Site icon Synapses

How DEEP is your Data Science project?

How DEEP is your Data Science project?

…., I am frustrated with my Data Science job …I get blamed for every project failure… I don’t want to be a Data Scientist.….I wanna quit.” 

 

 

One of the fun parts of coaching people in Data Science/Machine Learning field is that, we get loads ‘n’ loads of questions not just in data science/ML arena, but also issues related to their jobs, their managers, sometimes family issues etc.

Last weekend (Saturday – early morning), I got a call from one of my students/mentees.. He starts of with …

…., I am frustrated with my Data Science job …I get blamed for every project failure… I don’t want to be a Data Scientist.….I wanna quit.” 

I was taken aback. If I remember correctly he was one of smartest and brightest chaps. He was one of the geeks in Machine Learning and specifically an expert on neural nets.

Well, I slowly, steadily calmed him and started asking questions (in a ‘5 Whys’ manner). Something serious came out of the discussion.  Looks like, his management provided some data and asked him to come up with proposed solutions…. No problem statement, No goal statement, No project objectives, No understanding of Business domain, No process flow diagrams, No knowledge on data source, No data dictionary.. Nothing about vision etc.

They believed that if you have the title as Data Scientist, you are a magician. You look into the data and are expected to give the management “solutions” immediately. I wish if this was true. On further discussing with him I found that due to time pressure, the person directly jumped into solution mode. No data science framework used nor any methodology being followed.

Typically, data scientists are so engrossed in building algorithms that they tend to miss out the bigger picture. Well, importantly we have noticed that there are several institutions that conduct and train on data science, machine learning algorithms etc but very few give importance/teach on the Data Science Methodology or framework.

The call with my mentee compelled me to write the below piece on Data Science Framework/methodology. Thought of coining this as D.E.E.P (no, its not Deep Learning)

DEEP stands for

Let’s look into each phase in detail

Define phase is one of the most critical phase in a data science project. Unfortunately, this phase is the most neglected by Data Science team. In Define we do the following

Based on the discussion that I had with my mentee (person who had called in) looks like corners were cut in the Define phase.

In Explore phase, we carry out most of the dirty work (oops sorry, I shouldn’t use this). This is the phase wherein most of the data scientists try to cut corners

This is the phase wherein the Data Scientist plays with data, builds model and importantly thinks this is only the most value added activity that he/she should do (which is not right). Some of the key activities of Exploit phase are

All looks good, all learnings done, model built, reported to management etc. Time to move the model into production/live environment with live data feed. Below are the key activities within Productionize phase

Lastly, as said in one of the above steps, visualization is the best way of describing information; the above methodology can be debriefed with a single view (visual) as below.

 

  I hope the above methodology/framework gives us a discipline while setting and developing Data Science project.

Happy DEEPing your Data Science Project!

Author

Safdar Hussain BE,MS(Oxford,UK),M.Tech (IIT),M.Phil,PGMP(IIM),PMP,Lean Expert,Six Sigma Black Belt(ASQ),MBB(GE),CPP,ITIL V3 Expert, PgMP

Email: Hussain.pmp@gmail.com / safdar.oxford@gmail.com

 

Exit mobile version