Site icon Synapses

Data Biryani – Simply finger lickin’good

Back in 1990s, Sunday was the most awaited day of the week… No, not just a being a weekend or a holiday, but importantly to have mom made “Biryani” (South Asian mixed rice dish generally made with spices, rice and meat – popular throughout the subcontinent). Simply, amazing aroma & finger licking. Very nostalgic.

Biryani Picture (Zomato)

If I recall correctly, my mom spent hours in kitchen for the Biryani preparation. She carried out several preparation activities before she started cooking the biryani. Some of the activities were ensuring that the meat is well cleaned (all the fat being removed), removing the dried leaves from a bunch of coriander leaves, picking up the best tomatoes/onions, removing any pebbles from the rice (yeah.. this was a daunting task) etc. At times, looking at the preparation tasks, I used to get impatient and tell why do we need to do these activities, why can’t we just put all of them “As-is” and start cooking; patiently she used to say one thing “Real Biryani tastes only when it is cleaned and well preprocessed”.

After many years, I co-relate this to my “Data Science” profession. Ask any real Data Scientist “Where does he spends most of his/her time while building the models”, we get the answer “Data Preparation phase”. (Data Preparation is the process of collecting, cleaning, and consolidating data into one file primarily used in data modelling/analysis). It is estimated that 73% of Data Scientist’s (I mean real data scientists) time is spent on “Data Preparation & Exploration”.

But the question is why am I saying this? What am I trying to say?

Recently I met a Data Scientist (working in an extremely reputed management consulting firm) in a workshop (Middle East Banking Summit); and while having lunch with him (incidentally both of us were having and discussing about Biryani), found that this gentleman knows only 2 things in Data Science: “Fit” and “Predict” (Many of my Data Science friends would know this). What was more surprising that he acknowledged, that all he did was these 2 things – once the data was given to him. I was taken aback and bit surprised. Further to my surprise I met few more people in the workshop who claimed as Data Analytics/Data Scientist professionals and found that very few had literally carried out “Data Preparation” phase. It was as good as putting all the Biryani ingredients “As-Is” and start cooking.

This affirmed me to share some of my experiences on few of the most common activities in the data preparation phase while carrying out any data modelling. Even though some of the activities are quite basic, but carrying them out would help in Data hygiene (and ultimately predicting better model results).

Example: Defaulters on Auto Loan have increased by 2.3% (Year-On-Year) and goal is to reduce the default rate by 90%. Data sources would be “Delinquency Table, Customer Transaction Table, Account-360, Social Media association” etc.

Sample Data Dictionary:

Sample Data Dictionary (Provided by Christopher Kalodikis)

https://www.youtube.com/watch?v=kH0bcw9P2Lc

I wish to conclude by saying “The one secret for better results in Data Modelling is emphasis on Data Preparation”.. In the meanwhile, enjoy the Data Biryani!

Kindest Regards, Safdar Hussain

https://www.linkedin.com/pulse/data-biryani-simply-finger-lickingood-safdar-hussain/

Exit mobile version