A few years back my dad was diagnosed with inguinal hernia for which a surgery was recommended by the doctors. My dad was admitted to the hospital two days before the scheduled day of the surgery for pre-surgery tests as he was nearing 70 years. A parade of different specialists from the heart surgeon to anaesthetist visited him to make sure that he was fit for the surgery. The night before the day of the surgery, he was administered with enema to clear the stomach and was counselled to be mentally prepared as patients usually get frightened to go under the knife. On the day of the surgery, he was taken to the operation theater and was all set to be put under anaesthesia when the surgeon shouted, “What is his EF?” EF, short for Ejection Fraction, is a measurement, expressed as a percentage, of how much blood the left ventricle pumps out with each contraction. A normal heart has a EF of 55%, where as my dad’s was just 40%. The surgeon reminded that someone with a weak heart will have enough complications to come out of anaesthesia so much so he asked the team to abort the surgery! To arrive at this decision, you did not need a surgeon in the operation theater. The doctors could have determined this well in advance and not waste their own time as a surgery involves an array of para medics like nurses to work in a coordinated manner.
Cut to 2018. We were working on a relatively medium scale data science project which involved data sourcing from multiple vendors. The team had data scientists, product managers and data engineers.
Modelling by Data Scientists is the third stage in the process and by then there has to be clarity on the manner in which the data is collected/sourced, the validation of data is complete and the data is made available for consumption in a fashion that data modelling can happen seamlessly. In this case, 8 weeks into the project, the Data Scientists unearthed a flaw in the way the data was collected for a particular data source. Some fundamental hypothesis we had could not be validated in the manner we expected because of the flaw in data. Although the project was salvaged in some manner, for me it was deja vu. The Data Scientist was akin to the Surgeon. The surgeon had to abort the surgery in the operating theatre where as the data scientist had to almost abort the project at the stage of modelling.
It is said that Data Scientists spend a lot of time in cleaning and reorganising data, but that falls in the realm of data engineers. Data Scientists also worry about the nature of the data collection but that again falls under the realm of Data Product Managers. If the Data Scientists have to increasingly do the cleanup of data, question the nature of the data collection and find basic flaws in the data, the product managers and data engineers are not doing their job. Don’t expect the surgeon to find out that the blood test appears wrong!
Image Credit: Almondbite3