Even if the ML project failure rate is not ~80% (for example, failure at the Proof of Concept stage is a good thing), the “real” failure rate is quite likely still very high.
This story is repeated so very often:
- A data scientist is brought in to do ML on the data being collected.
- She discovers that the data is unusable. Either nobody knows what it is, or it is incomplete and unreliable.
- Somehow she manages to clean the data; experiment, and build a model. She has it all in her Jupyter notebook.
- Management considers it done and ready to deploy. Only to learn that significant work is needed to take it to production.
- Disappointed management says, “okay, do it.” The data scientist replies, “she can’t, engineers have to do it.” And engineers are like, “who, me? this math?”
- Nobody is yet realizing that it is NOT done even after deployment. The model must be monitored for data drift and retrained.
- Nobody is happy in the end. Management thinks ML is a hoax. Data Scientist thinks they don’t get it.
ML project failures can happen due to:
- Lack of ownership: waterfall-like “thrown over the wall” handoffs between teams
- Poor problem formulation: solving the wrong problem, optimizing wrong metrics
- Data access, insufficiency, quality, collection, and curation issues
- Infeasibility or cost of deploying a model
- Lack of model monitoring and maintenance