When to (Not) Use Machine Learning (ML4Devs, Issue 4)
In the previous issue, I discussed why Machine Learning projects fail. In this issue, let’s start figuring how to build successful Machine Learning products. The first step is to understand when Machine Learning is more effective than traditional programming.
Traditional programs have deterministic logic to solve a problem. But Machine Learning is probabilistic. It leverages patterns in data to tune the logic.
For a problem:
If you have a deterministic logic that solves it with 100% accuracy, then obviously that is cheaper, easier, and more accurate than any ML model one can make.
If you have some stable heuristic rules that solve it most of the time, the extra work/complexity of ML might not be worth it.
If your heuristics does not work up to the desired accuracy, and requires constant updates, then ML can be a good bet.
Always evaluate the tradeoffs of additional complexity and cost against performance gains to determine if it is really worth it.
Some of the problems that are better solved with ML:
Search result ranking
Mail classification and spam detection
Expected Time of Arrival in maps
Clustering similar news stories
Web/App Ads click-through rate prediction in Google, Facebook, etc.
Product recommendations in Amazon, Netflix, Facebook, Quora, etc.
Personal Assistants like Siri, Alexa, Google
Fraud transaction detection
Customer segmentation and churn prediction
Equipment failure prediction
Network intrusion detection
Text sentiment analysis
It is not possible to design a 100% correct logic for these problems. Earlier, many of these problems were solved with heuristic rules that were updated constantly. It is easier and better to collect data and train an ML model instead. (However, these heuristic solutions are a good starting point to collect the needed data.)
Always Start with User Experience
Building an ML product feature, just like non-ML features, start with thinking about user experience. In the case of ML, we know that the solution will not be 100% correct. So, a graceful failure experience has to be thought through.
It takes several iterations over the following stages:
Translate product need to an ML problem
Iterate over models
Deploy and monitor
It may feel like a lot to begin with. There is a wide spectrum of choices depending upon the scope of the problem:
Expand existing analytics infra (say, with SparkML or BigQuery ML)
Pick predefined models in Auto ML solutions by cloud vendors
Build ML pipeline and stack from scratch using Scikit-Learn, TensorFlow, PyTorch, etc.
If your organization is just beginning to do ML, the approach of expanding analytics infra is consistent with the hierarchy of needs for machine learning. In fact, advanced analytics requires data science and machine learning. Especially if your organization deals with tabular and semi-structured data, this approach will require less upfront investment.