To set the context for MLOps and ML pipeline, let’s step back and recap my three percepts from software engineering that we can apply to ML:
-
Consolidate Ownership: Cross-functional team responsible for the end-to-end project.
-
Integrate Early: Implement a simple (rule-based) model and develop product features around it.
-
Iterate Often: Build better models and replace the simple model, monitor, and repeat.
Consolidate Ownership
The Modeling and Engineering silos can be avoided by setting up a cross-functional team of product managers, developers, data engineers, and data scientists that is responsible for the feature end-to-end.
It improves communication. Data Scientists are in tune with business needs and production constraints, and developers get to know the nuances in utilizing the model.
Integrate Early
Do not jump into making an ML model. First, make a skeleton application end-to-end. For the ML component, just implement a dummy baseline rule-based system. It is okay if it is not accurate.
Seeing how your ML model will be consumed in a barely functioning system is a superpower.
For the ML component, design application-centric APIs. Even if you have a strong sense of what model you are going to use, resist spilling the model specifics into the APIs. A nice encapsulation will make it easy to swap and experiment with the models.
Iterate Often
Integrating early with a dummy baseline model also decouples ML from the development of other parts. While some developers can iterate over and enrich the skeleton application, ML engineers can work on models and always have an end-to-end system to test and experiment with.
This integration also gives you a baseline for benchmarking evaluation metrics. There are four important metrics:
-
Business Success: Impact of the product feature on the business (e.g. rate of recommended product being bought). Either implicitly through user actions or by designing a way for users to give feedback.
-
Model Performance: Benchmark the effectiveness of the model.
-
Latency: Time is taken in model inference. A perfect model that tests users’ patience is a bad model.
-
Data Drift: Used after the deployment to monitor if data distribution of data encountered in wild is shifting.
With careful design, it is possible to have a high product experience despite not-do-high model performance. For example, showing search or recommendation results only when confidence is above a certain threshold, or showing top 3 suggestions instead of one, can lead to higher user satisfaction. Careful product design plays a huge role in ML success.