MLOps for Continuous Integration, Delivery, and Training of ML Models (ML4Devs, Issue 6)
Continuing on the theme of “Integrate Early and Iterate Often” from the previous issue, the obvious question is how to do it well. We touched upon ML Pipeline briefly. In this issue, let’s examine how the best in the business do it.
Google has been running ML models on a large scale probably for the longest, and they have published their best practices in automating machine learning pipelines. They confirm that only a small fraction of a real-world ML system is composed of the ML code. You probably would have seen this diagram:
The automated pipeline needs to be built for:
Continuous Integration: Tests for not just code but also for validating data, data schemas, and models.
Continuous Delivery: Deploy not just one (ML prediction) service, but an ML training pipeline that should automatically deploy ML prediction service when desired.
Continuous Training: New and unique to ML for automatically retraining and serving ML models.
That article also defines MLOps maturity levels:
Level 0: Manual process: Train, and deploy ML models manually
Level 1: ML pipeline automation: Automated pipeline for continuous training, and continuous delivery of the ML prediction service, however, the ML pipeline is deployed manually.
Level 2: CI/CD pipeline automation: Automated ML pipeline deployment.
Just a conceptual pipeline does not suffice. We need tools to implement the pipeline. There has been an explosion of MLOps tools. I am listing here only a few prominent alternatives:
Tools are evolving very rapidly. TFX is more comprehensive and complex. MLFlow and MetaFlow are quite mature and not as complex as TFX. It is also common to combine multiple tools with Airflow and Kubeflow.
My apologies for so many links in this issue. I hope that it will be useful in the future, and you may return to it at a later date.