View profile

Introducing Machine Learning for Developers Newsletter (ML4Devs, Issue 1)

Machine Learning for Developers
Hi there,
Thank you for your attention. My aim with this newsletter is:
To curate and create resources for practitioners to design, develop, deploy, and maintain ML applications at scale to drive measurable positive business impact.
In each issue, I will explore a topic from the developer’s point of view, with few links to relevant resources for you to dig deeper.
In this issue, I want to discuss the experiences of data scientists and developers who build real-world ML applications, and underscore:
  • Modeling is only a fraction of work, and most of the work is rigorous engineering
  • Developers can adapt their engineering skills to ML without needing a math Ph.D.
At present, Machine Learning (ML) conversation is dominated by exciting modeling techniques and the latest State Of The Art (SOTA) models. It creates an impression that one must master all the math behind these models to build ML systems. But very few business use cases need SOTA models. About a dozen well-known models suffice for a large number of applications.
It takes a lot more than the models to create useful ML applications. It is the rest of the things that can make or break an ML application in production. But these other things are not discussed as often and as much as deserved.
Like any other software application, the ML journey starts with product and user experience design. It culminates in operations: deployment and monitoring. There are three key disciplines in building ML applications:
  • Software Engineering: continuously design-develop-test-deploy-monitor software systems to achieve business goals,
  • Machine Learning: implement some product features using statistical models, and evaluate whether models are effective in light of new data,
  • Data Engineering: collect, curate, and manage quality, privacy, and security of the data needed for training ML models.
The better we learn this union — and not just ML models — the more likely we will succeed in making ML work in the real world. Here are my 5 percepts:
  1. Deep use case understanding and domain knowledge lead to better products.
  2. Early app and ML pipeline integration with unified ownership works better.
  3. Automated tests, logs & monitoring speed up bug detection and fixing.
  4. Better data always wins.
  5. Simple, fast, easy to train & explain models rule.
The first 3 are not new to software engineers. They have known these for over two decades. Since in ML, data + model is logic, the last 2 are in a way saying that simple, fast, easy to understand algorithms are more successful in production than complex asymptotically superior algorithms. This is also not new for developers.
You can see that the engineering considerations in non-ML applications have an echo in ML applications too. You will see the same echo in the experiences of data scientists who build ML applications used by thousands of people.
Data Scientists who crossed over to Engineering
Eugene Yan (@eugeneyan) is an Applied Scientist shipping ML recommender system at Amazon. His essay “Unpopular Opinion - Data Scientists Should Be More End-to-End” highlights issues with data scientists working in silos: diffusion of responsibility, and loss in translation. Reading it was déjà vu for me of a time when developers coded in silos, tested ad-hoc, and threw over the wall for testers to write tests and verify the expected functionality. Since then, software development has become agile with CI/CD. We now work in cross-functional teams of product managers, designers, engineers, and DevOps for better communication and early detection of integration issues. So much so that Full-Stack Developer has become a thing.
Shreya Shankar (@sh_reya) is an ML Researcher. She studied at Stanford and interned at Google Brain. She wrote an insightful article based on her experience as an ML engineer: Reflecting on a year of making machine learning actually useful. She emphasizes the importance of data: how adding data and tweaking features can yield better RoI than iterating over different models. She writes, “90% of my work involves modeling-agnostic tasks.” She underscores the importance of testing. She also explains concepts of reproducibility and replicability that are unique to ML applications due to their probabilistic nature.
Chip Huyen (@chipro) is an ML Researcher, also from Stanford, who specializes in MLOps and deploying ML in production. She posted an interesting list of things to learn for becoming an ML engineer. Please check how many of these items developers already do:
Chip Huyen
Things I’d prioritize learning if I was to study to become a ML engineer again:

1. Version control
2. SQL + NoSQL
3. Python
4. Pandas/Dask
5. Data structures
6. Prob & stats
7. ML algos
8. Parallel computing
10. Kubernetes + Airflow
11. Unit/integration tests
Chip also speaks of the same 90-10 split:
Chip Huyen
Machine learning engineering is 10% machine learning and 90% engineering.
Developers who crossed over to Machine Learning
Machine Learning libraries and frameworks have matured in the last 5 years. Tooling is evolving rapidly. It all has become very accessible for developers. For end-to-end ownership, developers should stop treating ML models as black boxes.
Today, product managers and developers are expected to have an understanding of distributed system concepts. Tomorrow, the same will be expected for ML.
Santiago Valdarrama (@svpino) has described his experience of this transition in a crisp Twitter thread:
I am not particularly inclined to Math.
I do not have a Ph.D.
I do not like to read research papers.

But I do make a pretty good living working on the Data Science/AI/Machine Learning field.

You can also do it.

Here is how I got here.

I have been an engineer for 15 years before venturing into ML. I wrote about my journey in “An Engineer’s trek into Machine Learning.”
If you prefer structured courses, here are 5 free ML courses and tutorials to start with:
There wasn’t a better time ever for learning to build ML applications. I look forward to sharing my experiences and learn from yours.
ML4Devs is a weekly newsletter for software developers with the aim:
To curate and create resources for practitioners to design, develop, deploy, and maintain ML applications at scale to drive measurable positive business impact.
Each issue discusses a topic from a developer’s viewpoint. Please connect on Twitter or Linkedin, and send your feedback, experiences, and suggestions.
Did you enjoy this issue? Yes No
Satish Chandra Gupta
Satish Chandra Gupta @scgupta

ML4Devs is a weekly newsletter for software developers.

The aim is to curate and create resources for practitioners to design, develop, deploy, and maintain ML applications at scale to drive measurable positive business impact.

Each issue discusses a topic from a developer’s viewpoint.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Created with Revue by Twitter.