Commendable Sources for MLOps and ML System Design Journey.

2022/03/05 | 6 minute read |

MLOps is a newly minted word. As per Wikipedia, “MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field”. Companies are flocking to go behind it. Myriad PaaS startups are offering end-to-end services. MLOps market is projected to reach $126 billion by 2025.

I want to highlight a few resources that are useful for the MLOps journey. I reached out to many people, searched on Github and Reddit, joined multiple Discord channels, followed many researchers on Twitter to gather some of these. I hope you will find it useful as well. If you have anything to append to this ever-growing list, please feel free to leave a comment.

CS 329S: Machine Learning Systems Design

This course is unquestionably the most unique, first-of-its-kind machine learning system design course. The course is taught by Chip at Stanford. It touches all aspects of ML system design with practical examples and significance. I found lecture notes comprehensive to refresh ML system design basics. We should be grateful to them for making it available for FREE.

Hidden Technical Debt in Machine Learning Systems

People who are in the ML system design field or are interested in MLOps must have come across this paper or its reference. Many blogs give credit to this paper (2015) by Google for highlighting the importance of infrastructure in real-world ML systems. It explains that the maintenance costs in real-world ML systems are massive and can not be decoupled from the ML ecosystem.

Machine Learning Operations by DTU MLOps

One of the detailed MLOps courses is available for FREE at DTU. The main focus is on exercises with emphasis on practical tools and coding skills for machine learning in production. It introduces a number of coding practices that will help organize, scale, monitor, and deploy machine learning models either in a research or production setting. It provides hands-on experience with a number of frameworks, both local and in the cloud, for doing large-scale machine learning models.

The course objectives:

Organize code in an efficient way for easy maintainability and shareability (Git, Code structure, Debugging, Profiling, Experiment Logging)
Understand the importance of reproducibility and how to create reproducible containerized applications and experiments (Docker, Config files)
Capable of using version control to efficiently collaborate on code development (Github, DVC)
Knowledge of continuous integration (CI) and continuous machine learning (CML) for automating code development
Being able to debug, profile, visualize and monitor multiple experiments to assess model performance (GCP Monitoring)
Capable of using online cloud-based computing services to scale experiments (Local, Cloud deployment, Data Drifting)
Demonstrate knowledge about different distributed training paradigms within machine learning and how to apply them (Distributed Data Loading, Distributed Training)
Deploy machine learning models, both locally and in the cloud

Bonus point: MEMES!!

Machine Learning Ops Roundup

MLOps roundup is a monthly newsletter that brings together the best articles, news, and papers about ML resources. It is curated by Nihit Desai and Rishabh Bhargava. I always found it a very informative and common placeholder for all the MLOps news and advancement. At the time of writing this part of the blog, there are 30 newsletters filled with a ton of information. It is a treat for an MLops nerd!

University of Cincinnati Business Analytics Course

If you are a R nerd and looking for a quality course in machine learning, UC R Programming is the place. It has great explanations about all basic algorithms such as Linear Regression, Naive Bayes, Random Forest etc. Preparing for Regression Problems introduces crisp concepts necessary for any type of supervised machine learning model.

ML System Design by University of Minnesota

The ML System Design is offered by the University of Minnesota and taught by HAMZA FAROOQ. Hamza is a Data Science Manager at Google and an adjunct professor at the University of Minnesota. The last time when I emailed Hamza, he stated that the course material will be made available on Github.

Machine Learning Design Patterns

It is one of the great books on MLOps. It provides solutions to Common Challenges in Data Preparation, Model Building, and MLOps. The design patterns capture best practices and solutions to recurring problems in machine learning. It provides explanations of 30 patterns for data and problem representation, operationalization, repeatability, reproducibility, flexibility, explainability, and fairness.

It will help you how to:

Identify and mitigate common challenges when training, evaluating and deploying ML models
Choose the right model type for specific problems
Deploy scalable ML systems that you can retrain and update to reflect new data
Interpret model predictions for stakeholders and ensure models are treating users fairly

Machine Learning Operations

It provides an end-to-end ML development process to design, build and manage reproducible, testable, and evolvable ML-powered software.

A few noteworthy reads on the website:

Motivation for MLOps
End-to-End ML workflow lifecycle
MLOps Principles
CRISP-ML
ML Model Governance

Made With ML

As the repository states, “Learn how to apply ML to build a production-grade product and deliver value.” It is a great resource for software engineers, data scientists, and product managers. It has hands-on, intuition-first, software engineering, and focused yet holistic content.

Purpose
Data
Modeling
Scripting
Interfaces
Testing
Reproducibility
Production

Designing Machine Learning Systems by Chip Huyen

DMLS: An interative process for production-ready applications. This book covers scenarios such as:

Data Engineering and selection of the right metrics
Automating the process for CD (continuous development), CE (evaluation), CD (continuous deployment).
Development of production monitorng system
Architecting an ML platform and responsible ML systems

MLOps Zoomcamp

It teaches the practical aspects of productionizing ML services: collecting requirements to model deployment and monitoring. The course has just started!!

ML in Production

It showcases best practices for building real world machine learning systems in production. It has content that helps readers build, deploy, and run ML systems.

Large Language Models

Stamford CS324 Course. It teaches the fundamentals about the modeling, theory, ethics, and systems aspects of large language models, as well as hands-on experience working with them.

Implementing MLOps in the Enterprise

This practical guide helps your company bring data science to life for different real-world MLOps scenarios.

Learn the MLOps process, including its technological and business value
Build and structure effective MLOps pipelines
Efficiently scale MLOps across your organization
Explore common MLOps use cases
Build MLOps pipelines for hybrid deployments, real-time predictions, and composite AI
Learn how to prepare for and adapt to the future of MLOps
Effectively use pre-trained models like HuggingFace and OpenAI to complement your MLOps strategy

Product Management for AI

What you need to know about Product Management for AI
Practical skills for the AI Product Manager
Bringing an AI Product to Market
AI Product Management after Deployment

Harvard CS197: AI Research Experiences

A free offering by Harvard CS197. This course is designed to not only teach you how to utilize current technologies and tools in the field of artificial intelligence, but also to develop a comprehensive understanding of what it means to be a successful AI researcher. Through this course, you will learn how to effectively read and understand research papers, generate innovative ideas, and present your ideas through written papers or presentations. Additionally, you will gain valuable skills in project management and effective communication within a team, which are essential for top AI researchers.

I will keep adding content to the same post!! Thank you for reading.

Thank you!

Share on

Twitter Facebook Google+ LinkedIn

Ashish Tele