Simon Stiebellehner is a lecturer in Data Mining & Data Warehousing at University of Applied Sciences Vienna and Lead MLOps Engineer at Transaction Monitoring Netherlands (TMNL). This article is part of the series “The MLOps Engineer” and originally appeared on Simon's Medium.
In this article, you’ll read about:
- Failing being part of the game when doing data science,
- How you can minimize the blast radius of failure,
- Why vertical prototypes are a key ingredient in this,
- How a “golden path” makes vertical prototyping scalable, and
- What MLOps has to do with the “golden path”
TL;DR? >> Fast track to the gist:
- Failure is inherent in doing data science.
- However, we have some options to reduce the blast radius of failure.
- Besides structuring the product development process appropriately, making the vertical breakthrough happen fast is key to detecting risks early and being able to iterate on the model development as close to a production setting and to the stakeholders as possible.
- Of course, developing vertical prototypes early can be expensive if your model development teams always need to take care of all the infrastructure, deployment and serving overhead.
- This is why the MLOps paradigm and, at its core, the establishment of a golden path — a set of processes, tools, templates and training — is crucial.
- It enables model development teams to push out vertical prototypes fast at very little additional costs by reducing the non-model specific overhead that comes with bringing ML systems (close) to production, eventually reducing the blast radius of failure in data science.
In for the long read? Great!
Doing Data Science Means Failing (sometimes)
In 2020/2021, when I was building machine learning models at bol.com (the largest e-commerce retailer in the Netherlands), together with a team colleague I held a talk on product development for data science at GoDataFest. In this talk, I’m expounding why I believe that a phased, incremental process of developing and productionising ML systems, in which making the “vertical breakthrough” early, is critical to eventually deliver business value with machine learning. Below is what we had been applying back then in the team (more on it in the linked talk).
One of the key reasons why we had been drawing up this approach is that Data Science — and with this term I’m more specifically referring to the model development aspects of it — is a highly experimental discipline. Failure — meaning a model not delivering (business) value — is an absolutely normal co-occurrence of doing data science. Bluntly speaking, I’d even argue that it’s in the name of the discipline: “science”, by definition, describes the process of generating knowledge through research — i.e. exploring the unknown. In the case of data science, there are often many unknowns, some critical ones being, for example, data properties (quality, skew…) and true model performance on unseen data “in the wild”.
Consequently, uncertainty and failure are inherent in doing (data) science. The one and only way to really find out whether your endeavor will succeed is putting work into it.
This is one of the key differences between more “traditional” software engineering and data science. When you deal with modeling, every problem is unique due to the fact that every data set has unique properties that often have large effects on the modeling approach and outcome. The application of well-known, battle-proven model architectures and preprocessing logic increases your chance of success, but it’s by far no guarantee that your model eventually works as expected and your solution delivers the business value that it has been envisioned to do.
PS: There’re several critical differences in the nature of software engineering and developing ML models. This needs to be accounted for in organisational structures and team setups to develop robust ML systems that eventually solve business problems. Interested in this topics? Follow me on LinkedIn!
One of the key points I’m vouching for in the linked talk is developing “vertical prototypes” (deep, narrow prototypes) of the desired machine learning system as soon as possible — even if the model in it is still a simple baseline/far from delivering business value.
Concretely, this means we deploy a very first version of our ML system to production (or to a very production-like environment) in the first ~1–2 sprints.
This principle might sound natural to mature software engineering teams following agile principles, such as the SCRUM framework, where shipping functioning versions of the product in defined intervals (“releases”) is normal and features are added from release to release. However, in data science, the largest focus in initial phases of the project is often put on bringing the actual model to a state that ticks the “accuracy” box. Only then teams usually consider moving it to production. This is a common pitfall: here the focus is solely on the model itself, not on the ML system as a whole — which is far more than the model. This pitfall is common in data science/analytics teams that lack the engineering perspective.
Too bad if you then realize that you’ve have spent months chasing some accuracy threshold in your 10-fold cross validation but the model suddenly fails to deliver the experimental results in a production setting.
Whether that’s because of performance issues, production-data looking slightly different or the solution being made available to stakeholders only after it’s fully done and hence not being what they “really wanted”. This type of failure in data science is a pretty bad one. Not so much because it costs a lot of time and stakeholders might be disappointed, but because it probably could’ve been avoided.
Focusing on developing a vertical prototype very early in the process allows us to sketch out engineering risks (performance, latency, costs, security…) as soon as possible and test model iterations fast and early in a production(-like) setting — also together with business stakeholders. From there we can keep developing the entire pipeline including the model to a production-grade state in an incremental, iterative way, being as close to the technical production setting and to our business stakeholders as possible.
No matter what your 10-fold cross validation promises, only a model that runs in a production setting and whose results can be analyzed and evaluated by the stakeholders tells you the truth.
But isn’t it a lot of effort to develop a vertical prototype, which often requires setting up a lot of infra around it? CI(CD) pipeline, artifact storing/retrieval, deployment and serving logic and some more. What if the model eventually just doesn’t work out to an extent that actually adds value? All the additional work required for the vertical breakthrough would be lost.
That’s true, but these “sunk costs” for the vertical breakthrough should be very small and on average the value gained through developing and testing the model close to a production setting as well as getting early stakeholder feedback should outrun them by far.
The step from a model prototype to a vertical prototype of the ML system should not be a significant effort. If it is, you are likely suffering from the infamous deployment gap. This is where the Golden Path and MLOps come into play.
Primer Into MLOps
MLOps is the cross-domain field (Machine Learning + Development + Operations) that is concerned with developing infrastructure and tooling that facilitates model development, deployment and operations.
Have a look at the slides of my talk at last year’s (2021) WeAreDevelopers World Congress, where I spent 45 minutes on explaining how MLOps helps manage the complexity of Machine Learning in organizations and why it’s critical to actually getting business value out of machine learning systems.
In a nutshell: if your organization wants to start scaling Data Science/modeling, you need to think about MLOps. Period. MLOps principles and tools help you level-up your model development process from various perspectives. It makes your teams as well as your infrastructure fit for scaling from ten to hundreds and thousands of models in a controlled, secure, efficient and effective way. If we’d like to put MLOps into one graphic, the below one is a reasonable (but still minimal) stab at breaking it down.
I like to think about MLOps as a platform that’s used in model development, deployment and operations, spanning all the way from prototyping to production, but with emphasis on the production-end of the scale. It’s ultimate purpose is to make data science scalable.
MLOps paves the Golden Path
One of the main things that your MLOps team should be busy with is “paving the golden path”. The “golden path” refers to the supported way that models should (or even must) be brought to production. Paving it typically requires developing a set of processes, documentation, tools, packages and templates that together make model deployment and operations easy and fast for the user (data scientists, MLEs).
Model developers should be able to focus on what they can do best and what they typically like to do most: developing great models.
What the golden path concretely looks like depends on a variety of factors, some of them being technical (e.g. tech stack), others being skills (e.g. what roles/skills exist in the model development teams) or even cultural (e.g. team autonomy vs. centralization). Therefore, golden paths may look fairly different depending on the organization. Typically, they’re characterized by the following points:
- Generalizable infrastructure work is stripped away from the duties of the model development teams through tooling.
- There’s a balance between flexibility and enforced standardization that fits the organization’s risk appetite, types/heterogeneity of models, company culture as well as teams’ skill sets.
- Documentation and training is in place: data scientists and machine learning engineers are aware of what the golden path looks like, what it requires the model to comply with and how the path can be followed.
- Time-to-deployment of models is significantly reduced.
- It’s actually followed in practice and model development teams do not try to work around or dodge it.
You cannot buy the “golden path” off the shelf although some marketing material of MLOps platform providers might make it seem so. The golden path is composed of a palette of software products, glue code, processes and documentation. Its continuous development is typically at the very core of the MLOps team.
Developing a path that actually adds value is a complex endeavor with potentially a diverse set of stakeholders. It should be broken down into various loosely coupled components/products with clear interfaces and dedicated release cycles. It requires comprehensive understanding of user (model developers’) needs and workflows. It often encodes opinionated rules about how model developers should interact with resources such as data or model artifacts. It’s the manifestation of the decision in the trade-off between flexibility and standardization. It requires training teams and users, good documentation and high-quality examples.
The golden path is the heart of the MLOps team. With growing data science efforts, its components can quickly become the backbone for the entire model landscape. This needs to be accounted for during the design process as the blast radius of things going wrong can be significant. Changing patterns in the golden path, e.g. how model deployment is done, can potentially impact hundreds or even thousands of models.
The earlier you start designing your golden path, the less retrofitting of model pipelines you’ll have to do and the less technical debt you’ll build up.
Again, if your organization considers growing data science efforts, you need to think about MLOps and the golden path. Now. Period.
Now we know that failure is inherent in doing data science. However, we have some options to reduce the blast radius of failure. Besides structuring the product development process appropriately, making the vertical breakthrough happen fast is key to detecting risks early and being able to iterate on the model development as close to a production setting and to the stakeholders as possible. Of course, developing vertical prototypes early can be expensive if your model development teams always need to take care of all the infrastructure, deployment and serving overhead. This is why the MLOps paradigm and, at its core, the establishment of a golden path — a set of processes, tools, templates and training — is so important. It enables model development teams to push out vertical prototypes fast at very little additional costs by reducing the non-model specific overhead that comes with bringing ML systems (close) to production, eventually reducing the blast radius of failure in data science.
In the next article, we’ll dive deeper into the golden path by looking at a typical core component of it: the CI/CD pipeline for model development. We’ll think about things like…
- Centralized pipelines vs. team autonomy
- How to make a CI/CD pipeline fit for the “data sciency” way of working
- Why and how are requirements different to traditional software engineering?
> Do you have feedback or would like to chat? Message me!
- Talk recording “Product Development for Data Science” by Simon Stiebellehner and Melissa Perotti at GoDataFest 2020: https://www.youtube.com/watch?v=Sa430abUnKQ&t=2706s&ab_channel=GoDataDrivenAmsterdam
- Talk slides “Effective Machine Learning: Managing Complexity with MLOps” by Simon Stiebellehner at WeAreDevelopers World Congress 2021: https://drive.google.com/file/d/1DBFcMmSzi1_xbMHamMCM9iIrjVfX4u9P/view?usp=sharing
Are you interested in curated, high-quality content around MLOps?
Follow Simon on LinkedIn to stay up-to-date.