Applied Machine Learning — Deploying machine learning models with use of execution patterns

11 min readNov 11, 2020

Until recently, large financial services organizations on boarded data science teams, placed them next to the delivery organization and gave them tools for experimentation. It was as if every organization had to demonstrate for themselves that their data combined with statistical modelling had potential to generate business insights. When the derived insights failed to translate into actual, running models continuously delivering business value, it generated frustration among business leaders. The recent interest in applied machine learning tooling, including MLOps and DataOps, is a recognition that there is a commercial potential to harvest by remedying this executive frustration.

The lesson is that hiring a group of data scientists is no guarantee for commercial success. At minimum, they must engage in Applied Machine Learning and carefully balance any experimentation or research conducted outside that paradigm. Applied Machine learning is an engineering discipline aimed at deploying machine learning systems that solve particular real world problems.[1] Opposed to this stand Machine Learning Research, which experiments in the pursuit of better algorithms or explores the frontiers of industry application. The key principle of Applied Machine Learning is to think execution before experimentation — think of how to execute before what to execute.[2]

We have reached a tipping point and 2020 appears to be the breakthrough year for productionization of machine learning models: The literature on the topic is growing, MLOps is becoming a familiar term among executives, and software providers are crowding the field.[3] Digital natives have talked about topics within the Applied Machine Learning for some years, but over the last year contributions from other organizations have started to grow significantly.[4] Top of mind in many organizations, including my organization, is the efficient delivery of machine learning models into an execution setup.

Machine learning programs are essentially code, so how hard could it be for a couple of DevOps engineers to set up the pipeline.

Could data science not just to adopt software engineering principles? This true to some degree but there are more steps in the machine learning process due to i) the nature of testing statistical models, and ii) the degree of collaboration required. The implications are that the model lifecycle must have more structure defined beforehand and everyone must understand their role in the model lifecycle. The purpose of this article is to describe the steps in the machine learning productionization lifecycle and explain the concept of execution pattern, which is the essential mechanism for thinking execution before experimentation. Based on their specific needs and circumstances, organizations should create their own versions of the examples provided, but I hope they can serve as inspiration.

The Machine Learning Productionization Life Cycle

The Machine Learning Productionization Life Cycle is the process of developing and integrating machine learning models into a live setting where they generate value for the business, and then continuously improving them.

Circumventing the experimentation before execution mind set, the life cycle starts at the final step of the iteration and views all prior steps from that perspective. The final step is where the model executes in a live setting and serves scores to the intended consumer. Compare how amateur and professional teams prepare for a match. Amateur football teams often play for fun and socializing. They join up during the week, play games against each other and then have a beer afterwards. In the weekend, there is the tournament match. Professional football teams start with upcoming weekend’s match and plan the training so they are in the best position to win that match. The defence and offence run separate programs during the week while regularly meeting to test collective strength. If time permits, there is a test match before the real match — only difference being that the result does not count. The serve step in the machine learning productionization lifecycle has an offline and online mode. Like the test football match, the offline mode in the serve phase contains the final validation step: When everything is deployed, the test data set and a set of live scores are run and evaluated offline as a final validation that the model executes and scores as intended.

This validation step could be automatic, but sometimes the criteria and timing for moving from offline to online mode are holistic and contextual. For example, updating a model might generate confusion with downstream consumers if it generates different scores than the old model on the same input, so perhaps some communication needs to happen.

To arrive at the final serve step, integration tests run on the execution package. The platform team exclusively owns this step: It consists of tests that the platform team need to make in order to allow the model onto the platform. Integration tests execute when triggered and it is an automatic flow, because the execution package adheres to the standards set up by the execution pattern. The equivalent in football is the formation, e.g. 4–4–2 or 3–4–3, which defines the structure of offence, midfield and defence.

The key principle of Applied Machine Learning is to think execution before experimentation — think of how to execute before what to execute.

It should be possible to determine which specific execution pattern to use based on the requirements description. There is no good analogy in football for this way of determining the formation. However, in applied machine learning, a client or business stakeholder should provide the following types of requirements all explicated during the requirements step:

· Functional requirements, including what should be predicted, scope, performance metric, level of quality and precision, degree of explainability, data governance

· Non-functional requirements, including execution timing, availability, destination and format, access and segregation

· Project requirements, including cost and time to market

The functional requirements determine the direction of experimentation and are usually the focus for data science teams. In applied machine learning, however, the non-functional and project requirements take priority. It is the non-functional requirements that determine which kind of experimentation is acceptable, and the project requirements the acceptable amount of time spent on experimentation.[5]

Based on the requirements provided, the team forms and work begins. Building machine learning models requires someone to get data to train the model and a pipeline for the data input to be served to the system when live. It requires someone to design the model algorithm that will take input and generate appropriate output. It requires someone to configure the IT and perhaps even order the system where the model will run. These are the three work streams. On one side of the analogy, there is data, algorithm and IT. On the other, defence, midfield, offence. Who is who?

Anyway, before these work streams take off, the team and client have to agree on the execution pattern. The next section will provide some examples of the execution patterns we are currently developing in Danske Bank.

The execution pattern

An execution pattern is a predefined template that describes required an acceptable output from each of the work streams such that these can be integrated and productionized in the available setup; when all the components satisfy the respective requirements of the execution pattern, they combine to a well-defined and executable execution package that can be automatically integrated and deployed.

The platform team managing the execution platform must implement a set of execution patterns meeting the needs of the organization and available on their platform (that makes the platform team analogous to the football coach) This requires understanding the organization’s ambitions and the use cases that have shown to be successful within the industry. For example, working in the financial services industry, we are currently aiming for the 4 execution patterns seen in the diagram below.

Experimentation is included as a special execution pattern. This is where there is no requirement for a live model. For example, when a client requests explorative research or one time analysis. There can be good reasons why an organization allows free experimentation within machine learning, but if so make an explicit distinction between open-ended experimentation and experimentation that is part of a process targeting productionization. In the experimentation pattern, the lifecycle does not proceed to the integration phase and the team working on this must be aware that productionization is not possible.

Each execution pattern in the serve pattern branch requires adherence to a particular file structure and code formatting standards defined and owned by the platform team who in return takes responsibility for the deployment and ongoing execution. Non-functional requirements cannot exceed the levels provided by the execution patterns without generating additional work. In reality, this means that the platform team has to interact continuously with clients regarding requirements as well as the teams working on the data, algorithm and IT configuration. This has to be happen in order to ensure the execution patterns are coordinated with the needs and preferences of these teams.

In the case at hand, there are four execution patterns. This is the current conclusion from dialogue with business stakeholders, applied machine learning team members, and owners of consuming systems that these 4 will satisfy the current demand. The demand in this case ranges from the need for scheduled execution of complex big data models to simple on-demand execution of inline models, where all information needed for calculation is part of the original call. The big data execution pattern is very comprehensive, whereas the inline execution pattern is simpler in the sense that it involves fewer steps. Currently, the inline pattern only specifies folder structure, execution file structure including utilities (modelbase, logger, SQL connection), naming convention, metadata information, and model packaging and delivery guide.

Execution patterns are essential irrespective of whether the delivery organization is centred around one diverse team containing all roles or whether there are several teams spread across the larger organization. They are essential, because they contain a structure needed to collaborate and combine multiple disciplines. The creation of a deployable machine learning model requires data engineering, machine learning modelling and configuration of IT for execution pipelines. In organizations of a certain size, competencies are often specialized so that they sit with different people and sometimes even with different teams. In this setup, the execution pattern facilitates collaboration, because they allow each team to focus on their individual contribution while still collaborating with the other teams — they allow the defence, midfield and offence to form a unity. In organizations where work is concentrated to a single team containing all the roles, there is sometimes not a deep level of expertise within all steps. Here the execution pattern ensures the required quality level on all dimensions — they tell the practitioners when they a are playing defence and when they are playing offence.

Designing an execution pattern

The team building the execution platform must lead the design phase for an execution pattern. They need to sign off that adherence to the execution pattern will make a model deployable. Furthermore, they define the demarcations and interfaces between the platform that executes the model and the solutions delivering data, at one end, and consuming the scoring, at the other end.

The execution platform must involve the data and model teams in the design phase. If the productionization ‘problem’ is approached as a purely software engineering problem, it simplifies too much. It is about bringing developed code to production, but not only software centric tests are needed: There is extensive testing of the statistical performance of the model on the defined feature. It is generally not efficient to do all the model related testing first and then the code and integration related testing afterwards even though this would make a division of labour much easier. You will want to test your code before you train and validate the model, because otherwise you risk running very slow code. For the more complex execution patterns, you might even need some level of integration testing during development.

On one side of the applied machine learning — football analogy, there is data, algorithm and IT. On the other, defence, midfield, offence. Which is which?

Data activities warrant a separate, independent discussion. In the above framework, activities falling within this category are data collection, preparation as well as feature engineering.

During data collection and preparation, the analyst collecting the data must ensure that it is accessible, sizeable, useable, understandable and reliable. This includes ensuring that the data can be accessed for live scoring so that a serving setup is possible, as well as producing a labelled set of examples that can be used for training, validating and testing the machine learning models. Furthermore, the data must be stored, appropriate security measures applied, formatted and versioned

Feature engineering is the process of designing and then implementing the programming code that transforms the raw data into a feature vector that can be consumed by the candidate machine learning model. Techniques in this context include regularization, numerical transformation and enrichment via indirect data.

Both data collection, preparation and feature engineering will take a significant part of its requirements from the model developer teams rather than directly from the client requirements. Furthermore, these requirements will necessarily be specified iteratively as the model developer and data engineer collaborate to understand the predictive power of the data sets in combination with the model design. In fact, most common practice is probably that feature engineering is an integrated part of the model development — sometimes this is the case for all data activities except the ingestion of raw data to the data platform.

The tight coupling between data activities and model engineering might make sense because the interaction loop is tight, but they should be seen as distinct disciplines irrespective of how they are organized. Creating a data pipeline of engineered data follows the same standards as ETL, while the machine learning context adds activities such as labelling and checking for data leakage.

The introduction of execution patterns is critical because it forces the machine learning teams to consider deployment at the very start of their projects. This way the vicious cycle of not being able to productionize is avoided.

The Algorithm activities

When the vantage point is machine learning experimentation and you only start to think about deployment and industrialized execution afterwards, then you get into a dreaded vicious cycle where it takes weeks, months and even years to deploy a model.

The Deploy Phase

Literature

[1] Andrew Ng makes the distinction between Artificial General Intelligence (AGI) and Artificial Narrow Intelligence (ANI). The latter is very much related to Applied Machine Learning.

[2] Applied Machine Learning is, however, not just a discipline and an accompanying set of tools. It is also a mind-set that practitioners must adopt and put instead of the elitist approach not uncommon among data scientists.machine learning productionization lifecycle.

[3] See e.g. IDC Marketscape: Worldwide Advanced Machine Learning Software Platforms 2020 Vendor Assessment.

[4] See List of Literature in this article’s final section

[5] The implication is that the IT team members need to be involved in the dialogue from the beginning because otherwise the non-functional requirements are at risk of being forgotten.