Demystifying MLOps: Enhancing Machine Learning Workflow and Addressing Key Challenges
What is MLOps?
Main Source:
https://azure.microsoft.com/en-in/resources/drive-efficiency-and-productivity-with-machine-learning-operations/
MLOps:
- Concept or way of working. Not a product or service
- Approach for streamlining end-to-end ML life cycle
- Does for Data scientists and/or ML engineers what DevOps does for developers
What problems does it address?
MLOps addresses the following gaps:
- DevOps teams strive to automate all their workflows, and for the most part, today they have. However, data scientists/ml engineers are excluded. For example, Git is a perfect tool for building, testing and deploying to a production environment. However, due to the nature of ML projects, Git can be an environment that makes this problematic.
- ML projects may require tuning, scalability, automation, and packaging. These tasks may take months, and the process is iterative. DevOps does not propose a strategy to support this continuity.
Versioning
What is dataset/model versioning?
It is the process of applying different naming of different stages of a data/model.
In which cases would it be useful?
- Facilitating easy sharing
- Collaboration
- Model reuse in an environment that includes comprehensive access control
- Traceability
How does it differ from software versioning?
In ML, the data changes over time, and models need to be retrained regularly, which is not the case for software engineering. If the module requires the requirements in the software engineering field, it is frozen.
ML in Academia vs. Industry
What are the differences between doing ML in academic and industrial settings?
| |
Academia |
Industry |
| Approach to Metric |
Reach an unreached value |
Reach an unreached value with many constraints such as less complexity, and low computational cost. |
| Data |
Encourage to open source |
Confidentiality of both company and customer data |
| Model |
No constrains |
Large models are not preferred (as they will be used in real-time) |
| Deployment |
No need deployment |
Optimization has to be done considering the deployment phase |
| Pipeline |
No constrains |
Long-term and sustainable (Since it has deployment and mass production, it has a long life cycle) |
| Owner |
Individual or Team |
Company |
| Motivation |
Internally motivated |
Product-oriented |
| Money Support |
Rare or few (at least for Turkey) |
Full support from the company |
I assume that the performance refers to the metric in this question. Let’s say you get high accuracy in your test set but not in production. These are the possible reasons:
- There might be a distribution shift:
- The customer uses different data distribution than yours. For example, your test set does not cover your customer’s need: Bilby is an Australian animal, not found in the animal photos you collected in Turkey. But one of your customers is from Australia and unhappy with this situation.
- The customer uses different data structures. For example, your test set for image segmentation problems has a high resolution, ex. 1080. But the customers have low resolution, ex 360.
- There might be a concept drift. (“change in the relationships between input and output data in the underlying problem overtime”)
- Although you consider your customer’s needs and your test set covers perfectly, the concept of that field may change.
If we assume that the performance refers to the needs:
- There might be a computational cost issue:
- Your test environment has a higher and more powerful GPU but the production has poor. Although the model fits in the GPU, the evaluation process takes longer in real time. This could be a poor performance from the customer’s point of view.
- The model output requires memory. Your test environment has high memory, and memory is not an issue for you. But the product has less memory. Let us say after a few usages; the customer could not use the product without deleting the old ones because the memory is full. This could be a poor performance from the customer’s point of view.
How would you address this problem?
The data in which the model fails is examined manually. Try to find a pattern among the data with an error. After finding, a dataset should be created and the model should be retrained with this new dataset. (It is for deep learning. If the data is for machine learning, a discovered or missing feature should be added.)