Cloudera Shares Customer Lessons on How to Scale Production Machine Learning

November 26, 2019

Research By: Natalia Modjeska, Info-Tech Research Group

To make machine learning (ML) repeatable and scalable, you need to invest in serving infrastructure (the “last mile”), ML operations, and governance, says Cloudera’s Sr. Product Manager Alex Breshears in the MIT-Cloudera webinar “How to Scale Production Machine Learning in the Enterprise.”

In the webinar, Breshears shared key challenges and lessons learned from Cloudera customers who have built large-scale production ML systems.

The webinar also featured Tom Davenport, a distinguished professor and author of several books including Competing on Analytics and The AI Advantage: How to Put the Artificial Intelligence Revolution to Work.

Many organizations experimenting with AI and ML learn very quickly that ML models make up only a small fraction of real-world ML systems – the small black box in the middle of the diagram below, said Breshears, citing a diagram from a paper by Google researchers. Production ML requires a lot more.

Courtesy: Sculley, D. et al. “Hidden Technical Debt in Machine Learning Systems”, NIPS 2015

Organizations intending to put ML into production and run it at scale need to invest in the following:

Serving infrastructure: How will the output of an ML model be served to its consumers or integrated with applications they are using? (The “last mile” delivery of ML predictions.)
Model operations and monitoring at scale: These models need to be packaged, deployed, monitored for performance and drift, and retrained on a periodic basis. If you only have a handful of models, you can do that manually. If you have thousands of them in production like Cisco Systems (the example Davenport gave), you’ll need ModelOps. Cisco has gone from having a few models in production to 60,000 sales propensity models covering 160 million of its customers. The only way to achieve this without hiring an army of data scientists was by creating a “model factory.”
ML governance: You will also need to think about – and plan for – model security, model governance, model catalogue, etc.

Our Take

To achieve scale with ML and truly start reaping its benefits by embedding it everywhere, you will need to automate as many components in the ML development and deployment lifecycle as possible. While production ML projects are largely custom, a platform like Cloudera (and other tools – see “Want to Know More?”) can help you achieve that automation.