Deploying Machine Learning Models at Scale in Cloud // Vishnu Prathish // MLOps Meetup #60
MLOps.community - En podcast af Demetrios Brinkmann
Kategorier:
MLOps community meetup #60! Last Wednesday we talked to Vishnu Prathish, Director Of Engineering, AI Products, Innovyze. //Abstract The way Data Science is done is changing. Notebook sharing and collaboration were messy and there was minimal visibility or QA into the model deployment process. Vishnu will talk about building an ops platform that deploys hundreds of models at-scale every month. A platform that supports typical features of MLOps (CI/CD, Separated QA, Dev and PROD environment, experiments tracking, Isolated retraining, model monitoring in real-time, Automatic Retraining with live data) and ensures quality and observability without compromising the collaborative nature of data science. //Bio With 10 years in building production-grade data-first software at BBM & HP Labs, I started building Emagin's AI platform about three years ago with the goal of optimizing operations for the water industry. At Innovyze post-acquisition, we are part of the org building world-leading water infrastructure data analytics product. //Takeaways Why is MLOps necessary for model building at scale? What are various cloud-based models for MLOps? Where can ops help in various points in the ML pipeline Data Prep, Feature Engineering, Model building, Training, Retraining, Evaluation and inference ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vishnuprathish/ Timestamps: [00:00] Introduction to Vishnu Prathish [00:16] Vishnu's background [04:18] Use cases on wooden pipes for freshwater [04:55] Virtual representation of actual, physical, tangible assets [06:56] Platform built by Vishnu [08:30] Build a reliable representation of network [11:52] Pipeline architecture [16:17] "MLOps is still an evolving discipline. You need to try and fail many times before you figure out what's right for you." [17:11] Open-sourcing [18:17] Platform for virtual twin [20:02] Entirely Amazon Stagemaker [20:43] Data quality issues [23:21] Reproducibility [23:40] "Reproducibility is important for everybody. Most of the frameworks do that for you." [25:00] Reproducibility as Innovyze's core business. [26:38] Each model is individual to each customer [27:50] Solving reproducibility problems [28:24] "Reproducibility applies to the process of training pipelines. It starts with collected from historical raw data from customers. In real-time, there's also this data being collected directly from sensors coming from a certain pipeline." [31:55] "Reusable training is step one to attaining automated retraining." [32:17] Collaboration of Vishnu's team [36:23] War stories [41:36] Data prediction [44:24] "A data scientist is the most expensive hire you can make." [47:55] 3 Tiers [48:53] MLOps problems [52:25] Automatically retraining [52:34] "Because of the numbers of models that go through this pipeline, it's impossible for somebody to manually monitor and retrain as necessary. It's not easy, it takes a lot of time." [54:22] Metrics on retraining [56:42] "Retraining is a little less prevalent for our industry compared to a turned prediction model that changes a lot. There are external factors that depend on it but a pump is a pump."