A curated list of tools, frameworks, platforms, and resources for Machine Learning Operations (MLOps).
MLOps stands at the intersection of machine learning, DevOps, and data engineering. This list is intended for ML engineers, data scientists, DevOps practitioners, and anyone building, deploying, monitoring, and scaling machine learning systems.
- General Resources
- Model Development & Experiment Tracking
- Model Deployment
- Model Monitoring
- Model Governance & Fairness
- Data Versioning & Management
- CI/CD for ML
- Frameworks & Platforms
- Courses & Learning
- Related Awesome Lists
- MLOps Guide by Google – Google's foundational guide to implementing MLOps.
- ml-ops.org – Open source resource defining MLOps best practices.
- Hidden Technical Debt in Machine Learning Systems (paper) – Seminal research on operational complexity in ML.
- MLflow – Open-source platform for managing the ML lifecycle, including experimentation and reproducibility.
- Weights & Biases – Experiment tracking, model management, and collaboration tools.
- Neptune.ai – Metadata store for ML experiments.
- Comet – Experiment tracking, model optimization, and monitoring.
- Sacred – Lightweight experiment configuration and tracking tool.
- Seldon Core – Deploy machine learning models on Kubernetes.
- KFServing (KServe) – Kubernetes-based model serving with autoscaling and inference graph support.
- BentoML – Framework for serving, optimizing, and deploying ML models.
- MLServer – Fast and lightweight inference server for deploying ML models.
- Triton Inference Server – Scalable GPU/CPU inference server by NVIDIA.
- Evidently – Monitor data drift, model performance, and fairness.
- WhyLabs – Observability for ML models and data.
- Arize AI – ML performance and drift monitoring platform.
- Fiddler AI – Explainable AI and monitoring for production ML models.
- AI Fairness 360 – IBM's toolkit for detecting and mitigating bias in ML models.
- Fairlearn – Python library for assessing and improving fairness.
- Model Cards for Model Reporting – Framework for transparent model documentation.
- Audit-AI – Bias and discrimination auditing for models.
- DVC (Data Version Control) – Git-like version control for datasets and ML pipelines.
- LakeFS – Git-like operations for data lakes.
- Delta Lake – Reliable data lakes with ACID transactions and time travel.
- Pachyderm – Data versioning and lineage for ML pipelines.
- Feast – Feature store for production ML.
- ZenML – MLOps framework for reproducible, production-ready pipelines.
- Metaflow – Netflix-developed tool for real-world ML pipelines.
- Kubeflow Pipelines – End-to-end ML workflows on Kubernetes.
- Flyte – Scalable and structured workflows for ML and data processing.
- Dagster – Data orchestrator for machine learning, analytics, and ETL.
- Tecton – Enterprise-grade feature store.
- Airflow – Workflow orchestration for ETL and ML pipelines.
- Dagster – Build and monitor data applications and ML systems.
- Metaflow – Human-centric workflow tool for ML.
- Coursera – MLOps Specialization by DeepLearning.AI – Learn to build, deploy, and monitor ML systems at scale.
- MLOps Zoomcamp – Free course to learn MLOps from scratch.
- Full Stack Deep Learning – Covers the full lifecycle of deep learning projects.
- Awesome Production Machine Learning – Companion list with tutorials and practical guides.
- Awesome LLMOps – Resources focused on managing large language models in production.
- Awesome Prompt Engineering – Techniques and tools for prompt design.
- Awesome AI Infrastructure – Tools for managing AI pipelines and infra.
Contributions are welcome!