Back to Tech Stack

Kubeflow

ML toolkit for Kubernetes orchestrating end-to-end workflows

Why Kubeflow?

Kubeflow is the Kubernetes-native ML platform I use for orchestrating complex ML workflows at scale. It provides the building blocks for production ML systems on Kubernetes.

Key Components I Use

Kubeflow Pipelines

  • DAG-based workflow orchestration
  • Reusable pipeline components
  • Experiment tracking integration
  • Scheduled and triggered runs

Notebooks

  • Jupyter notebooks on Kubernetes
  • GPU/CPU resource allocation
  • Persistent volumes for data
  • Collaboration and sharing

Training Operators

  • Distributed training support
  • TensorFlow, PyTorch operators
  • Resource management
  • Fault tolerance

Serving

  • Model deployment and serving
  • Auto-scaling capabilities
  • A/B testing support
  • Canary deployments

My Experience

At CML Insights, I've built comprehensive ML pipelines using Kubeflow:

Data Pipelines

  • Ingestion from multiple sources
  • Transformation and feature engineering
  • Validation and quality checks
  • Feature store integration

Training Pipelines

  • Hyperparameter optimization
  • Distributed training workflows
  • Experiment tracking with MLflow
  • Model validation and testing

Deployment Pipelines

  • Automated model deployment
  • Smoke testing in staging
  • Progressive rollout to production
  • Monitoring integration

Architecture

Kubeflow on EKS with:

  • S3 for pipeline artifacts
  • PostgreSQL for metadata
  • MLflow for model registry
  • ArgoCD for deployment

Benefits

  • Kubernetes-native workflows
  • Resource efficiency
  • Reproducible experiments
  • Version-controlled pipelines
  • Team collaboration