Kubeflow | Imantha Ahangama

Why Kubeflow?

Kubeflow is the Kubernetes-native ML platform I use for orchestrating complex ML workflows at scale. It provides the building blocks for production ML systems on Kubernetes.

Key Components I Use

Kubeflow Pipelines

DAG-based workflow orchestration
Reusable pipeline components
Experiment tracking integration
Scheduled and triggered runs

Notebooks

Jupyter notebooks on Kubernetes
GPU/CPU resource allocation
Persistent volumes for data
Collaboration and sharing

Training Operators

Distributed training support
TensorFlow, PyTorch operators
Resource management
Fault tolerance

Serving

Model deployment and serving
Auto-scaling capabilities
A/B testing support
Canary deployments

My Experience

At CML Insights, I've built comprehensive ML pipelines using Kubeflow:

Data Pipelines

Ingestion from multiple sources
Transformation and feature engineering
Validation and quality checks
Feature store integration

Training Pipelines

Hyperparameter optimization
Distributed training workflows
Experiment tracking with MLflow
Model validation and testing

Deployment Pipelines

Automated model deployment
Smoke testing in staging
Progressive rollout to production
Monitoring integration

Architecture

Kubeflow on EKS with:

S3 for pipeline artifacts
PostgreSQL for metadata
MLflow for model registry
ArgoCD for deployment

Benefits

Kubernetes-native workflows
Resource efficiency
Reproducible experiments
Version-controlled pipelines
Team collaboration