Back to Projects
CML Insights - ML Platform & Products

CML Insights - ML Platform & Products

CML InsightsJuly 2022 - PresentMachine Learning Engineering Lead

Key Highlights

  • Architected end-to-end ML infrastructure for multiple production applications
  • Built scalable MLOps pipelines with Kubeflow, MLflow, and Dagster
  • Implemented GitOps workflows with ArgoCD and infrastructure-as-code with Terraform
  • Delivered 4 major products: CML Insights App, Evidence Hub, Evidence Hub Curator, Fair Appraisal
  • Led successful projects for Kids Read Now and JG Wentworth clients

Overview

As Machine Learning Engineering Lead at CML Insights, I architect and implement end-to-end ML solutions spanning multiple products in educational and financial domains. My role encompasses infrastructure design, MLOps implementation, database architecture, and hands-on development of production ML systems.

Products & Solutions

CML Insights App

Core analytics platform providing ML-powered insights for educational assessment and intervention.

  • Microservices architecture on Kubernetes
  • Real-time data processing pipelines
  • Custom ML models for predictive analytics
  • RESTful APIs with comprehensive authentication

Evidence Hub Ecosystem

Two-part solution for evidence collection and curation:

Evidence Hub App

  • Mobile and web data collection system
  • Multi-modal evidence capture (text, image, video)
  • Offline-first architecture with sync capabilities

Evidence Hub Curator App

  • Content management and review workflows
  • ML-assisted tagging and categorization
  • Admin dashboards for quality control

Fair Appraisal App

ML-driven appraisal and evaluation system:

  • Automated scoring algorithms
  • Bias detection and fairness metrics
  • Explainable AI for decision transparency

Technical Architecture

MLOps Infrastructure

Built comprehensive MLOps pipelines enabling rapid experimentation and deployment:

Orchestration

  • Kubeflow Pipelines for ML workflows
  • MLflow for experiment tracking and model registry
  • Dagster for data pipeline orchestration
  • Custom automation for model versioning and deployment

Infrastructure as Code

  • Terraform modules for AWS resource provisioning
  • Kustomize for Kubernetes configuration management
  • GitOps workflows with ArgoCD
  • Automated environment provisioning (dev/staging/prod)

Monitoring & Observability

  • Grafana dashboards for system and ML metrics
  • Prometheus for metrics collection
  • Loki for log aggregation
  • Alert Manager for proactive incident response

Authentication & API Gateway

  • Keycloak for identity and access management
  • Kong Gateway for API routing and rate limiting
  • OAuth 2.0 and OIDC implementations
  • Role-based access control (RBAC)

Database Design

Designed normalized schemas optimizing for:

  • High-throughput ML feature access
  • Transaction consistency for application data
  • Efficient querying for analytics workloads
  • PostgreSQL with read replicas and connection pooling

ML Capabilities

  • Scikit-learn for classical ML models
  • PyTorch for deep learning applications
  • Dask for distributed data processing
  • Integration with OpenAI API for LLM features
  • Hugging Face model deployment

Notable Client Projects

Kids Read Now

Educational literacy program serving thousands of students:

  • ML models predicting reading progress and intervention needs
  • Data pipelines processing reading assessments
  • Dashboards for educators and administrators
  • Scalable infrastructure handling peak loads

JG Wentworth

Financial services ML solutions:

  • Risk assessment models
  • Document processing with NLP
  • Secure data handling complying with financial regulations

Technical Challenges & Solutions

Scalability

Challenge: Handle variable loads across multiple products
Solution: Kubernetes HPA with custom metrics; microservices isolation; asynchronous processing with message queues

Deployment Velocity

Challenge: Reduce time from model training to production
Solution: Automated CI/CD pipelines; containerized deployments; feature flags for safe rollouts; comprehensive testing automation

Data Privacy & Security

Challenge: Handle sensitive educational and financial data
Solution: Encryption at rest and in transit; audit logging; compliance with FERPA and SOC 2; regular security audits

Cost Optimization

Challenge: Manage infrastructure costs while maintaining performance
Solution: Right-sizing resources; spot instances for batch workloads; intelligent caching; query optimization

Impact

  • Scale: Serving 4 production applications with thousands of daily active users
  • Reliability: Maintained 99.5%+ uptime across all services
  • Velocity: Reduced feature deployment time from weeks to days
  • Team Growth: Established MLOps best practices adopted across engineering teams

Technology Stack

Languages: Python, SQL, Shell scripting
ML/Data: PyTorch, Scikit-learn, Pandas, Dask, NumPy
MLOps: Kubeflow, MLflow, Dagster
Infrastructure: Kubernetes, Docker, Terraform, Kustomize
Cloud: AWS (EKS, S3, RDS, EC2, Lambda)
CI/CD: ArgoCD, GitHub Actions
Monitoring: Grafana, Prometheus, Loki
Security: Keycloak, Kong Gateway
Databases: PostgreSQL, Redis
APIs: OpenAI, Hugging Face