CML Insights App - Causal ML Platform

Overview

Causal machine learning platform that goes beyond standard predictions to answer the "why" behind events. Architected for higher education institutions to understand true drivers of student outcomes and make evidence-based decisions using causal relationships rather than correlations.

Architecture

Full-stack platform with layered architecture:

Data layer: PostgreSQL schemas normalized for efficient feature access
Processing layer: Python microservices for ETL, feature engineering, and model training
Orchestration: Dagster pipelines coordinating batch jobs and retraining
Deployment: Kubernetes on GCP with Terraform-managed infrastructure

Key Technical Contributions

Causal Inference Engine

Built improved propensity score matching algorithm with gradient boosting models for propensity calculation, custom distance metrics for treatment-control matching, and sensitivity analysis for assumption validation. Optimized using approximate nearest neighbor search and Dask parallelization, reducing runtime from hours to minutes for 100K+ observation datasets.

Multi-Tenancy System

Designed flexible metadata layer mapping client-specific data schemas to platform standards, enabling onboarding new institutions without code changes. Implemented multiple imputation strategies handling 20-30% missing data rates common in educational datasets.

ML Pipeline Automation

Created end-to-end workflows ingesting from multiple sources (CSV, databases, APIs), engineering domain-specific features (retention, graduation, performance metrics), training ensemble models with hyperparameter optimization, and deploying via GitOps with data drift monitoring.

Technical Leadership

Wrote design documents, conducted architecture reviews, mentored engineers on ML best practices, and established coding standards ensuring scalability and maintainability across the engineering team.

Technologies

Python: Scikit-learn, Pandas, NumPy, Dask
Kubernetes: Microservices deployment with autoscaling
PostgreSQL: ML feature storage and application state
GCP: GKE, Cloud SQL, Cloud Storage
MLOps: Dagster, Kubeflow pipelines
IaC: Terraform, Kustomize, ArgoCD

Impact

Platform serves institutions from small colleges to large university systems, identifying actionable interventions with proven causal effects. Enables resource allocation to effective programs while avoiding ineffective ones, supporting rigorous causal studies with publishable methodology.

CML Insights App - Causal ML Platform

Technologies

Key Highlights

Overview

Architecture

Key Technical Contributions

Causal Inference Engine

Multi-Tenancy System

ML Pipeline Automation

Technical Leadership

Technologies

Impact