
Causal machine learning platform that goes beyond standard predictions to answer the "why" behind events. Architected for higher education institutions to understand true drivers of student outcomes and make evidence-based decisions using causal relationships rather than correlations.
Full-stack platform with layered architecture:
Built improved propensity score matching algorithm with gradient boosting models for propensity calculation, custom distance metrics for treatment-control matching, and sensitivity analysis for assumption validation. Optimized using approximate nearest neighbor search and Dask parallelization, reducing runtime from hours to minutes for 100K+ observation datasets.
Designed flexible metadata layer mapping client-specific data schemas to platform standards, enabling onboarding new institutions without code changes. Implemented multiple imputation strategies handling 20-30% missing data rates common in educational datasets.
Created end-to-end workflows ingesting from multiple sources (CSV, databases, APIs), engineering domain-specific features (retention, graduation, performance metrics), training ensemble models with hyperparameter optimization, and deploying via GitOps with data drift monitoring.
Wrote design documents, conducted architecture reviews, mentored engineers on ML best practices, and established coding standards ensuring scalability and maintainability across the engineering team.
Platform serves institutions from small colleges to large university systems, identifying actionable interventions with proven causal effects. Enables resource allocation to effective programs while avoiding ineffective ones, supporting rigorous causal studies with publishable methodology.