
PLM (Physical Language Model) is a production chat-based application enabling astrophysicists to analyze raw X-ray event data from the Chandra Source Catalog through natural language. I led the end-to-end development of the entire system: fine-tuning models on 50,000+ sources, building the FastAPI backend with multi-agent orchestration, implementing MongoDB vector search, and creating an interactive Next.js frontend with UMAP visualization and real-time streaming chat.
Backend: FastAPI (Python) with multi-agent workflow orchestration
Frontend: Next.js 14 with React, TypeScript, TailwindCSS, D3.js visualizations
Database: MongoDB with vector search indices for 64D PCA embeddings
ML Infrastructure: RunPod GPUs for fine-tuning Qwen-7B on X-ray event data
Each X-ray source contains photon arrival times and energies. I built a comprehensive processing pipeline that:
Built with LangGraph, the system orchestrates 5 specialized agents with streaming progress updates:
I fine-tuned Qwen-7B on 50,000 X-ray sources to directly interpret raw photon event data. The model learned to classify sources (AGN, stars, SNR, galaxies) and identify variability patterns from time-energy sequences without traditional text inputs.
Analyzes computed spectral features and astrophysical metadata. Rather than fine-tuning on raw data, I translate event data into standardized metrics (hardness ratios, light curves, model fit parameters)—the same approach a professional astrophysicist would use—enabling enterprise LLMs to provide expert-level physical reasoning.
I implemented MongoDB vector search with cosine similarity on 64D PCA embeddings to find the 10 most similar sources. The analyst compares spectral properties of neighbors, leverages known classifications, and provides comparative context. This "wisdom of the crowd" approach significantly improves classification confidence.
Provides LLM-accessible tools, primarily hips2fits for querying multi-wavelength imagery (infrared, optical, X-ray) at varying fields of view. The LLM autonomously queries images of the surrounding region to understand spatial and spectral context.
Synthesizes all agent outputs, identifies agreements and disagreements, evaluates evidence strength, and assesses overall confidence. Provides critical review ensuring physically plausible conclusions.
Produces the final response in either "Normal" (conversational) or "Advanced" (structured with explicit reasoning sections) modes, synthesizing all analyses into a coherent answer.
pca_64d field for cosine similarity queriesFrontend: Next.js 14, React 18, TypeScript, TailwindCSS, D3.js, SVG visualizations
Backend: FastAPI (Python), LangChain, LangGraph, NumPy/SciPy
ML: PyTorch, Transformers, Qwen-7B fine-tuning, OpenAI GPT-4/GPT-5
Data: MongoDB with vector search, Pinecone (alternative vector DB)
Infrastructure: RunPod (GPU), Docker, HTTP microservices
Enables astrophysicists to rapidly analyze X-ray sources through conversational AI, significantly reducing analysis time from hours to minutes. The system combines strengths of specialized fine-tuned models (pattern recognition from 50,000+ sources) with general-purpose LLMs (physical reasoning) and automated computational pipelines (standardized metrics), providing comprehensive multi-perspective analysis with confidence assessments.
Key Contributions: