
Collaborative research project with the Center for Astrophysics at Harvard to create shared embedding spaces between Chandra X-ray Observatory event data and astronomical research papers using contrastive learning.
Astronomical research involves two distinct modalities that traditionally exist in separate silos:
The goal was to create a unified embedding space enabling cross-modal retrieval between data and text.
Developed contrastive learning framework with dual encoders:
Used hard negative mining and temperature scaling for efficient training on paired observation-paper datasets.
Achieved >85% top-5 retrieval accuracy for paper-to-data matching, enabling intelligent data retrieval, literature-guided analysis, and automated annotation of astronomical observations. Embeddings cluster meaningfully by astronomical object type and generalize to unseen source types.
