Production LLM Systems
RAG and retrieval/ranking services, multi-agent pipelines, and real-time inference, built to hold accuracy under latency, cost, and safety constraints.
Machine Learning Engineer / New York, NY
Applied ML and LLM systems engineer. I take models from problem formulation through training, evaluation, real-time deployment, and monitoring, owning the accuracy, latency, cost, and safety tradeoffs in production.

RAG and retrieval/ranking services, multi-agent pipelines, and real-time inference, built to hold accuracy under latency, cost, and safety constraints.
Fine-tuning and post-training, survival/risk modeling, and graph learning, with rigorous evaluation on imbalanced, out-of-distribution data.
Model safety and robustness, interpretability, and evaluation harnesses, first-author research on refusal robustness and tokenization.
First-author research, applied modeling, and shipped demos, with live links where they exist.
Training-time safety defense and evaluation harness for Llama-3.2-1B-Instruct. Post-trained via class-conditional mean/covariance matching with temperature-scaled KL distillation from a frozen instruct model.
Raised the linear-ablation rank required to break refusal from K=1 to K≥16, with baseline behavior preserved.
Formulated vocabulary selection as a mixed-integer optimization (Gurobi) with greedy-consistency constraints over token-selection and pretoken-encoding variables. Advisors: Dr. Chris Tanner & Dr. Craig Schmidt.
Reduced tokenization cost by 1.5% versus Byte Pair Encoding through joint vocabulary optimization.
Framed Mg²⁺ metal-ion binding-site detection as binary classification over graph-structured RNA, engineering node/edge features from 3D molecular structure and addressing severe class imbalance. Trained GCN and GNN-DTI models.
+6.2 pp ROC-AUC over CNN baselines, with a live Streamlit + Mol* 3D demo.
Fully-local, real-time PT coaching: a vision-language model (Qwen2.5-VL) gives form feedback while MediaPipe handles rep counting and range-of-motion, with spoken coaching delivered in-browser, no cloud required.
Top-8 of 30 teams at the Dell × NVIDIA Hackathon 2026 (NYU CDS).
Built a multimodal dataset from ROS2 recordings, aligning RGB, depth, IMU, odometry, and action streams, and benchmarked CNN, DINOv2 ViT-S/14, and TD-JEPA encoders with GRU-RSSM dynamics.
Multi-horizon ADE/FDE evaluation (1.6 / 3.2 / 5.0s) replacing single-step MSE with deployment-relevant motion-forecasting metrics.
Co-founded two AI startup concepts through NYU Leslie eLab Startup School, Axentra (clinical-trial protocol design automation) and LaRa-Home (architecture blueprint / building-code compliance verification), designing LLM/RAG workflows for domain document reasoning, structured extraction, compliance checking, and human-in-the-loop review.
Ran customer discovery and MVP scoping to validate pain points across clinical research operations and architecture / building-code review.
Side projects and research tools, public on GitHub.
A multi-agent product-discovery pipeline: cluster evidence into insights, then generate, critique, and score directions.
Token-by-token reasoning phases in LLMs, modeled with an HMM over per-token features.
Sparse-autoencoder feature circuits in the ESM-2 protein language model, with an interactive circuit viewer.
Owning ML systems end-to-end across LLMs, retrieval, and applied modeling.
Validity of Machine Learning-Based COVID-19 Prediction
Benchmarked and validated 7 classification models on 195k clinical records; quantified ~20% AUROC degradation under cross-continental distribution shift and released an open-source evaluation toolkit.
Auto Encoders for Communication-Efficient Distributed Learning
Autoencoder-based method for communication-efficient distributed learning.
AI in Coronary Physiology: Where Do We Stand?
Review of AI's role in cardiovascular disease detection and intervention.
Exploring Protein Design Landscapes with Semi-Supervised Adaptive Sampling
Poster: semi-supervised adaptive sampling over protein design landscapes.