Machine Learning Research Engineer / New York, NY

LLM research,production systems I own.

ML research engineer focused on LLMs, with research on refusal robustness, interpretability, and tokenizer efficiency. I also build and own the production side: real-time retrieval and ranking, multi-agent pipelines, and inference tuned for latency, cost, and safety. A researcher who ships.

Get in touch See my work

GitHub LinkedIn dm6262@nyu.edu

Portrait of Deepanshu Mody — FIG.01LLM Research · Safety & Interpretability · Production ML

41→84%top-1 retrieval accIncedo

92.0token-level F1Incedo

~18Kbiomedical KG entitiesPfizer

16×refusal attack-rankNYU

195k+records validatedPLOS ONE

Production LLM Systems

RAG and retrieval/ranking services, multi-agent pipelines, and real-time inference, built to hold accuracy under latency, cost, and safety constraints.

RAG
Retrieval & ranking
Multi-agent
Real-time inference
Monitoring

Applied ML & Modeling

Fine-tuning and post-training, survival/risk modeling, and graph learning, with rigorous evaluation on imbalanced, out-of-distribution data.

Fine-tuning / QLoRA
Survival modeling
GNNs
Calibration
OOD validation

Safety, Evaluation & Research

Model safety and robustness, interpretability, and evaluation harnesses, with research on refusal robustness and tokenization.

Refusal robustness
Interpretability
Eval harnesses
Tokenization

Selected projects & research

Research, applied modeling, and shipped demos, with live links where they exist.

NYU · LLM safetyJan-May 2026

Robustness & Evaluation of Refusal in Open-Weight LLMs

Training-time safety defense and evaluation harness for Llama-3.2-1B-Instruct. Post-trained via class-conditional mean/covariance matching with temperature-scaled KL distillation from a frozen instruct model.

Raised the linear-ablation rank required to break refusal from K=1 to K≥16, with baseline behavior preserved.

Preprint coming soonRead case study

Explainer Code

Kensho/MIT EECSCOLM 2026 · plannedSep 2025 - Present

Optimizing a Tokenizer for Greedy Left-to-Right Inference

Formulated vocabulary selection as a mixed-integer optimization (Gurobi) with greedy-consistency constraints over token-selection and pretoken-encoding variables. Advisors: Dr. Chris Tanner & Dr. Craig Schmidt.

Reduced tokenization cost by 1.5% versus Byte Pair Encoding through joint vocabulary optimization.

Preprint coming soonRead case study

Research InternPurdue · Dr. KiharaJun-Dec 2022

Imbalanced Binary Classification on Graph-Structured Data (RNA GNNs)

Framed Mg²⁺ metal-ion binding-site detection as binary classification over graph-structured RNA, engineering node/edge features from 3D molecular structure and addressing severe class imbalance. Trained GCN and GNN-DTI models.

+6.2 pp ROC-AUC over CNN baselines, with a live Streamlit + Mol* 3D demo.

Live demo Code

Dell × NVIDIA HackathonTop-8 / 30 · NYU CDS2026

PhysioCoach: Real-Time AI Physical-Therapy Coach

Fully-local, real-time PT coaching: a vision-language model (Qwen2.5-VL) gives form feedback while MediaPipe handles rep counting and range-of-motion, with spoken coaching delivered in-browser, no cloud required.

Top-8 of 30 teams at the Dell × NVIDIA Hackathon 2026 (NYU CDS).

Code

CoRL 2026 · plannedJan-May 2026

World-Model Training & Forecasting for Robot Locomotion

Built a multimodal dataset from ROS2 recordings, aligning RGB, depth, IMU, odometry, and action streams, and benchmarked CNN, DINOv2 ViT-S/14, and TD-JEPA encoders with GRU-RSSM dynamics.

Multi-horizon ADE/FDE evaluation (1.6 / 3.2 / 5.0s) replacing single-step MSE with deployment-relevant motion-forecasting metrics.

Preprint coming soon

Co-FounderTwo AI startupsDec 2024 - Jan 2026

Axentra & LaRa-Home · NYU Leslie eLab Startup School

Co-founded two AI startup concepts through NYU Leslie eLab Startup School, Axentra (clinical-trial protocol design automation) and LaRa-Home (architecture blueprint / building-code compliance verification), designing LLM/RAG workflows for domain document reasoning, structured extraction, compliance checking, and human-in-the-loop review.

Ran customer discovery and MVP scoping to validate pain points across clinical research operations and architecture / building-code review.

Axentra demo Credential

Experience

Owning ML systems end-to-end across LLMs, retrieval, and applied modeling.

Feb 2026 - May 2026~1M conversations mined · ~93% intent accuracy

Data Analyst InternLiving Brands AI · Brooklyn, NY

Built a reproducible Python/SQL pipeline that mined ~1M user conversations with embedding-based retrieval and a two-pass classifier, reaching ~93% accuracy segmenting 2,650 queries into intent groups to quantify what drives user engagement.
Designed a Bradley–Terry ranking methodology with bootstrap 95% confidence intervals, extracting ~2.9M implicit pairwise comparisons from a 2,810-response corpus at zero added cost, turning wide statistical tiers into defensible, decision-ready rankings.
Defined and validated a 245-prompt visibility-metric battery for a financial-services client by triangulating three independent sources, quantifying distribution gaps up to 25 points and auditing data-quality issues; delivered a methodology report for non-technical stakeholders.
Built classification workflows with confidence scoring, probability calibration, and deterministic structured-output inference, defining decision thresholds that route low-confidence predictions to human review — a safety pattern for high-stakes, ambiguous decisions.
Designed and shipped a production model-monitoring system (React/TypeScript on AWS, PostgreSQL backend) tracking live model outputs, confidence distributions, category drift, and failure modes to accelerate error analysis and continuous evaluation.

Jun 2025 - Aug 20256-agent QA system · ~18K-entity KG

Statistics & AI/ML InternPfizer · Boston, MA

Built and deployed an evidence-grounded biomedical QA system on AWS using LangGraph, orchestrating six agents across planning, entity normalization, tool routing, graph retrieval, validation, and synthesis.
Constructed a biomedical knowledge graph of ~18K entities and ~95K typed, evidence-weighted relationships across diseases, genes, proteins, pathways, drugs, and source evidence.
Integrated eight production retrieval tools (Neo4j traversal, relationship lookup, evidence retrieval) so LLM outputs cite supporting graph paths and provenance, auditable answers suited to regulated clinical settings.

Jul 2023 - Jul 202441% → 84% exact-match · 92.0 F1

Data Scientist · Software Engineer, Data & AIIncedo Inc. · Gurugram, India

Owned and productionized a real-time retrieval and ranking service (hybrid BM25 + dense retrieval with cross-encoder reranking) over 1,200+ documents, raising exact-match top-result accuracy from 41% to 84% and token-level F1 to 92.0.
Fine-tuned open-weight LLMs (Llama-2) for named-entity recognition on medical and telecom data, lifting entity F1 from ~0.75 zero-shot to ~0.90 in-domain (models published to Hugging Face) and surfacing a ~0.60 out-of-domain drop that guided fine-tune-vs-prompt decisions.
Owned production inference code serving live requests, tuning precision configurations (FP32/FP16/FP8/INT8) and benchmarking latency vs. throughput, directly trading off accuracy, latency, and cost to meet real-time SLAs without degrading quality.

Jan 2023 - Jun 2023RISC-V · LLVM backend

Software Engineering InternKinara AI (acquired by NXP Semiconductors) · Hyderabad, India

Prototyped a RISC-V vector extension and LLVM backend (custom lowering for vectorized memory and scatter/gather intrinsics) to accelerate ML-kernel execution in cycle-accurate simulation.

Publications & profile

Publications & conferences

Validity of Machine Learning-Based COVID-19 Prediction
Benchmarked and validated 7 classification models on 195k clinical records; quantified ~20% AUROC degradation under cross-continental distribution shift and released an open-source evaluation toolkit.
PLOS ONE · 2025
Paper Code Website
Auto Encoders for Communication-Efficient Distributed Learning
Autoencoder-based method for communication-efficient distributed learning.
AAAI Deployable AI Workshop · 2023
Workshop
AI in Coronary Physiology: Where Do We Stand?
Review of AI's role in cardiovascular disease detection and intervention.
Review article
Paper
Exploring Protein Design Landscapes with Semi-Supervised Adaptive Sampling
Poster: semi-supervised adaptive sampling over protein design landscapes.
22nd Int'l Conference on Bioinformatics, Brisbane · Nov 2023

Let's talk

Building something where ML has to actually work in production?

dm6262@nyu.edu

GitHub LinkedIn