Machine Learning Engineer / New York, NY

I build and own ML systemsend-to-end, from training to production.

Applied ML and LLM systems engineer. I take models from problem formulation through training, evaluation, real-time deployment, and monitoring, owning the accuracy, latency, cost, and safety tradeoffs in production.

Portrait of Deepanshu Mody
FIG.01Applied ML · LLM Systems · Applied Research
41→84%top-1 retrieval accIncedo
92.0token-level F1Incedo
~50%lower query costPfizer
16×refusal attack-rankICML '26
195k+records validatedPLOS ONE
01

Production LLM Systems

RAG and retrieval/ranking services, multi-agent pipelines, and real-time inference, built to hold accuracy under latency, cost, and safety constraints.

  • RAG
  • Retrieval & ranking
  • Multi-agent
  • Real-time inference
  • Monitoring
02

Applied ML & Modeling

Fine-tuning and post-training, survival/risk modeling, and graph learning, with rigorous evaluation on imbalanced, out-of-distribution data.

  • Fine-tuning / QLoRA
  • Survival modeling
  • GNNs
  • Calibration
  • OOD validation
03

Safety, Evaluation & Research

Model safety and robustness, interpretability, and evaluation harnesses, first-author research on refusal robustness and tokenization.

  • Refusal robustness
  • Interpretability
  • Eval harnesses
  • Tokenization
01

Selected projects & research

First-author research, applied modeling, and shipped demos, with live links where they exist.

First AuthorICML 2026 · submittedJan-May 2026

Robustness & Evaluation of Refusal in Open-Weight LLMs

Training-time safety defense and evaluation harness for Llama-3.2-1B-Instruct. Post-trained via class-conditional mean/covariance matching with temperature-scaled KL distillation from a frozen instruct model.

Raised the linear-ablation rank required to break refusal from K=1 to K≥16, with baseline behavior preserved.

First Author · Kensho/MIT EECSCOLM 2026 · plannedSep 2025 - Present

Optimizing a Tokenizer for Greedy Left-to-Right Inference

Formulated vocabulary selection as a mixed-integer optimization (Gurobi) with greedy-consistency constraints over token-selection and pretoken-encoding variables. Advisors: Dr. Chris Tanner & Dr. Craig Schmidt.

Reduced tokenization cost by 1.5% versus Byte Pair Encoding through joint vocabulary optimization.

Preprint coming soonRead case study
Research InternPurdue · Dr. KiharaJun-Dec 2022

Imbalanced Binary Classification on Graph-Structured Data (RNA GNNs)

Framed Mg²⁺ metal-ion binding-site detection as binary classification over graph-structured RNA, engineering node/edge features from 3D molecular structure and addressing severe class imbalance. Trained GCN and GNN-DTI models.

+6.2 pp ROC-AUC over CNN baselines, with a live Streamlit + Mol* 3D demo.

Dell × NVIDIA HackathonTop-8 / 30 · NYU CDS2026

PhysioCoach: Real-Time AI Physical-Therapy Coach

Fully-local, real-time PT coaching: a vision-language model (Qwen2.5-VL) gives form feedback while MediaPipe handles rep counting and range-of-motion, with spoken coaching delivered in-browser, no cloud required.

Top-8 of 30 teams at the Dell × NVIDIA Hackathon 2026 (NYU CDS).

Co-First AuthorCoRL 2026 · plannedJan-May 2026

World-Model Training & Forecasting for Robot Locomotion

Built a multimodal dataset from ROS2 recordings, aligning RGB, depth, IMU, odometry, and action streams, and benchmarked CNN, DINOv2 ViT-S/14, and TD-JEPA encoders with GRU-RSSM dynamics.

Multi-horizon ADE/FDE evaluation (1.6 / 3.2 / 5.0s) replacing single-step MSE with deployment-relevant motion-forecasting metrics.

Preprint coming soon
Co-FounderTwo AI startupsDec 2024 - Jan 2026

Axentra & LaRa-Home · NYU Leslie eLab Startup School

Co-founded two AI startup concepts through NYU Leslie eLab Startup School, Axentra (clinical-trial protocol design automation) and LaRa-Home (architecture blueprint / building-code compliance verification), designing LLM/RAG workflows for domain document reasoning, structured extraction, compliance checking, and human-in-the-loop review.

Ran customer discovery and MVP scoping to validate pain points across clinical research operations and architecture / building-code review.

02

Open source & experiments

Side projects and research tools, public on GitHub.

03

Experience

Owning ML systems end-to-end across LLMs, retrieval, and applied modeling.

Feb 2026 - May 2026~1M conversations mined · ~93% intent accuracy

Data Analyst InternLiving Brands AI · Brooklyn, NY

  • Built a reproducible Python/SQL pipeline that mined ~1M user conversations with embedding-based retrieval and a two-pass classifier, reaching ~93% accuracy segmenting 2,650 queries into intent groups to quantify what drives user engagement.
  • Designed a Bradley–Terry ranking methodology with bootstrap 95% confidence intervals, extracting ~2.9M implicit pairwise comparisons from a 2,810-response corpus at zero added cost, turning wide statistical tiers into defensible, decision-ready rankings.
  • Defined and validated a 245-prompt visibility-metric battery for a financial-services client by triangulating three independent sources, quantifying distribution gaps up to 25 points and auditing data-quality issues; delivered a methodology report for non-technical stakeholders.
Jun 2025 - Aug 2025~50% lower cost @ ~90% acc · 6-agent pipeline

Statistics & AI/ML InternPfizer · Boston, MA

  • Built and deployed a production LLM multi-agent pipeline (LangGraph; Gemini Flash, DeepSeek-R1) on AWS (EC2/S3), orchestrating 6 agents across planning, tool use, retrieval, validation, and evidence-grounded synthesis for biomedical question answering.
  • Benchmarked 7 open- and closed-weight LLMs and routed per task, cutting per-query cost ~50% while holding ~90% accuracy on a 200-question evaluation.
  • Built an LLM extraction pipeline over ~3,000 oncology and biomedical conference abstracts, disambiguating synonyms and drug/company-name variants against reference databases (entity F1 ~0.88; ~92% linking accuracy), then assembled a knowledge graph of ~18k entities and 95k evidence-weighted relationships.
  • Integrated 8 grounding tools (Neo4j graph traversal, relationship and evidence retrieval) into auditable, traceable answers suited to regulated clinical settings.
Jul 2023 - Jul 202441% → 84% exact-match · 92.0 F1

Data Scientist · Software Engineer, Data & AIIncedo Inc. · Gurugram, India

  • Owned and productionized a real-time retrieval and ranking service (hybrid BM25 + dense retrieval with cross-encoder reranking) over 1,200+ documents, raising exact-match top-result accuracy from 41% to 84% and token-level F1 to 92.0.
  • Fine-tuned open-weight LLMs (Llama-2) for named-entity recognition on medical and telecom data, lifting entity F1 from ~0.75 zero-shot to ~0.90 in-domain (models published to Hugging Face) and surfacing a ~0.60 out-of-domain drop that guided fine-tune-vs-prompt decisions.
  • Owned production inference code serving live requests, tuning precision configurations (FP32/FP16/FP8/INT8) and benchmarking latency vs. throughput, directly trading off accuracy, latency, and cost to meet real-time SLAs without degrading quality.
  • Built schema-constrained extraction (LangChain, Pydantic, llama.cpp grammars) turning scanned PDFs, forms, and tables into validated fields at ~95% accuracy, plus an LLM guardrail/evaluation suite (relevancy, factual accuracy, completeness, jailbreak detection) that flagged ~12% of responses for review.
  • Engineered ranking pipelines (FAISS, BM25, sentence-transformers) evaluated against held-out relevance sets, reaching 9/10 top-k=3 accuracy, and built scalable multimodal data pipelines (text, PDF, image, audio, video) on Azure ML.
Jan 2023 - Jun 2023RISC-V · LLVM backend

Software Engineering InternKinara AI (acquired by NXP Semiconductors) · Hyderabad, India

  • Prototyped a RISC-V vector extension and LLVM backend (custom lowering for vectorized memory and scatter/gather intrinsics) to accelerate ML-kernel execution in cycle-accurate simulation.
04

Publications & profile

Publications & conferences

  • Validity of Machine Learning-Based COVID-19 Prediction

    Benchmarked and validated 7 classification models on 195k clinical records; quantified ~20% AUROC degradation under cross-continental distribution shift and released an open-source evaluation toolkit.

  • Auto Encoders for Communication-Efficient Distributed Learning

    Autoencoder-based method for communication-efficient distributed learning.

    AAAI Deployable AI Workshop · 2023
  • AI in Coronary Physiology: Where Do We Stand?

    Review of AI's role in cardiovascular disease detection and intervention.

    Review article
  • Exploring Protein Design Landscapes with Semi-Supervised Adaptive Sampling

    Poster: semi-supervised adaptive sampling over protein design landscapes.

    22nd Int'l Conference on Bioinformatics, Brisbane · Nov 2023
Let's talk

Building something where ML has to actually work in production?

dm6262@nyu.edu