Capstone Project (Advisor: Dr. Chris Tanner)
- Designed and implemented Markov Chain Monte Carlo and Reinforcement Learning approaches for globally optimizing BPE tokenization (entropy + compression objectives) training on the MiniPile corpus.
Final-year M.S. Data Science student at NYU with industry experience building LLM-based and agentic AI systems. My research spans ML, transformers, and multimodal models, with interests in retrieval, interpretability and tokenization.

Building LLM agentic workflows and retrieval systems.
Academic and capstone work spanning tokenization, imaging, and GNNs.
Industry roles focused on applied ML, retrieval systems, and systems work.
Selected publications and workshop presentations.
Proposed a novel method using autoencoders to optimize distributed learning and presented at the workshop.
Benchmarked 7 hematology-based prognostic models on 195k patient records across Brazil, Italy, and Western Europe; uncovered ~20% AUROC drop in cross-continental transfer and released an open-source validation toolkit.
Academic training in data science, computer science, and biology.
Technical stack spanning ML, systems, and deployment.
Let's discuss research, collaboration, or ML engineering roles.