About
I am an AI engineer who has spent the past year immersed in agentic AI — not just building with it, but understanding where it actually fails and designing systems that fail less. I started with experiments, went through a lot of dead ends, read a lot of research, and slowly built the intuition about what makes production retrieval and reasoning systems trustworthy rather than just impressive.
That year of work became Singularity — a production multi-agent research platform I shipped solo in 13 days. The speed was only possible because the hard decisions had already been made. Before writing a line of code I knew why most RAG systems fail (open-ended queries, no credibility filtering), what the architectural fix looked like (plan before you retrieve), and where the reliability boundaries were. The 13 days was execution. The year before it was the actual work.
I am a two-time founder with a strong bias toward shipping things that work for real people under real constraints. I am finishing my M.S. in Computer Engineering at NYU Tandon in May 2026 and looking for AI engineering roles where the problem is genuinely hard and the measure of the work is whether it actually helps someone do something that matters.
Projects
A production multi-agent research platform that plans before it retrieves. The core insight driving the architecture: open-ended queries against a vector store produce semantically similar results, not relevant results. The fix is structural — agents construct a full report plan first, and every search query targets a specific planned section rather than a vague topic. Retrieval becomes deterministic given a good plan.
- Phase-5 orchestration: 3 manager agents propose full report trees in parallel, a lead agent synthesizes a canonical plan, every query targets a planned section — measurably reducing hallucination and retrieval fanout
- 44-skill auto-registration system (
__init_subclass__metaclass, zero manual wiring) spanning 18 retrieval sources (ArXiv, PubMed, SEC EDGAR, GitHub, ClinicalTrials, YouTube transcripts, legal databases, and more), 18 analysis skills, 8 output skills - 2-pass credibility-weighted source gate filters low-quality sources before synthesis regardless of semantic similarity score
- Pure BYOK architecture: 10 models across xAI Grok, Gemini 2.5 Pro/Flash, and DeepSeek R1/V3 routed through a single injected LLM client; Fernet-encrypted keys at rest; JWT family rotation with reuse detection
- fastembed/ONNX embeddings — 1.4GB lighter than sentence-transformers, no GPU at inference; deployed as 6-container Docker stack on AWS t3-small at ~$26/month
Stack: FastAPI · ARQ · LangGraph · Qdrant · fastembed/ONNX · Redis · PostgreSQL 16 · Next.js 16 · Docker · AWS · SSE
A career intelligence platform built around the observation that most resume-to-job matching tells you how similar two texts are, not whether you actually meet the requirements. Those are different problems. Wand separates them.
- 6-step wave-parallelized LLM pipeline (profile extraction → JD parsing → company
intel → match scoring → contact strategy → action plan) using
asyncio.gather(); each step streams its result via WebSocket in real time as it completes - Multi-model weighted scoring engine: Qualification Match (30%), Technical Skill (25%), Keyword (25%), Formatting (20%) — Gemini 2.5 Pro at temperature=0.0 for nuanced qualification reasoning, Flash for pattern-matching; evidence field enforced non-empty to prevent hallucinated matches
- Unified LLMClient routing all AI calls across Grok, Gemini, and DeepSeek at runtime; all outputs as typed Pydantic models via Instructor — zero JSON parsing anywhere in the codebase
- Multi-source profile unification (PDF resume + LinkedIn export + HTML portfolio) with semantic entity alignment and explicit merge-priority rules
Stack: FastAPI · Celery · Redis · WebSockets · Instructor · Pydantic · Gemini · DeepSeek · SQLite · Docker
- Dual-mode LangGraph system: Chat mode (2–4 LLM calls, 2–4s latency) for quick queries; Think mode (10–15 calls, 15–30s) with Planner → Financial Agent → Publisher workflow and self-correction loops
- 18+ financial tools: quantitative metrics (yfinance), news aggregation (NewsAPI + BeautifulSoup), SEC filing RAG pipeline (10-K/10-Q/8-K → ChromaDB, <200ms vector search)
- MongoDB-backed session persistence; real-time WebSocket streaming; deployed AWS EC2 + S3/CloudFront; 50+ req/sec throughput
Stack: LangGraph · FastAPI · ChromaDB · MongoDB · Gemini 1.5 Flash · React 18 · Docker · AWS
- Fine-tuned LLaVA-1.5/1.6 (7B) with LoRA on InstaCities1M (100k urban images); FastAPI inference server with <2s P90 latency; ~300 req/hr peak
- Full IaC stack: Terraform (VM provisioning on Chameleon Cloud), Ansible (config management), Kubernetes via Kubespray (3-node cluster); MLflow experiment tracking across BLEU, CIDEr, and test loss
- Feedback-to-retraining loop: user corrections → MinIO → Label Studio annotation → MLflow artifact registration; Prometheus/Grafana monitoring
Stack: LLaVA · LoRA · PyTorch · MLflow · FastAPI · Terraform · Ansible · Kubernetes · Prometheus · Grafana
Research
A novel gradient-level intervention for spurious correlation mitigation in CNNs. The key insight: shortcut learning manifests as frequency-dominant gradient energy in a specific band. SGE applies 2D FFT over convolutional kernel spatial dimensions post-backpropagation, decomposes the weight gradient into spurious-band and causal-band components, and attenuates the spurious contribution via a per-channel scaling coefficient — without requiring group labels at training time.
- +3.1pp worst-group accuracy over ERM baseline on Waterbirds benchmark (ResNet-18); +2.5pp post-DFR gain with Tail-loss pairing
- Spurious-to-causal gradient energy ratio exceeds 1.0 from epoch 1 across all runs — shortcut-frequency dominance is an immediate training phenomenon, not late-stage
- Benchmarked 5 objectives (ERM, Group-Balanced, Tail-loss, GroupDRO, JTT); diagnosed GroupDRO failure as calibration stationarity under adversarial reweighting
- Unsupervised calibration via loss-stratified pseudo-groups estimates the radial frequency cutoff — no ground-truth group annotations required
Stack: PyTorch · ResNet-18 · Waterbirds · GroupDRO · JTT · DFR
Lightweight dual-model framework for step-level mathematical reasoning verification. My contribution: the Verifier module end-to-end.
- Fine-tuned TinyLLaMA (1.1B) on PRM800K for step-level correctness classification (correct/unclear/wrong) via 2-phase LoRA — Phase 1 on simple-context examples, Phase 2 on complex multi-step reasoning chains
- 87.91% step classification accuracy; outperformed Phi-2 (81.5%) across QLoRA ablations at rank 16 and 64
- Generator: Phi-2 (2.7B) fine-tuned on GSM8K via QLoRA achieved 64.06% Pass@1 with 7.86M trainable parameters (0.28%)
Stack: QLoRA · LoRA · TinyLLaMA · Phi-2 · PRM800K · GSM8K · HuggingFace PEFT
- Benchmarked EDSR (CNN), SwinIR (Transformer), and Stable Diffusion x4 across DIV2K (natural), TextZoom (scene text), and STAR (astronomical) datasets under zero-shot and domain-adapted protocols
- Introduced Cross-Domain Drop (CDD) metric; diffusion models exhibit catastrophic CDD (79.18% on TextZoom); domain-specific fine-tuning recovers 75–100% of lost performance
- Key finding: text domains require 3x more adaptation (+10.6 dB) than astronomical imagery (+3.4 dB) due to high-frequency structural discontinuities
- FGSM, I-FGSM, PGD, and patch attacks on ResNet-34 under strict l-infinity constraint (epsilon=0.02); PGD reduced Top-1 from 76% to 0.2%
- Confirmed black-box transferability: ResNet-34 adversarial examples degraded DenseNet-121 Top-1 from 74.80% to 59.80% without target model access
Experience
- Standardized PyTorch training pipelines (data loading, training loops, evaluation) used by 100+ students across course offerings
- Led debugging sessions on CNN optimization, backpropagation, and autograd — helping students diagnose training failures and performance bottlenecks
- Created supplementary video lectures on ML calculus concepts for a graduate cohort of 50+ students
- Architected distributed microservices (FastAPI + Node.js) serving 1K+ DAU across 10+ clients; maintained 99.9% system availability
- Built full-stack products end-to-end: TypeScript/React/Next.js frontends, ML inference backends, AWS deployments, automated CI/CD pipelines
Technical Skills
- AI/ML: RAG pipelines, Multi-agent orchestration, LLM fine-tuning (LoRA/QLoRA), RLHF (GRPO/PPO/DPO), Instructor (structured outputs), LangGraph, LangChain, DSPy, Vector stores (Qdrant, ChromaDB, Pinecone), fastembed/ONNX, TensorRT, MLflow, PyTorch, HuggingFace Transformers, PEFT
- Backend: FastAPI, PostgreSQL, Redis, ARQ, asyncpg, SQLAlchemy, Celery, WebSockets, SSE, REST, gRPC, Node.js, Django
- Infra: Docker, Kubernetes, AWS (EC2, S3, RDS, IAM, CloudFront), Terraform, Ansible, GitHub Actions, Prometheus, Grafana, Caddy
- Frontend: Next.js, React, TypeScript
- Languages: Python, TypeScript, Java, C++, SQL
Teaching
- ML Course Assistant @ NYU Tandon: Mentoring 50+ graduate students. Standardized training pipelines used by 100+ students across course offerings.
- Web Development Mentor: Taught 120+ students, enabling deployment of 40+ web apps.
- LeetCode: 200+ problems solved, 52 solutions published, 4.6K+ community views.
- Tech Blogger: Writing on AI, agents, and software at Medium.