Nishant Sharma — AI Engineer

About

I am an AI engineer who has spent the past year immersed in agentic AI — not just building with it, but understanding where it actually fails and designing systems that fail less. I started with experiments, went through a lot of dead ends, read a lot of research, and slowly built the intuition about what makes production retrieval and reasoning systems trustworthy rather than just impressive.

That year of work became Singularity — a production multi-agent research platform I shipped solo in 13 days. The speed was only possible because the hard decisions had already been made. Before writing a line of code I knew why most RAG systems fail (open-ended queries, no credibility filtering), what the architectural fix looked like (plan before you retrieve), and where the reliability boundaries were. The 13 days was execution. The year before it was the actual work.

I am a two-time founder with a strong bias toward shipping things that work for real people under real constraints. I am finishing my M.S. in Computer Engineering at NYU Tandon in May 2026 and looking for AI engineering roles where the problem is genuinely hard and the measure of the work is whether it actually helps someone do something that matters.

Resume Blog

Projects

Singularity — AI Deep Research Platform

Solo · ~25,500 LOC · 13 days

Mar 2026 – Apr 2026

A production multi-agent research platform that plans before it retrieves. The core insight driving the architecture: open-ended queries against a vector store produce semantically similar results, not relevant results. The fix is structural — agents construct a full report plan first, and every search query targets a specific planned section rather than a vague topic. Retrieval becomes deterministic given a good plan.

Phase-5 orchestration: 3 manager agents propose full report trees in parallel, a lead agent synthesizes a canonical plan, every query targets a planned section — measurably reducing hallucination and retrieval fanout
44-skill auto-registration system (__init_subclass__ metaclass, zero manual wiring) spanning 18 retrieval sources (ArXiv, PubMed, SEC EDGAR, GitHub, ClinicalTrials, YouTube transcripts, legal databases, and more), 18 analysis skills, 8 output skills
2-pass credibility-weighted source gate filters low-quality sources before synthesis regardless of semantic similarity score
Pure BYOK architecture: 10 models across xAI Grok, Gemini 2.5 Pro/Flash, and DeepSeek R1/V3 routed through a single injected LLM client; Fernet-encrypted keys at rest; JWT family rotation with reuse detection
fastembed/ONNX embeddings — 1.4GB lighter than sentence-transformers, no GPU at inference; deployed as 6-container Docker stack on AWS t3-small at ~$26/month

Stack: FastAPI · ARQ · LangGraph · Qdrant · fastembed/ONNX · Redis · PostgreSQL 16 · Next.js 16 · Docker · AWS · SSE

[ live ]

Wand — AI Career Intelligence Platform

Solo

Jan 2026 – Apr 2026

A career intelligence platform built around the observation that most resume-to-job matching tells you how similar two texts are, not whether you actually meet the requirements. Those are different problems. Wand separates them.

6-step wave-parallelized LLM pipeline (profile extraction → JD parsing → company intel → match scoring → contact strategy → action plan) using asyncio.gather(); each step streams its result via WebSocket in real time as it completes
Multi-model weighted scoring engine: Qualification Match (30%), Technical Skill (25%), Keyword (25%), Formatting (20%) — Gemini 2.5 Pro at temperature=0.0 for nuanced qualification reasoning, Flash for pattern-matching; evidence field enforced non-empty to prevent hallucinated matches
Unified LLMClient routing all AI calls across Grok, Gemini, and DeepSeek at runtime; all outputs as typed Pydantic models via Instructor — zero JSON parsing anywhere in the codebase
Multi-source profile unification (PDF resume + LinkedIn export + HTML portfolio) with semantic entity alignment and explicit merge-priority rules

Stack: FastAPI · Celery · Redis · WebSockets · Instructor · Pydantic · Gemini · DeepSeek · SQLite · Docker

Finassistant — Multi-Agent Financial Research Platform

Solo

Nov 2025 – Dec 2025

Dual-mode LangGraph system: Chat mode (2–4 LLM calls, 2–4s latency) for quick queries; Think mode (10–15 calls, 15–30s) with Planner → Financial Agent → Publisher workflow and self-correction loops
18+ financial tools: quantitative metrics (yfinance), news aggregation (NewsAPI + BeautifulSoup), SEC filing RAG pipeline (10-K/10-Q/8-K → ChromaDB, <200ms vector search)
MongoDB-backed session persistence; real-time WebSocket streaming; deployed AWS EC2 + S3/CloudFront; 50+ req/sec throughput

Stack: LangGraph · FastAPI · ChromaDB · MongoDB · Gemini 1.5 Flash · React 18 · Docker · AWS

[ code ]

Snap2Caption — ML Systems for Caption Generation

Team · MLOps course project

Mar 2025 – May 2025

Fine-tuned LLaVA-1.5/1.6 (7B) with LoRA on InstaCities1M (100k urban images); FastAPI inference server with <2s P90 latency; ~300 req/hr peak
Full IaC stack: Terraform (VM provisioning on Chameleon Cloud), Ansible (config management), Kubernetes via Kubespray (3-node cluster); MLflow experiment tracking across BLEU, CIDEr, and test loss
Feedback-to-retraining loop: user corrections → MinIO → Label Studio annotation → MLflow artifact registration; Prometheus/Grafana monitoring

Stack: LLaVA · LoRA · PyTorch · MLflow · FastAPI · Terraform · Ansible · Kubernetes · Prometheus · Grafana

[ code ]

Research

Spectral Gradient Equalization (SGE) — ML Robustness

Spring 2026 · NYU Tandon · Advanced Machine Learning

A novel gradient-level intervention for spurious correlation mitigation in CNNs. The key insight: shortcut learning manifests as frequency-dominant gradient energy in a specific band. SGE applies 2D FFT over convolutional kernel spatial dimensions post-backpropagation, decomposes the weight gradient into spurious-band and causal-band components, and attenuates the spurious contribution via a per-channel scaling coefficient — without requiring group labels at training time.

+3.1pp worst-group accuracy over ERM baseline on Waterbirds benchmark (ResNet-18); +2.5pp post-DFR gain with Tail-loss pairing
Spurious-to-causal gradient energy ratio exceeds 1.0 from epoch 1 across all runs — shortcut-frequency dominance is an immediate training phenomenon, not late-stage
Benchmarked 5 objectives (ERM, Group-Balanced, Tail-loss, GroupDRO, JTT); diagnosed GroupDRO failure as calibration stationarity under adversarial reweighting
Unsupervised calibration via loss-stratified pseudo-groups estimates the radial frequency cutoff — no ground-truth group annotations required

Stack: PyTorch · ResNet-18 · Waterbirds · GroupDRO · JTT · DFR

SMOLSolver — Verifier-Guided Mathematical Reasoning

Sep 2025 – Dec 2025 · Team project (5 members) · NYU Tandon

Lightweight dual-model framework for step-level mathematical reasoning verification. My contribution: the Verifier module end-to-end.

Fine-tuned TinyLLaMA (1.1B) on PRM800K for step-level correctness classification (correct/unclear/wrong) via 2-phase LoRA — Phase 1 on simple-context examples, Phase 2 on complex multi-step reasoning chains
87.91% step classification accuracy; outperformed Phi-2 (81.5%) across QLoRA ablations at rank 16 and 64
Generator: Phi-2 (2.7B) fine-tuned on GSM8K via QLoRA achieved 64.06% Pass@1 with 7.86M trainable parameters (0.28%)

Stack: QLoRA · LoRA · TinyLLaMA · Phi-2 · PRM800K · GSM8K · HuggingFace PEFT

[ report ]

Cross-Domain Robustness in Image Super-Resolution

Sep 2025 – Dec 2025 · Team project · NYU Tandon

Benchmarked EDSR (CNN), SwinIR (Transformer), and Stable Diffusion x4 across DIV2K (natural), TextZoom (scene text), and STAR (astronomical) datasets under zero-shot and domain-adapted protocols
Introduced Cross-Domain Drop (CDD) metric; diffusion models exhibit catastrophic CDD (79.18% on TextZoom); domain-specific fine-tuning recovers 75–100% of lost performance
Key finding: text domains require 3x more adaptation (+10.6 dB) than astronomical imagery (+3.4 dB) due to high-frequency structural discontinuities

[ report ]

Jailbreaking Deep Vision Models — Adversarial Robustness

May 2025 · Team project · NYU Tandon

FGSM, I-FGSM, PGD, and patch attacks on ResNet-34 under strict l-infinity constraint (epsilon=0.02); PGD reduced Top-1 from 76% to 0.2%
Confirmed black-box transferability: ResNet-34 adversarial examples degraded DenseNet-121 Top-1 from 74.80% to 59.80% without target model access

[ report ]

Experience

Machine Learning Teaching Assistant · NYU Tandon School of Engineering

New York, NY

Sep 2025 – Present

Standardized PyTorch training pipelines (data loading, training loops, evaluation) used by 100+ students across course offerings
Led debugging sessions on CNN optimization, backpropagation, and autograd — helping students diagnose training failures and performance bottlenecks
Created supplementary video lectures on ML calculus concepts for a graduate cohort of 50+ students

Co-Founder & Lead Engineer · Macverin Technologies

India (Hybrid)

Jul 2022 – Aug 2024

Architected distributed microservices (FastAPI + Node.js) serving 1K+ DAU across 10+ clients; maintained 99.9% system availability
Built full-stack products end-to-end: TypeScript/React/Next.js frontends, ML inference backends, AWS deployments, automated CI/CD pipelines

Technical Skills

AI/ML: RAG pipelines, Multi-agent orchestration, LLM fine-tuning (LoRA/QLoRA), RLHF (GRPO/PPO/DPO), Instructor (structured outputs), LangGraph, LangChain, DSPy, Vector stores (Qdrant, ChromaDB, Pinecone), fastembed/ONNX, TensorRT, MLflow, PyTorch, HuggingFace Transformers, PEFT
Backend: FastAPI, PostgreSQL, Redis, ARQ, asyncpg, SQLAlchemy, Celery, WebSockets, SSE, REST, gRPC, Node.js, Django
Infra: Docker, Kubernetes, AWS (EC2, S3, RDS, IAM, CloudFront), Terraform, Ansible, GitHub Actions, Prometheus, Grafana, Caddy
Frontend: Next.js, React, TypeScript
Languages: Python, TypeScript, Java, C++, SQL

Teaching

Lecture: The Math Behind Regularization (L1 vs L2)

Nov 2025 · Technical Deep Dive

A 17-minute deep dive into why L1 regularization produces sparsity while L2 shrinks weights — derived from gradient updates and geometric intuition, backed by Python experiments on the Diabetes dataset. Built for students who had memorized the formula but not understood it.

[ slides ] [ YouTube ]

ML Course Assistant @ NYU Tandon: Mentoring 50+ graduate students. Standardized training pipelines used by 100+ students across course offerings.
Web Development Mentor: Taught 120+ students, enabling deployment of 40+ web apps.
LeetCode: 200+ problems solved, 52 solutions published, 4.6K+ community views.
Tech Blogger: Writing on AI, agents, and software at Medium.