13+ years at AMD & Xilinx bridging Applied AI and systems engineering. Built end-to-end AI based solutions. Mentored 10+ engineers across global sites.
Hackathons & Research
- Deep RL for FloorPlan Optimization — AMD Internal Conference Finalist; GIN on 15M-node netlists, 2% QoR gain. (arXiv pending)
- ML-based Delay Prediction for EDA — AMD Internal Conference Finalist; GNN complexity models with automated fine-tuning & drift detection
- Adaptive OFDM Pilots — IEEE WAMICON 2009; adaptive pilot placement for OFDM systems
- ALTERA Design Challenge — Top 15, Innovate India Design Contest 2007
- ImageNet Classifier — ResNet-50, 77.4% Top-1 on ImageNet-1K; CutMix, MixUp, Random Erasing, LR Finder
- Elite Mentorship Program — AMD
Applied AI
- Researching auto-generated scheduling algorithms for executing large computational DAGs on memory-constrained accelerators — where tensors far exceed on-chip capacity, requiring orchestrated data movement across execution stages
- Approach: auto-research pipeline identifies DAG motifs and synthesizes motif-specific scheduling strategies using a combination of greedy & heuristic algorithms with AI-assisted heuristic selection via Multi-arm Bandit — MAB learns which heuristic action to apply per motif, avoiding one-size-fits-all solutions — 50% latency reduction vs. baseline schedulers
- Built agentic auto-optimizer for MoE workload optimization on B200: two-agent feedback loop where one agent iteratively runs, builds, and evaluates kernels while a second agent profiles using Nsight Compute & Nsight Systems — loop continues until hitting the SPOL limit, achieving ~20% gain over naive PyTorch baseline
- Deep RL for FPGA directive optimization: GIN feature extraction on 15M-node netlists, reward shaping — 2% QoR gain
- Ray distributed training with Grid/ASHA/PBT hyperparameter search for scalable RL experiments
- GNN delay prediction with automated fine-tuning, monitoring, and drift detection
- Production Agentic AI framework: reverse-engineers all EDA tools to build a skill document cataloguing every known error pattern and its resolution; during live sessions, the agent reasons over the error against the skill & Python docs to suggest targeted fixes — graph-based LLM orchestration, iterative self-correction, Dockerized evaluation
Backend Engineering
- Mentored 10+ engineers across global sites on simulation tooling for 10nm/7nm/2nm FPGA nodes
- Client/server system (Boost Asio + Protobuf) for concurrent multi-capture — 3x throughput
- Divide-and-conquer parallel processing via LSF Farm — 20x scale
- Graph compression pipeline: 3.5B datapoints → 500K patterns with Python analytics
- Tool profiling, linters, dashboards, and YAML semantic verifier for HW/SW validation
Technical Skills
ML/DL Frameworks: PyTorch, HuggingFace, TRL, vLLM, DeepSpeed, FSDP
RL & Agents: Stable Baselines 3, Ray, Gym, multi-agent orchestration
GPU & Performance: Triton, Flash Attention, Nsight Systems/Compute, CUDA, mixed-precision
Languages: Python, C++, Golang
Infrastructure: Kubernetes, Docker, Ray, LSF, W&B, Optuna
Education
M.Eng Electrical Engineering — University of Cincinnati, 2012
B.Eng Electronics & Communication — Anna University, 2007
Certifications
Triton Kernel Dev on AMD Instinct GPUs · LLM Serving with vLLM & MI300X · Agentic Framework (HuggingFace) · Generative AI with LLMs (DeepLearning.AI) · ML Ops (DeepLearning.AI) · Machine Learning (Stanford) · Analytics Edge (MITx) · Parallel & Distributed Computing (Rice) · Kubernetes (Udacity) · Big Data with Spark (Berkeley)