ML Engineer · Distributed Training · LLM Systems · Computer Vision
I build things that work at scale -- and try to understand why they work at all.
I work in the gap between ML research and production engineering -- where the math is clean and the cluster is not.
Day-to-day: cloud-scale ML infrastructure at a hyperscaler, distributed training infrastructure, LLM safety systems, and the occasional Triton kernel when PyTorch decides it's done for the day. Most of my production work lives in private repos -- this is where the side projects land.
Things I care about technically
- Large-scale pre-training infrastructure -- MoE routing, fault-tolerant checkpointing, tensor/pipeline parallelism
- LLM safety and observability -- keeping models honest at inference time
- The hardware-software boundary: SIMD, CUDA, kernel-level optimization
- Novel architectures worth deploying, not just benchmarking
Things I care about less technically
- Code that impresses interviewers but breaks on week two
- Benchmarks that only win on synthetic data
- Documentation that describes the happy path and nothing else
Most projects here are built to solve a real problem, not to fill a portfolio. I'd rather have three things that work than ten that look good.
| Project | What it is | Status |
|---|---|---|
| Composed-MoE-Engine | Sparse MoE training runtime -- Triton Top-K routing, DP+EP+TP distributed, async sharded checkpointing, TorchElastic fault recovery | Active |
| GuardRail Studio | LLM firewall -- sub-10ms p99 inline guardrails, DistilRoBERTa + ONNX + Triton, continuous drift detection and LoRA retraining | Active |
| KANX | Production KAN library -- TF + PyTorch + ONNX, Docker/K8s ready, published to PyPI | Active · pip install kanx |
| RLHF-PPO-DPO | Modular RLHF framework -- PPO and DPO, reward model training, policy optimization | Active |
| SIMD Microkernels | C++ AVX2 kernels for ML primitives -- tiled GEMM, vectorized GeLU, Python bindings | Experimental |
| ML from scratch | NumPy-only implementations of supervised, unsupervised, RL, and Bayesian methods | Reference |
Not a comprehensive list. Just what I actually reach for.
Training & inference
PyTorch TensorFlow Triton ONNX TensorRT FSDP2 TorchElastic
LLM ecosystem
Transformers PEFT / LoRA vLLM LangChain FastAPI Triton Inference Server
Distributed & infra
NCCL Kubernetes Helm Terraform Airflow Ray
Observability
Prometheus Grafana OpenTelemetry Weights & Biases
Low-level
C++ AVX2 / SIMD CUDA pybind11
Data
PostgreSQL Qdrant MongoDB Spark Dask
Most of my interesting work happens in private repositories -- production systems at cloud scale where open-sourcing isn't an option. This GitHub is a public window, not the full picture.
That said: the repos here are held to the same standard as the private ones -- CI, tests, type checking, real benchmarks. If something is experimental, the README says so. I'd rather write documentation that admits limitations than one that hides them.
I'm particularly interested in the fault-tolerance problems that only appear at real cluster scale, the latency-accuracy tradeoffs in LLM safety systems, and the open question of whether KAN-style architectures will find their niche or stay a curiosity.
- Working on: fixing MoE engine chaos scenario A -- sudden node failure under expert resharding
- Reading: the Megatron-LM codebase and the FlexAttention paper
- Thinking about: whether MFU tracking gives you enough signal to catch silent training degradation early
The idea that a machine could hold memory across time -- that the past could shape the present through nothing more than a weight matrix -- was the moment I understood why this field is worth a lifetime.
The equation is simple. What it implies is not.
Outside of work I'm usually reading something I don't fully understand yet, listening to music that has no business being that good, and occasionally wondering if the model actually converged or if I just got lucky. I like working with people who say "I don't know" without embarrassment and argue about architecture in good faith.
Open to interesting conversations about distributed training, LLM infrastructure, or any hard ML systems problem worth losing sleep over.





