Agentic Incident Triage (MCP)
Production Model Context Protocol platform that autonomously correlates observability signals across Splunk, OTel, and Azure to triage incidents before a human pages.
Principal Site Reliability Engineer
Principal SRE with 16+ years building production-grade platforms for Tier-1 banks.
Each role includes queryable AI context — the real story behind the bullet points.
Loading…
A curated set of production systems — agentic AI, LLM observability, and platform reliability work shipped across 16+ years in regulated FinTech.
Production Model Context Protocol platform that autonomously correlates observability signals across Splunk, OTel, and Azure to triage incidents before a human pages.
End-to-end telemetry pipeline for LLM and agent workloads — adopted as the enterprise AI observability standard across the bank.
Terraform-driven cost telemetry, right-sizing, and SLO automation across AWS Fargate microservices — kept SLO attainment at 99.97% while cutting spend.
Architected and rolled out 20+ microservices with hardened CI/CD pipelines — accelerated delivery for regulated banking workloads.
Three buckets. No five-star ratings. No "expert in everything."
Paste a job description. Get an honest assessment of whether I'm the right person — including when I'm not.
"This signals something completely different than 'please consider my resume.' You're qualifying them. Your time is valuable too."