Principal Engineer · Fractional Architecture & SRE Advisory

I build the systems that operate the systems —
and now the systems that reason about them.

25+ years turning operational complexity into automated, measurable reliability. Today: petabyte/day SIEM & observability and production agentic-AI operations — autonomous systems that investigate, diagnose, and remediate, with humans kept in the loop for governance and audit.

PB/daySIEM & telemetry at scale
25+ yrsCisco · TiVo · T-Mobile
3agentic integrations shipped
2,000+tests across OSS frameworks
Why teams bring me in

Data- and AI-heavy platforms are exactly the problem I solve

Most data and AI platforms are the same problem underneath: ingest from many noisy sources, normalize and aggregate, turn signals into intent, and expose it all through a reliable, governable API. That pipeline — and the scale and compliance pressure around it — is my daily work.

Ingestion & data quality

🧹 Multi-source pipelines that stay clean

High-volume ingestion, routing, and shaping across heterogeneous sources — controlling cost, noise, and fidelity. The difference between a demo and a product is what happens to the data on the way in.

Scale & reliability

📈 Built to not fall over

I architect systems where I already know what breaks first as volume and customers grow — and the cost to fix it before it bites. SLOs, observability, and graceful degradation by default.

Signals & automation

🤖 Signal → intent → action

Agentic systems that analyze, predict, and act — with safety rails. The layer that turns raw aggregated data into decisions a product can charge for.

Governance & trust

⚖️ Compliant, auditable, defensible

I ship automation into regulated, audited environments. When a platform is built on sensitive or third-party data, the data-rights and governance posture is as important as the model — and I'll flag it early.

The data-platform spine I work across
01
Ingest

Public, court, market & social sources — resilient collectors

02
Normalize

Clean, dedupe, entity-resolve, standardize

03
Signals

Aggregate & model intent / demand trends

04
API

Reliable, versioned, governed access by geo

05
UI & Insights

Campaigns, dashboards, behavioral analytics

Capabilities

What I bring to the table

🧠

Agentic AI Operations

Production autonomous systems with governance built in.

  • Investigate → root-cause → remediate loops
  • LLM agent frameworks & tool use
  • Human-in-the-loop & audit trails
🛰️

SIEM & Observability

Petabyte-scale detection and telemetry.

  • Splunk · ADX · Cribl · Vector · Anvilogic
  • Prometheus · Grafana · OpenTelemetry
  • MTTD/MTTR detection & response tuning
🗄️

Data & Pipelines

High-volume ingestion to analytics.

  • Snowflake · ADX · streaming ETL
  • Cost/noise/fidelity tuning
  • Schema & entity resolution
☁️

Cloud & Containers

Resilient, reproducible infrastructure.

  • Azure (ACA/ACI) · Kubernetes · AWS
  • Ephemeral sandboxed workloads
  • Helm · Terraform · GitOps
🚀

Release & CI/CD

Ship safely, roll back faster.

  • GitLab CI/CD orchestration
  • Multi-environment deploys
  • Automated rollback & safe deploys
🧭

Architecture & Advisory

Senior judgment, prioritized roadmaps.

  • Scalability & risk assessment
  • Build-vs-buy & first-hire guidance
  • Data-rights & compliance flags
Proof, not promises

Shipped work you can click

Live demo

AS-Demo — Agentic Operations Platform

A working platform showcasing autonomous workflows across Confluence, JIRA, and Splunk: incident response, SRE on-call triage, change management, and knowledge sync — backed by a real observability stack (Grafana LGTM, Redis, queue orchestration).

Hosted on this server · Open the live demo →
Open source

Assistant-Skills Frameworks

Production frameworks for building AI-powered operational tooling — natural-language interfaces to Splunk and JIRA, with 2,000+ tests. The same agentic patterns that power autonomous investigation and remediation.

This page is itself a small proof. It's served from a single DigitalOcean droplet I administer — running containerized nginx, Redis, a queue service, and a full Grafana LGTM observability stack — with automated TLS that now renews itself. Infrastructure I set up, broke nothing taking over, and hardened.
Agentic AI: Claude Code · MCP · LLM agents SIEM: Splunk · ADX · Cribl · Vector · Anvilogic Observability: Prometheus · Grafana · OpenTelemetry · Loki Data: Snowflake · ADX · streaming ETL Cloud: Azure ACA/ACI · Kubernetes · AWS CI/CD: GitLab · Helm · Terraform Languages: Python · Go · Java · TypeScript
Engage

Let's talk about your platform

I work as a fractional technical advisor — architecture, SRE/reliability, and data. A typical start is a focused, fixed-fee discovery: current-state assessment, scalability & risk analysis, and a prioritized 90-day roadmap.