Mission Control

Demo

Central command — log clusters, anomaly synopsis, and AI chat in one pane

Source:

Ask Mission Control

ai-sre chat

Ask about your logs — e.g. “What’s driving the error spike?”

Live Anomalies

4
New error signature — never seen before
prod/checkout-api
1first occurrence

novel event · 1,243,891,402 logs scanned

Memory cgroup out of memory: Killed process 4821 (java)
constraint=CONSTRAINT_MEMCG oom_memcg=/kubepods/burstable/pod-7c4f
Reason: OOMKilled Exit Code: 137

Zero history, no learned baseline, no monitor configured. Restarted in 8s, so all 42 pods stayed green and no threshold fired.

Risk if unchecked

First tremor of a memory leak. At peak load it OOM-kills every replica — all 42 checkout pods down, full checkout outage.

Pods affected: 1 / 42Restarted: 8sMonitors: 0Detected in: 6s14:02:08
CrashLoopBackOff following the OOM
prod/checkout-api
3

restarts in 2 min

Back-off restarting failed container app in pod checkout-api-7c4f9b
Last state: Terminated · Reason: Error · Exit Code: 137

Correlated to the cgroup OOM 40s earlier on the same pod.

Risk if unchecked

Each restart widens the back-off. The pod stops serving entirely and checkout capacity drops to 41/42 replicas.

Back-off: 5m0sPod: 7c4f9bDetected in: 7s14:02:49
Readiness probe failing
prod/checkout-api
0 / 3

readiness probes passing

Readiness probe failed: HTTP probe failed with statuscode: 503
Endpoint removed from Service load balancer

Pod marked NotReady; traffic shifted onto the 41 healthy replicas.

Risk if unchecked

The 41 remaining replicas now absorb extra load. If one more drops, checkout runs under capacity and latency spikes.

Endpoints: 41 / 42Probe: /healthzDetected in: 9s14:03:01
503 rate climbing at the edge
prod/api-gateway
4.2%

5xx rate · 210× over baseline

upstream checkout-api: 503 Service Unavailable
baseline 0.02% · now 4.2% · 210× over learned normal

Downstream blast radius of the checkout-api OOM cascade.

Risk if unchecked

Real users are already failing to check out. At 4.2% and climbing, every minute is lost orders and direct revenue loss.

Baseline: 0.02%Now: 4.2%Detected in: 11s14:03:20
Demo feed · replace with live anomaly subscription