Playbook: Deploying an Autonomous SRE Agent for GKE on Cloudflare

Note: This playbook is published on autoclaw.sh, which is primarily focused on OpenClaw. This guide covers a separate project — an SRE agent for Kubernetes infrastructure. We’ve published it here as a practical reference for teams running similar stacks.

Overview

This playbook walks through deploying an autonomous SRE incident investigation agent. The agent:

Receives down alerts from Uptime Kuma via webhook
Runs an autonomous investigation loop against a GKE cluster using an LLM with Kubernetes tool access
Posts threaded findings to Google Chat

Stack: Cloudflare Workers + Durable Objects + Queues + KV + Workers AI (Gemma 4) + GKE + Uptime Kuma + Google Chat

Prerequisites

Cloudflare account with Workers paid plan (Durable Objects requires it)
GKE cluster with kubectl access
Uptime Kuma instance with at least one monitor configured
Google Chat space with incoming webhook configured
Node.js 18+ and npx wrangler available locally
A domain managed in Cloudflare DNS

Part 1 — GKE: Create a read-only ServiceAccount

Create a ServiceAccount with cluster-wide read permissions (no write, no secrets access).

# sre-agent-rbac.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: sre-agent-ns
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sre-agent
  namespace: sre-agent-ns
---
apiVersion: v1
kind: Secret
metadata:
  name: sre-agent-token
  namespace: sre-agent-ns
  annotations:
    kubernetes.io/service-account.name: sre-agent
type: kubernetes.io/service-account-token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: sre-agent-reader
rules:
  - apiGroups: [""]
    resources: ["namespaces", "pods", "pods/log", "events", "nodes", "services"]
    verbs: ["get", "list"]
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: sre-agent-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: sre-agent-reader
subjects:
  - kind: ServiceAccount
    name: sre-agent
    namespace: sre-agent-ns

Apply it:

kubectl apply -f sre-agent-rbac.yaml

Extract the bearer token:

kubectl get secret sre-agent-token -n sre-agent-ns \
  -o jsonpath='{.data.token}' | base64 -d

Save this token — you’ll use it as KUBECONFIG_TOKEN.

Get your GKE master endpoint:

kubectl cluster-info
# Kubernetes control plane is running at https://<IP_OR_HOSTNAME>

Part 2 — Cloudflare DNS: expose the GKE API

The GKE master endpoint is on the public internet but addressed by IP. Cloudflare Workers require a hostname. Create a DNS record on your domain:

Field	Value
Type	A
Name	`gke-api`
IPv4	`<GKE master IP>`
Proxy	Proxied (orange cloud)

Then add a Page Rule:

URL: gke-api.yourdomain.com/*
Setting: SSL → Full

This lets Cloudflare proxy requests to the GKE API without strict certificate verification (GKE’s cert is for its IP, not your subdomain).

The record should be proxied, not DNS-only. DNS-only would expose the GKE IP directly and cause TLS errors from Workers.

Part 3 — Cloudflare: Create infrastructure

Create the required Cloudflare resources:

export CLOUDFLARE_ACCOUNT_ID=<your-account-id>

# KV namespace for deduplication
npx wrangler kv namespace create DEDUPE_KV

# Queues
npx wrangler queues create sre-incident-queue
npx wrangler queues create sre-incident-dlq

Note the KV namespace ID from the output — you’ll need it in wrangler.toml.

Part 4 — Worker: configure wrangler.toml

name = "sre-incident-agent"
main = "src/index.ts"
compatibility_date = "2025-01-01"
compatibility_flags = ["nodejs_compat"]
account_id = "<your-cloudflare-account-id>"

[observability]
enabled = true
head_sampling_rate = 1

[ai]
binding = "AI"

[[durable_objects.bindings]]
name = "INCIDENT_DO"
class_name = "IncidentDO"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["IncidentDO"]

[[queues.producers]]
binding = "INCIDENT_QUEUE"
queue = "sre-incident-queue"

[[queues.consumers]]
queue = "sre-incident-queue"
max_batch_size = 5
max_batch_timeout = 2
max_retries = 3
dead_letter_queue = "sre-incident-dlq"

[[kv_namespaces]]
binding = "DEDUPE_KV"
id = "<kv-namespace-id-from-step-3>"

[vars]
ENVIRONMENT = "production"
GKE_API_SERVER = "https://gke-api.yourdomain.com"
LLM_MODEL = "@cf/google/gemma-4-26b-a4b-it"

Part 5 — Worker: set secrets

# GKE ServiceAccount bearer token (from Part 1)
npx wrangler secret put KUBECONFIG_TOKEN

# Google Chat incoming webhook URL
npx wrangler secret put GOOGLE_CHAT_WEBHOOK_URL

# Uptime Kuma webhook bearer token (choose any string, use same value in Kuma)
npx wrangler secret put UPTIME_KUMA_SECRET

Part 6 — Worker: deploy

npm install
npx wrangler deploy

Verify it’s live:

curl https://sre-incident-agent.<your-subdomain>.workers.dev/health
# {"ok":true,"environment":"production"}

Test K8s connectivity (also sends first 10 namespaces to Google Chat):

curl https://sre-incident-agent.<your-subdomain>.workers.dev/test/k8s
# {"ok":true,"message":"Connected to GKE. Found N namespaces..."}

Part 7 — Uptime Kuma: configure notification

In Uptime Kuma → Notifications → Add Notification:

Field	Value
Type	Webhook
Name	SRE Agent
URL	`https://sre-incident-agent.<subdomain>.workers.dev/webhook/uptime-kuma`
Method	POST
Content Type	`application/json`
Additional Headers	`{"Authorization": "Bearer <your-UPTIME_KUMA_SECRET>"}`

Hit Test — you should see “Test ping acknowledged” (200 OK). Uptime Kuma test pings send heartbeat: null which the agent handles gracefully without triggering an investigation.

Assign this notification to whichever monitors you want covered.

Part 8 — Verify end-to-end

Trigger a fake down event:

curl -X POST https://sre-incident-agent.<subdomain>.workers.dev/webhook/uptime-kuma \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-secret>" \
  -d '{
    "heartbeat": {"status": 0, "time": "2026-01-01T00:00:00Z", "msg": "Test"},
    "monitor": {"id": 1, "name": "my-service", "url": "https://my-service.example.com"},
    "msg": "Test incident"
  }'

Check Google Chat — you should see a thread appear with:

Alert received notification
Investigation step notifications as tool calls fire
A final card with root cause, recommended action, and confidence level

Check the incident state:

curl https://sre-incident-agent.<subdomain>.workers.dev/incidents/<incidentId>

Architecture reference

Uptime Kuma
    │  POST /webhook/uptime-kuma
    ▼
Cloudflare Worker (sre-incident-agent)
    │  verify bearer token
    │  deduplicate (KV)
    │  notify Google Chat: "received"
    │  enqueue (Workers Queue)
    ▼
Queue Consumer (same Worker)
    │  dispatch to Durable Object by incidentId
    ▼
IncidentDO (Durable Object)
    │  POST /start → schedule alarm
    ▼
alarm() loop [up to 20 turns, 3s between alarms]
    │
    ├── Gemma 4 (Workers AI)
    │       calls tools: list_pods, get_deployment_status,
    │                     get_recent_events, get_pod_logs,
    │                     get_rollout_status, get_node_readiness
    │
    ├── KubeHttpClient
    │       → https://gke-api.yourdomain.com  (Cloudflare proxy)
    │       → GKE master endpoint             (Full SSL, bearer token)
    │
    └── Google Chat webhook (threaded by incidentId)
            "investigation step N"
            "complete: root cause + recommended action"

Security notes

The ServiceAccount is strictly read-only — no write access, no access to Secrets
The webhook endpoint validates a bearer token on every request
The GKE endpoint is protected by the ServiceAccount token even when DNS-proxied through Cloudflare
Secrets are stored as Cloudflare Worker secrets (encrypted at rest, not in wrangler.toml)
The deduplication KV prevents the same alert from triggering multiple investigations

Tuning

Parameter	Location	Default	Notes
Max tool calls	`loop.ts` `MAX_TURNS`	20	Increase for complex clusters
Alarm interval	`IncidentDO.ts`	3s	Rate limit buffer for Workers AI
Queue batch size	`wrangler.toml`	5	Max concurrent incidents processed
Queue retries	`wrangler.toml`	3	Before DLQ
LLM model	`wrangler.toml` `LLM_MODEL`	`@cf/google/gemma-4-26b-a4b-it`	Any Workers AI model with tool calling