Phuriwaj

Agentic Cost Control

Core Idea

Token tracking alone is insufficient for cost control in agentic AI systems. Production agent pipelines need per-task spend caps, trajectory scoring, and webhook stop signals built into the AI gateway β€” not bolted on after the fact.

Why This Matters

A single poorly-scoped agentic task can silently consume hundreds of dollars. Devin averages ~800 LLM turns per task. Without hard stops, a runaway agent can exhaust a monthly budget on one bad run. This is infrastructure-level risk, not a prompt problem.

Key Points

  • Per-task spend caps β€” set a max_budget_usd on each agentic call; cut off the session if the cap is hit
  • Trajectory scoring β€” evaluate whether the agent is making progress per turn; abort if stuck in a loop or producing low-value output
  • Webhook stop signals β€” your AI gateway should expose a kill signal that external monitoring can trigger (e.g. a cost alert fires, webhook stops the session)
  • Token tracking is a lagging indicator β€” by the time you see high token counts, the cost is already incurred; you need predictive budget accounting
  • Model selection matters β€” routing cheap/fast tasks to smaller models (MiniMax, Haiku) and reserving Opus/Sonnet for hard reasoning tasks can cut costs 3–5Γ— without quality loss

Benchmark

  • Devin: ~800 LLM turns per task, a bug-fix task can cost $180 and return a non-compiling PR
  • Claude Code: ~30 turns for equivalent tasks
  • Rule of thumb: 1 active agentic Claude Code session = 2–5 concurrent API requests at the gateway level

Connections

Source

Conversation: β€œLLM-powered news search and summarization sites” β€” 2026-05-23 AI Dev Brief; GSD autonomous-dev pipeline analysis β€” 2026-05-19/20