Claude Code — Autonomous Development Pipeline (GSD + autonomous-dev)
The strongest known pattern for fully autonomous requirements → prototype development using Claude Code, combining GSD for planning/context management with the autonomous-dev harness for adversarial verification.
Why / When to Use
Use when you need to run a multi-phase development task overnight or in CI without human-in-the-loop review at each step. The pipeline is designed to catch its own failures before declaring “done.”
Core Concept
Four fundamental failure modes in autonomous coding, and how this stack addresses them:
| Failure | Problem | Solution |
|---|---|---|
| Drift | Claude interprets requirements differently than intended | GSD locks requirements in PROJECT.md, REQUIREMENTS.md before code is written |
| Context rot | Quality degrades mid-execution on long tasks | GSD spawns fresh subagent contexts (200K window each), rotates as needed |
| No verification gate | ”Done” = Claude says done, not actually working | autonomous-dev: 0 test failures gate + spec-blind reviewer |
| No recovery | Failure at step 6 = restart from 0 | GSD’s verify step diagnoses, generates fix plans, re-executes |
The Two Components
GSD (Get Shit Done)
Handles the requirements → structured plan → execution phase.
State files GSD maintains across sessions:
PROJECT.md— project rules, constraints, architectural decisionsREQUIREMENTS.md— full feature spec locked before any codeROADMAP.md— phases, milestones, what’s been completedSTATE.md— current phase, what’s in progress, what’s next
Key commands:
# Bootstrap from requirements file, fully autonomous
gsd headless new-milestone --context requirements.md --auto
# Or interactive phase-by-phase
/gsd-new-project # parallel research agents → roadmap
/gsd-discuss-phase # lock decisions: API shapes, data model
/gsd-plan-phase # 2–3 tasks per plan, fits in 50% context window
/gsd-execute-phase # wave-based parallel subagents, atomic commits
/gsd-verify-work # diagnose → fix plan → re-execute
/gsd-autonomous # runs all phases to completion (headless)Install:
npx get-shit-done-cc@latest --claude --globalautonomous-dev Harness
Adds adversarial verification on top of GSD’s execution layer. The key innovation: a spec-blind reviewer agent tests the implementation without having seen the source code — only the acceptance criteria. This is the closest approximation to an independent QA reviewer.
Hard gates (pipeline stops if any fail):
- Tests written before implementation (spec → test → code order)
- 0 test failures required to proceed
- No stubs or placeholders allowed
- Security scan must pass
- Spec-blind validation: separate agent writes behavioural tests from acceptance criteria alone, then validates against the implementation
The adversarial layer:
implementer agent → builds the feature
reviewer agent → sees only: acceptance criteria + running code
→ writes its own tests from spec, never from implementation
→ verdict: pass / fail / escalate
Install and trigger:
/implement # runs the full autonomous-dev pipelineFull End-to-End Pipeline
requirements.md
↓
GSD: /gsd-new-project → parallel research agents → roadmap
↓
GSD: /gsd-discuss-phase → lock API shapes, data model
↓
GSD: /gsd-plan-phase → 2–3 tasks per plan, 50% context headroom
↓
GSD: /gsd-execute-phase → wave-based parallel subagents, atomic commits
↓
autonomous-dev: /implement
├── acceptance tests written BEFORE implementation
├── 0 failures gate — loops back if failing
├── no stubs/placeholders gate
└── security scan gate + spec-blind reviewer
↓
GSD: /gsd-verify-work → diagnose → fix plan → re-execute
↓
prototype on branch, PR opened
↓
YOU review diff and merge → production
Human Checkpoints (Intentionally Minimal)
| Checkpoint | Why human | Time |
|---|---|---|
Approve roadmap after /gsd-new-project | Confirm scope before any code | 5 min |
Review /gsd-discuss-phase decisions | Lock API shapes, data model | 10–15 min |
| Review final PR diff | Merge decision | Your call |
Everything else — research, planning, coding, testing, fixing, committing — runs autonomously.
GitHub Actions Pattern (Overnight / CI)
Trigger autonomously when requirements.md is pushed:
on:
push:
paths: ['requirements.md']
steps:
- name: Run autonomous pipeline
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
npm install -g @anthropic-ai/claude-code
npx get-shit-done-cc@latest --claude --global
claude --dangerously-skip-permissions -p \
"/gsd-autonomous" --max-turns 100Local Overnight Pattern
tmux new -s build
claude --dangerously-skip-permissions
/gsd-autonomous # runs all phases to completion
# detach (Ctrl+B, D), close laptopComparison with Simpler Approaches
| Method | Laptop needed? | Adversarial testing? | Context management | Best for |
|---|---|---|---|---|
| Ralph bash loop | Yes | No | None (relies on CLAUDE.md) | Simple sequential tasks |
| GSD alone | Yes | No | ★★★★★ | Large multi-phase projects |
| GSD + autonomous-dev | Yes/CI | ★★★★★ | ★★★★★ | Production-quality autonomous builds |
| Claude Code Routines | No | No | None | Scheduled cloud automation |
Gotchas
- The “fully autonomous, zero human touch” framing is aspirational — still want PR review gates for production
- Some hook behaviours and config details may shift between Claude Code versions; cross-check against code.claude.com/docs
- autonomous-dev’s spec-blind reviewer only works if acceptance criteria are precise — vague specs produce false passes
- GSD spawns many subagents; costs can accumulate quickly on large codebases
Source
Conversations “CC-Autonomous” (Claude Code project) and “Evaluating Claude code automation credibility” — 2026-05-19. Article by Kevin Collins (Echofold / Manus Fellow), published April 2026. GitHub: autonomous-dev harness repo.
Updates — 2026-05-20
Detailed Pipeline Walk-Through (CC-Autonomous conversation)
Full end-to-end pipeline confirmed from conversation:
requirements.md
↓
GSD: /gsd-new-project (parallel research agents → roadmap)
↓
GSD: /gsd-discuss-phase (lock decisions: API shapes, data model)
↓
GSD: /gsd-plan-phase (2–3 tasks per plan, fits in 50% context)
↓
GSD: /gsd-execute-phase (wave-based parallel subagents, atomic commits)
↓
autonomous-dev: /implement (hard gates fire here)
├── tests written BEFORE seeing implementation
├── 0 failures gate — loop back if failing
├── no stubs/placeholders gate
└── security scan gate
↓
GSD: /gsd-verify-work (diagnose → fix plan → re-execute)
↓
prototype on branch, PR opened
↓
YOU review diff and merge → production
Minimal human checkpoints:
| Checkpoint | Why human | Time |
|---|---|---|
Approve roadmap after /gsd-new-project | Confirm scope before any code | 5 min |
Review /gsd-discuss-phase decisions | Lock API shapes, data model | 10–15 min |
| Review final PR diff | Merge decision | Your call |
GitHub Actions trigger on requirements push:
on:
push:
paths: ['requirements.md']
steps:
- name: Run autonomous pipeline
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
npm install -g @anthropic-ai/claude-code
npx get-shit-done-cc@latest --claude --global
claude --dangerously-skip-permissions -p \
"/gsd-autonomous" --max-turns 100Overnight local run:
tmux new -s build
claude --dangerously-skip-permissions
/gsd-autonomous # runs all phases to completion; detach and sleepCredibility Note on Echofold Article
Kevin Collins (Claude Ambassador, Manus Fellow, founder of Echofold) published an article on this pipeline (April 2026). Technical details are accurate. The “fully autonomous, zero human touch” framing is aspirational for production — PR review gates remain important. Best sections: Phase 3 (Hooks), Phase 9 (Agent SDK/session-per-ticket), complete config reference.
Source: Conversations “CC-Autonomous” and “Evaluating Claude code automation credibility” — 2026-05-20
Updates — 2026-05-21
Landscape Shift: Superpowers + oh-my-claudecode Now Dominant
The CC-Autonomous conversation (2026-05-21) confirms the stack has evolved significantly. Three frameworks now lead the Claude Code ecosystem:
GStack — thinking/research layer GSD — context management (still valid, unchanged) Superpowers — execution pipeline with mandatory TDD enforcement
Superpowers (replaces autonomous-dev)
7-phase pipeline: Brainstorm → Spec → Plan → TDD → Subagent Dev → Review → Finalize
Key design choice: mandatory TDD with architectural enforcement — code written before tests are deleted and a restart is forced. This is harder than a hook check; it’s built into the pipeline structure.
- 124K GitHub stars (vs autonomous-dev’s 27K) → larger community, faster fixes when Claude Code updates break things
- Jesse Vincent: Claude Code with Superpowers can work autonomously for hours without deviating from the initial plan
# Install via Claude Code marketplace (details TBC)
# Run: /superpowers-setupoh-my-claudecode — Single-Stack Alternative
Multi-agent orchestration plugin: 19 specialized agents, 36 built-in skills. Covers analysis → design → planning → execution → QA → verification from one install.
5 execution modes:
| Mode | Description |
|---|---|
| Autopilot | Single-threaded, traditional |
| Ultrapilot | 5 concurrent workers, 3–5x speedup |
| Team | Staged pipeline: plan → PRD → execute → verify → fix (quality gates) |
| Ralph | Persistent execution with verify-fix loops (overnight) |
| Ecomode | Cost-optimized with automatic model routing |
Performance:
- 3–5x faster on large projects via parallel execution (Ultrapilot)
- 30–50% cheaper via automatic model routing (Opus for hard tasks, Haiku for simple ones)
- Zero config: install via Claude Code marketplace →
/omc-setup→ done
# Install via Claude Code marketplace
/omc-setup
# Team mode: plan → PRD → execute → verify pipeline
# Ralph mode: overnight verify-fix loopUpdated Recommendation Matrix
| Goal | Best stack |
|---|---|
| Maximum quality gates + TDD enforcement | Superpowers + GSD |
| Simplest single install, requirements → prototype | oh-my-claudecode (Team + Ralph mode) |
| Requirements → prototype with spec traceability | Speckit + Superpowers + GSD |
| Full autonomous overnight, no babysitting | oh-my-claudecode Ralph mode or /gsd-autonomous |
| Recommended for most use cases | oh-my-claudecode Team mode + GSD (context isolation) |
Source: Conversation “CC-Autonomous” — 2026-05-21
Updates — 2026-05-24
Cherry-Picked agent-skills for the Stack (Addy Osmani)
Evaluated github.com/addyosmani/agent-skills (43K GitHub stars) — 23 production-grade skills covering Define → Plan → Build → Verify → Review → Ship. Most overlap with existing tools (Speckit, Superpowers, gstack), but three add genuine, non-overlapping value for a Next.js 16 / Django 6 project:
| Skill | When to Use |
|---|---|
| doubt-driven-development | Any production / security / irreversible decision. Pattern: CLAIM → EXTRACT → DOUBT → RECONCILE → STOP. Optional cross-model escalation. |
| source-driven-development | Any work touching Next.js 16, Django 6, or cutting-edge frameworks. Grounds every decision in official docs; flags unverified claims. Enforces the “verify against official docs” warning already in CLAUDE.md. |
| context-engineering | Session starts and when switching between backend/frontend. Feeds agents the right information at the right time via rules files, context packing, MCP integrations. |
Install:
/plugin marketplace add addyosmani/agent-skills
/plugin install agent-skills@addy-agent-skillsCLAUDE.md activation (selective — skip the rest):
## agent-skills (selective)
- Use doubt-driven-development for any production/security/irreversible decisions
- Use source-driven-development when using Next.js 16, Django 6, or any
framework where training data may be outdated
- Use context-engineering at session start and when switching between
backend and frontend tasksUpdated stack with clean job separation:
- Speckit → WHAT to build (spec + plan)
- agent-skills → HOW to think (doubt, source-verify, context)
- Superpowers → HOW to execute (TDD, subagents, gates)
- gstack → HOW to verify + ship (qa, review, cso, ship)
Source: Conversation “CC-Autonomous” — 2026-05-24. Repo by Addy Osmani (Google Chrome team lead). See also addy-osmani.