Claude Code — Autonomous Development Pipeline (GSD + autonomous-dev)

The strongest known pattern for fully autonomous requirements → prototype development using Claude Code, combining GSD for planning/context management with the autonomous-dev harness for adversarial verification.

Why / When to Use

Use when you need to run a multi-phase development task overnight or in CI without human-in-the-loop review at each step. The pipeline is designed to catch its own failures before declaring “done.”

Core Concept

Four fundamental failure modes in autonomous coding, and how this stack addresses them:

Failure	Problem	Solution
Drift	Claude interprets requirements differently than intended	GSD locks requirements in PROJECT.md, REQUIREMENTS.md before code is written
Context rot	Quality degrades mid-execution on long tasks	GSD spawns fresh subagent contexts (200K window each), rotates as needed
No verification gate	”Done” = Claude says done, not actually working	autonomous-dev: 0 test failures gate + spec-blind reviewer
No recovery	Failure at step 6 = restart from 0	GSD’s verify step diagnoses, generates fix plans, re-executes

The Two Components

GSD (Get Shit Done)

Handles the requirements → structured plan → execution phase.

State files GSD maintains across sessions:

PROJECT.md — project rules, constraints, architectural decisions
REQUIREMENTS.md — full feature spec locked before any code
ROADMAP.md — phases, milestones, what’s been completed
STATE.md — current phase, what’s in progress, what’s next

Key commands:

# Bootstrap from requirements file, fully autonomous
gsd headless new-milestone --context requirements.md --auto
 
# Or interactive phase-by-phase
/gsd-new-project       # parallel research agents → roadmap
/gsd-discuss-phase     # lock decisions: API shapes, data model
/gsd-plan-phase        # 2–3 tasks per plan, fits in 50% context window
/gsd-execute-phase     # wave-based parallel subagents, atomic commits
/gsd-verify-work       # diagnose → fix plan → re-execute
/gsd-autonomous        # runs all phases to completion (headless)

Install:

npx get-shit-done-cc@latest --claude --global

autonomous-dev Harness

Adds adversarial verification on top of GSD’s execution layer. The key innovation: a spec-blind reviewer agent tests the implementation without having seen the source code — only the acceptance criteria. This is the closest approximation to an independent QA reviewer.

Hard gates (pipeline stops if any fail):

Tests written before implementation (spec → test → code order)
0 test failures required to proceed
No stubs or placeholders allowed
Security scan must pass
Spec-blind validation: separate agent writes behavioural tests from acceptance criteria alone, then validates against the implementation

The adversarial layer:

implementer agent    → builds the feature
reviewer agent       → sees only: acceptance criteria + running code
                     → writes its own tests from spec, never from implementation
                     → verdict: pass / fail / escalate

Install and trigger:

/implement    # runs the full autonomous-dev pipeline

Full End-to-End Pipeline

requirements.md
  ↓
GSD: /gsd-new-project   → parallel research agents → roadmap
  ↓
GSD: /gsd-discuss-phase → lock API shapes, data model
  ↓
GSD: /gsd-plan-phase    → 2–3 tasks per plan, 50% context headroom
  ↓
GSD: /gsd-execute-phase → wave-based parallel subagents, atomic commits
  ↓
autonomous-dev: /implement
  ├── acceptance tests written BEFORE implementation
  ├── 0 failures gate — loops back if failing
  ├── no stubs/placeholders gate
  └── security scan gate + spec-blind reviewer
  ↓
GSD: /gsd-verify-work   → diagnose → fix plan → re-execute
  ↓
prototype on branch, PR opened
  ↓
YOU review diff and merge → production

Human Checkpoints (Intentionally Minimal)

Checkpoint	Why human	Time
Approve roadmap after `/gsd-new-project`	Confirm scope before any code	5 min
Review `/gsd-discuss-phase` decisions	Lock API shapes, data model	10–15 min
Review final PR diff	Merge decision	Your call

Everything else — research, planning, coding, testing, fixing, committing — runs autonomously.

GitHub Actions Pattern (Overnight / CI)

Trigger autonomously when requirements.md is pushed:

on:
  push:
    paths: ['requirements.md']
 
steps:
  - name: Run autonomous pipeline
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    run: |
      npm install -g @anthropic-ai/claude-code
      npx get-shit-done-cc@latest --claude --global
      claude --dangerously-skip-permissions -p \
        "/gsd-autonomous" --max-turns 100

Local Overnight Pattern

tmux new -s build
claude --dangerously-skip-permissions
/gsd-autonomous     # runs all phases to completion
# detach (Ctrl+B, D), close laptop

Comparison with Simpler Approaches

Method	Laptop needed?	Adversarial testing?	Context management	Best for
Ralph bash loop	Yes	No	None (relies on CLAUDE.md)	Simple sequential tasks
GSD alone	Yes	No	★★★★★	Large multi-phase projects
GSD + autonomous-dev	Yes/CI	★★★★★	★★★★★	Production-quality autonomous builds
Claude Code Routines	No	No	None	Scheduled cloud automation

Gotchas

The “fully autonomous, zero human touch” framing is aspirational — still want PR review gates for production
Some hook behaviours and config details may shift between Claude Code versions; cross-check against code.claude.com/docs
autonomous-dev’s spec-blind reviewer only works if acceptance criteria are precise — vague specs produce false passes
GSD spawns many subagents; costs can accumulate quickly on large codebases

Source

Conversations “CC-Autonomous” (Claude Code project) and “Evaluating Claude code automation credibility” — 2026-05-19. Article by Kevin Collins (Echofold / Manus Fellow), published April 2026. GitHub: autonomous-dev harness repo.

Updates — 2026-05-20

Detailed Pipeline Walk-Through (CC-Autonomous conversation)

Full end-to-end pipeline confirmed from conversation:

requirements.md
↓
GSD: /gsd-new-project       (parallel research agents → roadmap)
↓
GSD: /gsd-discuss-phase     (lock decisions: API shapes, data model)
↓
GSD: /gsd-plan-phase        (2–3 tasks per plan, fits in 50% context)
↓
GSD: /gsd-execute-phase     (wave-based parallel subagents, atomic commits)
↓
autonomous-dev: /implement  (hard gates fire here)
  ├── tests written BEFORE seeing implementation
  ├── 0 failures gate — loop back if failing
  ├── no stubs/placeholders gate
  └── security scan gate
↓
GSD: /gsd-verify-work       (diagnose → fix plan → re-execute)
↓
prototype on branch, PR opened
↓
YOU review diff and merge → production

Minimal human checkpoints:

Checkpoint	Why human	Time
Approve roadmap after `/gsd-new-project`	Confirm scope before any code	5 min
Review `/gsd-discuss-phase` decisions	Lock API shapes, data model	10–15 min
Review final PR diff	Merge decision	Your call

GitHub Actions trigger on requirements push:

on:
  push:
    paths: ['requirements.md']
 
steps:
  - name: Run autonomous pipeline
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    run: |
      npm install -g @anthropic-ai/claude-code
      npx get-shit-done-cc@latest --claude --global
      claude --dangerously-skip-permissions -p \
        "/gsd-autonomous" --max-turns 100

Overnight local run:

tmux new -s build
claude --dangerously-skip-permissions
/gsd-autonomous   # runs all phases to completion; detach and sleep

Credibility Note on Echofold Article

Kevin Collins (Claude Ambassador, Manus Fellow, founder of Echofold) published an article on this pipeline (April 2026). Technical details are accurate. The “fully autonomous, zero human touch” framing is aspirational for production — PR review gates remain important. Best sections: Phase 3 (Hooks), Phase 9 (Agent SDK/session-per-ticket), complete config reference.

Source: Conversations “CC-Autonomous” and “Evaluating Claude code automation credibility” — 2026-05-20

Updates — 2026-05-21

Landscape Shift: Superpowers + oh-my-claudecode Now Dominant

The CC-Autonomous conversation (2026-05-21) confirms the stack has evolved significantly. Three frameworks now lead the Claude Code ecosystem:

GStack — thinking/research layer GSD — context management (still valid, unchanged) Superpowers — execution pipeline with mandatory TDD enforcement

Superpowers (replaces autonomous-dev)

7-phase pipeline: Brainstorm → Spec → Plan → TDD → Subagent Dev → Review → Finalize

Key design choice: mandatory TDD with architectural enforcement — code written before tests are deleted and a restart is forced. This is harder than a hook check; it’s built into the pipeline structure.

124K GitHub stars (vs autonomous-dev’s 27K) → larger community, faster fixes when Claude Code updates break things
Jesse Vincent: Claude Code with Superpowers can work autonomously for hours without deviating from the initial plan

# Install via Claude Code marketplace (details TBC)
# Run: /superpowers-setup

oh-my-claudecode — Single-Stack Alternative

Multi-agent orchestration plugin: 19 specialized agents, 36 built-in skills. Covers analysis → design → planning → execution → QA → verification from one install.

5 execution modes:

Mode	Description
Autopilot	Single-threaded, traditional
Ultrapilot	5 concurrent workers, 3–5x speedup
Team	Staged pipeline: plan → PRD → execute → verify → fix (quality gates)
Ralph	Persistent execution with verify-fix loops (overnight)
Ecomode	Cost-optimized with automatic model routing

Performance:

3–5x faster on large projects via parallel execution (Ultrapilot)
30–50% cheaper via automatic model routing (Opus for hard tasks, Haiku for simple ones)
Zero config: install via Claude Code marketplace → /omc-setup → done

# Install via Claude Code marketplace
/omc-setup
# Team mode: plan → PRD → execute → verify pipeline
# Ralph mode: overnight verify-fix loop

Updated Recommendation Matrix

Goal	Best stack
Maximum quality gates + TDD enforcement	Superpowers + GSD
Simplest single install, requirements → prototype	oh-my-claudecode (Team + Ralph mode)
Requirements → prototype with spec traceability	Speckit + Superpowers + GSD
Full autonomous overnight, no babysitting	oh-my-claudecode Ralph mode or `/gsd-autonomous`
Recommended for most use cases	oh-my-claudecode Team mode + GSD (context isolation)

Source: Conversation “CC-Autonomous” — 2026-05-21

Updates — 2026-05-24

Cherry-Picked agent-skills for the Stack (Addy Osmani)

Evaluated github.com/addyosmani/agent-skills (43K GitHub stars) — 23 production-grade skills covering Define → Plan → Build → Verify → Review → Ship. Most overlap with existing tools (Speckit, Superpowers, gstack), but three add genuine, non-overlapping value for a Next.js 16 / Django 6 project:

Skill	When to Use
doubt-driven-development	Any production / security / irreversible decision. Pattern: CLAIM → EXTRACT → DOUBT → RECONCILE → STOP. Optional cross-model escalation.
source-driven-development	Any work touching Next.js 16, Django 6, or cutting-edge frameworks. Grounds every decision in official docs; flags unverified claims. Enforces the “verify against official docs” warning already in CLAUDE.md.
context-engineering	Session starts and when switching between backend/frontend. Feeds agents the right information at the right time via rules files, context packing, MCP integrations.

Install:

/plugin marketplace add addyosmani/agent-skills
/plugin install agent-skills@addy-agent-skills

CLAUDE.md activation (selective — skip the rest):

## agent-skills (selective)
- Use doubt-driven-development for any production/security/irreversible decisions
- Use source-driven-development when using Next.js 16, Django 6, or any
  framework where training data may be outdated
- Use context-engineering at session start and when switching between
  backend and frontend tasks

Updated stack with clean job separation:

Speckit → WHAT to build (spec + plan)
agent-skills → HOW to think (doubt, source-verify, context)
Superpowers → HOW to execute (TDD, subagents, gates)
gstack → HOW to verify + ship (qa, review, cso, ship)

Source: Conversation “CC-Autonomous” — 2026-05-24. Repo by Addy Osmani (Google Chrome team lead). See also addy-osmani.