cms.c2sgmbh/.claude/get-shit-done/workflows/verify-work.md
Martin Porwoll 77f70876f4 chore: add Claude Code config, prompts, and tenant setup scripts
- Add .claude/ configuration (agents, commands, hooks, get-shit-done workflows)
- Add prompts/ directory with development planning documents
- Add scripts/setup-tenants/ with tenant configuration
- Add docs/screenshots/
- Remove obsolete phase2.2-corrections-report.md
- Update pnpm-lock.yaml
- Update detect-secrets.sh to ignore setup.sh (env var usage, not secrets)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 10:18:05 +00:00

14 KiB

Validate built features through conversational testing with persistent state. Creates UAT.md that tracks test progress, survives /clear, and feeds gaps into /gsd:plan-phase --gaps.

User tests, Claude records. One test at a time. Plain text responses.

**Show expected, ask if reality matches.**

Claude presents what SHOULD happen. User confirms or describes what's different.

  • "yes" / "y" / "next" / empty → pass
  • Anything else → logged as issue, severity inferred

No Pass/Fail buttons. No severity questions. Just: "Here's what should happen. Does it?"

@/home/payload/payload-cms/.claude/get-shit-done/templates/UAT.md **First: Check for active UAT sessions**
find .planning/phases -name "*-UAT.md" -type f 2>/dev/null | head -5

If active sessions exist AND no $ARGUMENTS provided:

Read each file's frontmatter (status, phase) and Current Test section.

Display inline:

## Active UAT Sessions

| # | Phase | Status | Current Test | Progress |
|---|-------|--------|--------------|----------|
| 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
| 2 | 05-auth | testing | 1. Login Form | 0/4 |

Reply with a number to resume, or provide a phase number to start new.

Wait for user response.

  • If user replies with number (1, 2) → Load that file, go to resume_from_file
  • If user replies with phase number → Treat as new session, go to create_uat_file

If active sessions exist AND $ARGUMENTS provided:

Check if session exists for that phase. If yes, offer to resume or restart. If no, continue to create_uat_file.

If no active sessions AND no $ARGUMENTS:

No active UAT sessions.

Provide a phase number to start testing (e.g., /gsd:verify-work 4)

If no active sessions AND $ARGUMENTS provided:

Continue to create_uat_file.

**Find what to test:**

Parse $ARGUMENTS as phase number (e.g., "4") or plan number (e.g., "04-02").

# Find phase directory (match both zero-padded and unpadded)
PADDED_PHASE=$(printf "%02d" ${PHASE_ARG} 2>/dev/null || echo "${PHASE_ARG}")
PHASE_DIR=$(ls -d .planning/phases/${PADDED_PHASE}-* .planning/phases/${PHASE_ARG}-* 2>/dev/null | head -1)

# Find SUMMARY files
ls "$PHASE_DIR"/*-SUMMARY.md 2>/dev/null

Read each SUMMARY.md to extract testable deliverables.

**Extract testable deliverables from SUMMARY.md:**

Parse for:

  1. Accomplishments - Features/functionality added
  2. User-facing changes - UI, workflows, interactions

Focus on USER-OBSERVABLE outcomes, not implementation details.

For each deliverable, create a test:

  • name: Brief test name
  • expected: What the user should see/experience (specific, observable)

Examples:

  • Accomplishment: "Added comment threading with infinite nesting" → Test: "Reply to a Comment" → Expected: "Clicking Reply opens inline composer below comment. Submitting shows reply nested under parent with visual indentation."

Skip internal/non-observable items (refactors, type changes, etc.).

**Create UAT file with all tests:**
mkdir -p "$PHASE_DIR"

Build test list from extracted deliverables.

Create file:

---
status: testing
phase: XX-name
source: [list of SUMMARY.md files]
started: [ISO timestamp]
updated: [ISO timestamp]
---

## Current Test
<!-- OVERWRITE each test - shows where we are -->

number: 1
name: [first test name]
expected: |
  [what user should observe]
awaiting: user response

## Tests

### 1. [Test Name]
expected: [observable behavior]
result: [pending]

### 2. [Test Name]
expected: [observable behavior]
result: [pending]

...

## Summary

total: [N]
passed: 0
issues: 0
pending: [N]
skipped: 0

## Gaps

[none yet]

Write to .planning/phases/XX-name/{phase}-UAT.md

Proceed to present_test.

**Present current test to user:**

Read Current Test section from UAT file.

Display using checkpoint box format:

╔══════════════════════════════════════════════════════════════╗
║  CHECKPOINT: Verification Required                           ║
╚══════════════════════════════════════════════════════════════╝

**Test {number}: {name}**

{expected}

──────────────────────────────────────────────────────────────
→ Type "pass" or describe what's wrong
──────────────────────────────────────────────────────────────

Wait for user response (plain text, no AskUserQuestion).

**Process user response and update file:**

If response indicates pass:

  • Empty response, "yes", "y", "ok", "pass", "next", "approved", "✓"

Update Tests section:

### {N}. {name}
expected: {expected}
result: pass

If response indicates skip:

  • "skip", "can't test", "n/a"

Update Tests section:

### {N}. {name}
expected: {expected}
result: skipped
reason: [user's reason if provided]

If response is anything else:

  • Treat as issue description

Infer severity from description:

  • Contains: crash, error, exception, fails, broken, unusable → blocker
  • Contains: doesn't work, wrong, missing, can't → major
  • Contains: slow, weird, off, minor, small → minor
  • Contains: color, font, spacing, alignment, visual → cosmetic
  • Default if unclear: major

Update Tests section:

### {N}. {name}
expected: {expected}
result: issue
reported: "{verbatim user response}"
severity: {inferred}

Append to Gaps section (structured YAML for plan-phase --gaps):

- truth: "{expected behavior from test}"
  status: failed
  reason: "User reported: {verbatim user response}"
  severity: {inferred}
  test: {N}
  artifacts: []  # Filled by diagnosis
  missing: []    # Filled by diagnosis

After any response:

Update Summary counts. Update frontmatter.updated timestamp.

If more tests remain → Update Current Test, go to present_test If no more tests → Go to complete_session

**Resume testing from UAT file:**

Read the full UAT file.

Find first test with result: [pending].

Announce:

Resuming: Phase {phase} UAT
Progress: {passed + issues + skipped}/{total}
Issues found so far: {issues count}

Continuing from Test {N}...

Update Current Test section with the pending test. Proceed to present_test.

**Complete testing and commit:**

Update frontmatter:

  • status: complete
  • updated: [now]

Clear Current Test section:

## Current Test

[testing complete]

Commit the UAT file:

git add ".planning/phases/XX-name/{phase}-UAT.md"
git commit -m "test({phase}): complete UAT - {passed} passed, {issues} issues"

Present summary:

## UAT Complete: Phase {phase}

| Result | Count |
|--------|-------|
| Passed | {N}   |
| Issues | {N}   |
| Skipped| {N}   |

[If issues > 0:]
### Issues Found

[List from Issues section]

If issues > 0: Proceed to diagnose_issues

If issues == 0:

All tests passed. Ready to continue.

- `/gsd:plan-phase {next}` — Plan next phase
- `/gsd:execute-phase {next}` — Execute next phase
**Diagnose root causes before planning fixes:**
---

{N} issues found. Diagnosing root causes...

Spawning parallel debug agents to investigate each issue.
  • Load diagnose-issues workflow
  • Follow @/home/payload/payload-cms/.claude/get-shit-done/workflows/diagnose-issues.md
  • Spawn parallel debug agents for each issue
  • Collect root causes
  • Update UAT.md with root causes
  • Proceed to plan_gap_closure

Diagnosis runs automatically - no user prompt. Parallel agents investigate simultaneously, so overhead is minimal and fixes are more accurate.

**Auto-plan fixes from diagnosed gaps:**

Display:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► PLANNING FIXES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

◆ Spawning planner for gap closure...

Spawn gsd-planner in --gaps mode:

Task(
  prompt="""
<planning_context>

**Phase:** {phase_number}
**Mode:** gap_closure

**UAT with diagnoses:**
@.planning/phases/{phase_dir}/{phase}-UAT.md

**Project State:**
@.planning/STATE.md

**Roadmap:**
@.planning/ROADMAP.md

</planning_context>

<downstream_consumer>
Output consumed by /gsd:execute-phase
Plans must be executable prompts.
</downstream_consumer>
""",
  subagent_type="gsd-planner",
  description="Plan gap fixes for Phase {phase}"
)

On return:

  • PLANNING COMPLETE: Proceed to verify_gap_plans
  • PLANNING INCONCLUSIVE: Report and offer manual intervention
**Verify fix plans with checker:**

Display:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► VERIFYING FIX PLANS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

◆ Spawning plan checker...

Initialize: iteration_count = 1

Spawn gsd-plan-checker:

Task(
  prompt="""
<verification_context>

**Phase:** {phase_number}
**Phase Goal:** Close diagnosed gaps from UAT

**Plans to verify:**
@.planning/phases/{phase_dir}/*-PLAN.md

</verification_context>

<expected_output>
Return one of:
- ## VERIFICATION PASSED — all checks pass
- ## ISSUES FOUND — structured issue list
</expected_output>
""",
  subagent_type="gsd-plan-checker",
  description="Verify Phase {phase} fix plans"
)

On return:

  • VERIFICATION PASSED: Proceed to present_ready
  • ISSUES FOUND: Proceed to revision_loop
**Iterate planner ↔ checker until plans pass (max 3):**

If iteration_count < 3:

Display: Sending back to planner for revision... (iteration {N}/3)

Spawn gsd-planner with revision context:

Task(
  prompt="""
<revision_context>

**Phase:** {phase_number}
**Mode:** revision

**Existing plans:**
@.planning/phases/{phase_dir}/*-PLAN.md

**Checker issues:**
{structured_issues_from_checker}

</revision_context>

<instructions>
Read existing PLAN.md files. Make targeted updates to address checker issues.
Do NOT replan from scratch unless issues are fundamental.
</instructions>
""",
  subagent_type="gsd-planner",
  description="Revise Phase {phase} plans"
)

After planner returns → spawn checker again (verify_gap_plans logic) Increment iteration_count

If iteration_count >= 3:

Display: Max iterations reached. {N} issues remain.

Offer options:

  1. Force proceed (execute despite issues)
  2. Provide guidance (user gives direction, retry)
  3. Abandon (exit, user runs /gsd:plan-phase manually)

Wait for user response.

**Present completion and next steps:**
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► FIXES READY ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Phase {X}: {Name}** — {N} gap(s) diagnosed, {M} fix plan(s) created

| Gap | Root Cause | Fix Plan |
|-----|------------|----------|
| {truth 1} | {root_cause} | {phase}-04 |
| {truth 2} | {root_cause} | {phase}-04 |

Plans verified and ready for execution.

───────────────────────────────────────────────────────────────

## ▶ Next Up

**Execute fixes** — run fix plans

`/clear` then `/gsd:execute-phase {phase} --gaps-only`

───────────────────────────────────────────────────────────────

<update_rules> Batched writes for efficiency:

Keep results in memory. Write to file only when:

  1. Issue found — Preserve the problem immediately
  2. Session complete — Final write before commit
  3. Checkpoint — Every 5 passed tests (safety net)
Section Rule When Written
Frontmatter.status OVERWRITE Start, complete
Frontmatter.updated OVERWRITE On any file write
Current Test OVERWRITE On any file write
Tests.{N}.result OVERWRITE On any file write
Summary OVERWRITE On any file write
Gaps APPEND When issue found

On context reset: File shows last checkpoint. Resume from there. </update_rules>

<severity_inference> Infer severity from user's natural language:

User says Infer
"crashes", "error", "exception", "fails completely" blocker
"doesn't work", "nothing happens", "wrong behavior" major
"works but...", "slow", "weird", "minor issue" minor
"color", "spacing", "alignment", "looks off" cosmetic

Default to major if unclear. User can correct if needed.

Never ask "how severe is this?" - just infer and move on. </severity_inference>

<success_criteria>

  • UAT file created with all tests from SUMMARY.md
  • Tests presented one at a time with expected behavior
  • User responses processed as pass/issue/skip
  • Severity inferred from description (never asked)
  • Batched writes: on issue, every 5 passes, or completion
  • Committed on completion
  • If issues: parallel debug agents diagnose root causes
  • If issues: gsd-planner creates fix plans (gap_closure mode)
  • If issues: gsd-plan-checker verifies fix plans
  • If issues: revision loop until plans pass (max 3 iterations)
  • Ready for /gsd:execute-phase --gaps-only when complete </success_criteria>