# Fall-ID Anonymisierung — Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Replace patient last names in fall_ids with KVNR (or 6-char random fallback), retroactively migrate all existing fall_ids, and auto-update fall_id when KVNR is later entered.

**Architecture:** Modify `generate_fall_id()` to use KVNR instead of Nachname. Add a helper `generate_random_suffix()` for cases without KVNR. Write an Alembic migration that rebuilds all existing fall_ids. Extend the `PUT /cases/{case_id}/kvnr` endpoint to update the fall_id when the current suffix is a random sequence (not a KVNR).

**Tech Stack:** Python 3.12, FastAPI, SQLAlchemy 2.0, Alembic, Pydantic v2, pytest

---

### Task 1: Update `generate_fall_id()` and add `generate_random_suffix()`

**Files:**
- Modify: `backend/app/services/import_service.py:22-30`
- Test: `backend/tests/test_import.py`

**Step 1: Write failing tests for new fall_id format**

Add these tests to `backend/tests/test_import.py` in `TestGenerateFallId`:

```python
def test_uses_kvnr_when_available(self):
    """fall_id uses KVNR instead of Nachname when KVNR is present."""
    pc = _make_parsed_case(nachname="Tonn", kvnr="A123456789", fallgruppe="onko", jahr=2026, kw=6)
    result = generate_fall_id(pc)
    assert result == "2026-06-onko-A123456789"
    assert "Tonn" not in result

def test_random_suffix_when_no_kvnr(self):
    """fall_id uses 6-char random suffix when KVNR is missing."""
    pc = _make_parsed_case(nachname="Tonn", kvnr=None, fallgruppe="onko", jahr=2026, kw=6)
    result = generate_fall_id(pc)
    # Format: YYYY-KW-fallgruppe-XXXXXX (6 alphanumeric uppercase+digits)
    parts = result.split("-")
    assert parts[0] == "2026"
    assert parts[1] == "06"
    assert parts[2] == "onko"
    suffix = parts[3]
    assert len(suffix) == 6
    assert suffix.isalnum()
    assert suffix == suffix.upper()  # Only uppercase + digits
    assert "Tonn" not in result

def test_random_suffix_when_empty_kvnr(self):
    """fall_id uses random suffix when KVNR is empty string."""
    pc = _make_parsed_case(nachname="Tonn", kvnr="", fallgruppe="onko", jahr=2026, kw=6)
    result = generate_fall_id(pc)
    parts = result.split("-")
    suffix = parts[3]
    assert len(suffix) == 6
    assert suffix.isalnum()

def test_random_suffixes_are_unique(self):
    """Different calls produce different random suffixes."""
    pc = _make_parsed_case(kvnr=None)
    ids = {generate_fall_id(pc) for _ in range(20)}
    assert len(ids) >= 15  # Allow small collision chance
```

Also add a new test for `generate_random_suffix`:

```python
from app.services.import_service import generate_random_suffix

class TestGenerateRandomSuffix:
    def test_length(self):
        assert len(generate_random_suffix()) == 6

    def test_charset(self):
        """Only uppercase letters and digits."""
        import re
        for _ in range(50):
            s = generate_random_suffix()
            assert re.match(r'^[A-Z0-9]{6}$', s)
```

**Step 2: Run tests to verify they fail**

Run: `cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py -v -k "kvnr or random_suffix"`
Expected: FAIL — tests reference new behavior/function not yet implemented.

**Step 3: Implement `generate_random_suffix()` and update `generate_fall_id()`**

In `backend/app/services/import_service.py`, replace the existing `generate_fall_id` function and add `generate_random_suffix`:

```python
import random
import string

def generate_random_suffix(length: int = 6) -> str:
    """Generate a random alphanumeric suffix (uppercase + digits)."""
    charset = string.ascii_uppercase + string.digits
    return "".join(random.choices(charset, k=length))


def generate_fall_id(parsed: ParsedCase) -> str:
    """Generate unique fall_id: YYYY-KW02d-fallgruppe-KVNR.

    Uses KVNR as identifier. Falls back to 6-char random suffix if
    KVNR is missing or empty.

    Examples:
        - 2026-06-onko-A123456789
        - 2026-12-kardio-X7K9M2  (random fallback)
    """
    suffix = parsed.kvnr if parsed.kvnr else generate_random_suffix()
    return f"{parsed.jahr}-{parsed.kw:02d}-{parsed.fallgruppe}-{suffix}"
```

**Step 4: Update existing tests that expect Nachname in fall_id**

Update `test_format` and other tests in `TestGenerateFallId` that assert Nachname-based fall_ids. Since the default `_make_parsed_case` has `kvnr="D410126355"`, the fall_id will now use that. Specifically update:

- `test_format`: expect `"2026-06-onko-D410126355"` (not `"2026-06-onko-Tonn"`)
- `test_different_cases_produce_different_ids`: change to use different KVNRs
- `test_same_patient_same_week_same_fallgruppe`: same KVNR → same fall_id
- `test_umlauts_preserved`: pass kvnr=None, check random suffix format instead
- `test_hyphenated_name`: pass kvnr=None, check random suffix format instead
- `test_all_fallgruppen`: remains valid (checks `-{fg}-` pattern)
- `test_year_boundary`: remains valid (checks year prefix)

In `TestImportRowSchema.test_full_row` update `fall_id="2026-06-kardio-D410126355"`.

In `TestPreviewImportMocked` tests that check `fall_id is not None`: still valid.

**Step 5: Run all tests to verify they pass**

Run: `cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py -v`
Expected: ALL PASS

**Step 6: Update module docstring**

Change the docstring at top of `import_service.py` from:
```
- fall_id generation: YYYY-KW02d-fallgruppe-Nachname
```
to:
```
- fall_id generation: YYYY-KW02d-fallgruppe-KVNR (or 6-char random suffix)
```

**Step 7: Commit**

```bash
git add backend/app/services/import_service.py backend/tests/test_import.py
git commit -m "feat: use KVNR instead of Nachname in fall_id generation"
```

---

### Task 2: Update `check_duplicate()` for new fall_id format

**Files:**
- Modify: `backend/app/services/import_service.py:33-59`
- Test: `backend/tests/test_import.py`

**Step 1: Analyze impact**

`check_duplicate()` currently checks:
1. Exact fall_id match
2. Personal data match (nachname + fallgruppe + datum + optional vorname/geburtsdatum)

With the new fall_id format:
- Criterion 1 still works if same KVNR → same fall_id (deterministic).
- For random-suffix cases, fall_id match won't catch duplicates → criterion 2 is essential.
- Criterion 2 (personal data match) stays as-is — it doesn't depend on fall_id format.

**No code change needed** for `check_duplicate()` — it already has both detection paths. The personal data match (criterion 2) handles the random-suffix case correctly.

**Step 2: Write a verification test**

Add to `TestCheckDuplicateMocked`:

```python
def test_duplicate_detected_by_personal_data_when_kvnr_missing(self):
    """Duplicate detected by personal data even when fall_id uses random suffix."""
    db = MagicMock()
    query = MagicMock()
    db.query.return_value = query
    query.filter.return_value = query
    # First .first() (fall_id check) → no match, second .first() (personal data) → match
    query.first.side_effect = [None, MagicMock()]
    pc = _make_parsed_case(kvnr=None)
    assert check_duplicate(db, pc) is True
```

**Step 3: Run test**

Run: `cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py::TestCheckDuplicateMocked -v`
Expected: ALL PASS

**Step 4: Commit**

```bash
git add backend/tests/test_import.py
git commit -m "test: verify duplicate detection works with new fall_id format"
```

---

### Task 3: Retroactive migration of existing fall_ids

**Files:**
- Create: `backend/alembic/versions/006_anonymize_fall_ids.py`

**Step 1: Write the Alembic migration**

This migration:
1. Reads all cases with their fall_id and kvnr
2. For each case, regenerates the fall_id as `YYYY-KW-fallgruppe-KVNR` (or random suffix)
3. Updates in bulk

```python
"""Anonymize fall_ids: replace Nachname with KVNR or random suffix.

Revision ID: 006_anonymize_fall_ids
Revises: 005_add_disclosure_requests
"""

import random
import string

from alembic import op
import sqlalchemy as sa

revision = "006_anonymize_fall_ids"
down_revision = "005_add_disclosure_requests"
branch_labels = None
depends_on = None


def _random_suffix(length=6):
    charset = string.ascii_uppercase + string.digits
    return "".join(random.choices(charset, k=length))


def upgrade():
    conn = op.get_bind()
    cases = conn.execute(
        sa.text("SELECT id, fall_id, kvnr, jahr, kw, fallgruppe FROM cases")
    ).fetchall()

    for case in cases:
        case_id, old_fall_id, kvnr, jahr, kw, fallgruppe = case
        suffix = kvnr if kvnr else _random_suffix()
        new_fall_id = f"{jahr}-{kw:02d}-{fallgruppe}-{suffix}"
        conn.execute(
            sa.text("UPDATE cases SET fall_id = :new_id WHERE id = :case_id"),
            {"new_id": new_fall_id, "case_id": case_id},
        )


def downgrade():
    # Cannot restore original Nachname-based fall_ids — data is lost
    pass
```

**Step 2: Run migration locally (dry-run check)**

Run: `cd /home/frontend/dak_c2s/backend && python -c "from alembic.versions import *; print('syntax ok')"`

The actual migration will be run on production during deploy (Task 5).

**Step 3: Commit**

```bash
git add backend/alembic/versions/006_anonymize_fall_ids.py
git commit -m "feat: add migration to anonymize existing fall_ids"
```

---

### Task 4: Auto-update fall_id when KVNR is entered

**Files:**
- Modify: `backend/app/api/cases.py:417-452` (set_case_kvnr endpoint)
- Test: manual verification after deploy

**Step 1: Add helper to detect random-suffix fall_ids**

In `backend/app/services/import_service.py`, add:

```python
import re

# KVNR format: letter followed by 9 digits (e.g. A123456789)
KVNR_PATTERN = re.compile(r'^[A-Z]\d{9}$')

def has_random_suffix(fall_id: str) -> bool:
    """Check if a fall_id ends with a random suffix (not a KVNR).

    Random suffix: exactly 6 alphanumeric chars (uppercase + digits).
    KVNR: letter + 9 digits (10 chars total).
    Nachname: any other string (legacy format, also treated as non-KVNR).
    """
    if not fall_id:
        return False
    parts = fall_id.rsplit("-", 1)
    if len(parts) < 2:
        return False
    suffix = parts[1]
    # If it matches KVNR pattern, it's NOT a random suffix
    if KVNR_PATTERN.match(suffix):
        return False
    # Otherwise it's either a random suffix or a legacy Nachname — either way, should be updated
    return True
```

**Step 2: Write tests for `has_random_suffix()`**

Add to `backend/tests/test_import.py`:

```python
from app.services.import_service import has_random_suffix

class TestHasRandomSuffix:
    def test_kvnr_suffix(self):
        """Fall_id with KVNR is NOT random."""
        assert has_random_suffix("2026-06-onko-A123456789") is False

    def test_random_suffix(self):
        """Fall_id with 6-char random suffix IS random."""
        assert has_random_suffix("2026-06-onko-X7K9M2") is True

    def test_legacy_nachname_suffix(self):
        """Fall_id with legacy Nachname suffix IS treated as non-KVNR."""
        assert has_random_suffix("2020-32-onko-Bartl-Zimmermann") is True
        assert has_random_suffix("2026-06-kardio-Tonn") is True

    def test_empty_fall_id(self):
        assert has_random_suffix("") is False
        assert has_random_suffix(None) is False
```

**Step 3: Run tests**

Run: `cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py::TestHasRandomSuffix -v`
Expected: ALL PASS

**Step 4: Modify `set_case_kvnr` endpoint**

In `backend/app/api/cases.py`, update the endpoint to also update the fall_id:

```python
from app.services.import_service import has_random_suffix

@router.put("/{case_id}/kvnr", response_model=CaseResponse)
def set_case_kvnr(
    case_id: int,
    payload: dict,
    request: Request,
    db: Session = Depends(get_db),
    user: User = Depends(get_current_user),
):
    """Update the KVNR for a case. Accessible to both admin and dak_mitarbeiter.

    If the current fall_id has a random suffix (no KVNR), it is automatically
    updated to use the new KVNR.
    """
    case = db.query(Case).filter(Case.id == case_id).first()
    if not case:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Case not found",
        )

    old_kvnr = case.kvnr
    old_fall_id = case.fall_id
    new_kvnr = payload.get("kvnr")
    case.kvnr = new_kvnr

    # Auto-update fall_id if it currently uses a random suffix or legacy Nachname
    if new_kvnr and case.fall_id and has_random_suffix(case.fall_id):
        prefix = case.fall_id.rsplit("-", 1)[0]
        case.fall_id = f"{prefix}-{new_kvnr}"

    case.updated_by = user.id
    db.commit()
    db.refresh(case)

    log_action(
        db,
        user_id=user.id,
        action="kvnr_updated",
        entity_type="case",
        entity_id=case.id,
        old_values={"kvnr": old_kvnr, "fall_id": old_fall_id},
        new_values={"kvnr": new_kvnr, "fall_id": case.fall_id},
        ip_address=request.client.host if request.client else None,
        user_agent=request.headers.get("user-agent"),
    )

    return case
```

Note: The `rsplit("-", 1)` approach works for the standard format `YYYY-KW-fallgruppe-suffix`. For legacy Nachnames with hyphens like `2020-32-onko-Bartl-Zimmermann`, rsplit with maxsplit=1 gives `("2020-32-onko-Bartl", "Zimmermann")` which is wrong. Better approach: reconstruct from case fields.

**Corrected implementation** — use case fields instead of string parsing:

```python
    if new_kvnr and case.fall_id and has_random_suffix(case.fall_id):
        case.fall_id = f"{case.jahr}-{case.kw:02d}-{case.fallgruppe}-{new_kvnr}"
```

This is more robust because it uses the actual database fields.

**Step 5: Commit**

```bash
git add backend/app/services/import_service.py backend/app/api/cases.py backend/tests/test_import.py
git commit -m "feat: auto-update fall_id when KVNR is entered on a case"
```

---

### Task 5: Build, deploy, and run migration

**Files:**
- Frontend: `frontend/` (build)
- Production DB: run Alembic migration

**Step 1: Build frontend**

Run: `cd /home/frontend/dak_c2s/frontend && pnpm build`
Expected: Build succeeds (no frontend code changes needed for this feature)

**Step 2: Run all backend tests**

Run: `cd /home/frontend/dak_c2s/backend && python -m pytest tests/ -v`
Expected: ALL PASS

**Step 3: Commit all, push develop, merge to main**

```bash
cd /home/frontend/dak_c2s
git push origin develop
git checkout main && git pull origin main && git merge develop && git push origin main
git checkout develop
```

**Step 4: Deploy to Hetzner 1**

```bash
# Pull latest code
ssh hetzner1 "cd /opt/dak-portal && git pull origin main"

# Run migration (updates all ~2900 fall_ids)
ssh hetzner1 "cd /opt/dak-portal/backend && source /opt/dak-portal/venv/bin/activate && alembic upgrade head"

# Build and deploy frontend
ssh hetzner1 "cd /opt/dak-portal/frontend && pnpm install && pnpm build && cp -r dist/* /var/www/vhosts/complexcaresolutions.de/dak.complexcaresolutions.de/dist/"

# Restart backend
ssh hetzner1 "systemctl restart dak-backend"
```

**Step 5: Verify on production**

1. Check fall_ids no longer contain patient names: `ssh hetzner1 "mysql -u dak_c2s_admin -p'R@TIa&s7ygxm4x0b' dak_c2s -e \"SELECT fall_id FROM cases LIMIT 20\""`
2. Import a new CSV → verify new fall_ids use KVNR format
3. Enter KVNR on a case with random suffix → verify fall_id updates automatically