mirror of https://github.com/complexcaresolutions/dak.c2s.git synced 2026-03-17 20:43:41 +00:00

CCS Admin d33fc7d242 docs: add implementation plan for fall-id anonymization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-26 17:01:09 +00:00

15 KiB

Raw Blame History

Fall-ID Anonymisierung — Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Replace patient last names in fall_ids with KVNR (or 6-char random fallback), retroactively migrate all existing fall_ids, and auto-update fall_id when KVNR is later entered.

Architecture: Modify generate_fall_id() to use KVNR instead of Nachname. Add a helper generate_random_suffix() for cases without KVNR. Write an Alembic migration that rebuilds all existing fall_ids. Extend the PUT /cases/{case_id}/kvnr endpoint to update the fall_id when the current suffix is a random sequence (not a KVNR).

Tech Stack: Python 3.12, FastAPI, SQLAlchemy 2.0, Alembic, Pydantic v2, pytest

Task 1: Update `generate_fall_id()` and add `generate_random_suffix()`

Files:

Modify: backend/app/services/import_service.py:22-30
Test: backend/tests/test_import.py

Step 1: Write failing tests for new fall_id format

Add these tests to backend/tests/test_import.py in TestGenerateFallId:

def test_uses_kvnr_when_available(self):
    """fall_id uses KVNR instead of Nachname when KVNR is present."""
    pc = _make_parsed_case(nachname="Tonn", kvnr="A123456789", fallgruppe="onko", jahr=2026, kw=6)
    result = generate_fall_id(pc)
    assert result == "2026-06-onko-A123456789"
    assert "Tonn" not in result

def test_random_suffix_when_no_kvnr(self):
    """fall_id uses 6-char random suffix when KVNR is missing."""
    pc = _make_parsed_case(nachname="Tonn", kvnr=None, fallgruppe="onko", jahr=2026, kw=6)
    result = generate_fall_id(pc)
    # Format: YYYY-KW-fallgruppe-XXXXXX (6 alphanumeric uppercase+digits)
    parts = result.split("-")
    assert parts[0] == "2026"
    assert parts[1] == "06"
    assert parts[2] == "onko"
    suffix = parts[3]
    assert len(suffix) == 6
    assert suffix.isalnum()
    assert suffix == suffix.upper()  # Only uppercase + digits
    assert "Tonn" not in result

def test_random_suffix_when_empty_kvnr(self):
    """fall_id uses random suffix when KVNR is empty string."""
    pc = _make_parsed_case(nachname="Tonn", kvnr="", fallgruppe="onko", jahr=2026, kw=6)
    result = generate_fall_id(pc)
    parts = result.split("-")
    suffix = parts[3]
    assert len(suffix) == 6
    assert suffix.isalnum()

def test_random_suffixes_are_unique(self):
    """Different calls produce different random suffixes."""
    pc = _make_parsed_case(kvnr=None)
    ids = {generate_fall_id(pc) for _ in range(20)}
    assert len(ids) >= 15  # Allow small collision chance

Also add a new test for generate_random_suffix:

from app.services.import_service import generate_random_suffix

class TestGenerateRandomSuffix:
    def test_length(self):
        assert len(generate_random_suffix()) == 6

    def test_charset(self):
        """Only uppercase letters and digits."""
        import re
        for _ in range(50):
            s = generate_random_suffix()
            assert re.match(r'^[A-Z0-9]{6}$', s)

Step 2: Run tests to verify they fail

Run: cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py -v -k "kvnr or random_suffix" Expected: FAIL — tests reference new behavior/function not yet implemented.

Step 3: Implement generate_random_suffix() and update generate_fall_id()

In backend/app/services/import_service.py, replace the existing generate_fall_id function and add generate_random_suffix:

import random
import string

def generate_random_suffix(length: int = 6) -> str:
    """Generate a random alphanumeric suffix (uppercase + digits)."""
    charset = string.ascii_uppercase + string.digits
    return "".join(random.choices(charset, k=length))


def generate_fall_id(parsed: ParsedCase) -> str:
    """Generate unique fall_id: YYYY-KW02d-fallgruppe-KVNR.

    Uses KVNR as identifier. Falls back to 6-char random suffix if
    KVNR is missing or empty.

    Examples:
        - 2026-06-onko-A123456789
        - 2026-12-kardio-X7K9M2  (random fallback)
    """
    suffix = parsed.kvnr if parsed.kvnr else generate_random_suffix()
    return f"{parsed.jahr}-{parsed.kw:02d}-{parsed.fallgruppe}-{suffix}"

Step 4: Update existing tests that expect Nachname in fall_id

Update test_format and other tests in TestGenerateFallId that assert Nachname-based fall_ids. Since the default _make_parsed_case has kvnr="D410126355", the fall_id will now use that. Specifically update:

test_format: expect "2026-06-onko-D410126355" (not "2026-06-onko-Tonn")
test_different_cases_produce_different_ids: change to use different KVNRs
test_same_patient_same_week_same_fallgruppe: same KVNR → same fall_id
test_umlauts_preserved: pass kvnr=None, check random suffix format instead
test_hyphenated_name: pass kvnr=None, check random suffix format instead
test_all_fallgruppen: remains valid (checks -{fg}- pattern)
test_year_boundary: remains valid (checks year prefix)

In TestImportRowSchema.test_full_row update fall_id="2026-06-kardio-D410126355".

In TestPreviewImportMocked tests that check fall_id is not None: still valid.

Step 5: Run all tests to verify they pass

Run: cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py -v Expected: ALL PASS

Step 6: Update module docstring

Change the docstring at top of import_service.py from:

- fall_id generation: YYYY-KW02d-fallgruppe-Nachname

to:

- fall_id generation: YYYY-KW02d-fallgruppe-KVNR (or 6-char random suffix)

Step 7: Commit

git add backend/app/services/import_service.py backend/tests/test_import.py
git commit -m "feat: use KVNR instead of Nachname in fall_id generation"

Task 2: Update `check_duplicate()` for new fall_id format

Files:

Modify: backend/app/services/import_service.py:33-59
Test: backend/tests/test_import.py

Step 1: Analyze impact

check_duplicate() currently checks:

Exact fall_id match
Personal data match (nachname + fallgruppe + datum + optional vorname/geburtsdatum)

With the new fall_id format:

Criterion 1 still works if same KVNR → same fall_id (deterministic).
For random-suffix cases, fall_id match won't catch duplicates → criterion 2 is essential.
Criterion 2 (personal data match) stays as-is — it doesn't depend on fall_id format.

No code change needed for check_duplicate() — it already has both detection paths. The personal data match (criterion 2) handles the random-suffix case correctly.

Step 2: Write a verification test

Add to TestCheckDuplicateMocked:

def test_duplicate_detected_by_personal_data_when_kvnr_missing(self):
    """Duplicate detected by personal data even when fall_id uses random suffix."""
    db = MagicMock()
    query = MagicMock()
    db.query.return_value = query
    query.filter.return_value = query
    # First .first() (fall_id check) → no match, second .first() (personal data) → match
    query.first.side_effect = [None, MagicMock()]
    pc = _make_parsed_case(kvnr=None)
    assert check_duplicate(db, pc) is True

Step 3: Run test

Run: cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py::TestCheckDuplicateMocked -v Expected: ALL PASS

Step 4: Commit

git add backend/tests/test_import.py
git commit -m "test: verify duplicate detection works with new fall_id format"

Task 3: Retroactive migration of existing fall_ids

Files:

Create: backend/alembic/versions/006_anonymize_fall_ids.py

Step 1: Write the Alembic migration

This migration:

Reads all cases with their fall_id and kvnr
For each case, regenerates the fall_id as YYYY-KW-fallgruppe-KVNR (or random suffix)
Updates in bulk

"""Anonymize fall_ids: replace Nachname with KVNR or random suffix.

Revision ID: 006_anonymize_fall_ids
Revises: 005_add_disclosure_requests
"""

import random
import string

from alembic import op
import sqlalchemy as sa

revision = "006_anonymize_fall_ids"
down_revision = "005_add_disclosure_requests"
branch_labels = None
depends_on = None


def _random_suffix(length=6):
    charset = string.ascii_uppercase + string.digits
    return "".join(random.choices(charset, k=length))


def upgrade():
    conn = op.get_bind()
    cases = conn.execute(
        sa.text("SELECT id, fall_id, kvnr, jahr, kw, fallgruppe FROM cases")
    ).fetchall()

    for case in cases:
        case_id, old_fall_id, kvnr, jahr, kw, fallgruppe = case
        suffix = kvnr if kvnr else _random_suffix()
        new_fall_id = f"{jahr}-{kw:02d}-{fallgruppe}-{suffix}"
        conn.execute(
            sa.text("UPDATE cases SET fall_id = :new_id WHERE id = :case_id"),
            {"new_id": new_fall_id, "case_id": case_id},
        )


def downgrade():
    # Cannot restore original Nachname-based fall_ids — data is lost
    pass

Step 2: Run migration locally (dry-run check)

Run: cd /home/frontend/dak_c2s/backend && python -c "from alembic.versions import *; print('syntax ok')"

The actual migration will be run on production during deploy (Task 5).

Step 3: Commit

git add backend/alembic/versions/006_anonymize_fall_ids.py
git commit -m "feat: add migration to anonymize existing fall_ids"

Task 4: Auto-update fall_id when KVNR is entered

Files:

Modify: backend/app/api/cases.py:417-452 (set_case_kvnr endpoint)
Test: manual verification after deploy

Step 1: Add helper to detect random-suffix fall_ids

In backend/app/services/import_service.py, add:

import re

# KVNR format: letter followed by 9 digits (e.g. A123456789)
KVNR_PATTERN = re.compile(r'^[A-Z]\d{9}$')

def has_random_suffix(fall_id: str) -> bool:
    """Check if a fall_id ends with a random suffix (not a KVNR).

    Random suffix: exactly 6 alphanumeric chars (uppercase + digits).
    KVNR: letter + 9 digits (10 chars total).
    Nachname: any other string (legacy format, also treated as non-KVNR).
    """
    if not fall_id:
        return False
    parts = fall_id.rsplit("-", 1)
    if len(parts) < 2:
        return False
    suffix = parts[1]
    # If it matches KVNR pattern, it's NOT a random suffix
    if KVNR_PATTERN.match(suffix):
        return False
    # Otherwise it's either a random suffix or a legacy Nachname — either way, should be updated
    return True

Step 2: Write tests for has_random_suffix()

Add to backend/tests/test_import.py:

from app.services.import_service import has_random_suffix

class TestHasRandomSuffix:
    def test_kvnr_suffix(self):
        """Fall_id with KVNR is NOT random."""
        assert has_random_suffix("2026-06-onko-A123456789") is False

    def test_random_suffix(self):
        """Fall_id with 6-char random suffix IS random."""
        assert has_random_suffix("2026-06-onko-X7K9M2") is True

    def test_legacy_nachname_suffix(self):
        """Fall_id with legacy Nachname suffix IS treated as non-KVNR."""
        assert has_random_suffix("2020-32-onko-Bartl-Zimmermann") is True
        assert has_random_suffix("2026-06-kardio-Tonn") is True

    def test_empty_fall_id(self):
        assert has_random_suffix("") is False
        assert has_random_suffix(None) is False

Step 3: Run tests

Run: cd /home/frontend/dak_c2s/backend && python -m pytest tests/test_import.py::TestHasRandomSuffix -v Expected: ALL PASS

Step 4: Modify set_case_kvnr endpoint

In backend/app/api/cases.py, update the endpoint to also update the fall_id:

from app.services.import_service import has_random_suffix

@router.put("/{case_id}/kvnr", response_model=CaseResponse)
def set_case_kvnr(
    case_id: int,
    payload: dict,
    request: Request,
    db: Session = Depends(get_db),
    user: User = Depends(get_current_user),
):
    """Update the KVNR for a case. Accessible to both admin and dak_mitarbeiter.

    If the current fall_id has a random suffix (no KVNR), it is automatically
    updated to use the new KVNR.
    """
    case = db.query(Case).filter(Case.id == case_id).first()
    if not case:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Case not found",
        )

    old_kvnr = case.kvnr
    old_fall_id = case.fall_id
    new_kvnr = payload.get("kvnr")
    case.kvnr = new_kvnr

    # Auto-update fall_id if it currently uses a random suffix or legacy Nachname
    if new_kvnr and case.fall_id and has_random_suffix(case.fall_id):
        prefix = case.fall_id.rsplit("-", 1)[0]
        case.fall_id = f"{prefix}-{new_kvnr}"

    case.updated_by = user.id
    db.commit()
    db.refresh(case)

    log_action(
        db,
        user_id=user.id,
        action="kvnr_updated",
        entity_type="case",
        entity_id=case.id,
        old_values={"kvnr": old_kvnr, "fall_id": old_fall_id},
        new_values={"kvnr": new_kvnr, "fall_id": case.fall_id},
        ip_address=request.client.host if request.client else None,
        user_agent=request.headers.get("user-agent"),
    )

    return case

Note: The rsplit("-", 1) approach works for the standard format YYYY-KW-fallgruppe-suffix. For legacy Nachnames with hyphens like 2020-32-onko-Bartl-Zimmermann, rsplit with maxsplit=1 gives ("2020-32-onko-Bartl", "Zimmermann") which is wrong. Better approach: reconstruct from case fields.

Corrected implementation — use case fields instead of string parsing:

    if new_kvnr and case.fall_id and has_random_suffix(case.fall_id):
        case.fall_id = f"{case.jahr}-{case.kw:02d}-{case.fallgruppe}-{new_kvnr}"

This is more robust because it uses the actual database fields.

Step 5: Commit

git add backend/app/services/import_service.py backend/app/api/cases.py backend/tests/test_import.py
git commit -m "feat: auto-update fall_id when KVNR is entered on a case"

Task 5: Build, deploy, and run migration

Files:

Frontend: frontend/ (build)
Production DB: run Alembic migration

Step 1: Build frontend

Run: cd /home/frontend/dak_c2s/frontend && pnpm build Expected: Build succeeds (no frontend code changes needed for this feature)

Step 2: Run all backend tests

Run: cd /home/frontend/dak_c2s/backend && python -m pytest tests/ -v Expected: ALL PASS

Step 3: Commit all, push develop, merge to main

cd /home/frontend/dak_c2s
git push origin develop
git checkout main && git pull origin main && git merge develop && git push origin main
git checkout develop

Step 4: Deploy to Hetzner 1

# Pull latest code
ssh hetzner1 "cd /opt/dak-portal && git pull origin main"

# Run migration (updates all ~2900 fall_ids)
ssh hetzner1 "cd /opt/dak-portal/backend && source /opt/dak-portal/venv/bin/activate && alembic upgrade head"

# Build and deploy frontend
ssh hetzner1 "cd /opt/dak-portal/frontend && pnpm install && pnpm build && cp -r dist/* /var/www/vhosts/complexcaresolutions.de/dak.complexcaresolutions.de/dist/"

# Restart backend
ssh hetzner1 "systemctl restart dak-backend"

Step 5: Verify on production

Check fall_ids no longer contain patient names: ssh hetzner1 "mysql -u dak_c2s_admin -p'R@TIa&s7ygxm4x0b' dak_c2s -e \"SELECT fall_id FROM cases LIMIT 20\""
Import a new CSV → verify new fall_ids use KVNR format
Enter KVNR on a case with random suffix → verify fall_id updates automatically

15 KiB Raw Blame History

Fall-ID Anonymisierung — Implementation Plan

Task 1: Update generate_fall_id() and add generate_random_suffix()

Task 2: Update check_duplicate() for new fall_id format

Task 3: Retroactive migration of existing fall_ids

Task 4: Auto-update fall_id when KVNR is entered

Task 5: Build, deploy, and run migration

15 KiB

Raw Blame History

Task 1: Update `generate_fall_id()` and add `generate_random_suffix()`

Task 2: Update `check_duplicate()` for new fall_id format