3  The Diagnostic Engine

Before fitting any survival model, survival-pipe runs six automated checks on your data. This chapter explains each check, the threshold configuration, and how the routing logic works.

3.1 Why Diagnose First?

Note

Standard survival analysis tutorials jump straight to Kaplan-Meier curves and Cox models. In real clinical data, this often leads to biased estimates because the data violates assumptions that were never checked.

The diagnostic engine (sa-diagnostics/) ensures that data complexities are detected before model fitting, not discovered during peer review.

3.2 The Six Checks

1. Competing Risks

Script: sa-diagnostics/scripts/01_check_competing_risks.R

Detects whether multiple distinct event types are present. If patients can experience death from cancer or death from cardiovascular disease, naive KM overestimates the cumulative incidence of each cause.

2. Recurrent Events

Script: sa-diagnostics/scripts/02_check_recurrent_events.R

Checks whether subjects have more than one event. Recurrent hospitalisations, infections, or relapses require models that handle within-subject correlation (AG, PWP, WLW, or frailty).

3. Time-Varying Exposure

Script: sa-diagnostics/scripts/03_check_time_varying.R

Identifies exposures that change status after baseline (e.g., transplant receipt, drug switching). Flags potential immortal time bias when treatment is coded as a fixed baseline variable but was actually assigned during follow-up.

4. Left Truncation

Script: sa-diagnostics/scripts/04_check_left_truncation.R

Detects delayed entry — subjects who enter observation partway through the time scale. Common in prevalent cohorts and studies using calendar-time enrolment.

5. Informative Censoring

Script: sa-diagnostics/scripts/05_check_informative_censoring.R

Tests whether censoring is related to the outcome. When sicker patients are more likely to be lost to follow-up, standard methods produce biased survival estimates.

6. Clustering

Script: sa-diagnostics/scripts/06_check_clustering.R

Detects non-independence from hierarchical structure: patients nested within hospitals, siblings within families, or repeated measures within individuals.

3.3 Routing Logic

After all checks run, sa-diagnostics/scripts/07_diagnostic_report.R produces a summary. The diagnostic router (sa-end-to-end/scripts/diagnostic_router.py) reads this report and selects the analysis path:

flowchart TD
    R[Diagnostic Report] --> A{Competing risks?}
    A -- Yes --> CR[sa-competing-risks]
    A -- No --> B{Recurrent events?}
    B -- Yes --> RE[sa-recurrent-multistate]
    B -- No --> C{Time-varying?}
    C -- Yes --> TV[sa-time-varying]
    C -- No --> SK[sa-standard-km]
    CR --> D{Left truncation OR<br/>informative censoring<br/>OR clustering?}
    RE --> D
    TV --> D
    SK --> D
    D -- Yes --> ADJ[sa-advanced-adjustments]
    D -- No --> FIG[sa-publication-figures]
    ADJ --> FIG

TipAuto-Routing

Run make analyze-auto PROJECT=my-study to let the router both diagnose and execute the selected analysis in one step.

3.4 Example Diagnostic Output

A typical diagnostic report flags which checks triggered:

Diagnostic Summary for: demo-competing
─────────────────────────────────────
  Competing risks:       YES (2 event types detected)
  Recurrent events:      NO
  Time-varying exposure: NO
  Left truncation:       NO
  Informative censoring: NO
  Clustering:            NO

Recommended path: sa-competing-risks
Adjustments needed: none

3.5 Demo: Scenario 1 (Standard Survival)

The diagnostic engine correctly identified no complexities in the standard 2-arm RCT data (N=500, 40% censored), routing to sa-standard-km:

Check Detected Finding
Competing risks No Single event type
Recurrent events No Max 1 event per subject
Time-varying No No counting-process format
Left truncation No No delayed entry
Informative censoring No No significant differences (40% censored)
Clustering No No cluster variable

Primary route: sa-standard-km (KM + Cox PH)