flowchart TD
R[Diagnostic Report] --> A{Competing risks?}
A -- Yes --> CR[sa-competing-risks]
A -- No --> B{Recurrent events?}
B -- Yes --> RE[sa-recurrent-multistate]
B -- No --> C{Time-varying?}
C -- Yes --> TV[sa-time-varying]
C -- No --> SK[sa-standard-km]
CR --> D{Left truncation OR<br/>informative censoring<br/>OR clustering?}
RE --> D
TV --> D
SK --> D
D -- Yes --> ADJ[sa-advanced-adjustments]
D -- No --> FIG[sa-publication-figures]
ADJ --> FIG
3 The Diagnostic Engine
Before fitting any survival model, survival-pipe runs six automated checks on your data. This chapter explains each check, the threshold configuration, and how the routing logic works.
3.1 Why Diagnose First?
Standard survival analysis tutorials jump straight to Kaplan-Meier curves and Cox models. In real clinical data, this often leads to biased estimates because the data violates assumptions that were never checked.
The diagnostic engine (sa-diagnostics/) ensures that data complexities are detected before model fitting, not discovered during peer review.
3.2 The Six Checks
1. Competing Risks
Script: sa-diagnostics/scripts/01_check_competing_risks.R
Detects whether multiple distinct event types are present. If patients can experience death from cancer or death from cardiovascular disease, naive KM overestimates the cumulative incidence of each cause.
2. Recurrent Events
Script: sa-diagnostics/scripts/02_check_recurrent_events.R
Checks whether subjects have more than one event. Recurrent hospitalisations, infections, or relapses require models that handle within-subject correlation (AG, PWP, WLW, or frailty).
3. Time-Varying Exposure
Script: sa-diagnostics/scripts/03_check_time_varying.R
Identifies exposures that change status after baseline (e.g., transplant receipt, drug switching). Flags potential immortal time bias when treatment is coded as a fixed baseline variable but was actually assigned during follow-up.
4. Left Truncation
Script: sa-diagnostics/scripts/04_check_left_truncation.R
Detects delayed entry — subjects who enter observation partway through the time scale. Common in prevalent cohorts and studies using calendar-time enrolment.
5. Informative Censoring
Script: sa-diagnostics/scripts/05_check_informative_censoring.R
Tests whether censoring is related to the outcome. When sicker patients are more likely to be lost to follow-up, standard methods produce biased survival estimates.
6. Clustering
Script: sa-diagnostics/scripts/06_check_clustering.R
Detects non-independence from hierarchical structure: patients nested within hospitals, siblings within families, or repeated measures within individuals.
3.3 Routing Logic
After all checks run, sa-diagnostics/scripts/07_diagnostic_report.R produces a summary. The diagnostic router (sa-end-to-end/scripts/diagnostic_router.py) reads this report and selects the analysis path:
Run make analyze-auto PROJECT=my-study to let the router both diagnose and execute the selected analysis in one step.
3.4 Example Diagnostic Output
A typical diagnostic report flags which checks triggered:
Diagnostic Summary for: demo-competing
─────────────────────────────────────
Competing risks: YES (2 event types detected)
Recurrent events: NO
Time-varying exposure: NO
Left truncation: NO
Informative censoring: NO
Clustering: NO
Recommended path: sa-competing-risks
Adjustments needed: none
3.5 Demo: Scenario 1 (Standard Survival)
The diagnostic engine correctly identified no complexities in the standard 2-arm RCT data (N=500, 40% censored), routing to sa-standard-km:
| Check | Detected | Finding |
|---|---|---|
| Competing risks | No | Single event type |
| Recurrent events | No | Max 1 event per subject |
| Time-varying | No | No counting-process format |
| Left truncation | No | No delayed entry |
| Informative censoring | No | No significant differences (40% censored) |
| Clustering | No | No cluster variable |
Primary route: sa-standard-km (KM + Cox PH)