flowchart LR
subgraph "Stage 0"
A[Raw data] --> B[Validate & Profile]
B --> C[Diagnose]
C --> D[Route]
end
subgraph "Stage 1"
D --> E[Analysis Module]
D --> F[Advanced Adjustments]
end
subgraph "Stage 2"
E --> G[Publication Figures]
F --> G
end
subgraph "Stage 3"
G --> H[Manuscript + Render]
end
1 Project Overview
Survival-Pipe organises survival analysis into ten independent modules, each responsible for one stage of the workflow. This chapter describes the architecture, directory layout, and design philosophy.
1.1 Module Map
| Module | Stage | Purpose |
|---|---|---|
sa-data-intake |
Data prep | Ingest, validate, and profile raw TTE data |
sa-diagnostics |
Diagnostics | Six automated checks + routing report |
sa-standard-km |
KM + Cox | Kaplan-Meier curves, log-rank tests, Cox PH |
sa-competing-risks |
Competing risks | CIF estimation, cause-specific Cox, Fine-Gray |
sa-recurrent-multistate |
Recurrent/multi-state | AG, PWP, WLW, frailty, multi-state models |
sa-time-varying |
Time-varying | Landmark analysis, tmerge, time-dependent Cox |
sa-advanced-adjustments |
Adjustments | Left truncation, IPCW, frailty, cluster SE |
sa-publication-figures |
Figures | 300 DPI multi-panel assembly |
sa-manuscript-quarto |
Manuscript | Table 1, results draft, Quarto rendering |
sa-end-to-end |
Orchestration | Project init, diagnostic router, QA, checkpoints |
1.2 Directory Structure
Each analysis project lives under projects/<name>/:
projects/<name>/
├── 01_data/ # raw/ and simulated/
├── 02_diagnostics/ # diagnostic check outputs
├── 03_analysis/ # model results and RDS files
├── 04_figures/ # PNG + PDF at 300 DPI
├── 05_manuscript/ # Quarto source and rendered output
└── 06_qa/ # QA validation reports
1.3 Data Flow
1.4 Design Philosophy
NoteDiagnostic-First Principle
The pipeline never assumes which survival model is appropriate. Instead, it runs six automated checks on every dataset and uses the results to recommend — or auto-select — the correct analysis path.
This prevents common mistakes such as:
- Ignoring competing risks when multiple event types exist
- Fitting naive Cox models in the presence of immortal time bias
- Missing clustered observations that violate independence
- Treating left-truncated data as if follow-up began at time zero
Each module is self-contained with its own scripts/, references/, and SKILL.md. Modules communicate through the standardised project directory layout, reading from and writing to well-defined subdirectories.