1 Project Overview

Survival-Pipe organises survival analysis into ten independent modules, each responsible for one stage of the workflow. This chapter describes the architecture, directory layout, and design philosophy.

1.1 Module Map

Module	Stage	Purpose
`sa-data-intake`	Data prep	Ingest, validate, and profile raw TTE data
`sa-diagnostics`	Diagnostics	Six automated checks + routing report
`sa-standard-km`	KM + Cox	Kaplan-Meier curves, log-rank tests, Cox PH
`sa-competing-risks`	Competing risks	CIF estimation, cause-specific Cox, Fine-Gray
`sa-recurrent-multistate`	Recurrent/multi-state	AG, PWP, WLW, frailty, multi-state models
`sa-time-varying`	Time-varying	Landmark analysis, tmerge, time-dependent Cox
`sa-advanced-adjustments`	Adjustments	Left truncation, IPCW, frailty, cluster SE
`sa-publication-figures`	Figures	300 DPI multi-panel assembly
`sa-manuscript-quarto`	Manuscript	Table 1, results draft, Quarto rendering
`sa-end-to-end`	Orchestration	Project init, diagnostic router, QA, checkpoints

1.2 Directory Structure

Each analysis project lives under projects/<name>/:

projects/<name>/
├── 01_data/           # raw/ and simulated/
├── 02_diagnostics/    # diagnostic check outputs
├── 03_analysis/       # model results and RDS files
├── 04_figures/        # PNG + PDF at 300 DPI
├── 05_manuscript/     # Quarto source and rendered output
└── 06_qa/             # QA validation reports

1.3 Data Flow

flowchart LR
    subgraph "Stage 0"
        A[Raw data] --> B[Validate & Profile]
        B --> C[Diagnose]
        C --> D[Route]
    end
    subgraph "Stage 1"
        D --> E[Analysis Module]
        D --> F[Advanced Adjustments]
    end
    subgraph "Stage 2"
        E --> G[Publication Figures]
        F --> G
    end
    subgraph "Stage 3"
        G --> H[Manuscript + Render]
    end

1.4 Design Philosophy

Diagnostic-First Principle

The pipeline never assumes which survival model is appropriate. Instead, it runs six automated checks on every dataset and uses the results to recommend — or auto-select — the correct analysis path.

This prevents common mistakes such as:

Ignoring competing risks when multiple event types exist
Fitting naive Cox models in the presence of immortal time bias
Missing clustered observations that violate independence
Treating left-truncated data as if follow-up began at time zero

Each module is self-contained with its own scripts/, references/, and SKILL.md. Modules communicate through the standardised project directory layout, reading from and writing to well-defined subdirectories.