1  Project Overview

Survival-Pipe organises survival analysis into ten independent modules, each responsible for one stage of the workflow. This chapter describes the architecture, directory layout, and design philosophy.

1.1 Module Map

Module Stage Purpose
sa-data-intake Data prep Ingest, validate, and profile raw TTE data
sa-diagnostics Diagnostics Six automated checks + routing report
sa-standard-km KM + Cox Kaplan-Meier curves, log-rank tests, Cox PH
sa-competing-risks Competing risks CIF estimation, cause-specific Cox, Fine-Gray
sa-recurrent-multistate Recurrent/multi-state AG, PWP, WLW, frailty, multi-state models
sa-time-varying Time-varying Landmark analysis, tmerge, time-dependent Cox
sa-advanced-adjustments Adjustments Left truncation, IPCW, frailty, cluster SE
sa-publication-figures Figures 300 DPI multi-panel assembly
sa-manuscript-quarto Manuscript Table 1, results draft, Quarto rendering
sa-end-to-end Orchestration Project init, diagnostic router, QA, checkpoints

1.2 Directory Structure

Each analysis project lives under projects/<name>/:

projects/<name>/
├── 01_data/           # raw/ and simulated/
├── 02_diagnostics/    # diagnostic check outputs
├── 03_analysis/       # model results and RDS files
├── 04_figures/        # PNG + PDF at 300 DPI
├── 05_manuscript/     # Quarto source and rendered output
└── 06_qa/             # QA validation reports

1.3 Data Flow

flowchart LR
    subgraph "Stage 0"
        A[Raw data] --> B[Validate & Profile]
        B --> C[Diagnose]
        C --> D[Route]
    end
    subgraph "Stage 1"
        D --> E[Analysis Module]
        D --> F[Advanced Adjustments]
    end
    subgraph "Stage 2"
        E --> G[Publication Figures]
        F --> G
    end
    subgraph "Stage 3"
        G --> H[Manuscript + Render]
    end

1.4 Design Philosophy

NoteDiagnostic-First Principle

The pipeline never assumes which survival model is appropriate. Instead, it runs six automated checks on every dataset and uses the results to recommend — or auto-select — the correct analysis path.

This prevents common mistakes such as:

  • Ignoring competing risks when multiple event types exist
  • Fitting naive Cox models in the presence of immortal time bias
  • Missing clustered observations that violate independence
  • Treating left-truncated data as if follow-up began at time zero

Each module is self-contained with its own scripts/, references/, and SKILL.md. Modules communicate through the standardised project directory layout, reading from and writing to well-defined subdirectories.