4  Artificial Intelligence and Informatics in Hematology

4.1 Session Overview

Session Details
Session Artificial Intelligence & Informatics in Hematology: Driving Practice Improvement and Patient-Centered Care
Speaker Barbara Lam, MD
Affiliation Fred Hutch Cancer Center, University of Washington, Seattle, WA
Time Day 1, 11:45 a.m.–12:15 p.m.

Dr. Lam organizes this year’s ASH AI abstracts into four major themes—AI classification of hematologic diseases from pathology, AI for dynamic risk prediction, large language models (LLMs) for clinical decision-making, and wearable technology for home monitoring of patients [slide p.4]. The learning objectives are to describe new AI and remote-monitoring applications for research and clinical practice and to identify the benefits and pitfalls of using LLMs for clinical decision-making [slide p.3]. The talk is case-based, walking through representative 2025 ASH abstracts under each theme and ending with a unifying view of how diverse data inputs (H&E slides, methylation arrays, longitudinal labs, wearable streams) feed modern neural-network, transformer, and LLM architectures to deliver faster diagnosis, dynamic risk prediction, and better prognostication [slide p.44].

4.2 Speaker Spotlight

Barbara Lam, MD is a physician-informaticist at Fred Hutchinson Cancer Center specializing in the application of electronic health record (EHR) data, natural language processing (NLP), and machine learning (ML) to hematologic malignancies. Her research focuses on clinical informatics for hematopoietic cell transplantation and cellular therapy, including predictive modeling for adverse events and outcomes optimization using real-world data.

4.3 What’s New in 2025–2026

Dr. Lam opens by framing AI as neural networks that take an input (for example, an H&E image) through an arbitrary number of hidden layers to generate a classification output—the familiar “is this an image of a cat” schematic reapplied to hematologic diagnosis and risk prediction [slides p.7, p.14]. She then structures the talk around four themes drawn from this year’s AI abstracts [slide p.4].

4.3.1 Theme 1 — AI to Classify Hematologic Diseases from Pathology

The motivating case is a 45-year-old woman with Sjögren disease and an enlarging cervical node whose core biopsy is “concerning for lymphoma, additional studies pending”—the team debates what to do while ancillary stains run for several more weeks [slide p.6]. Could a neural network give them a faster answer?

Abstract 13972 — Lymphovision (Seheult et al., Mayo Clinic) trained a lymphoma-specialized foundation model on 31,211 H&E slides from 9,155 Mayo Clinic cases (2006–2024), using 37 million patches at four magnifications (5x, 10x, 20x, 40x) [slide p.10]. On H&E alone, the model could:

  • classify B-cell lymphomas (AUC 0.95),
  • grade follicular lymphoma (AUC 0.88),
  • identify DLBCL cell of origin (AUC 0.93), and
  • prognosticate DLBCL, with 24-month event-free survival AUC rising from 0.67 (clinical only) to 0.74 when H&E images were added [slide p.10].

Abstract 10859 — Rapid DNA-methylation–based classification (Achterberg et al.) trained on 5,550 methylation-array samples and classifies 58 hematologic entities spanning leukemias and lymphomas [slide p.12]. Paired with nanopore sequencing, the pipeline can return a high-confidence call within hours rather than weeks, and achieved 95% accuracy on a retrospective cohort of 178 adult and pediatric patients at a confidence threshold of 0.95 [slide p.12]. The take-home from Theme 1 is that AI classifiers can accept many input types (H&E, methylation arrays, other molecular data) and could extend beyond malignancies to other hematologic disorders [slide p.13].

Clinical Pearl: Foundation Models Speed Up the Diagnostic Clock

Both Lymphovision (H&E) and the DNA-methylation classifier tackle the same bottleneck as Dr. Lam’s Sjögren case—weeks of ancillary testing. In the future, AI models may more rapidly diagnose hematologic malignancies and shorten the interval between biopsy and treatment decisions [slide p.14].

4.3.2 Theme 2 — AI for Dynamic Risk Prediction

Motivating case: a 72-year-old man with pancreatic cancer on prophylactic anticoagulation whose disease and counts have improved—how does his clot risk change over time? [slide p.16] Traditional scores such as the Geneva VTE prophylaxis score or Khorana assign fixed points to static yes/no variables at a single time point [slide p.17]. Two abstracts in this theme “re-envision” the traditional risk score by using richer, dynamic inputs [slide p.18].

Abstract 7816 — Dynamic deep-learning VTE model (He et al., VA) derived a transformer model on 88,808 cancer patients from the US Veterans Affairs system and externally validated it on 9,752 patients from Harris Health [slide p.21]. Inputs are time-stamped trajectories of 1,863 ICD phecodes, CBC/CMP lab values, and static features (BMI, sex, race), with diagnoses anchored relative to the cancer diagnosis and treatment index dates [slide p.21]. Across four prediction intervals (0–3, 3–6, 6–9, 9–12 months), the transformer outperformed both the Khorana score and an EHR-CAT comparator, with AUC rising over time and suggesting dynamic transformer models may eventually replace static risk scores [slide p.22].

Abstract 14158 — Multi-endpoint AI morphology model (MEAM) in MPNs (Hu et al.) instead feeds bone-marrow biopsy whole-slide images through a pretrained vision-transformer foundation model to predict four endpoints simultaneously in 788 patients (643 essential thrombocythemia, 145 polycythemia vera): vascular events, transformation to myelofibrosis, transformation to AML/MDS, and death [slide p.25]. Combining MEAM with conventional risk scores improved C-index for transformation (+23% to +46% in ET; +31% to +38% in PV) and death (+13% to +24% in ET; +4% to +17% in PV), whereas prediction of thrombosis from bone-marrow images alone was only comparable to existing models; the study has no external validation yet [slide p.26].

From Static Scores to Dynamic Trajectories

The common thread across Theme 2 is that a patient’s risk is not a single number at diagnosis but a trajectory shaped by labs, diagnoses, treatments, and tissue biology [slides p.18, p.27]. Bringing those data streams into AI models lets risk be re-estimated as the patient’s course unfolds.

4.3.3 Theme 3 — Large Language Models for Clinical Decision-Making

Motivating case: a 35-year-old man with Castleman disease presenting with fevers—a rare entity, so the clinician opens ChatGPT to help prep for the visit and likes the suggested labs but is unsure about the differential [slide p.29]. Dr. Lam polls the audience on whether they and their patients already use LLMs for clinical decision-making and visit preparation [slides p.30–31].

Abstract 11366 — Evaluating LLMs in real-world hematologic decision-making (Swoboda et al.) presented 30 complex MDS cases integrating clinical, morphologic, cytogenetic, and molecular data to four LLMs (GPT-o3, GPT-4o, DeepSeek, Claude) using a standardized prompt; eleven international MDS experts graded diagnosis, prognosis, treatment, and clinical-trial recommendations on a 1–5 Likert scale (≥4 = correct) [slide p.33]. Overall percent-correct performance was modest and heterogeneous: GPT-o3 66.4%, GPT-4o 40.9%, DeepSeek 38.2%, Claude 33.8%, with major factual errors in 24.0%, 24.7%, 30.7% and 32.0% of responses respectively [slide p.33]. Dr. Lam’s conclusion: current state-of-the-art LLMs underperform in highly specialized hematologic malignancy scenarios and should supplement—not replace—other sources [slide p.34]. She also flags important caveats: the published prompt is usually unknown, models (and their versions) drift rapidly, and clinicians are increasingly using other tools such as GPT-5, OpenEvidence, or Doximity GPT; LiveBench.ai was offered as one public benchmark for tracking LLM performance [slide p.34].

Abstract 2556 — Generative-AI clinical decision support for germline predisposition to myeloid neoplasms (Gurnari et al.) tries to make an LLM “evidence based” by wrapping GPT-4 in a retrieval-augmented generation (RAG) pipeline: a curated library of peer-reviewed literature and clinical guidelines is chunked, embedded, and stored in a vector database; user questions are matched to relevant passages, and a multimodal feature additionally lets the chatbot review medical images [slide p.37]. Preliminary feedback suggested increased clinician confidence, though the chatbot still needs prospective testing on real-world cases [slide p.37].

Clinical Pearl: LLMs Are Baseline Helpers, Not Oracles

LLMs can provide basic assistance with clinical decision-making, but there are many models available and they are constantly changing; in the future, added features such as RAG and multimodal inputs may make them more reliable for clinical use [slide p.38]. Until then, treat LLM outputs in hematology as one more input to cross-check against guidelines, experts, and the primary literature.

4.3.4 Theme 4 — Wearable Technology for Home Monitoring

Motivating case: a 22-year-old woman with sickle cell disease who has presented to the ED three times in the past month and often delays care to avoid missing work—the clinician wants a way to monitor vitals from home and guide when to come in [slide p.40].

Abstract 11609 — Hospital-at-Home for hematologic malignancies (Shah et al., Tampa General) implemented an inpatient-level Hospital-at-Home (HaH) program combining continuous biometric vitals monitoring and hematology virtual visits with in-home BID nursing, phlebotomy, infusions, transfusions, oxygen/respiratory support, PT/OT, and daily APP or MD visits [slide p.42]. Of 101 patients screened over 15 months, 45 were enrolled (median age 56, range 28–80); 62% (28 patients) had acute leukemia [slide p.42]. Median in-hospital length of stay was 14 days (range 1–59) with a median 5 days (1–11) in HaH, 22.2% received home transfusions, 20% had unplanned admissions, and 30-day mortality was 0%. The program saved 226 inpatient bed-days overall [slide p.42].

Abstract 14941 — SCD-CARRE wearable feasibility (Jonassaint et al.) is a 12-month multi-site randomized controlled trial in 173 high-risk adults with sickle cell disease, asking whether ≥80% of participants can provide valid data from a wearable worn continuously for 7 days at baseline and follow-ups, and whether wearable metrics track disease severity [slide p.44]. Baseline results met feasibility: valid data collection rate 85.2%, valid data in 132/173 participants, mean daily wear time 21.5 hours, and any device data in 89.6% [slide p.44]. Wearable-derived metrics scaled with disease class: steps/day fell from 4,730 (Class I) to 1,970 (Class IV, p<0.001), distance from 3.30 to 1.24 km (p<0.001), active time from 3.58 to 2.35 hours (p=0.004), with modest differences in average heart rate and total sleep [slide p.44]. These data support integrating wearables into future sickle cell trials and routine care [slide p.44].

Clinical Pearl: Home Monitoring Works for Both Malignant and Classical Hematology

A home monitoring program using wearable devices can give clinicians information that correlates with disease severity in sickle cell disease [slide p.45], while Hospital-at-Home models can safely deliver acute-leukemia–level care at home with zero 30-day mortality in selected patients [slide p.42]. Theme 4 reframes remote monitoring as a deployable tool today, not a future vision.

4.3.5 Putting the Four Themes Together

In her closing synthesis, Dr. Lam emphasizes that clinicians and researchers are experimenting with pulling diverse data types—bone marrow slides, lymph-node biopsies, DNA methylation, labs over time, and wearable streams—into modern AI architectures (neural networks, transformer models, and LLMs) to deliver faster diagnosis, dynamic risk prediction, and better prognostication [slide p.44].

Asia-Pacific Consideration

The abstracts highlighted by Dr. Lam were predominantly derived from US and European cohorts (Mayo Clinic, US Veterans Affairs, Harris Health, UK/European MPN groups) [editorial]. Asia-Pacific hematology programs planning to adopt these tools should prioritize local validation, attention to differences in disease epidemiology and genetic background, and data-governance frameworks appropriate to each jurisdiction before deploying any of these models at scale.

4.4 Clinical Pearls

Five Key Takeaways
  1. Four AI themes in hematology this year. Dr. Lam’s framing highlights AI for (1) pathology-based disease classification, (2) dynamic risk prediction, (3) LLMs for clinical decision-making, and (4) wearable home monitoring [slide p.4].
  2. Foundation models can diagnose lymphoma from H&E. Lymphovision (abstract 13972) classified B-cell lymphomas with AUC 0.95 and improved DLBCL event-free survival prediction from 0.67 to 0.74 when H&E images were added to clinical data [slide p.10]; a DNA-methylation classifier (abstract 10859) can place 58 hematologic entities into a diagnosis in hours with 95% accuracy at a 0.95 confidence threshold [slide p.12].
  3. Dynamic beats static for risk prediction. A VA transformer VTE model using 1,863 phecodes plus labs across cancer trajectories outperformed Khorana and EHR-CAT across 0–12 month windows [slides p.21–22], and a bone-marrow vision-transformer (MEAM) improved transformation and mortality C-indices by 13–46% in ET and PV over NCCN/IPSET/MIPSS scores [slides p.25–26].
  4. LLMs are useful but fallible in specialist hematology. On 30 complex MDS cases, GPT-o3, GPT-4o, DeepSeek, and Claude were overall correct only 33.8–66.4% of the time with major factual errors in 24–32% of answers; prompts and versions are a moving target, and RAG plus multimodal inputs (abstract 2556) are one path to making them more reliable [slides p.33–34, p.37].
  5. Wearables and Hospital-at-Home are already deliverable. A Tampa HaH program (62% acute leukemia) saved 226 inpatient bed-days with 0% 30-day mortality [slide p.42], and the SCD-CARRE RCT showed 85% valid-data collection from wearables in high-risk sickle cell adults, with steps, distance, and active time scaling with disease severity [slide p.44].

4.5 Key References

  1. Seheult JN, Han W, Keser RK, et al. Lymphovision: a lymphoma-specialized foundation model for histology-based lymphoma classification and subtyping. Blood. 2025;146(Suppl 1):abstract 13972 [slides p.9–10].
  2. Achterberg T, de Ruijter E, van Tuil M, et al. Rapid DNA-methylation–based classification of hematological malignancies. Blood. 2025;146(Suppl 1):abstract 10859 [slides p.11–12].
  3. He T, Zheng C, La J, et al. A deep-learning model to dynamically predict cancer-associated thrombosis using electronic health records from the U.S. Veterans Affairs healthcare system. Blood. 2025;146(Suppl 1):abstract 7816 [slides p.20–22].
  4. Hu X, Aberdeen A, Ruane S, et al. Multi-endpoint AI morphology model (MEAM) enhances risk prediction for vascular events and disease progression in MPNs. Blood. 2025;146(Suppl 1):abstract 14158 [slides p.24–26].
  5. Swoboda D, DeZern A, England J, et al. Evaluating large language models in real-world hematologic clinical decision-making: performance, limitations, and clinical implications. Blood. 2025;146(Suppl 1):abstract 11366 [slides p.32–34].
  6. Gurnari C, Pérez Míguez C, Crucitti D, et al. Generative AI-powered clinical decision support in germline predisposition to myeloid neoplasms. Blood. 2025;146(Suppl 1):abstract 2556 [slides p.36–37].
  7. Shah N, Barris J, Mcintosh C, et al. Implementation of an innovative Hospital-at-Home program for patients with hematologic malignancies. Blood. 2025;146(Suppl 1):abstract 11609 [slides p.41–42].
  8. Jonassaint C, Colvin A, Ballantyne C, et al. Feasibility of wearable technology for remote monitoring in high-risk adults with sickle cell disease: baseline data from the SCD-CARRE trial. Blood. 2025;146(Suppl 1):abstract 14941 [slides p.43–44].