GigaPath
GigaPath
GigaPath
https://doi.org/10.1038/s41586-024-07441-w Hanwen Xu1,2,7, Naoto Usuyama1,7, Jaspreet Bagga1, Sheng Zhang1, Rajesh Rao1,
Tristan Naumann1, Cliff Wong1, Zelalem Gero1, Javier González1, Yu Gu1, Yanbo Xu1, Mu Wei1,
Received: 30 November 2023
Wenhui Wang1, Shuming Ma1, Furu Wei1, Jianwei Yang1, Chunyuan Li1, Jianfeng Gao1,
Accepted: 19 April 2024 Jaylen Rosemon3, Tucker Bower3, Soohee Lee4, Roshanthi Weerasinghe4, Bill J. Wright4,
Ari Robicsek4, Brian Piening3,5, Carlo Bifulco3,5 ✉, Sheng Wang2,6 ✉ & Hoifung Poon1 ✉
Published online: 22 May 2024
Open access
Digital pathology poses unique computational challenges, as a standard gigapixel
Check for updates
slide may comprise tens of thousands of image tiles1–3. Prior models have often
resorted to subsampling a small portion of tiles for each slide, thus missing the
important slide-level context4. Here we present Prov-GigaPath, a whole-slide
pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image
tiles in 171,189 whole slides from Providence, a large US health network comprising
28 cancer centres. The slides originated from more than 30,000 patients covering 31
major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision
transformer architecture for pretraining gigapixel pathology slides. To scale
GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath
adapts the newly developed LongNet5 method to digital pathology. To evaluate
Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer
subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With
large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains
state-of-the-art performance on 25 out of 26 tasks, with significant improvement
over the second-best method on 18 tasks. We further demonstrate the potential of
Prov-GigaPath on vision–language pretraining for pathology7,8 by incorporating
the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model
that achieves state-of-the-art performance on various digital pathology tasks,
demonstrating the importance of real-world data and whole-slide modelling.
Computational pathology has the potential to transform cancer diag- pretrained on such data. For example, existing pathology foundation
nostics by empowering diverse clinical applications, including can- models were mainly pretrained on whole-slide images (WSIs) from The
cer subtyping2,9,10, cancer staging1,11–13, diagnostic prediction14–17 and Cancer Genome Atlas (TCGA), an expert-curated dataset comprising
prognostic prediction18–23. Despite the encouraging performance of approximately 30,000 slides and 208 million image tiles. Although
existing computational approaches, these are often developed for a they are a tremendous resource, TCGA data might not be sufficiently
specific application and require a large amount of annotated data for large to fully address the challenges around real-world digital pathol-
supervised learning. Data annotation is expensive and time-consuming ogy in clinical practice, such as heterogeneity and noise artefacts34,
and has emerged as an important bottleneck for computational pathol- leading to a substantial performance drop when using TCGA-based
ogy. Recently, self-supervised learning has shown promising results in predictive models and biomarkers on out-of-distribution samples.
leveraging unlabelled data to pretrain a foundation model, which can Second, it remains challenging to design a model architecture that can
substantially reduce the demand for task-specific annotations24–28. effectively capture both local patterns in individual tiles and global pat-
Owing to their strong generalizability, foundation models have been terns across whole slides35–39. Existing models often treat each image
developed for biomedical domains where labelled data are scarce but tile as an independent sample and formulate slide-level modelling as
unlabelled data are abundant, a situation that aptly describes compu- multiple instance learning4,40–43, thus limiting their ability to model
tational pathology29–33. complex global patterns in gigapixel whole slides. A notable exception
There are three major challenges that hinder the development and is Hierarchical Image Pyramid Transformer (HIPT), which explores
use of pathology foundation models for real-world clinical applica- hierarchical self-attention over the tiles35. Third, in the rare cases in
tions. First, publicly available pathology data are relatively scarce and which pretraining has been conducted on large-scale real-world patient
of varying quality, which limits the performance of foundation models data, the resulting foundation models are typically not accessible to
Microsoft Research, Redmond, WA, USA. 2Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. 3Providence Genomics, Portland, OR, USA.
1
Providence Research Network, Renton, WA, USA. 5Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA. 6Department of Surgery, University of Washington, Seattle,
4
WA, USA. 7These authors contributed equally: Hanwen Xu, Naoto Usuyama. ✉e-mail: carlo.bifulco@providence.org; swang@cs.washington.edu; hoifung@microsoft.com
Image-level embeddings
Slide-level embeddings
Vision transformer
Dilated attention
([CLS])
256 × 256 image tile sequence Tile-level encoder Slide-level encoder (LongNet)
Contrastive loss
Reconstruction loss
LongNet-based encoder
Image tile
Fig. 1 | Overview of Prov-GigaPath. a, Flow chart showing the model to generate contextualized embeddings, which can serve as the basis for various
architecture of Prov-GigaPath. Prov-GigaPath first serializes each input WSI downstream applications. b, Image tile-level pretraining using DINOv2.
into a sequence of 256 × 256 image tiles in row-major order and uses an image c, Slide-level pretraining with LongNet using masked autoencoder. [CLS] is the
tile-level encoder to convert each image tile into a visual embedding. Then classification token.
Prov-GigaPath applies a slide-level encoder based on the LongNet architecture
6.5% macro-AUROC improvement and 18.7% AUPRC improvement is better than pretraining using a contrastive-learning-based approach
(Fig. 2c,h and Extended Data Fig. 3). SimCLR26 and masked autoencoders45 (Supplementary Fig. 4), demon-
We also conducted head-to-head comparison of all approaches on strating the effectiveness of our pretraining strategy. Prov-GigaPath
TCGA data to examine the generalizability of Prov-GigaPath. We again also outperformed a supervised learning approach that utilizes an
used LUAD-specific five-gene mutation prediction as a key evaluation ImageNet-trained model, necessitating our self-supervised learning
task (Fig. 2d,i and Extended Data Fig. 4). We observed similar advantage framework (Supplementary Fig. 4).
of Prov-GigaPath over the competing methods. This is all the more Overall, Prov-GigaPath demonstrated clear performance gains on
remarkable given that the competing methods35,41,42 were all pretrained various pathomics tasks over prior state-of-the-art pathology foun-
on TCGA. To further test the generalizability of Prov-GigaPath, we dation models. We hypothesize that such significant improvement
collected a new cohort of 403 patients with colorectal cancer from reflects the differentiation advantage in our whole-slide modelling.
Providence. These data were collected after March 2023, whereas all
data used for pretraining Prov-GigaPath were collected before March
2023. We found that Prov-GigaPath again outperformed competing Prov-GigaPath improves cancer subtyping
methods on this cohort. We also noted that the performance was not Given the overall utility of pathology images in assigning tumour sub-
significantly different from that on previous data from patients with types2,9,10,49, we next examined whether Prov-GigaPath can accurately
colorectal cancer (Extended Data Fig. 5). Finally, we examined the predict cancer subtypes from images. We evaluated our method on
prediction of overall tumour mutation burden (TMB), a predictive subtyping for nine major cancer types in Prov-Path (Fig. 3). Prov-
biomarker in solid tumours that is particularly relevant for immuno- GigaPath outperformed all competing approaches on all nine can-
therapy. Prov-GigaPath achieved the best performance with an average cer types and achieved significant improvements compared with
AUROC of 0.708, with significant improvement over the second-best the second-best method on six cancer types, indicating that our tile
method (Fig. 2e,j). encoder and slide encoder work synergistically to extract meaningful
We observed that GigaPath pretrained on Prov-Path achieves a sub- features for differentiating minute pathological patterns. A key differ-
stantial improvement against the same model architecture pretrained ence between HIPT and Prov-GigaPath is the aggregation layer over
on TCGA data when tested on LUAD-specific five-gene mutation in image tile embeddings. The substantial improvement of Prov-GigaPath
TCGA, indicating the high quality of Prov-Path (Extended Data Fig. 6). over HIPT demonstrates the promise in using LongNet for efficient
We further found that GigaPath outperformed HIPT when both are and effective aggregation of the ultra-large collection of image tiles
trained on Prov-Path, indicating that the effectiveness of GigaPath in a whole slide.
framework (Extended Data Figs. 7 and 8). To further assess the pretrain- Finally, we conducted ablation studies to systematically assess how
ing strategy of our method, we observed that pretraining using DINOv2 each component of Prov-GigaPath contributes to its performance
0 0 0 0 0
Pan 18-biomarker LUAD 5-gene Pan 5-gene LUAD 5-gene (TCGA) Pan TMB
0.65
0.60
0.55
74
AS
53
FR
X3
1A
S1
A
TA
T2
P1
T2
AP
KD
T2
AT
D2
TP
FA
RO
ID
EG
TC
KR
K3
SP
KM
KM
KM
LR
ZF
PR
AR
C
PI
N
Fig. 2 | Gene mutation prediction. a−j, Bar plots comparing the AUROC and on TCGA. a−k, Data are mean ± s.e.m. across n = 10 independent experiments.
AUPRC scores of Prov-GigaPath and competing methods on pan-cancer The listed P value indicates the significance for Prov-GigaPath outperforming
18-biomarker (a,f), LUAD-specific 5-gene mutation prediction (b,g), the best comparison approach, with one-sided Wilcoxon test. l, Comparison
pan-cancer 5-gene mutation prediction (c,h), LUAD-specific 5-gene mutation of AUROC scores for individual biomarkers in pan-cancer 18-biomarker
prediction on TCGA (d,i) and pan-cancer TMB prediction (e,j). k, Bar plot predictions.
showing AUROC for each gene on LUAD-specific five-gene mutation prediction
in cancer subtyping (Supplementary Fig. 5). To examine the impor- tested one alternative by removing LongNet and only aggregating
tance of LongNet pretraining, we replaced the LongNet encoder through the attention-based deep multiple instance learning (ABMIL)
pretrained on Prov-Path with a randomly initialized model. We layer. On average, the ABMIL layer cannot achieve a similar perfor-
observed a substantial performance decrease in average AUROC mance to LongNet for slide encoder (P value < 0.012), confirming
from 0.903 to 0.886 (P value < 2.0 × 10−3), indicating that pretrain- the necessity of modelling long-range dependencies in pathology
ing our LongNet encoder could better capture the slide-level cancer slides.
heterogeneity. We observed that freezing and unfreezing the LongNet
encoder achieved comparable performance on cancer subtyping
tasks. This suggests that our pretraining approach can effectively Slide-level vision–language alignment
learn high-quality representations, reducing the need for additional The promising results of Prov-GigaPath on pathology images further
fine-tuning of LongNet. To verify the superiority of using the LongNet motivated us to explore Prov-GigaPath in multimodal vision–language
encoder to aggregate image patterns across the whole slide, we then processing. Prior work on pathology vision–language modelling tends
BACC
0.6 0.6 0.6 0.6 0.6 0.6
BACC
0.6 0.6 0.6 0.6 0.6 0.6
BACC
Fig. 3 | Comparison on cancer subtyping. a–f, Bar plots comparing cancer BACC, balanced accuracy. BRCA, breast invasive carcinoma; CNS, central
subtyping performance in terms of AUROC (a,c,e) and balanced accuracy nervous system; COADREAD, colorectal adenocarcinoma; DIFG, diffuse
(b,d,f) on nine cancer types. Data are mean ± s.e.m. across n = 10 independent intrinsic pontine glioma; EGC, early gastric cancer; HB, hepatobiliary; NSCLC,
experiments. The listed P value indicates the significance for Prov-GigaPath non-small cell lung cancer; OVT, ovarian cancer; RCC, renal cell cancer.
outperforming the best comparison approach, with one-sided Wilcoxon test.
to focus on tile-level alignment of pathology images and text, as their tissue in Prov-Path. Prov-GigaPath outperformed PLIP by a considerable
studies were limited by the sources of image–text pairs (textbook margin, which potentially reflects the superiority of real-world clinical
examples7 or Twitter data8). By contrast, we examined slide-level data over Twitter data.
alignment of pathology images and text by leveraging the associ- Next, we examined the possibility of predicting gene mutations using
ated report for each slide (Fig. 4a). Such naturally occurring slide– the vision–language pretrained Prov-GigaPath (Fig. 4d,e and Extended
report pairs can potentially uncover richer slide-level information, Data Fig. 9) in the same zero-shot setting. We adopted the prompts
but the modelling is considerably more challenging as we do not have used for cancer subtyping by replacing the cancer type name with the
fine-grained alignment information between individual image tiles gene name for which we want to predict the binary mutation status.
and text snippets. We used the standard cross-modal contrastive loss Prov-GigaPath substantially outperformed state-of-the-art pathology
in continual pretraining of Prov-GigaPath as the visual encoder and vision–language models by a large margin across all six mutations we
PubMedBERT29, a state-of-the-art biomedical language model, as the have examined (P value < 0.001) (Fig. 4d,e). The improvement of our
textual encoder (Fig. 4b). approach is larger on mutation prediction than on cancer subtyping,
We evaluated the resulting Prov-GigaPath on zero-shot cancer sub- which may be partially attributable to richer mutation information
typing in NSCLC and COADREAD following the same setting used in in pathology reports from real-world data compared with text com-
MI-Zero7, a state-of-the-art pathology vision–language model. In the mentary in Twitter8 and scientific papers50. To our knowledge, this is
zero-shot setting, no training images are provided for any of the tar- the first time zero-shot gene mutation prediction was evaluated on
get cancer subtypes. Slides and the corresponding cancer subtypes pathology vision–language modelling. The promising performance
were collected from Prov-Path. Compared with three state-of-the-art of Prov-GigaPath on this novel task bodes well for potential future
pathology vision–language models, Prov-GigaPath attained the best applications in studying rare cancer types and new mutations.
zero-shot classification results on all three metrics in both cancer types
(Fig. 4c,e, Extended Data Fig. 9 and Supplementary Fig. 6), suggesting
that slide-level alignment enabled by LongNet is indeed advantageous. Discussion
Prov-GigaPath attained larger improvement on NSCLC than COAD- We have introduced Prov-GigaPath, a pathology foundation model for
READ, which can be ascribed to the more prevalent presence of lung a broad range of digital pathology applications. Prov-GigaPath was
cell carcinoma
Case ID: XXX
tumour cells
of the lung
carcinoma
carcinoma
squamous
detected:
Patient name: XXX
Variants
positive
Adeno-
mutant
EGFR-
tissue
KRAS
Colon
FAT1
Lung
DOB: XX/XX
Diagnosis:
Infiltrating ductal
carcinoma, right P1 P2 Pn
breast core biopsies. Prov-GigaPath
PubMedBERT T1 T2 T3 T4 T5 T6
PubMedBERT
0.21
Prov-GigaPath
T2 T1P3 T1P3 T1P3
Diagnosis: 0.16 0.14
Infiltrating ductal T1P3 P
carcinoma, right 0.05
breast core biopsies. Tn 0.03 0.03
T1P3 T1P3 T1P3
0.5
BACC
0.55 0.4
f1
0.4
0.50 0.3
0.3
0.45 0.2
NSCLC COADREAD NSCLC COADREAD NSCLC COADREAD
d P < 0.001 P < 0.001 P < 0.001 P < 0.001 P < 0.001 P < 0.001
0.6 0.6 0.6 0.6 0.6 0.6
BACC
BACC
BACC
BACC
BACC
0.4 0.4 0.4 0.4 0.4 0.4
0.58 0.58
0.54
0.54
0.54 0.54
0.50
0.50 0.50
0.50
0.46
0.50 0.54 0.58 0.62 0.66 0.46 0.50 0.54 0.58 0.62 0.50 0.54 0.58 0.62
0
4
8
2
6
5
5
5
6
6
0.
0.
0.
0.
0.
Fig. 4 | Comparison on image–report alignment. a, Flow chart showing the probability of the input WSI being classified into specific cancer subtypes and
fine-tuning of Prov-GigaPath using pathology reports. Real-world pathology mutations. c, Bar plots comparing zero-shot subtyping performance on NSCLC
reports are processed using GPT-3.5 from OpenAI to remove information and COADREAD in terms of BACC, precision and f 1. d, Bar plots comparing the
irrelevant to cancer diagnosis. We performed the CLIP-based contrastive performance on mutation prediction using the fine-tuned model for six genes.
learning to align Prov-GigaPath and PubMedBERT. b, The fine-tuned c,d, Data are mean ± s.e.m. across n = 50 experiments. The listed P value
Prov-GigaPath can then be used to perform zero-shot cancer subtyping and indicates the significance for Prov-GigaPath outperforming the best
mutation prediction. The input of Prov-GigaPath is a sequence of tiles comparison approach, with one-sided Wilcoxon test. e, Scatter plots
segmented from a WSI, and the inputs of the text encoder PubMedBERT are comparing the performance between Prov-GigaPath and MI-Zero in terms
manually designed prompts representing cancer types and mutations. Based of BACC on zero-shot cancer subtyping. Each dot indicates one trial with a
on the output of Prov-GigaPath and PubMedBERT, we can calculate the particular set of text query formulations.
pretrained on a large real-world dataset Prov-Path derived from Provi- Providence and TCGA datasets, we demonstrated state-of-the-art per-
dence health system with diverse types and qualities. Prov-Path is formance for Prov-GigaPath on a variety of pathomics and cancer sub-
substantially larger than TCGA, comprising 1,384,860,229 image tiles typing tasks, as well as on vision–language processing. Prov-GigaPath
from 171,189 whole pathology slides of around 30,000 patients. We has the potential to assist clinical diagnostics and decision support,
proposed GigaPath for pretraining, which adapted the cutting-edge and GigaPath can potentially be applicable to broader biomedical
LongNet5 as the vision transformer to facilitate ultra-large-context domains for efficient self-supervised learning from high-resolution
modelling of gigapixel WSIs. In comprehensive evaluation on both images.
Competing interests C.B. is a member of the scientific advisory board and owns stock in
Data availability PrimeVax and BioAI; is on the scientific board of Lunaphore and SironaDx; has a consultant or
The pathology imaging data used for the pretraining were created advisory relationship with Sanofi, Agilent, Roche and Incendia; contributes to institutional
research for Illumina, and is an inventor on US patent applications US20180322632A1 (Image
from oncology pathology slides at Providence. The associated clini- Processing Systems and Methods for Displaying Multiple Images of a Biological Specimen) filed
cal data used for fine-tuning and testing were obtained from the cor- by Ventana Medical Systems, Providence Health and Services Oregon and US20200388033A1
responding medical records. These proprietary data cannot be made (System and Method for Automatic Labeling of Pathology Images) filed by Providence Health
and Services Oregon, Omics Data Automation. The other authors declare no competing
publicly available. Researchers may obtain a de-identified test subset interests.
from Providence Health System by reasonable request and subject
to local and national ethical approvals. To help researchers use our Additional information
Supplementary information The online version contains supplementary material available at
model, we provide a de-identified subset of our data at https://doi. https://doi.org/10.1038/s41586-024-07441-w.
org/10.5281/zenodo.10909616 (ref. 57) and https://doi.org/10.5281/ Correspondence and requests for materials should be addressed to Carlo Bifulco,
zenodo.10909922 (ref. 58) for a few patients. We also collected pub- Sheng Wang or Hoifung Poon.
Peer review information Nature thanks Akshay Chaudhari, Joe Yeong and the other,
licly available TCGA WSIs from the NIH Genomic Data Commons Data anonymous, reviewer(s) for their contribution to the peer review of this work.
Portal. The TCGA-LUAD dataset, comprising whole pathology slides Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | Comparison on Pan-cancer 18-biomarker prediction. Bar plot showing the AUPRC score for each biomarker on the 18-biomarker
prediction by Prov-GigaPath and competing methods.
Article
Extended Data Fig. 2 | Comparison on LUAD 5-gene mutation prediction. across n = 10 independent experiments and the bar centre shows the mean
Bar plots showing AUROC and AUPRC scores for predicting each gene mutation value. The listed p-value indicates the significance level that Prov-GigaPath
on LUAD 5-gene mutation prediction. The error bars show the standard error outperforms the best comparison approach, with one-sided Wilcoxon test.
Extended Data Fig. 3 | Comparison on Pan-cancer 5-gene mutation shows the mean value. The listed p-value indicates the significance level that
prediction. Bar plots showing AUROC and AUPRC scores for predicting each Prov-GigaPath outperforms the best comparison approach, with one-sided
gene mutation on Pan-cancer 5-gene mutation prediction. The error bars show Wilcoxon test.
the standard error across n = 10 independent experiments and the bar centre
Article
Extended Data Fig. 4 | Comparison on LUAD 5-gene mutation prediction in error across n = 10 independent experiments and the bar centre shows the mean
TCGA. Bar plots showing AUPRC scores for predicting each gene mutation on value. The listed p-value indicates the significance level that Prov-GigaPath
LUAD 5-gene mutation prediction in TCGA. The error bars show the standard outperforms the best comparison approach, with one-sided Wilcoxon test.
Extended Data Fig. 5 | Comparison on mutation prediction on new colorectal shows the mean value. The listed p-value indicates the significance level that
patients. Bar plots showing AUROC and AUPRC scores for predicting 5-gene Prov-GigaPath outperforms the best comparison approach, with one-sided
mutation and TMB status on new patients from Providence. The error bars show Wilcoxon test.
the standard error across n = 10 independent experiments and the bar centre
Article
Extended Data Fig. 6 | Comparison between pretraining the same model GigaPath-TCGA is GigaPath trained on TCGA. The error bars show the standard
using Prov-Path and TCGA. a-b, Bar plots showing the AUROC (a) and AURPC error across n = 10 independent experiments and the bar centre shows the mean
(b) on LUAD 5-gene mutation prediction in TCGA using models trained on value. The listed p-value indicates the significance level that Prov-GigaPath
Prov-Path and TCGA. Prov-GigaPath is GigaPath trained on Prov-Path. outperforms GigaPath-TCGA, with one-sided Wilcoxon test.
Extended Data Fig. 7 | Comparison between GigaPath trained using pretrained on Prov-Path. The error bars show the standard error across n = 10
Prov-Path and HIPT trained using Prov-Path on mutation prediction. independent experiments and the bar centre shows the mean value. The listed
a-j: Bar plots showing the AUROC (a-e) and AURPC (f-j) of mutation prediction p-value indicates the significance level that Prov-GigaPath outperforms the
tasks by Prov-GigaPath and HIPT-Prov-Path. HIPT-Prov-Path indicates HIPT HIPT-Prov-Path, with one-sided Wilcoxon test.
Article
Extended Data Fig. 8 | Comparison between GigaPath trained using pretrained on Prov-Path. The error bars show the standard error across n = 10
Prov-Path and HIPT trained using Prov-Path on cancer subtyping. independent experiments and the bar centre shows the mean value. The listed
a-f, Bar plots showing the AUROC (a,c,e) and BACC (b,d,f) of cancer subtyping p-value indicates the significance level that Prov-GigaPath outperforms the
tasks by Prov-GigaPath and HIPT-Prov-Path. HIPT-Prov-Path indicates HIPT HIPT-Prov-Path, with one-sided Wilcoxon test.
Extended Data Fig. 9 | Alignment between pathology reports and images. indicates the significance level that Prov-GigaPath outperforms the best
a-d, Bar plots showing the performance of f 1 (a), Precision (b), AUROC (c) and comparison approach, with one-sided Wilcoxon test. e, Scatter plots comparing
AUPRC (d) using fine-tuned Prov-GigaPath to predict mutations in the Prov-GigaPath and MI-Zero on cancer subtyping prediction and mutation
zero-shot learning setting. The error bars show the standard error across prediction in terms of balanced accuracy (BACC).
n = 50 experiments and the bar centre shows the mean value. The listed p-value
nature portfolio | reporting summary
Corresponding author(s): Hoifung Poon
Last updated by author(s): 2024/04/01
Reporting Summary
Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis This work uses open source codebase and libraries to analyze the data. We used DINOv2 (https://github.com/facebookresearch/dinov2/tree/
main) to pretrain the ViT tile encoder and OpenCLIP (https://github.com/mlfoundations/open_clip) to train the vision-language alignment
model. For LongNet model, we used the implementation in torchscale==0.1.1. To install torchscale, we used the following public packages,
including torch==2.0.0+cu117, torchvision==0.15.0+cu117, tensorboard==2.15.1, timm==0.9.12, xformers==0.0.18, einops==0.7.0,
fairscale==0.4.13, huggingface-hub==0.19.4. We used scikit-learn==1.3.2, scipy==1.11.4 and numpy==1.24.1 to evaluate the model
performance. We used matplotlib==3.3.0 to visualize the data.
All the codes to reproduce our experiments will be made public upon publication.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.
April 2023
1
nature portfolio | reporting summary
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A description of any restrictions on data availability
- For clinical datasets or third party data, please ensure that the statement adheres to our policy
The pathology imaging data used for the pretraining were created from oncology pathology slides at Providence. The associated clinical data used for fine-tuning
and testing were obtained from the corresponding medical records. These proprietary data cannot be made publicly available. Researchers may obtain a de-
identified test subset from Providence Health System by reasonable request and subject to local and national ethical approvals. To help researchers use our model,
we provide a de-identified subset of our data at https://doi.org/10.5281/zenodo.10909616 and https://doi.org/10.5281/zenodo.10909922 for a few patients. We
also collected publicly available TCGA whole slide images from NIH Genomic Data Commons Data Portal. The TCGA LUAD dataset, comprising whole pathology slides
and labels, is available via the NIH Genomic Data Commons portal at https://portal.gdc.cancer.gov/projects/TCGA-LUAD.
Reporting on race, ethnicity, or Please specify the socially constructed or socially relevant categorization variable(s) used in your manuscript and explain why
other socially relevant they were used. Please note that such variables should not be used as proxies for other socially constructed/relevant variables
groupings (for example, race or ethnicity should not be used as a proxy for socioeconomic status).
Provide clear definitions of the relevant terms used, how they were provided (by the participants/respondents, the
researchers, or third parties), and the method(s) used to classify people into the different categories (e.g. self-report, census or
administrative data, social media data, etc.)
Please provide details about how you controlled for confounding variables in your analyses.
Population characteristics Describe the covariate-relevant population characteristics of the human research participants (e.g. age, genotypic
information, past and current diagnosis and treatment categories). If you filled out the behavioural & social sciences study
design questions and have nothing to add here, write "See above."
Recruitment Describe how participants were recruited. Outline any potential self-selection bias or other biases that may be present and
how these are likely to impact results.
Ethics oversight Identify the organization(s) that approved the study protocol.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions We identified tiles that don't have a substantial tissue occupancy as the background area and filter them out from pretraining and finetuning.
Replication Across all 26 tasks, we ran 10-fold cross-validation with 10 different seeds to determine whether the improvement of our model is significant
2
Replication compared to baseline approaches.
Blinding During the test, the researchers were blinded to the group allocation.
Plants
Seed stocks Report on the source of all seed stocks or other plant material used. If applicable, state the seed stock centre and catalogue number. If
plant specimens were collected from the field, describe the collection location, date and sampling procedures.
Novel plant genotypes Describe the methods by which all novel plant genotypes were produced. This includes those generated by transgenic approaches,
gene editing, chemical/radiation-based mutagenesis and hybridization. For transgenic lines, describe the transformation method, the
number of independent lines analyzed and the generation upon which experiments were performed. For gene-edited lines, describe
the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor
was applied.
Authentication Describe any authentication procedures for each seed stock used or novel genotype generated. Describe any experiments used to
assess the effect of a mutation and, where applicable, how potential secondary effects (e.g. second site T-DNA insertions, mosiacism,
off-target gene editing) were examined.
April 2023