PixCell: A generative foundation model for digital histopathology images

Yellapragada, Srikar; Graikos, Alexandros; Li, Zilinghan; Triaridis, Kostas; Belagali, Varun; Nandi, Tarak Nath; Bai, Karen; Knudsen, Beatrice S.; Kurc, Tahsin; Gupta, Rajarsi R.; Prasanna, Prateek; Madduri, Ravi K; Saltz, Joel; Samaras, Dimitris

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2506.05127 (eess)

[Submitted on 5 Jun 2025 (v1), last revised 3 Dec 2025 (this version, v2)]

Title:PixCell: A generative foundation model for digital histopathology images

Authors:Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Tarak Nath Nandi, Karen Bai, Beatrice S. Knudsen, Tahsin Kurc, Rajarsi R. Gupta, Prateek Prasanna, Ravi K Madduri, Joel Saltz, Dimitris Samaras

View PDF HTML (experimental)

Abstract:The digitization of histology slides has revolutionized pathology, providing massive datasets for cancer diagnosis and research. Self-supervised and vision-language models have been shown to effectively mine large pathology datasets to learn discriminative representations. On the other hand, there are unique problems in pathology, such as annotated data scarcity, privacy regulations in data sharing, and inherently generative tasks like virtual staining. Generative models, capable of synthesizing realistic and diverse images, present a compelling solution to address these problems through image synthesis. We introduce PixCell, the first generative foundation model for histopathology images. PixCell is a diffusion model trained on PanCan-30M, a large, diverse dataset derived from 69,184 H&E-stained whole slide images of various cancer types. We employ a progressive training strategy and a self-supervision-based conditioning that allows us to scale up training without any human-annotated data. By conditioning on real slides, the synthetic images capture the properties of the real data and can be used as data augmentation for small-scale datasets to boost classification performance. We prove the foundational versatility of PixCell by applying it to two generative downstream tasks: privacy-preserving synthetic data generation and virtual IHC staining. PixCell's high-fidelity conditional generation enables institutions to use their private data to synthesize highly realistic, site-specific surrogate images that can be shared in place of raw patient data. Furthermore, using datasets of roughly paired H&E-IHC tiles, we learn to translate PixCell's conditioning from H&E to multiple IHC stains, allowing the generation of IHC images from H&E inputs. Our trained models are publicly released to accelerate research in computational pathology.

Comments:	Project page - this https URL
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2506.05127 [eess.IV]
	(or arXiv:2506.05127v2 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2506.05127

Submission history

From: Srikar Yellapragada [view email]
[v1] Thu, 5 Jun 2025 15:14:32 UTC (12,114 KB)
[v2] Wed, 3 Dec 2025 04:16:13 UTC (40,224 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:PixCell: A generative foundation model for digital histopathology images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:PixCell: A generative foundation model for digital histopathology images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators