Celatlas Spatial is a comprehensive spatial transcriptomics analysis pipeline that provides end-to-end processing from raw sequencing data to publication-ready results. Version 1.7.0 introduces major enhancements including H&E staining-guided analysis (HE mode), flexible parameter configuration, and optimized multi-threading support.
Choose the installation method that best fits your situation:
# Option 1: Download and unzip the package, then install
# 1. Download the zip file from the release page
# 2. Navigate to the directory where you downloaded the file
# 3. Unzip the file to a folder
unzip celatlas_spatial-mian.zip
cd celatlas_spatial-mian
pip install dist/celatlas_spatial-1.7.0-py3-none-any.whl
# Option 2: git clone
git clone [repository_url]
cd Celatlas_spatial
pip install dist/celatlas_spatial-1.7.0-py3-none-any.whlIMPORTANT: Download the Swin-UNET model before running analysis:
# 1. Download swin_tiny_model.tar.gz (97MB) from GitHub Release
# 2. Extract to workspace src directory
mkdir -p /mnt/strna/celatlas_spatial/src
tar -xzf swin_tiny_model.tar.gz -C /mnt/strna/celatlas_spatial/src/
# If using custom workspace:
# mkdir -p $CELATLAS_WORKSPACE/src
# tar -xzf swin_tiny_model.tar.gz -C $CELATLAS_WORKSPACE/src/
# 3. Verify extraction
ls -lh /mnt/strna/celatlas_spatial/src/swin_tiny.pth
# Should show ~106MB swin_tiny.pth fileModel Details:
- Filename:
swin_tiny.pth - Size: ~106MB
- Location:
$CELATLAS_WORKSPACE/src/swin_tiny.pth - Purpose: Tissue segmentation for all analysis modes
Verify installation success:
celatlas_spatial --help- Python: >= 3.9
- External tools: STAR, SAMtools, Subread, TRUST4 (install via conda)
- Python packages: Automatically installed (PyTorch, Pandas, NumPy, Plotly, Jinja2)
- Hardware (HE mode):
- GPU: NVIDIA GPU with CUDA 12 support (recommended for deep learning tissue segmentation)
- CPU: Multi-core processor (16+ cores recommended)
- RAM: 64GB+ (128GB recommended for large tissue sections)
- Storage: 100GB+ free space for temporary files and results
Important: You must install the STAR aligner and other necessary software separately before using Celatlas Spatial:
conda install --file conda_pkgs.txtIMPORTANT: You must download the Swin-UNET model for tissue segmentation:
# 1. Download swin_tiny_model.tar.gz from GitHub Release or provided link
# 2. Extract model to src directory
mkdir -p $CELATLAS_WORKSPACE/src
tar -xzf swin_tiny_model.tar.gz -C $CELATLAS_WORKSPACE/src/
# 3. Verify the model file exists
ls -lh $CELATLAS_WORKSPACE/src/swin_tiny.pth
# Should show ~106MB fileNotes:
- Model size: ~106MB
- Required for tissue segmentation in all analysis modes
- Default workspace:
/mnt/strna/celatlas_spatial(customizable viaCELATLAS_WORKSPACE)
Before running analysis, you need to build reference genome indices:
1. Create reference directory structure:
mkdir -p reference/Homo_sapiens
or
mkdir -p reference/Mus_musculus
or
mkdir -p reference/others2. Download genome files (example for human):
Note: For detailed parameters and the latest guidelines, please refer to the official STAR documentation at https://github.com/alexdobin/STAR."
cd reference/Homo_sapiens
# Download genome FASTA
wget http://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
# Download GTF annotation
wget http://ftp.ensembl.org/pub/release-110/gtf/homo_sapiens/Homo_sapiens.GRCh38.110.gtf.gz
gunzip Homo_sapiens.GRCh38.110.gtf.gz3. Build STAR index:
celatlas_spatial rna mkref \
--genome_name Homo_sapiens \
--fasta reference/Homo_sapiens/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--gtf reference/Homo_sapiens/Homo_sapiens.GRCh38.110.gtf \
--thread 8After installation, you can run the complete spatial transcriptomics analysis pipeline using the Celatlas.sh script. Version 1.7.0 introduces flexible parameter modes - you can use traditional positional arguments or modern named arguments, and optionally add performance tuning parameters.
# Modern usage (Recommended - with named arguments)
Celatlas.sh --sample <name> --chip <chip> --casno <case> --chemistry <chem> \
--species <species> --method <method> --mode <mode> [OPTIONS]
# Legacy usage (Positional arguments - backward compatible)
Celatlas.sh <sample_name> <chip_number> <casno> <chemistry> <species> <method> <mode> [OPTIONS]
Celatlas.sh <chip_number> <casno> <chemistry> <species> <method> <mode> [OPTIONS]
# Hybrid mode (Recommended for flexibility)
Celatlas.sh <positional args...> --thread 32 --bin 10,50,100| Parameter | Description | Valid Values | Example |
|---|---|---|---|
sample_name |
Sample identifier (optional in 6-param mode) | Any string | DEMO |
chip_number |
Chip/slide number | Unique identifier | ST110001_A1 |
casno |
Case/project number for organizing results | Project ID | HE_TEST |
chemistry |
Chemistry version of the kit | BBV0, BBV2.4, BBV3 |
BBV2.4 |
species |
Target species for alignment | Homo_sapiens, Mus_musculus, others |
Mus_musculus |
method |
Analysis method (see detailed explanation below) | HE, ssDNA, gene_expr |
HE |
mode |
Pipeline mode | strna, scrna |
strna |
Understanding the --method parameter is crucial for successful analysis:
| Method | Full Name | When to Use | Required Files | Key Features |
|---|---|---|---|---|
HE |
H&E Staining Guided Analysis | Most common - When you have H&E stained histology images | β’ FASTQ files β’ {chip}_he.(tif|png|jpg)β’ Barcode position files |
β¨ NEW in v1.7.0 β’ Gene expression-based tissue detection β’ HE image registration β’ Combined visualization β’ Optimal for histological context |
ssDNA |
Single-strand DNA Probes | When using fluorescent ssDNA tissue imaging | β’ FASTQ files β’ {chip}.tif (tissue image)β’ Barcode position files |
β’ Tissue boundary from imaging β’ Traditional image segmentation β’ Alias for image mode |
gene_expr |
Gene Expression Only | When you have NO imaging data | β’ FASTQ files β’ Barcode position files |
β’ Pure computational tissue detection β’ UMI-based segmentation β’ No image required |
π¨ CRITICAL WARNING: HE Image File Naming
The
_hesuffix is MANDATORY and required!
Your Chip Number β Correct Filename β Common Mistakes ST110001_A1ST110001_A1_he.tifST110001_A1.tif(missing_he)
| ST110001 | ST110001_he.png OR ST110001_HE.png | ST110001.png (missing _he/_HE) |
| ST110001 | ST110001_he.jpg OR ST110001_HE.jpg | ST110001he.tif (missing underscore) |
Note: Case-insensitive - both _he and _HE (and _He, _hE) are supported!
|
ST110001|ST110001_he.png|ST110001_HE.tif(wrong case) | |ST110001|ST110001_he.jpg|ST110001he.tif(missing underscore) |Pipeline behavior:
- With
_hesuffix β HE mode (gene expression + H&E registration)- Without
_hesuffix β ssDNA mode (looks for{chip}.tiffluorescent image)- Wrong case/format β ERROR: HE mode requires H&E staining image!
Important Notes:
- HE mode is the recommended method for most applications as it combines gene expression data with histological information
- The HE image filename must follow the pattern:
{chip_number}_he.(tif|png|jpg|jpeg) - For example, if
chip_number=ST110001_A1, the HE image should be namedST110001_A1_he.tif(or.png,.jpg)
These parameters allow fine-tuning of performance and resolution:
| Parameter | Description | Default | Valid Range | Recommendation |
|---|---|---|---|---|
--thread |
Number of CPU threads | Auto-detect | 1-128 | Use 80% of available cores e.g., --thread 64 on 80-core machine |
--bin |
Bin sizes in micrometers (comma-separated) | 50 |
10,20,50,100 |
β’ 10: Highest resolution (~single cell)β’ 20: High resolutionβ’ 50: Balanced (recommended)β’ 100: More genes, lower resolution |
--insertR2 |
Insert R2 fragment size (bp) | 150 |
50-300 | Match your sequencing read length |
--cell_num |
Expected number of cells/spots | 50000 |
1000-200000 | Adjust based on tissue size |
--pixelSize |
Pixel size in micrometers | 0.5 |
0.1-2.0 | Check your imaging system specs |
For users with shared data environments or custom storage layouts, you can now specify custom paths for reference genomes, images, and FASTQ files:
| Parameter | Description | Default | Use Case |
|---|---|---|---|
--reference_dir |
Custom reference genome directory | $WORKSPACE/reference |
Shared reference across multiple projects |
--mask_dir |
Custom mask/barcode directory | $WORKSPACE/ST_mask |
Centralized barcode storage |
--image_dir |
Custom image directory | $WORKSPACE/images |
Separate image storage location |
--fastq_dir |
Custom FASTQ directory | $WORKSPACE/fastq/$chemistry |
Avoid duplicate data copies, use original sequencing location |
--fastq_name |
Custom FASTQ filename prefix | chip_number |
When FASTQ files have different names than chip numbers |
Benefits:
- β Avoid data duplication: Point directly to original sequencing files
- β Shared resources: Multiple users can share reference genomes and images
- β Flexible organization: Adapt to existing storage infrastructure
- β Storage savings: No need to copy large files into workspace
The pipeline now supports 4 FASTQ naming formats with automatic detection:
| Priority | Format | Pattern | Example |
|---|---|---|---|
| 1 | Multi-lane (Recommended) | {chip}_S*_L*_R1_*.fastq.gz |
ST110001_A1_S1_L001_R1_001.fastq.gzST110001_A1_S1_L002_R1_001.fastq.gz |
| 2 | Multi-fold (Legacy) | {chip}_fold{1-5}_1.fq.gz |
ST110001_A1_fold1_1.fq.gzST110001_A1_fold2_1.fq.gz |
| 3 | Simple format | {chip}_1.fq.gz |
ST110001_A1_1.fq.gzST110001_A1_2.fq.gz |
| 4 | _R1/_R2 format | {chip}_R1.fq.gz |
ST110001_A1_R1.fq.gzST110001_A1_R2.fq.gz |
Note: The pipeline automatically detects and uses the first available format.
# Using H&E stained image for tissue analysis
# Required files: ST110001_A1_he.tif β οΈ MUST have "_he" suffix!, FASTQ files, barcode files
bash Celatlas.sh DEMO ST110001_A1 HE_test BBV2.4 Mus_musculus HE strnaπ¨ CRITICAL: HE Mode File Naming Rule
The HE image MUST have
_hesuffix before the file extension!β Correct:
ST110001_A1_he.tif(or.png,.jpg,.jpeg)
β Wrong:ST110001_A1.tifβ This will be treated as ssDNA mode!
β Also Correct:ST110001_A1_HE.tifβ Case-insensitive (both_heand_HEwork!)
β Wrong:ST110001_A1he.tifβ Missing underscore beforeheWhy this matters: Without the
_hesuffix, the pipeline cannot detect HE mode and will fail with an error.
# High-performance mode with 64 threads and single bin size
bash Celatlas.sh DEMO ST110001_A1 HE_test BBV2.4 Mus_musculus HE strna \
--thread 64 --bin 50# Generate multiple bin sizes for comparison (10Β΅m, 20Β΅m, 50Β΅m, 100Β΅m)
# Useful for exploring optimal spatial resolution
bash Celatlas.sh DEMO ST110001_A1 multi_res BBV2.4 Mus_musculus HE strna \
--thread 32 --bin 10,20,50,100# When you don't have tissue images
bash Celatlas.sh DEMO ST110001_A1 expr_only BBV2.4 Mus_musculus gene_expr strna# When using fluorescent ssDNA tissue imaging
# Required: ST110001_A1.tif (tissue image)
bash Celatlas.sh DEMO ST110001_A1 ssDNA_test BBV2.4 Homo_sapiens ssDNA strna \
--thread 48# Single-cell RNA-seq analysis (no spatial information)
bash Celatlas.sh DEMO ST110001_A1 scrna_test BBV0 Homo_sapiens gene_expr scrna# Complete parameter customization
bash Celatlas.sh DEMO ST110001_A1 custom_test BBV2.4 Homo_sapiens HE strna \
--thread 48 \
--bin 10,20,50,100 \
--insertR2 120 \
--cell_num 80000 \
--pixelSize 0.5# Recommended for automation and reproducibility
bash Celatlas.sh \
--sample DEMO \
--chip ST110001_A1 \
--casno batch_001 \
--chemistry BBV2.4 \
--species Mus_musculus \
--method HE \
--mode strna \
--thread 32 \
--bin 50# Ideal for production environments with centralized storage
bash Celatlas.sh \
--chip ST110001_A1 \
--casno shared_proj_001 \
--chemistry BBV2.4 \
--species Mus_musculus \
--method HE \
--mode strna \
--reference_dir /data/shared/reference \
--fastq_dir /data/sequencing/run001 \
--mask_dir /data/shared/masks \
--image_dir /data/shared/imagesWhy use custom directories?
- Avoid copying 100+ GB FASTQ files
- Share reference genomes across multiple projects
- Use original sequencing output location directly
# When FASTQ files have different naming than chip numbers
bash Celatlas.sh \
--chip ST110001_A1 \
--casno test_001 \
--chemistry BBV2.4 \
--species Mus_musculus \
--method HE \
--mode strna \
--fastq_name "Sample_ABC_XYZ"
# Pipeline will look for:
# Sample_ABC_XYZ_S1_L001_R1_001.fastq.gz (multi-lane)
# Sample_ABC_XYZ_fold1_1.fq.gz (multi-fold)
# Sample_ABC_XYZ_1.fq.gz (simple)
# Sample_ABC_XYZ_R1.fq.gz (R1/R2 format)Understanding the sample naming system is critical for file organization:
The pipeline uses a two-part naming system to distinguish between sample identity and technical replicate:
-
sample_name(Optional): Biological sample identifier- Example:
DEMO,Sample001,MouseBrain_A - Used for: Grouping related samples, metadata tracking
- Can be omitted (will show as "N/A" in reports)
- Example:
-
chip_number(Required): Chip/slide identifier (the actual data identifier)- Example:
ST110001_A1,ST110001,ST110001 - Used for: Finding FASTQ files, matching image files, creating output directories
- This is the primary identifier used throughout the pipeline
- Example:
All input files must use the chip_number as the base name:
| File Type | Naming Pattern | Example | Notes |
|---|---|---|---|
| FASTQ (Multi-lane) | {chip}_S*_L*_R1_*.fastq.gz{chip}_S*_L*_R2_*.fastq.gz |
ST110001_A1_S1_L001_R1_001.fastq.gzST110001_A1_S1_L002_R1_001.fastq.gz |
β
Recommended format Automatically detects all lanes |
| FASTQ (Multi-fold) | {chip}_fold1_1.fq.gz{chip}_fold2_1.fq.gz |
ST110001_A1_fold1_1.fq.gzST110001_A1_fold1_2.fq.gz |
Legacy format Supports fold1-fold5 |
| FASTQ (Simple) | {chip}_1.fq.gz{chip}_2.fq.gz |
ST110001_A1_1.fq.gzST110001_A1_2.fq.gz |
Simple format |
| FASTQ (_R1/_R2) | {chip}_R1.fq.gz{chip}_R2.fq.gz |
ST110001_A1_R1.fq.gzST110001_A1_R2.fq.gz |
Common alternative format |
| HE Image | {chip}_he.(tif|png|jpg|jpeg) |
ST110001_A1_he.tif |
Required for HE mode |
| Tissue Image | {chip}.tif |
ST110001_A1.tif |
Required for ssDNA mode |
| Barcode Position | {chip}.barcodeToPos.h5{chip}_FilterBarcodes.csv{chip}_tissue_bbox.csv |
ST110001_A1.barcodeToPos.h5 |
Required for spatial modes |
The most common error in HE mode is incorrect image file naming. Please note:
- Mandatory suffix: The HE image MUST have
_hebefore the file extension - Case sensitive: Must be lowercase
_he, not_HEor_He - With underscore: Must be
_he, nothe(missing underscore will fail) - No extra text: Must be
{chip}_he.ext, not{chip}_he_anything.ext
Examples:
- β
ST110001_A1_he.tifβ Correct - β
ST110001_he.pngβ Correct - β
ST110001_A1.tifβ Will be treated as ssDNA mode - β
ST110001_A1_HE.tifβ Case mismatch, will not be detected - β
ST110001_A1he.tifβ Missing underscore, will not be detected
The pipeline searches for FASTQ files in this order:
Priority 1: Multi-lane format (Recommended)
βββ Pattern: {chip}_S*_L*_R1_*.fastq.gz
βββ Example: ST110001_A1_S1_L001_R1_001.fastq.gz
β ST110001_A1_S1_L002_R1_001.fastq.gz
βββ Auto-detects and merges all lanes
Priority 2: Multi-fold format (Legacy)
βββ Pattern: {chip}_fold{1-5}_1.fq.gz
βββ Example: ST110001_A1_fold1_1.fq.gz
β ST110001_A1_fold2_1.fq.gz
βββ Supports up to 5 fold files
Priority 3: Single file format (Simple)
βββ Pattern: {chip}_1.fq.gz
βββ Example: ST110001_A1_1.fq.gz
βββ Single file pair
Priority 4: _R1/_R2 format
βββ Pattern: {chip}_R1.fq.gz / {chip}_R2.fq.gz
βββ Example: ST110001_A1_R1.fq.gz
β ST110001_A1_R2.fq.gz
βββ Common alternative naming convention
Custom FASTQ Directory & Name:
You can override the default FASTQ location and naming using --fastq_dir and --fastq_name parameters (see Example 9 and 10 above).
| Version | Sample Identifier | File Base Name | Example Command |
|---|---|---|---|
| v1.7.0 | chip_number (primary)sample_name (optional metadata) |
Uses chip_number |
Celatlas.sh DEMO ST110001_A1 ... |
| v1.6.x | sample only |
Uses sample |
Celatlas.sh Sample001 ... |
Key Change: In v1.7.0, the chip_number parameter is now the primary identifier for file matching, while sample_name is optional metadata. This allows better organization when multiple samples share the same chip.
Example 1: Standard HE Mode with Multi-lane FASTQ
# File structure:
# /mnt/strna/celatlas_spatial/
# βββ fastq/BBV2.4/
# β βββ ST110001_A1_S1_L001_R1_001.fastq.gz
# β βββ ST110001_A1_S1_L001_R2_001.fastq.gz
# β βββ ST110001_A1_S1_L002_R1_001.fastq.gz
# β βββ ST110001_A1_S1_L002_R2_001.fastq.gz
# βββ images/
# β βββ ST110001_A1_he.tif
# βββ ST_mask/
# βββ ST110001_A1.barcodeToPos.h5
# βββ ST110001_A1_FilterBarcodes.csv
# βββ ST110001_A1_tissue_bbox.csv
bash Celatlas.sh DEMO ST110001_A1 HE_test BBV2.4 Mus_musculus HE strnaπ¨ CRITICAL: HE Mode File Naming Rule
The HE image MUST have
_hesuffix before the file extension!β Correct:
ST110001_A1_he.tif(or.png,.jpg,.jpeg)
β Wrong:ST110001_A1.tifβ This will be treated as ssDNA mode!
β Also Correct:ST110001_A1_HE.tifβ Case-insensitive (both_heand_HEwork!)
β Wrong:ST110001_A1he.tifβ Missing underscore beforeheWhy this matters: Without the
_hesuffix, the pipeline cannot detect HE mode and will fail with an error.
Example 2: Gene Expression Mode with Simple FASTQ
# File structure:
# /mnt/strna/celatlas_spatial/
# βββ fastq/BBV2.4/
# β βββ ST110001_1.fq.gz
# β βββ ST110001_2.fq.gz
# βββ ST_mask/
# βββ ST110001.barcodeToPos.h5
# βββ ST110001_FilterBarcodes.csv
# βββ ST110001_tissue_bbox.csv
bash Celatlas.sh DEMO ST110001 test_001 BBV2.4 Homo_sapiens gene_expr strnaYou can customize the pipeline behavior using environment variables:
| Variable | Description | Default Value |
|---|---|---|
CELATLAS_WORKSPACE |
Main workspace directory | /your/workspace/path |
MAX_PARALLEL_FILES |
Maximum parallel file processing | 3 |
Option 1: Export environment variables
export CELATLAS_WORKSPACE="/home/user/my_workspace"
bash Celatlas.sh DEMO ST110001 CAS250801 BBV2.4 Mus_musculus image strnaOption 2: One-time usage
CELATLAS_WORKSPACE="/home/user/workspace" \
bash Celatlas.sh DEMO ST110001 CAS250801 strnaV2 human image strnaOption 3: Create a configuration script
# create_config.sh
#!/bin/bash
export CELATLAS_WORKSPACE="/your/workspace/path"
export MAX_PARALLEL_FILES=4
# Usage: source create_config.sh && Celatlas.sh ...The pipeline expects and creates the following directory structure:
$CELATLAS_WORKSPACE/
βββ binSegment/ # Segmentation results
βββ images/ # Input images
βββ fastq/ # FASTQ files organized by chemistry
β βββ BBV2.4/
β
βββ reference/ # Reference genomes
β βββ Homo_sapiens/
β βββ Mus_musculus/
| βββ others/
β
βββ src/ # Source files and models
βββ rawdata/ # Raw data per sample
βββ results/ # Analysis results
βββ {casno}/ # Organized by case number
βββ {sample}/ # Sample-specific results
The Celatlas.sh script runs the following analysis steps. The workflow varies depending on the selected method (HE, ssDNA, or gene_expr):
-
Sample Processing (
00.sample)- Sample validation and metadata collection
- Chemistry detection and verification
- System resource check
-
Barcode Extraction (
01.barcode)- Spatial barcode identification from R1 reads
- Whitelist-based barcode correction
- UMI extraction
- Performance: Multi-file parallel processing (configurable with
MAX_PARALLEL_FILES)
-
Adapter Trimming (
02.cutadapt)- Quality control and adapter removal from R2 reads
- NextSeq-specific quality trimming
- Insert size filtering based on
--insertR2parameter
-
Sequence Alignment (
03.star)- STAR alignment to reference genome
- Threading: Utilizes
--threadparameter for parallel alignment - Adaptive parameters based on mode (stricter for spatial, relaxed for scRNA)
-
Feature Counting (
04.featureCounts)- Gene expression quantification
- Counts reads per gene per barcode
- Threading: Multi-core counting enabled
-
UMI Counting (
05.count)- UMI deduplication
- Cell/spot calling using EmptyDrops algorithm
- Expected cell number based on
--cell_numparameter - Generates filtered and raw count matrices
The binSegment step varies significantly based on the selected method:
06.binSegment - Gene Expression + H&E Image Registration
This is the most advanced mode combining computational and visual tissue detection:
Step 6A: Gene Expression-based Tissue Detection
βββ Aggregate UMI counts into spatial grid (default: 20Β΅m bins)
βββ Apply UMI threshold filtering (default: β₯30 UMIs)
βββ Enhance signal using percentile normalization (p_low=5, p_high=95)
βββ Generate tissue mask from gene expression heatmap
βββ Create GEM (Gene Expression Matrix) tissue boundary
Step 6B: H&E Image Registration
βββ Load H&E stained histology image ({chip}_he.tif/png/jpg)
βββ Tissue segmentation using Swin-UNET deep learning model
βββ Extract tissue region from H&E image
βββ Register H&E tissue to GEM tissue boundary
β βββ Registration method: Affine transformation (default)
β βββ Alignment: SimpleITK with mutual information metric
β βββ Initial alignment: Contour-based initialization
βββ Generate aligned overlay visualization
Step 6C: Spatial Binning
βββ Generate bins at specified resolutions (--bin parameter)
β βββ bin10/ - 10Β΅m bins (~single cell resolution)
β βββ bin20/ - 20Β΅m bins (high resolution)
β βββ bin50/ - 50Β΅m bins (balanced, recommended)
β βββ bin100/ - 100Β΅m bins (more genes, lower resolution)
βββ Assign barcodes to spatial bins
βββ Aggregate UMI counts per bin
βββ Generate count matrices for each bin size
Key Features:
- Dual tissue detection: Uses both gene expression AND histology
- Robust registration: Aligns H&E image to gene expression coordinates
- Interactive visualization: HTML report shows HE-GEM overlay with toggle controls
- Quality metrics: Registration accuracy, tissue coverage statistics
Output Files (HE Mode):
06.binSegment/
βββ {sample}_tissue_HE.jpg # Segmented H&E tissue region
βββ {sample}_1_tissue.png # Gene expression tissue mask
βββ {sample}_2_he_registered.jpg # Registered H&E image
βββ {sample}_3_overlay_combined.jpg # HE + GEM overlay
βββ {sample}_3c_gem_heatmap_only.png # Pure GEM heatmap (for download)
βββ square_bin/
βββ {sample}_bin10/
β βββ filtered_feature_bc_matrix/
β βββ stat.txt
β βββ downsample.tsv
βββ {sample}_bin50/
βββ {sample}_bin100/
06.binSegment - Fluorescent Image Segmentation
Traditional image-based tissue detection using fluorescent ssDNA probes:
Step 6: Image Segmentation + Spatial Binning
βββ Load tissue image ({chip}.tif)
βββ Tissue segmentation using Swin-UNET model
βββ Extract tissue boundary
βββ Generate spatial bins (--bin parameter)
βββ Assign barcodes to bins
βββ Create count matrices per bin
Required: High-quality fluorescent tissue image ({chip}.tif)
06.binSegment - Pure Computational Segmentation
Tissue detection based solely on gene expression data:
Step 6: Gene Expression-based Segmentation
βββ Aggregate UMI counts into spatial grid (--gem-bin-size, default: 20Β΅m)
βββ Apply UMI threshold (--umi-min-threshold, default: β₯30)
βββ Enhance signal (--enhance-params)
βββ Detect tissue boundary from expression pattern
βββ Generate spatial bins (--bin parameter)
βββ Create count matrices per bin
Advantages: No imaging required, works with any spatial platform
-
Spatial Analysis (
07.analysis)- Dimensionality reduction (UMAP, t-SNE*)
- Clustering (Leiden algorithm)
- Marker gene identification
- Spatial expression patterns
- Performance:
- Uses
--threadfor parallel computation - Smart t-SNE handling: For high-resolution data (bin10, bin20), t-SNE is automatically skipped to prevent memory issues
- UMAP is always computed for all bin sizes
- Uses
-
Report Generation
- Interactive HTML report with Plotly visualizations
- HE Mode: Includes Image Alignment Viewer with HE-GEM overlay
- All Modes: QC metrics, spatial plots, clustering results
- Downloadable figures (PNG format)
The pipeline includes several performance enhancements in v1.7.0:
- Auto-threading: Automatically detects CPU cores if
--threadnot specified - BLAS thread limiting: Prevents segmentation faults on high-core servers (auto-limits to 16 threads)
- Parallel barcode extraction: Multiple FASTQ files processed concurrently
- Smart t-SNE skipping: High-resolution bins skip t-SNE to avoid memory issues
- Intermediate file cleanup: Optional automatic cleanup of intermediate files (controlled by
CLEAN_INTERMEDIATEvariable)
Thread Configuration Details:
# Auto-detected (recommended for most users)
bash Celatlas.sh ... HE strna # Uses all available cores
# Manual specification (for high-performance servers)
bash Celatlas.sh ... HE strna --thread 64
# Conservative mode (for shared servers)
bash Celatlas.sh ... HE strna --thread 16Internal Thread Management:
- Analysis threads: Set by
--threadparameter (e.g., 64) - BLAS threads: Auto-limited to 16 (prevents OpenBLAS segfault)
- Environment variables set:
OMP_NUM_THREADS: Analysis threadsOPENBLAS_NUM_THREADS: Limited to 16 (max)MKL_NUM_THREADS: Limited to 16 (max)NUMEXPR_NUM_THREADS: Analysis threads
The pipeline generates comprehensive outputs organized by case number and sample:
$CELATLAS_WORKSPACE/results/{casno}/{chip_number}/
βββ 00.sample/
β βββ stat.json # Sample metadata and chemistry info
βββ 01.barcode/
β βββ {chip}_1.fq.gz # Barcode-corrected R1
β βββ {chip}_2.fq.gz # Barcode-corrected R2
β βββ stat.json # Barcode extraction statistics
βββ 02.cutadapt/
β βββ {chip}_clean_2.fq.gz # Adapter-trimmed R2
β βββ stat.json # Trimming statistics
βββ 03.star/
β βββ {chip}_Aligned.sortedByCoord.out.bam
β βββ {chip}_Log.final.out # Alignment summary
β βββ stat.json # STAR statistics
βββ 04.featureCounts/
β βββ {chip}_nameSorted.bam # Name-sorted BAM for counting
β βββ {chip}_counts.txt # Gene-level counts
β βββ stat.json # Feature counting stats
βββ 05.count/
β βββ {chip}_count_detail.txt # Detailed UMI counts per barcode
β βββ {chip}_filtered_feature_bc_matrix/ # Filtered count matrix (cells)
β β βββ barcodes.tsv.gz
β β βββ features.tsv.gz
β β βββ matrix.mtx.gz
β βββ {chip}_raw_feature_bc_matrix/ # Raw count matrix (all barcodes)
β βββ stat.json # Cell calling statistics
βββ 06.binSegment/
β βββ {chip}_tissue_HE.jpg # [HE mode] Segmented H&E tissue
β βββ {chip}_1_tissue.png # [HE mode] Gene expression mask
β βββ {chip}_2_he_registered.jpg # [HE mode] Registered H&E
β βββ {chip}_3_overlay_combined.jpg # [HE mode] HE+GEM overlay
β βββ {chip}_3c_gem_heatmap_only.png # [HE mode] Pure GEM heatmap
β βββ square_bin/
β βββ {chip}_bin10/ # 10Β΅m resolution
β β βββ filtered_feature_bc_matrix/
β β βββ stat.txt # Bin-specific statistics
β β βββ downsample.tsv # Saturation curve data
β βββ {chip}_bin20/ # 20Β΅m resolution
β βββ {chip}_bin50/ # 50Β΅m resolution (recommended)
β βββ {chip}_bin100/ # 100Β΅m resolution
βββ 07.analysis/
β βββ {chip}_bin{size}_cluster.tsv # Cluster assignments
β βββ {chip}_bin{size}_umap.tsv # UMAP coordinates
β βββ {chip}_bin{size}_tsne.tsv # t-SNE coordinates (if generated)
β βββ {chip}_bin{size}_markers.csv # Marker genes per cluster
β βββ spatial_plots/ # Spatial visualization PNGs
βββ {chip}_spatial_analysis_report.html # β Main interactive report
βββ pipeline.log # Complete pipeline execution log
| File | Description | Use Case |
|---|---|---|
{chip}_spatial_analysis_report.html |
Main deliverable - Interactive HTML report with all QC metrics, plots, and visualizations | Share with collaborators, publication figures |
06.binSegment/square_bin/{chip}_bin50/filtered_feature_bc_matrix/ |
Count matrix for downstream analysis | Load into Seurat, Scanpy, or other tools |
06.binSegment/{chip}_3_overlay_combined.jpg |
[HE mode] HE-GEM alignment visualization | Verify registration quality |
07.analysis/{chip}_bin{size}_cluster.tsv |
Spatial cluster assignments | Downstream spatial analysis |
pipeline.log |
Complete execution log with timing and errors | Troubleshooting, performance analysis |
The interactive HTML report includes:
QC Metrics Section:
- Total reads, mapped reads, genes detected
- Sequencing saturation curve
- Barcode rank plot
- Median genes per square
Spatial Visualization:
- UMI distribution heatmap
- Gene expression overlay on tissue
- Cluster spatial distribution
Clustering Analysis:
- UMAP/t-SNE embeddings colored by cluster
- Top marker genes per cluster
- Cluster composition statistics
[HE Mode Only] Image Alignment Viewer:
- Interactive overlay of H&E and gene expression
- Toggle visibility of HE/GEM layers
- Opacity slider for blend control
- Download buttons for HE-only, GEM-only, or combined views
Individual pipeline components are also available as command-line tools:
# RNA analysis subcommands
celatlas_spatial rna sample --help
celatlas_spatial rna barcode --help
celatlas_spatial rna cutadapt --help
celatlas_spatial rna star --help
celatlas_spatial rna featureCounts --help
celatlas_spatial rna count --help
celatlas_spatial rna binSegment --help
celatlas_spatial rna analysis --helpAll software dependencies and system requirements are detailed in the Installation & Setup section above. Python packages are automatically installed with pip, but bioinformatics tools like STAR and SAMtools need to be installed separately as described in the Prerequisites section.
For issues and questions:
- Create an issue on GitHub
- Contact:
rd@celatlas.comorlqs60667106@gmail.com
This project is licensed under the MIT License - see the LICENSE file for details.
If you use Celatlas Spatial in your research, please cite:
[https://github.com/](https://github.com/celatlas/Celatlas_spatial/)