Abstract
Myelodysplastic syndrome (MDS) originates from hematopoietic stem cell (HSC) clones with acquired gene mutations. However, the molecular characteristics of MDS stem cells remain poorly understood. Here, we show that the chromatin accessibility profiles of MDS stem cells more accurately reflect disease status than those of progenitor cells and reveal the process of stem cell alterations during disease progression. Characterization of differentially accessible regions (DARs) shows that MDS stem cells acquire progenitor-like chromatin accessibility during disease progression, leading to disruption of the normal stem-progenitor hierarchy. Profiling of transcription factor-binding motifs at DARs further uncovers precocious activation of myeloid transcriptional networks in MDS stem cells, with a concurrent loss of HSC-associated regulatory programs. In particular, increased chromatin accessibility at CEBP target sites represents the myeloid reprogramming status of MDS stem cells. Newly developed “progenitor scores” based on chromatin accessibility stratify disease status and correlate well with prognosis. These findings indicate that chromatin landscapes of MDS stem cells define their cell-autonomous behavior and contribute to disease progression.
Similar content being viewed by others
Introduction
Myelodysplastic syndrome (MDS) is a cancer in which blood cells in the bone marrow (BM) do not mature efficiently and undergo apoptosis, resulting in cytopenia1. MDS originates from hematopoietic stem cell (HSC) clones with acquired gene mutations/deletions and frequently evolves into acute myeloid leukemia (AML) at high frequencies1,2,3,4,5. The onset of MDS is characterized by a diverse array of genetic mutations6,7,8,9,10,11, resulting in heterogeneity in the disease stage and prognosis, making treatment selection challenging. The International Prognostic Scoring System-Revised (IPSS-R) has been a global standard for patient risk stratification for decades. The IPSS-R relies on hematologic and cytogenetic features but does not consider gene mutations12,13. IPSS-Molecular [IPSS-M] was developed by combining genomic profiling with IPSS-R, improving the risk stratification of patients with MDS, and representing a valuable tool for clinical decision-making14,15. However, current risk assessments remain inadequate. A new approach that combines gene mutation with gene expression data, in addition to clinical parameters, improves prediction in MDS16. Therefore, there is a need to unravel the mechanisms underlying MDS pathogenesis, thereby developing new biomarkers for disease progression.
MDS and AML arise from a rare population of HSCs that maintain a stem-progenitor hierarchy. Immunophenotypically defined HSC architectures have been proposed as potential predictive biomarkers to guide second-line treatments in patients with MDS who fail DNA hypomethylating agents17. Leukemic stem cell (LSC)-specific gene expression patterns in AML serve prognostic predictions18,19. Moreover, comprehensive global profiling of AML stem and progenitor cells, including cis-regulatory element activity and interaction, transcription factor occupancy, and gene expression, has successfully identified transcription factor networks in AML, including the myeloid transcription factor CCAAT/enhancer-binding protein-α (C/EBPα)20. Myeloid differentiation to the granulocyte-macrophage progenitor (GMP) level is required for pre-leukemic HSCs to acquire an LSC phenotype. C/EBPα has been implicated in this process as a differentiation mediator, contributing to AML initiation but not its maintenance21,22. However, the transcription factor networks involving MDS stem/progenitor cells have not been extensively analyzed.
The epigenetic landscape plays a pivotal role in defining the disease status of MDS23. In particular, chromatin accessibility captures the integrated activity of enhancers, promoters, and transcription factors that govern hematopoiesis and cancer development24,25,26,27. In this study, we profiled the chromatin accessibility of BM stem and progenitor cells from patients with MDS and secondary AML arising from MDS. Our analysis revealed both precocious activation of the normal myeloid differentiation program and the emergence of an aberrant myeloid program unique to MDS. Furthermore, we developed a novel scoring system based on chromatin accessibility that stratifies disease status and correlates closely with prognosis.
Results
Chromatin accessibility in MDS stem cells represents disease status
To understand the transcriptional networks underlying MDS, we analyzed BM samples from 34 patients, including 13 with MDS with multilineage dysplasia (MDS-MLD), 15 with MDS with excess blasts (MDS-EB), 4 with AML with myelodysplasia-related changes (AML-MRC), 1 with idiopathic cytopenia of undetermined significance (ICUS), 1 with clonal cytopenia of undetermined significance (CCUS), and 4 healthy individuals (Fig. 1a, Supplementary Data 1)28,29. We sorted BM lineage marker-negative (Lin–) CD34+CD38– (Lin–CD34+CD38–) primitive cells (referred to here as stem cells) and Lin–CD34+CD38+ progenitor cells (referred to as progenitor cells), corresponding to the bottom 10% and top 30%, respectively, of Lin–CD34+-gated cells. These populations were subjected to ATAC-seq, RNA-seq, and targeted deep sequencing (Fig. 1b). In normal BM, Lin–CD34+CD38– primitive cells comprise HSCs, multipotent progenitors (MPPs), and multilymphoid progenitors (MLPs), while Lin–CD34+CD38+ progenitor cells include common myeloid progenitors (CMPs), granulocyte-macrophage progenitors (GMPs), megakaryocyte-erythroid progenitors (MEPs), and common lymphoid progenitors (CLPs) (Fig. 1b). Intriguingly, CD34+CD38–/low primitive MDS cells were positive for CD90, a marker typically absent in normal MPPs and MLPs. As a result, although a population consistent with long-term HSCs (CD34+CD38–/low CD90+CD45RA–CD49f+) was detectable in MDS, immunophenotypically compatible MPP and MLP populations were absent, indicating that MDS cells exhibit altered surface marker expression compared to their normal counterparts. Therefore, it was not feasible to analyze MPP and MLP counterparts in MDS using standard immunophenotypic definitions (Supplementary Fig. 1). Instead, we compared CD34+CD38–/low primitive cells (hereafter referred to as MDS stem cells) with CD34+CD38+ MDS progenitor cells. The major genetic mutations detected by targeted sequencing were ASXL1, RUNX1, and TP53 mutations in seven patients each (21.9%); DNMT3A mutations in five patients (15.6%); TET2 mutations in four patients (12.5%); and EZH2 and SRSF2 mutations in three patients each (9.4%) (Supplementary Data 2).
a Patient list. b Sorting gates. BM CD34+CD38– stem and CD34+CD38+ progenitor cells, corresponding to the bottom 10% and top 30%, respectively, of Lin–CD34+-gated cells, respectively, were sorted. c UMAP plots of the overall chromatin accessibility profiles of the stem and/or progenitor cells of each disease sample. d Venn diagrams showing overlaps of open and closed DARs (q < 0.05) in diseased stem and progenitor cells compared with their normal counterparts. e Number of open and closed DARs between stem and progenitor cells of normal controls (n = 4), MDS-MLD (n = 13), MDS-EB (n = 15), and AML-MRC (n = 4) (left). DARs between stem and progenitor cells in MDS-MLD and MDS-EB were also calculated using randomly selected four samples, and the calculation was repeated 40 times. The data are shown in box plots (right). The box plot represents the interquartile range (midline indicates the median) with whiskers (10th and 90th percentiles). + indicates mean value. Statistical significance was determined by two-sided Tukey-Kramer multiple comparison test for post-hoc analysis. Source data are provided as a Source Data file.
We first analyzed transcriptome data and defined differentially expressed genes (DEGs) (q < 0.05) in MDS and AML-MRC cells compared with their normal counterparts. The DEGs of the disease groups largely overlapped, among which AML-MRC had larger numbers of DEGs downregulated in both stem and progenitor cells (Supplementary Fig. 2a). Gene set enrichment analysis revealed that inflammatory response gene signatures, including interferon response and TNFα signaling, are activated in MDS, particularly in AML-MRC in both stem and progenitor cells (Supplementary Fig. 2b). These data are consistent with observations by other groups30,31,32.
We then analyzed chromatin accessibility. The catalog of all ATAC peaks in any sample was produced by merging all the called peaks that overlapped by at least one base pair. The read numbers within the merged regions for each sample were then counted and normalized (Supplementary Fig. 3a). First, we compared the overall chromatin accessibility profiles of each disease sample with those of the normal controls. Uniform manifold approximation and projection (UMAP) analysis of the chromatin accessibility profile of each sample revealed that normal control/ICUS/CCUS and AML stem cells had clearly distinct profiles, whereas MDS stem cells with heterogeneous profiles were mapped around or between them (Fig. 1c). In contrast, the normal and diseased progenitors showed similar profiles. When stem and progenitor cells were combined in the UMAP plots, it was clear that AML stem cells exhibited profiles similar to those of progenitors. Hierarchical clustering with a correlation matrix also revealed a better separation between normal controls, MDS, and AML cells in the stem fraction than in the progenitor fraction (Supplementary Fig. 3b). These results indicate that chromatin accessibility in stem cells represents the disease status better than that in progenitor cells.
Next, we defined differentially accessible regions (DARs) (q < 0.05) in MDS and AML-MRC cells compared with their normal counterparts (Supplementary Data 3). The DARs of the MDS and AML-MRC groups largely overlapped and the number of DARs increased with disease progression in both stem and progenitor cells (Fig. 1d). DARs were mostly located in the intergenic and intronic regions and promoters (Supplementary Fig. 4a). DARs in stem and progenitor cells slightly overlapped at the MDS stage, but to a larger extent at the AML-MRC stage (Supplementary Fig. 4b). When MDS and AML-MRC progenitor cells were compared with their stem cells, the number of DARs gradually decreased with disease progression, and significant DARs were hardly detected between AML-MRC stem and progenitor cells (Fig. 1e and Supplementary Fig. 4c). These data clearly indicate that MDS gradually loses its stem-progenitor hierarchy with disease progression.
Chromatin accessibility reveals alterations in transcriptional networks in MDS
Before characterizing the transcriptional networks in MDS, we profiled DARs in normal CD34+CD38+ progenitor cells compared with those in CD34+CD38– HSCs (Fig. 2a), which were mostly located in the intergenic and intronic regions and promoters (Fig. 2b). Enrichment analysis of transcription factor binding motifs at DARs revealed that DARs with HOX and AP-1 family binding motifs had decreased accessibility during differentiation, whereas those with binding sites for CEBP myeloid transcription factors, GATA erythroid transcription factor, and lymphoid transcription factors, such as EBF, E2A, and TCF, had increased accessibility (Fig. 2c).
a Volcano plots showing DARs between normal CD34+CD38– HSCs and CD34+CD38+ progenitor cells (Open/Close in normal progenitor cells compared to normal stem cells). b Distribution of open and closed DARs in normal progenitor cells compared to normal stem cells. TSS, transcription start site; TTS, transcription termination site; UTR, untranslated region. c Transcription factor-binding motifs enriched in open and closed DARs in normal progenitor cells compared to normal stem cells. d List of strictly defined Lin–CD34+CD38– HSCs (HSCs, MPPs, and MLPs) and Lin–CD34+CD38+ progenitor cells (CMPs, GMPs, MEPs, and CLPs). e UMAP plots of normal HSPCs combined with MDS stem and progenitor pairs. f Heat map showing the enrichment of transcription factor binding motifs at the top 15% ATAC peaks of each HSPC fraction. The z-scores of the -logP values are shown as motif-enrichment scores. Each transcription factor was assigned to one of the HSPC fractions in which binding motif enrichment showed the highest scores and was at least 1.5-fold higher than the average of the other HSPC fractions. Source data are provided as a Source Data file.
To profile transcriptional network alterations during normal differentiation more in detail, we acquired chromatin accessibility profiles of more strictly defined Lin–CD34+CD38– HSCs [HSCs, multipotent progenitors (MPPs), and multilymphoid progenitors (MLPs)] and Lin–CD34+CD38+ progenitor cells [common myeloid progenitors (CMPs), granulocyte-macrophage progenitors (GMPs), megakaryocyte-erythroid progenitors (MEPs), and common lymphoid progenitors (CLPs)] (Fig. 2d, Supplementary Fig. 5a). UMAP plots of normal hematopoietic stem and progenitor cells (HSPCs) combined with disease stem and progenitor pairs revealed that most MDS pairs had altered profiles, largely tracing the differentiation paths of normal HSPCs (Fig. 2e). We next performed motif enrichment analysis using the top 15% ATAC peaks of each fraction (Fig. 2f). To understand the cell-type specificity of each transcription factor, each such factor was assigned to one of the HSPC fractions in which its binding motif enrichment showed the highest scores and was at least 1.5-fold higher than the average of the other HSPC fractions (Fig. 2f). Motifs related to lymphoid differentiation, including EBF, E2A, and TCF, were specifically enriched in CLPs, whereas the CEBP and GATA families were enriched in GMP and MEP, respectively. In contrast, AP-1 and HOX families were enriched in HSCs and MLPs, respectively. These results clearly demonstrate dynamic alterations in chromatin accessibility at the target binding sites of each transcription factor during normal adult BM hematopoietic differentiation, similar to cord blood hematopoietic differentiation25. The cell-type specificity of each transcription factor was applied to Fig. 2c and subsequent analyses. Transcription factor motif analysis was performed using the average of two normal samples. When analyzed individually, most transcription factor motifs showed consistent trends between the two samples; however, the AP-1 family motifs were not enriched in HSCs and MPPs in one of the two samples (Supplementary Fig. 5b). The reason for this discrepancy remains unclear. Nevertheless, in a separate analysis including four normal samples, AP-1 family motifs were significantly enriched in the stem cell fraction (including HSCs, MPPs and MLPs) compared to progenitor cells (Supplementary Fig. 7b). These findings suggest that chromatin regions containing the AP-1 family motifs are generally accessible in normal stem cell populations.
Next, we analyzed the changes in transcription factor networks in MDS and AML-MRC by focusing on transcription factors whose binding motifs were significantly enriched in open or closed DARs in MDS and AML-MRC stem and progenitor cells compared to their normal counterparts (Supplementary Fig. 6). A heatmap of motif enrichment scores was generated using the top 15% ATAC peaks for each disease fraction (Fig. 3a, Supplementary Fig. 7a). In MDS stem cells, chromatin accessibility at HOX, AP-1, and GATA target sites, which is high in stem cells compared to progenitor cells in normal controls, decreased. In MDS progenitor cells, chromatin accessibility at lymphoid transcription factor target sites, which is high in control progenitor cells, decreased. In contrast, chromatin accessibility at CEBP (myeloid) and KLF family (erythroid) transcription factor target sites increased in both MDS stem and progenitor cells compared to normal controls. These results indicate that some HSC and lymphoid-specific transcriptional networks were inactivated, whereas myeloid transcriptional networks were moderately activated in MDS stem cells, suggesting that precocious myeloid programming occurs in MDS stem cells.
a Heat map showing the enrichment of transcription factor binding motifs at the top 15% ATAC peaks of each disease stem and progenitor fraction. Z-scores of -logP values are shown as motif enrichment scores. Transcription factor-binding motifs whose -logP value of motif enrichment was higher than 30 at the DARs of any disease subtype, as shown in Supplementary Fig. 6, were selected. The HSPC fractions in which the binding motif of each transcription factor was most enriched in Fig. 2f are indicated at the bottom. b Pearson correlation coefficients between motif enrichment scores and gene expression of the corresponding transcription factors in MDS stem (left) and progenitor (right) samples. Motif enrichment of the transcription factor-binding motifs was calculated by averaging the ATAC peak values of MDS DARs (DARs between MDS-MLD or MDS-EB and normal samples) with the corresponding binding motifs. c Correlation between the enrichment of transcription factor-binding motifs at DARs and the expression of corresponding transcription factor genes during disease progression. d Transcription factor networks of MDS stem (left) and progenitor cells (right) constructed by the GeneNet method of GeNeCK using the motif enrichment score of each sample indicated in Fig. 3b and visualized using Cytoscape. The color and width of the edge indicate the partial correlation coefficient, reflecting the strength of the linear relationship between two transcription factor motifs, independent of other motifs. The size of the oval node represents the maximum enrichment score in the disease group. As the AP-1 family motifs showed high enrichment scores at closed DARs, they are depicted as relatively large blue ovals. Motifs were selected based on the criteria described in Fig. 3a for MDS stem and progenitor cells. Source data are provided as a Source Data file.
For a more quantitative evaluation of motif enrichment of transcription factors, we calculated the motif enrichment of each transcription factor in individual samples by taking the average of ATAC peak values of MDS DARs (DARs between MDS-MLD or MDS-EB and normal samples) with corresponding binding motifs. We then calculated the Pearson correlation coefficients between motif enrichment and gene expression for each transcription factor (Fig. 3b). Enrichment of the CEBPA and AP-1 motifs significantly correlated with their expression levels in MDS stem cells, whereas enrichment of the HOXA9 motif did not (Fig. 3b, c). Next, we determined transcription factor networks based on the correlation of the motif enrichment scores of the transcription factors in each sample (Fig. 3d). Several distinctive clusters were identified in MDS stem cells, including the AP-1 and homeobox families (decreased chromatin accessibility) and the CEBP and KLF/SP families (increased chromatin accessibility). Interestingly, chromatin accessibility tended to change in a manner similar to that of normal progenitor cells, confirming the acquisition of progenitor phenotypes by MDS stem cells. In contrast, many transcription factors, including lymphoid-related transcription factors, reduced chromatin accessibility in MDS progenitor cells, although they did not form clear clusters. Only the KLF/SP family was identified as a cluster that increased chromatin accessibility.
Chromatin accessibility reveals transcriptional networks associated with MDS prognosis
Next, we evaluated the association between MDS chromatin accessibility profiles and prognosis. IPSS-M is a scoring system that stratifies patient risk into six groups by combining somatic gene mutation data with hematological and cytogenetic parameters14,15. We calculated the Pearson correlation coefficients between the IPSS-M scores and individual ATAC peak values (Fig. 4a). We identified 3405 and 2980 positively and 3117 and 1961 negatively correlated peaks in MDS stem and progenitor cells, respectively (q < 0.05) (Fig. 4a), and performed motif analysis (Fig. 4b). In MDS stem cells, CEBP/AP-1 and HOX family motifs were significantly enriched in the positively and negatively correlated peaks, respectively. In progenitor cells, the ETS family and lymphoid transcription factor motifs were enriched in the positively and negatively correlated peaks, respectively. These results suggest that these transcription factors are significantly associated with MDS prognosis. Among the IPSS-M parameters, the percentage of blasts showed a strong positive correlation with the enrichment of AP-1 family binding motifs in both MDS stem and progenitor cells (Fig. 4c, d). Other hematological parameters and karyotype scores also showed differential associations with the enrichment of transcription factor-binding motifs (Supplementary Fig. 8).
a Correlation between IPSS-M scores and individual ATAC peak values. Pearson’s correlation coefficients (r) were also calculated (left panel) with biological replicates of MDS samples (n = 28), and two-sided P values were obtained using the t-distribution with (n − 2) degrees of freedom under the null hypothesis of zero correlation. Violin plot showing the distribution of significantly positively or negatively correlated peaks (p < 0.05) in MDS stem and progenitor cells (right panel). The box plot represents the interquartile range (midline indicates the median) with whiskers (10th and 90th percentiles). + indicates mean. b Enrichment of transcription factor-binding motifs at the selected ATAC peaks in (a) showing significant positive and negative correlations with IPSS-M in MDS stem and progenitor cells. c Correlation between blast percentage and individual ATAC peak values as in (a) with biological replicates of MDS samples (n = 28). Violin plot showing the distribution of significantly positively or negatively correlated peaks (p < 0.05) in MDS stem and progenitor cells (right panel). The box plot represents the interquartile range (midline indicates the median) with whiskers (10th and 90th percentiles). + indicates mean. d Enrichment of transcription factor-binding motifs at the selected ATAC peaks in (c) showing significant positive and negative correlations with blast percentage in MDS stem and progenitor cells, as in (b). Source data are provided as a Source Data file.
Alterations in DARs unique to MDS stem and progenitor cells
To investigate the characteristics of MDS DARs in detail, we compared their peak values. This analysis revealed that more than half of the MDS DARs fluctuated between stem and progenitor states and could be grouped into several characteristic clusters (Fig. 5a). In clusters 1-6, DARs exhibited dynamic switching between stem and progenitor states in normal controls (Fig. 5b). In MDS progenitor cells, clusters 1-3 followed normal progenitor chromatin profile, with MDS stem cells exhibiting a shift toward this state. Conversely, in clusters 4-6, MDS stem cells retained a normal stem-like state, while MDS progenitors partially reverted to this stem-like profile (Fig. 5b). While clusters 1, 2, and 3 became open and closed during normal differentiation, they became precociously open and closed in MDS stem cells (Fig. 5b). In contrast, clusters 4, 5, and 6 resisted chromatin reorganization between stem and progenitor cells and maintained stem-like chromatin accessibility in MDS progenitor cells (Fig. 5b). Motif analysis revealed that in MDS stem cells, the CEBP motif was highly enriched in cluster 1, which opened precociously in MDS stem cells, whereas the AP-1 and HOX family motifs were enriched in clusters 2 and 3, which closed precociously in MDS stem cells (Fig. 5c). In the MDS progenitor cells, GATA motifs were enriched in cluster 4, whereas lymphoid transcription factor motifs were enriched in clusters 5 and 6 (Fig. 5c). These results indicate bidirectional changes of MDS stem and progenitor cells toward progenitor and stem cells, respectively. Interestingly, clusters 7–9 and 10, which did not significantly change their chromatin accessibility during normal differentiation, aberrantly increased and decreased chromatin accessibility in both MDS stem and progenitor cells (Fig. 5d), representing MDS-specific chromatin reorganization that is not associated with normal differentiation. The CEBP and KLF motifs were enriched in clusters 7–9, which increased their accessibility in MDS stem and progenitor cells (Fig. 5e). These results revealed transcription factor networks unique to MDS stem and progenitor cells.
a K-means clustering of the peak values of the MDS DARs. Characteristic peak value changes in clusters 1-10 are boxed (increased and decreased peak values in the red and blue boxes, respectively). b, d Schematic representation of peak signal changes in clusters 1-10. c, e Enrichment of transcription factor-binding motifs at DARs in clusters 1-10. Source data are provided as a Source Data file.
Progenitor score is associated with prognosis
The data thus far demonstrate that the transcriptional networks in MDS stem and progenitor cells are gradually reprogrammed during disease progression, which may determine disease status. To define the reprogramming status of MDS stem cells toward progenitor cells based on chromatin accessibility, we developed a scoring system called the progenitor score (Fig. 6a). We selected MDS DARs that were differentially accessible during normal differentiation. We arbitrarily set the peak values of each DAR in the normal stem to 0 and the normal progenitor to 1. We then adjusted the peak values of each DAR in the disease samples to the 0–1 scale relative to those in their normal counterparts and took the average of all DARs in the disease stem and progenitor cells. According to our definition, the progenitor scores of normal stem and progenitor cells were calculated as 0 and 1, respectively (Fig. 6b). The progenitor scores in MDS stem cells gradually increased with disease progression (Fig. 6b), indicating the acquisition of progenitor phenotypes by MDS stem cells. These data clearly show that the stem-progenitor hierarchy is gradually lost during disease progression.
a Definition of progenitor scores. MDS DARs, which are also differentially accessible during normal differentiation, were selected to define progenitor scores. DARs were divided into four subgroups depending on chromatin accessibility changes in the normal control, MDS stem, and progenitor cells, as shown schematically, and their numbers are indicated. The peak values of each DAR in normal stem and progenitor cells were arbitrarily set to 0 and 1, respectively. The peak values of each DAR in diseased samples were adjusted to the 0–1 scale relative to those in their normal counterparts, and the average of all DARs in disease stem and progenitor cells were taken as progenitor scores. b Violin plots showing progenitor scores of normal and disease stem and progenitor cells with biological replicates of normal (n = 4), ICUS/MDS-MLD (n = 15), MDS-EB (n = 15) and AML-MRC samples (n = 4). The box plot represents the interquartile range (midline indicates the median) with whiskers (10th and 90th percentiles). + indicates mean. Statistical significance was determined by the paired, two-tailed Student’s t-test. (c, d) Correlation of Progenitor scores with IPSS-M scores in MDS stem (c) and progenitor cells (d). e Progenitor score differences between progenitor and stem cells. The fitted least-squares linear regression (y ~ x) is shown as a solid line, with the 95% confidence band shaded (c, d, e). f Correlation of progenitor scores with survival. The cutoff values of IPSS-M scores and progenitor score differences to analyze their correlation with survival are indicated in (e). Kaplan-Meier survival curves of the IPSS-M high and low MDS groups (left), and progenitor score high and low MDS groups (right). Statistical significance was calculated using the log-rank test. g Evaluation of prognostic models in survival analysis. Concordance indexes (C-indexes) obtained with IPSS-M, progenitor score differences, and the combination of IPSS-M and progenitor score differences are depicted. Source data are provided as a Source Data file.
We examined the correlation between progenitor and IPSS-M scores. The progenitor scores of stem cells positively correlated with the IPSS-M scores and showed significantly better correlations than those of progenitor cells (Fig. 6c, d). Among the DARs used for progenitor score calculation, 887 DARs that became open in normal progenitor cells and MDS stem cells compared with normal stem cells showed the strongest positive correlation with IPSS-M in MDS stem cells (Supplementary Fig. 9a). Among these 887 DARs, 192 had a CEBP-binding motif and showed a strong positive correlation with IPSS-M in MDS stem cells (Supplementary Fig. 9b), confirming the correlation of increased accessibility at CEBP target sites in MDS stem cells with IPSS-M. Moreover, progenitor score differences between progenitor and stem cells (progenitor minus stem cells) showed better correlation with IPSS-M than progenitor scores in MDS stem cells (Fig. 6e), indicating that the level of stem-progenitor hierarchy is well correlated with prognosis. We then calculated progenitor scores based on RNA-seq data using the same framework, with normalized gene expression counts substituted for normalized peak counts. In contrast to ATAC-seq data, progenitor scores calculated using RNA-seq data did not show a significant correlation with IPSS-M scores (Supplementary Fig. 9c). The genomic distribution of the DARs used to calculate the Progenitor score closely resembled that of the corresponding MDS DARs (Supplementary Fig. 9d). It is known that chromatin accessibility at promoter-TSS regions tends to correlate more strongly with gene expression levels than accessibility at other genomic regions. However, the majority of DARs were located in non-TSS regions. Therefore, ATAC-seq data captures chromatin dynamics beyond promoter regions, which may explain why chromatin accessibility correlates more strongly with disease severity than transcriptomic data.
Next, we examined whether the progenitor scores were associated with patient survival. We arbitrarily set the cut-off IPSS-M scores and progenitor score differences (Fig. 6e) and analyzed survival. As previously reported, the survival rates of the high and low IPSS-M groups differed significantly (Fig. 6f, left panel). Notably, the low progenitor score difference group also showed significantly inferior survival compared to the high progenitor score difference group (Fig. 6f, right panel). To evaluate the correlation between actual survival times and the predictions by progenitor score differences, we calculated the Concordance index (C-index), a commonly used metric in survival analysis for evaluating the performance of a prediction mode33,34. Notably, C-index of the progenitor score difference was moderately higher than that of IPSS-M; however, combining IPSS-M with the progenitor score difference did not result in an additive improvement in the C-index (Fig. 6g). These results indicate that differences in progenitor scores between stem and progenitor cells are strongly associated with patient prognosis.
Among the IPSS-M parameters, progenitor scores were significantly correlated with karyotypes in stem cells and blast percentages in progenitor cells. As a result, progenitor score differences were also significantly correlated with both karyotype and blast percentage (Supplementary Fig. 9e, f). In contrast, progenitor scores did not show significant correlations with the other parameters (Supplementary Fig. 9e). These results suggest that the progenitor score serves as an independent risk stratification factor for MDS progression.
Because it was challenging to directly compare MDS samples with normal samples across well-defined subpopulations of HSPCs, we instead evaluated the correlation of ATAC peak profiles between MDS stem and progenitor fractions and normal HSPC subsets, including HSCs, MPPs, MLPs, CMPs, GMPs, MEPs, and CLPs. We first extracted ATAC peaks strongly associated with the progenitor score. Using these peaks, we trained a machine learning model to predict the progenitor score and selected the peaks that highly contributed to the prediction (Supplementary Fig. 10a). These progenitor score-associated peaks were then used to calculate correlation coefficients between MDS and AML-MRC samples and normal HSPC fractions, yielding an HSPC signature score of each sample (Supplementary Fig. 10a). In the MDS stem cell fraction, GMP-, MEP-, and CLP-like signatures were positively correlated with the progenitor score, whereas HSC- and MPP-like signatures showed negative correlations (Supplementary Fig. 10b). MLP- and CMP-like signatures did not show a clear relationship with the progenitor score. In contrast, no clear correlation was observed between any HSPC signature and progenitor score in the MDS and AML-MRC progenitor cell fraction (Supplementary Fig. 10b). These findings suggest that MDS stem cells acquire progenitor-like chromatin accessibility patterns at the expense of HSC and MPP programs.
To compare AML-MRC and de novo AML in terms of loss of the stem-progenitor hierarchy, we newly obtained five de novo AML samples and performed ATAC-seq. UMAP analysis of chromatin accessibility profiles revealed that de novo AML display highly similar profiles to AML-MRC, with minimal differences between stem and progenitor fractions compared to their normal counterparts (Supplementary Fig. 11a, b). Although the number of DARs in de novo AML was greater than in AML-MRC, DARs identified in AML-MRC and de novo AML overlapped substantially (Supplementary Fig. 11c). Moreover, when comparing stem and progenitor fractions within de novo AML, only a few significant DARs were detected, similar to findings in AML-MRC (Supplementary Fig. 11d). Consistently, the progenitor scores of de novo AML samples were nearly identical to those of AML-MRC (Supplementary Fig. 11e). These results suggest that although de novo AML undergoes more extensive chromatin remodeling during transformation, both de novo AML and AML-MRC similarly lose the stem-progenitor chromatin hierarchy and acquire a shared set of DARs.
CEBP family transcription factors mediate myeloid reprogramming of MDS stem cells
The CEBP family of transcription factors is a key regulator of myeloid differentiation. Among the CEBP factors, the expression of CEBPA increased moderately in MDS and was high in AML-MRC in both stem and progenitor cells (Fig. 3b, c). To investigate the role of CEBP transcription factors in MDS, we tested CEBP inhibition by expressing a dominant-negative CEBP (DN-CEBP)35,36, which antagonizes all CEBP members, in the MDS cell line MDS-L37,38, and performed ATAC-seq (Fig. 7a). Among the 129 DARs that became closed (q < 0.05, FC > 2.0) following DN-CEBP expression, 85 DARs contained CEBP-binding motifs (Fig. 7b). Consistently, CEBP inhibition markedly reduced chromatin accessibility at ATAC peaks harboring CEBP motifs and at DARs with CEBP motifs that were specifically open in MDS stem cells relative to normal stem cells (Fig. 7c). These results support the notion that activation of the myeloid transcriptional network in MDS stem cells is mediated, at least in part, by CEBP family transcription factors.
a CEBP inhibition by expressing dominant-negative CEBP (DN-CEBP) in the MDS cell line MDS-L. b Volcano plots showing DARs (q < 0.05, FC > 2.0) in MDS-L cells expressing DN-CEBP (n = 3 biological replicates) compared with parental cells (n = 3 biological replicates). DARs with CEBP motifs are shown in magenta. c Changes in ATAC peak signal levels in MDS-L cells by CEBP inhibition at all ATAC peaks with CEBP motifs and at DARs with CEBP motifs that become open in MDS stem cells compared with normal stem cells. The box plot represents the interquartile range (midline indicates the median) with whiskers (10th and 90th percentiles). + indicates mean value. Statistical significance was determined by the paired, two-tailed Student’s t-test. Peak signals for the control and CEBP-DN were calculated as the mean of biological replicates (n = 3). d Clustering of 889 DARs with CEBP motifs that become open in MDS/AML stem cells using non-negative matrix factorization (NMF). The DARs were subdivided into three clusters (DAR1-3). MDS was stratified into three subgroups (MDS1-3) based on the chromatin accessibility profiles at DAR1-3. e Gene ontology terms significantly enriched in DAR 1-3. f Correlation of DAR1-3 with IPSS-M in MDS stem cells (n = 28 biological replicates). g Correlation of MDS subgroups 1-3 with progenitor scores in MDS stem cells (n = 28 biological replicates). The box plot represents the interquartile range (midline indicates the median) with whiskers (10th and 90th percentiles). + indicates mean value. Statistical significance was evaluated using the Tukey–Kramer post hoc test (two-sided) (f, g). The p values of the Jonckheere-Terpstra trend test are also shown. h Schematic representation of chromatin accessibility changes at the target sites of major transcription factors in MDS stem and progenitor cells. Myeloid transcriptional networks are moderately activated in MDS stem cells, while some HSC-specific transcriptional networks are inactivated in MDS stem cells, suggesting aberrant myeloid programming in MDS stem cells. Source data are provided as a Source Data file.
In RNA-seq analysis, non-negative Matrix Factorization (NMF) is commonly used to simultaneously identify latent sample clusters and extract underlying gene expression patterns by factorizing the gene expression matrix (genes × samples) into two non-negative matrices (W (genes × components k) and H (components k × samples)). We applied NMF to 889 DARs with CEBP-binding motifs that became accessible in MDS stem cells (Supplementary Fig. 12a). This analysis identified three distinct clusters of DARs, designated DAR1–3 (Fig. 7d, Supplementary Fig. 12b, c). Notably, most DAR1 and DAR3 corresponded to cluster 8 and cluster 1, respectively, as identified in Fig. 5a (Supplementary Fig. 12d). DAR2- and DAR3-related genes were significantly associated with gene ontology (GO) terms related to development/differentiation (DAR2) and leukocyte activation, migration, and immune processes (DAR2 and DAR3) (Fig. 7e). Moreover, DAR2 and DAR3 showed significantly stronger associations with IPSS-M scores than DAR1 (Fig. 7f). Based on the chromatin accessibility profiles of these CEBP-associated DARs, MDS samples were stratified into three subgroups (Fig. 7d and Supplementary Fig. 12e). Importantly, subgroup 3 exhibited the highest progenitor scores in MDS stem cells (Fig. 7g), suggesting a more advanced state of myeloid reprogramming. These findings indicate that chromatin accessibility at CEBP-target DARs can stratify MDS into subgroups with distinct transcriptional states and disease progression stages.
When MDS-L samples were overlaid onto the PCA plot with other samples, they mapped separately from both normal HSPCs and MDS stem/progenitor cells but positioned relatively closer to normal progenitor cells than to stem cells. Notably, overexpression of DN-CEBP shifted the chromatin accessibility profile of MDS-L cells further toward that of normal HSPCs (Supplementary Fig. 12f). We next investigated the chromatin regions that became less accessible upon DN-CEBP expression. Two-thirds of the closed DARs contained a CEBP binding motif, the majority of which were classified as CEBP DARs 2 and 3 (Supplementary Fig. 12g). Consistently, the peak signals at DARs 2 and 3 were significantly reduced in MDS-L cells upon DN-CEBP overexpression (Supplementary Fig. 12h). These results suggest that DN-CEBP overexpression reprograms the chromatin accessibility landscape of MDS-L cells to more closely resemble that of normal progenitor cells, primarily by attenuating chromatin accessibility at CEBP DARs 2 and 3.
We then analyzed three representative GO term-associated gene sets extracted from CEBP DARs 2 and 3 (Supplementary Fig. 12i). Gene set enrichment was assessed in each sample using gene set variation analysis (GSVA) based on the ATAC signal profiles. The resulting GSVA enrichment scores showed a significant positive correlation with both the IPSS-M score and the progenitor score (Supplementary Fig. 12i). In contrast, when GSVA was performed using transcriptomic data for genes associated with the same ATAC peaks, the correlations with IPSS-M and progenitor scores were relatively weaker (Supplementary Fig. 12j). These findings support that the ATAC-based signature extracted from CEBP DARs is more strongly correlated with disease prognosis than the corresponding gene expression-based signature.
In addition to CEBP motifs, AP-1 family motifs were significantly enriched in ATAC peaks correlated with IPSS-M, although their enrichment in MDS stem cells was weaker than that of CEBP motifs (Fig. 4b). Notably, AP-1 motif accessibility showed the strongest correlation with blast percentage (Fig. 4d). Among the AP-1 target peaks, a subset of 276 peaks was significantly associated with blast percentage. Interestingly, most of these peaks exhibited only moderate changes in chromatin accessibility below the threshold used to define DARs (Supplementary Fig. 13). This suggests that these regions may resist overt chromatin remodeling in MDS, yet remain in a primed state susceptible to reprogramming during leukemic transformation.
Lastly, we examined the relationship between gene mutation status and either progenitor scores or IPSS-M. No statistically significant associations were observed, except for a correlation between TP53 mutations and IPSS-M, likely due to the limited sample size (Supplementary Fig. 14).
Discussion
In this study, we found that chromatin accessibility in stem cells more accurately reflects disease status than either chromatin accessibility in progenitor cells or gene expression profiles in stem cells. As MDS progresses, the stem-progenitor hierarchy is gradually lost, with MDS stem cells acquiring progenitor-like chromatin accessibility profiles. MDS stem cells exhibit moderate activation of myeloid transcriptional networks and suppression of HSC-specific transcriptional programs, indicating the emergence of aberrant myeloid programming (Fig. 7h). CEBP family transcription factors appear to play a central role in driving this activation of myeloid transcriptional networks. Notably, chromatin accessibility at DARs harboring CEBP-binding motifs stratified MDS into subgroups with distinct disease stages. The CEBP family is a key regulator of myeloid differentiation, with C/EBPα being particularly critical for driving myelopoiesis39,40,41. C/EBPα also acts as a pioneer transcription factor that primes HSPCs for myeloid commitment by binding to chromatin at myeloid gene loci42. While loss-of-function mutations in CEBPA are implicated in certain AML subtypes39,40, CEBPA expression at the mRNA level remains intact in all AML subtypes40, suggesting its dual role in AML pathogenesis. Preleukemic HSCs with driver mutations are known to undergo leukemic transformation during differentiation into GMPs, a process in which CEBPα has been implicated21,22. However, its role in MDS progression has not been thoroughly explored. Our data showed that chromatin regions with CEBP motifs, which typically gain accessibility during normal differentiation (Fig. 5a, MDS DAR cluster 1), also exhibited increased accessibility in MDS stem cells, thereby precociously activating the myeloid differentiation program. In addition, we identified unique chromatin regions aberrantly activated in MDS stem and progenitor cells (Fig. 5a, clusters 7–9), distinct from those in normal differentiation, although their pathological significance remains to be determined. Given that chromatin accessibility of CEBPα-binding motifs showed the strongest correlation with disease progression and prognosis among all transcription factors analyzed, these findings suggest that CEBPα-mediated myeloid priming or reprogramming in MDS stem cells may be a key driver of MDS progression.
We also observed increased chromatin accessibility at KLF-binding regions during the progression from MDS to AML. KLF family transcription factors, which regulate hematopoietic cell differentiation and function43,44, are generally downregulated at the mRNA level in AMLs45,46. Nevertheless, KLF family transcription factors have been identified as a core regulatory cluster in AML regardless of subtype20. In our analysis, KLF motifs were co-enriched with CEBP motifs in MDS DAR clusters 1 and 7–9, suggesting a function convergence of KLF and CEBP transcription factors in promoting leukemic transformation in MDS22. In contrast to de novo AML, the AP-1 family, which plays a broad regulatory role on de novo AML subtypes, and the HOX family, which is more subtype-specific in AML20, were negatively enriched in MDS and AML-MRC stem and progenitor cells. This unique pattern may, at least in part, underlie the distinct phenotypic features of MDS and AML-MRC compared to de novo AML. These findings suggest that MDS harbors unique transcription factor networks consisting of different combinations of transcription factors compared to AML. Interestingly, however, a subset of AP-1 target peaks showed maintained or increased chromatin accessibility with disease progression in MDS stem cells. These peaks correlated positively with both blast percentage and IPSS-M scores, suggesting that a distinct subset of AP-1 target regions exhibits differential behavior and may be specifically associated with leukemic transformation.
MDS frequently progresses to secondary AML, which accounts for the poor prognosis of patients1,8,11. Prognostic assessment has traditionally relied on IPSS-R13, and more recently, on IPSS-M14. However, IPSS-M is still insufficient15. In this study, we defined “progenitor score” based on the chromatin accessibility to quantify progenitor cell-like reprograming in MDS stem cells. This score correlated strongly with disease progression and overall survival. Among the components of the IPSS-R and IPSSR-M scores, the progenitor score was significantly associated with blast percentage and cytogenetic abnormalities, but not with other clinical parameters, suggesting that it serves as an independent risk factor for MDS progression. Our study revealed a partial dissociation between ATAC-seq and RNA-seq data in relation to disease severity (Fig. 6a-e, Supplementary Fig. 9c). Although biologically relevant changes in chromatin accessibility changes are generally expected to manifest as transcriptional alterations, this correspondence was not uniformly observed. It is important to note that, progenitor scores based on RNA-seq data were generated using the same framework as those based on ATAC-seq data, with normalized gene expression counts substituted for normalized peak counts. Thus, the observed differences are likely attributable to the distinct biological features captured by each method. One distinction relates to numerical tendencies. Both gene expression values and chromatin accessibility peak intensities were derived from discrete read counts at specific genomic regions, normalized, and compared for variability using the R package DESeq2. However, gene expression values exhibited a broader dynamic range of change than peak intensities in this study, and the total number of factors (genes vs. peaks) as well as the number of variable factors (DEGs vs. DARs) were smaller in RNA-seq compared to ATAC-seq (Fig. 6a, Supplementary Fig. 9c). Consequently, progenitor scores derived from RNA-seq data may have been more strongly influenced by a limited number of high-value factors than those derived from ATAC-seq data. Another difference reflects the nature of transcriptional regulation. Alterations in the transcriptional regulatory system can affect not only regions that directly influence gene expression but also those harboring transcription factor binding motifs with limited direct transcriptional effects. As a result, ATAC-seq tends to emphasize changes associated with regulatory landscapes more prominently than gene expression-based analyses. Although it is difficult to quantitatively assess which of these features contributed more substantially to the discrepancy, our findings suggest that ATAC-seq is methodologically more sensitive for capturing alterations in regulatory landscapes.
Importantly, the progenitor score in MDS stem cells showed a strong correlation with the enrichment of CEBP-binding motifs, which have been implicated in the transformation of preleukemic HSCs21,22. Therefore, the progenitor score might reflect the biological state of preleukemic MDS stem cells and LSCs, justifying its utility in predicting disease progression and prognosis. Moreover, our findings suggest that chromatin properties of MDS stem cells better capture the cell-autonomous behavior and progression of MDS than transcriptomic profiles. Collectively, these data indicate that MDS-specific chromatin signatures represent promising biomarkers for disease stratification and provide deeper insights into MDS pathogenesis.
This study has several limitations. First, the purity of MDS cells in the BM may influence the accuracy of our analysis. Analysis of the variant allele frequency (VAF) of mutations revealed an average VAF of 0.45 (range: 0.07–0.81) (Supplementary Data 2), indicating that MDS cells comprise a major population in the BM of most samples analyzed. However, VAFs were not directly assessed in the CD34+ fractions used for chromatin profiling, which may limit precise estimation of clonal burden within the analyzed cell populations. Second, due to ethical constraints, it was very difficult to obtain age-matched normal BM samples. The normal BM samples used in this study were obtained from a commercial distributor and were exclusively from younger individuals compared to the MDS cohort. This age difference may serve as a potential confounding factor in the interpretation of age-sensitive chromatin and transcriptional changes. Third, we relied on conventional surface marker panels established in normal hematopoiesis to define MDS stem and progenitor cell compartments. However, as shown in Supplementary Fig. 1, CD34⁺CD38⁻/low MDS stem cells display quantitative and qualitative heterogeneity in surface marker expression that does not fully align with their normal counterparts. This heterogeneity may affect the interpretation of our findings and should be considered as a limitation of this study. Future investigations using single-cell RNA-seq or ATAC-seq approaches will be essential to more accurately resolve cellular identities and functional states in MDS. Fourth, our findings highlight that molecular identities of HSPCs may shift dynamically during MDS evolution. However, it remains unclear whether the changes we describe arise from true reprogramming of stem cells, or from altered clonal dynamics that favor progenitor expansion. Extensive literature suggests that AML cells-of-origin often reside within the progenitor rather than the stem cell compartment. In this context, it is important to analyze longitudinal sequential patient specimens to directly distinguish between these two possibilities. Lastly, progenitor scores were significantly associated with blast percentages, raising the possibility that the chromatin accessibility dynamics could be influenced, at least in part, by differences in blast burden. While VAF distributions across cohorts suggested comparable clonal architecture, VAF does not directly reflect the relative abundance of malignant primitive subsets. Further studies incorporating single-cell ATAC-seq and RNA-seq data will be essential to validate whether our conclusions hold at the level of malignant stem and progenitor populations.
Methods
Patients and ethics
BM samples were collected from patients who provided informed consent, in accordance with the Declaration of Helsinki. All the patients were untreated. All experiments were approved by the ethics committee of Chiba University (approval ID: 964), the Institute of Medical Science, The University of Tokyo (approval ID: 30-47-B1002, 2024-48-1029), and the cooperating hospitals, namely, Edogawa Hospital (approval ID: 42), Tokyo Metropolitan Cancer and Infectious Diseases Center Komagome Hospital (approval ID: 2203), and Juntendo University Nerima Hospital (approval ID: 2018078).
Purification of stem and progenitor cells
Mononuclear cells were isolated from BM samples using Lymphoprep (STEMCELL Technologies) and then reacted with CD34 magnetic beads (Miltenyi Biotec) in the presence of Human Fc Receptor Blocking Solution (Miltenyi Biotec), followed by the purification of CD34-positive cells using MACS separation LS columns (Miltenyi Biotec). The cells were stored in a nitrogen tank. Normal BM samples from four healthy individuals were purchased from Lonza and processed in the same manner. Cells were thawed by the dropwise addition of IMDM containing 50% fetal bovine serum (FBS), washed, and resuspended in PBS containing 2% FBS. Cells were stained with anti-CD34 (BioLegend), anti-CD38 (BioLegend), and an anti-human lineage cocktail (a mixture of anti-CD3, CD14, CD16, CD19, CD20, and CD56; BioLegend). Normal and MDS stem (Lineage–CD34+CD38–) and progenitor (Lineage–CD34+CD38+) cells were sorted using FACSAria IIIu (BD Bioscience). Dead cells were removed by staining with 0.5 μg/ml of propidium iodide (Sigma-Aldrich, USA). Normal BM CD34+ cells from three healthy individuals purchased from Lonza were stained with anti-CD34 (BioLegend), anti-CD38 (BD Biosciences), anti-CD90 (BD Pharmingen), anti-CD45RA (BD Pharmingen), anti-CD135 (BD Biosciences), and anti-CD10 (BioLegend) antibodies and sorted as shown in Fig. 2d.
ATAC sequencing
For the ATAC sequence, a transposase reaction was performed using nuclei prepared from freshly sorted 20,000 cells from each fraction. Libraries were generated using a NEBNext Ultra DNA Library Prep Kit (New England BioLabs, Ipswich, MA, USA). Library DNA was size-selected (240–360 bps) using BluePippin (Sage Science, Beverly, MA, USA). Sequencing was performed using a HiSeq1500 or HiSeq2500 (Illumina) with a single-read sequencing length of 60 bp. Sequences were aligned to human genome sequences (hg19) using Bowtie2 (version 2.3.4.3)47 with the default settings. Peaks were called using MACS2 (version 2.2.6)48 with a q-value cutoff of <0.00001 and model settings. The catalog of all peaks called in any sample was produced by merging all called peaks that overlapped by at least one base pair using the bedtools (version 2.27.1) merge function. We then removed the regions that were detected in only one sample. A total of 66,848 peaks were detected and used as map files for downstream processing. The bedtools map function was used to count the reads in each region of the catalog using the bed files of each sample. A read count matrix was used for the normalization and detection of differentially accessible peaks (DARs) using DESeq2 (version 2.2.1)49. For the heatmap, the normalized read counts were z-score-scaled and plotted. For motif analysis, findMotifsGenome. pl. of Homer (version 4.11.1)50 was used with the -size 200 -mask option and all catalog regions were used as the background. For the annotation of peaks, annotatePeaks.pl of Homer was used with default settings.
RNA sequencing
Total RNA was extracted using the RNeasy Plus Micro Kit (Qiagen) and subjected to reverse transcription and amplification using the SMARTer Ultra Low Input RNA Kit (Clontech). After sonication using an ultrasonicator (Covaris), RNA-seq libraries were generated from the fragmented DNA with eight cycles of amplification using the NEB Next Ultra DNA Library Prep Kit (New England BioLabs). After the libraries were quantified using a Bioanalyzer (Agilent Technologies), sequencing was performed using a HiSeq2500 (Illumina) with a single-read sequencing length of 60 bp. Sequences were aligned to the reference human genome (hg19) using HISAT2 (version 2.1.0)51 and Bowtie2 (version 2.3.4.3) with the default parameters. Normalization and significant differences in expression were detected using DESeq2 (version 2.2.1) with raw counts generated using StringTie (version 1.3.4)52. Transcripts Per Million (TPM) were calculated using StringTie (version 1.3.4). Differentially expressed genes (DEGs) were defined as adjusted p-value < 0.05. Gene set enrichment analysis (GSEA) was performed using R package fgsea (version 1.30.0). Genes were ranked by DESeq2 Wald statistics and performed gene set enrichment using the Molecular Signatures Database (MSigDB) hallmark gene sets (v2024.1.Hs).
Targeted sequencing
Targeted sequencing was performed as described previously53,54. DNA was extracted using a Gentra Puregene Blood Kit (Qiagen) and quantified using a Qubit dsDNA HS Assay Kit (Life Technologies). For the panel analysis, libraries were prepared using the Human Myeloid Neoplasms Panel (Qiagen) and sequenced with 150 bp paired-end reads using the MiSeq with Reagent Kit v2 (Illumina). The human genome (hg19) was used as a reference for mapping. To identify single-nucleotide variants, we used a pipeline consisting of Genomon2 (https://genomon.hgc.jp/exome/en/). To identify structural polymorphisms, we used a pipeline comprising Pindel (http://gmt.genome.wustl.edu/packages/pindel/) and Genomon SV.
Data analysis
Motif enrichment score of transcription factors
To calculate the motif enrichment scores of HSPCs (Fig. 2f) and diseases (Fig. 3a), we took the average of normalized ATAC peak values of all samples and selected top 15% peaks, then performed motif enrichment analysis. -logP values of enrichment were used as the motif enrichment scores. For a more quantitative evaluation of motif enrichment of transcription factors, we calculated the motif enrichment of each transcription factor in individual samples by taking the average of the ATAC peak values of MDS DARs (DARs between MDS-MLD or MDS-EB and normal samples) with corresponding binding motifs. Normalized counts of gene expression and motif activity values were used to calculate Pearson correlations.
Transcription factor networks
Transcription factor networks of MDS stem and progenitor cells were constructed using the GeneNet method (Linear shrinkage estimator for a covariance matrix followed by a Gaussian graphical model (GGM) selection based on the partial correlation obtained from the shrinkage estimator) of GeNeCK (https://lce.biohpc.swmed.edu/geneck/)55 using the motif enrichment score of each sample indicated in Fig. 3b and visualized using Cytoscape 3.10.2 (https://cytoscape.org/)56.
Progenitor score
MDS DARs, which were also differentially accessible during normal differentiation, were selected to define progenitor scores. DARs were divided into four subgroups depending on their chromatin accessibility patterns in normal controls, MDS stem cells, and progenitor cells, as shown schematically in Fig. 6a. For each DAR, the peak values in normal stem and progenitor cells were arbitrarily set to 0 and 1, respectively. The peak values in MDS samples were then adjusted to the 0–1 scale relative to those in their normal counterparts, and the average of all DARs was calculated to obtain progenitor scores for disease stem and progenitor cells. Progenitor scores based on RNA-seq data were generated using the same framework, with normalized gene expression counts substituted for normalized peak counts.
Clustering of DARs with CEBP motifs
The data matrix, comprising normalized counts of DARs with CEBP motifs in MDS stem cell samples, was factorized into two matrices using the NMF package in R. This analysis allowed for the simultaneous clustering of both DARs and MDS stem cell samples into three groups. The optimal factorization rank (i.e., number of latent components) was estimated using the nmfEstimateRank function, which determines the most appropriate number of factors for matrix decomposition based on internal quality metrics.
CEBP inhibition in MDS-L cells
MDS-L cells37,38, kindly provided by Dr. Kaoru Tohyama (Kawasaki Medical School, Japan), were cultured in RPMI-1640 supplemented with 10% FBS (Sigma-Aldrich), 1% penicillin/streptomycin (Gibco), and human interleukin-3 (IL-3; 10 ng/mL, BioLegend). Cells were transfected with a pGCsam retroviral vector harboring either the mock or CEBP-dominant negative form (CEBP-DN)35,36. Five days after transduction, the transduced cells were purified by FACS Aria III (BD Biosciences) using GFP as a marker, and then processed for RNA- and ATAC-sequencing analysis.
Functional enrichment analysis
Functional enrichment analysis was performed using g:Profiler (version: e110_eg57_p18_4b54a898)57 with g:GOSt functional enrichment analysis applying a significance threshold of 0.05.
Survival analysis
For survival analysis, we use the coxph function from the R package “survival” to apply a Cox regression model and calculate Harrell’s concordance index (C-index)33,34. A C-index of 0.5 is considered a random predication and a value of 1.0 suggests that the model is a perfectly discriminating one. Kaplan-Meier method for survival curve and calculation of p-value by log-rank test were performed using the survfit and survdiff functions from the R package “survival”, respectively.
Select ATAC peaks highly contributing to the progenitor score
To identify ATAC peaks that strongly contribute to the progenitor score, we applied a machine learning approach using the randomForest package in R. The set of peaks used for progenitor score calculation was applied to a machine learning to predict the progenitor score. The optimal value of mtry (number of variables randomly sampled as candidates at each split) was determined using the tuneRF function, and the nodesize parameter (minimum size of terminal nodes) was set to 3. The number of trees was set equal to the number of explanatory variables. For each iteration, 2/3 of the explanatory variables were randomly selected, and the prediction was repeated 40 times. Peaks were considered to contribute highly to the progenitor score if the median relative importance (% IncMSE) across the 40 iterations was ≥ 0. The Median %IncMSE values of highly contributed peaks are presented in Supplemental Table 3.
Gene set variation analysis (GSVA)
GSVA enrichment scores for defined gene sets were calculated using the GSVA package in R. A matrix of normalized ATAC peak counts for each sample was used as input. Each peak was annotated to the gene whose TSS is closest to the center of the ATAC peak region, and the resulting gene-by-sample matrix was used for enrichment score calculation.
Statistical analysis
For statistical analysis, R version 3.5.0 to 4.4.0 (https://www.r-project.org/) was used. P values were calculated using an unpaired, two-tailed Student’s t-test to compare the progenitor score of stem and progenitor cells or peak signals of control and CEBP-DN, and were defined as significant when the P value was <0.05. In the differentially accessible analysis using DESeq2, significance was defined as an adjusted P value of <0.05. The two-tailed p-value of the Pearson correlation was defined as significant when it was <0.05. No statistical method was used to predetermine sample size. Progenitor cells from the M06 sample were excluded from analyses of the progenitor score after outlier detection by Grubbs’ test (Supplementary Fig. 9a).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
RNA-seq, ATAC-seq, and targeted sequence data from human samples were deposited in the National Bioscience Database Center (NBDC) of Japan (NBDC Research ID: hum0384.v1, Dataset ID: JGAS000603, JGAS000718 [https://humandbs.dbcls.jp/en/hum0384-v2]). Sequence data from the MDS-L cell line were deposited in the DDBJ under accession code PRJDB17935. Source data are provided with this paper. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request. Source data are provided with this paper.
Code availability
The codes used for calculating the Progenitor score is available via Zenodo [https://doi.org/10.5281/zenodo.15876435].
References
Corey, S. J. et al. Myelodysplastic syndromes: the complexity of stem-cell diseases. Nat. Rev. Cancer 7, 118–129 (2007).
Woll, P. S. et al. Myelodysplastic syndromes are propagated by rare and distinct human cancer stem cells in vivo. Cancer Cell 25, 794–808 (2014).
Shastri, A., Will, B., Steidl, U. & Verma, A. Stem and progenitor cell alterations in myelodysplastic syndromes. Blood 129, 1586–1594 (2017).
Chen, J. et al. Myelodysplastic syndrome progression to acute myeloid leukemia at the stem cell level. Nat. Med 25, 103–110 (2019).
Hall, T., Gurbuxani, S. & Crispino, J. D. Malignant progression of preleukemic disorders. Blood 143, 2245–2255 (2024).
Ogawa, S. Genetics of MDS. Blood 133, 1049–1059 (2019).
Haferlach, T. et al. Landscape of genetic lesions in 944 patients with myelodysplastic syndromes. Leukemia 28, 241–247 (2014).
Lindsley, R. C. et al. Acute myeloid leukemia ontogeny is defined by distinct somatic mutations. Blood 125, 1367–1376 (2015).
Makishima, H. et al. Dynamics of clonal evolution in myelodysplastic syndromes. Nat. Genet 49, 204–212 (2017).
Sperling, A. S., Gibson, C. J. & Ebert, B. L. The genetics of myelodysplastic syndrome: from clonal haematopoiesis to secondary leukaemia. Nat. Rev. Cancer 17, 5–19 (2017).
Menssen, A. J. & Walter, M. J. Genetics of progression from MDS to secondary leukemia. Blood 136, 50–60 (2020).
Greenberg, P. et al. International scoring system for evaluating prognosis in myelodysplastic syndromes. Blood 89, 2079–2088 (1997).
Greenberg, P. L. et al. Revised international prognostic scoring system for myelodysplastic syndromes. Blood 120, 2454–2465 (2012).
Bernard, E. et al. Molecular international prognostic scoring system for myelodysplastic syndromes. NEJM Evid. 1, EVIDoa2200008 (2022).
DeZern, A. E. & Greenberg, P. L. The trajectory of prognostication and risk stratification for patients with myelodysplastic syndromes. Blood 142, 2258–2267 (2023).
Gerstung, M. et al. Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes. Nat. Commun. 6, 5901 (2015).
Ganan-Gomez, I. et al. Stem cell architecture drives myelodysplastic syndrome progression and predicts response to venetoclax-based therapy. Nat. Med 28, 557–567 (2022).
Ng, S. W. et al. A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature 540, 433–437 (2016).
Eppert, K. et al. Stem cell gene expression programs influence clinical outcome in human leukemia. Nat. Med 17, 1086–1093 (2011).
Assi, S. A. et al. Subtype-specific regulatory network rewiring in acute myeloid leukemia. Nat. Genet 51, 151–162 (2019).
Ohlsson, E. et al. Initiation of MLL-rearranged AML is dependent on C/EBPalpha. J. Exp. Med 211, 5–13 (2014).
Ye, M. et al. Hematopoietic differentiation is required for initiation of acute myeloid leukemia. Cell Stem Cell 17, 611–623 (2015).
Issa, J. P. The myelodysplastic syndrome as a prototypical epigenetic disease. Blood 121, 3811–3817 (2013).
Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548.e1516 (2018).
Takayama, N. et al. The transition from quiescent to activated states in human hematopoietic stem cells is governed by dynamic 3D genome reorganization. Cell Stem Cell 28, 488–501.e410 (2021).
Itokawa, N. et al. Epigenetic traits inscribed in chromatin accessibility in aged hematopoietic stem cells. Nat. Commun. 13, 2691 (2022).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet 48, 1193–1203 (2016).
Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).
Hasserjian, R. P., Germing, U. & Malcovati, L. Diagnosis and classification of myelodysplastic syndromes. Blood 142, 2247–2257 (2023).
Sallman, D. A. & List, A. The central role of inflammatory signaling in the pathogenesis of myelodysplastic syndromes. Blood 133, 1039–1048 (2019).
Winter, S., Shoaie, S., Kordasti, S. & Platzbecker, U. Integrating the “immunome” in the stratification of myelodysplastic syndromes and future clinical trial design. J. Clin. Oncol. 38, 1723–1735 (2020).
Schneider, M. et al. Activation of distinct inflammatory pathways in subgroups of LR-MDS. Leukemia 37, 1709–1718 (2023).
Harrell, F. E. Jr., Califf, R. M., Pryor, D. B., Lee, K. L. & Rosati, R. A. Evaluating the yield of medical tests. JAMA 247, 2543–2546 (1982).
Harrell, F. E. Jr., Lee, K. L. & Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med 15, 361–387 (1996).
Olive, M., Williams, S. C., Dezan, C., Johnson, P. F. & Vinson, C. Design of a C/EBP-specific, dominant-negative bZIP protein with both inhibitory and gain-of-function properties. J. Biol. Chem. 271, 2040–2047 (1996).
Iwama, A. et al. Reciprocal roles for CCAAT/enhancer binding protein (C/EBP) and PU.1 transcription factors in Langerhans cell commitment. J. Exp. Med 195, 547–558 (2002).
Kida, J. I. et al. An MDS-derived cell line and a series of its sublines serve as an in vitro model for the leukemic evolution of MDS. Leukemia 32, 1846–1850 (2018).
Kayamori, K. et al. DHODH inhibition synergizes with DNA-demethylating agents in the treatment of myelodysplastic syndromes. Blood Adv. 5, 438–450 (2021).
Rosenbauer, F. & Tenen, D. G. Transcription factors in myeloid development: balancing differentiation with transformation. Nat. Rev. Immunol. 7, 105–117 (2007).
Avellino, R. & Delwel, R. Expression and regulation of C/EBPalpha in normal myelopoiesis and in malignant transformation. Blood 129, 2083–2091 (2017).
Zhang, D. E. et al. Absence of granulocyte colony-stimulating factor signaling and neutrophil development in CCAAT enhancer binding protein alpha-deficient mice. Proc. Natl Acad. Sci. USA 94, 569–574 (1997).
Hasemann, M. S. et al. C/EBPalpha is required for long-term self-renewal and lineage priming of hematopoietic stem cells and for the maintenance of epigenetic configurations in multipotent progenitors. PLoS Genet 10, e1004079 (2014).
Cao, Z., Sun, X., Icli, B., Wara, A. K. & Feinberg, M. W. Role of kruppel-like factors in leukocyte development, function, and disease. Blood 116, 4404–4414 (2010).
Mahabeleshwar, G. H. et al. The myeloid transcription factor KLF2 regulates the host response to polymicrobial infection and endotoxic shock. Immunity 34, 715–728 (2011).
Humbert, M. et al. Deregulated expression of Kruppel-like factors in acute myeloid leukemia. Leuk. Res 35, 909–913 (2011).
Morris, V. A., Cummings, C. L., Korb, B., Boaglio, S. & Oehler, V. G. Deregulated KLF4 expression in myeloid leukemias alters cell proliferation and differentiation through MicroRNA and gene targets. Mol. Cell Biol. 36, 559–573 (2016).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Yokoyama, K. et al. Cell-lineage level-targeted sequencing to identify acute myeloid leukemia with myelodysplasia-related changes. Blood Adv. 2, 2513–2521 (2018).
Nakamura, S. et al. Prognostic impact of circulating tumor DNA status post-allogeneic hematopoietic stem cell transplantation in AML and MDS. Blood 133, 2682–2695 (2019).
Zhang, M. et al. GeNeCK: a web server for gene network construction and visualization. BMC Bioinforma. 20, 12 (2019).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504 (2003).
Kolberg, L. et al. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res 51, W207–W212 (2023).
Acknowledgements
We would like to thank K. Tohyama for providing us with MDS-L cells, Y. Isshiki for technical assistance, and O. Rizq for critical reading of the manuscript. Supercomputing resources were provided by the Human Genome Center at the Institute of Medical Science, University of Tokyo, Japan. This work was supported in part by Grants-in-Aid for Scientific Research (21K08366 and 24K11554 to MO and 19H05653, 19H05746, and 24H00066 to AI) from the Japan Society for the Promotion of Science (JSPS) and the Innovative Cancer Medical Practice Research Project (18ck0106401h0001) and Moonshot project (21zf0127003h0001) from AMED, Japan.
Author information
Authors and Affiliations
Contributions
M.O. and N.T. performed the experiments, analyzed the results, produced the figures, and actively wrote the manuscript; Y.N.-T., D.S., N.I., S. Kurosawa., S. Kaito, T.K., Y.Y., S.A., K.K., S.K.P., and M.A.K. assisted with the experiments; N.Y., K. Yokoyama, Y.N., A.T., Y.Z., and S.I. performed targeted sequencing; B.R., A.K., K. Yamaguchi., and Y.F. performed sequencing; T. Muto., S.T., E. Sakaida, E. Sato, N.D., K.N., Y.D., T. Myojo., Y.H., and H.H. collected MDS BM samples and analyzed patients’ data; K.E. discussed the results; and A.I. conceived of and directed the project, secured funding, and actively wrote the manuscript. All the authors reviewed the manuscript and approved the final version for submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Cristina Pina and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Oshima, M., Takayama, N., Nakajima-Takagi, Y. et al. Chromatin accessibility in stem cells unveils progressive transcriptional alterations in myelodysplastic syndrome. Nat Commun 16, 10726 (2025). https://doi.org/10.1038/s41467-025-65753-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-65753-5









