Genome-wide accessible chromatin profiling in B. napus and its progenitors
To ascertain the chromatin regulatory landscape of two types of B. napus and their progenitors, we constructed genome-wide maps of ACRs of leaves through ATAC-Seq, with three biological replicates for all the genotypes. High Spearman correlations were observed between biological replicates (Supplementary Fig. 1). Our results showed that approximately 64 million (B. rapa, A2) to 102 million (B. oleracea, C2) raw reads were obtained from ATAC-Seq, more than 96% of which were clean reads in all samples (Table S1). According to previous studies38,40, we constructed an in silico ‘hybrid’ (A_C), which can fuse well with parental data for the convenience of subsequent studies, by mixing the reads of B. rapa and B. oleracea in equal proportions. As shown in Table S1, an average of approximately 94.7%, 95.6%, 93.0%, and 98.0% of clean reads from the ATAC-Seq data of B. rapa, B. oleracea, resynthesized B. napus, and natural B. napus were mapped to the genome of B. napus35, and the GC content of each sample was higher than 40%.
In this study, the peaks identified by MACS241 were referred to as accessible chromatin regions (ACRs). In total, 36,161, 40,913, and 27,965 ACRs were identified in in silico ‘hybrid’, resynthesized B. napus and natural B. napus (Fig. 1a), which were associated with 26,601, 29,476, and 21,182 genes (Fig. 1b), respectively. In addition, we defined 45.64% of ACRs as genic ACRs (gACRs, overlapping with genic region from transcriptional start site, TSS, to transcriptional terminate site, TTS), and the remainder as intergenic ACRs (iACRs, not overlapping with genic region) in in silico ‘hybrid’ (Fig. 1c). Compared with the in silico ‘hybrid’, the proportion of gACRs increased, whereas the proportion of iACRs decreased in resynthesized B. napus. The proportion of gACRs in natural B. napus was similar to that in resynthesized B. napus. The genes associated with ACRs in the three genotypes were involved in the GO terms ‘positive regulation of molecular function’, ‘positive regulation of protein metabolic process’, and ‘positive regulation of catalytic activity’ (Supplementary Fig. 2, Supplementary Data 1). At chromosomal level, we found that many regions with high chromatin accessibility were conserved among the three genotypes, but there were also many variations in accessible chromatin regions or openness among the three genotypes (Supplementary Fig. 3). To explore the distribution of ACRs among genes, we inspected the ATAC-Seq signal at ACRs in three genotypes using deepTools42. These analyses revealed that ACRs tended to be highly enriched around the transcript start site (TSS) in all genotypes (Fig. 1d). Surprisingly, in silico ‘hybrid’ had an average maximum of approximately 235 reads per kilobase per million mapped reads (RPKM) around TSS, whereas resynthesized B. napus and natural B. napus had higher accessibility in these regions, with an average maximum of approximately 700 and 770 RPKM around TSS, respectively (Fig. 1e). These results implied that the overall chromatin accessibility increased in resynthesized B. napus during polyploidization coupled with the hybridization process, and the overall chromatin accessibility also increased in natural B. napus during the subsequent evolution process. Then, we mapped these ACRs to genomic features, and found that the majority (61.55%) of ACRs were distributed in the promoter region, especially promoters within 1 kb (37.24%), followed by the distal intergenic region (34.09%), whereas only 2.3% and 2.06% of ACRs were distributed in the gene body (exons and introns) and within 300 bp downstream of gene end in the in silico ‘hybrid’ (Fig. 1f). Similar proportions of these ACRs were observed in resynthesized B. napus. Compared with resynthesized B. napus, the proportion of ACRs distributed in the promoter region decreased 4.45%, and the proportion of ACRs distributed in the distal intergenic region increased 4.76%, whereas the proportions of ACRs distributed in the gene body and within 300 bp downstream of the gene end were almost unchanged in natural B. napus (Fig. 1f). These results implied the far more important regulatory roles of ACRs in the promoter region and distal intergenic region than in the gene body and within 300 bp downstream of the gene end. Although the distribution of these ACRs hardly changed during allopolyploidization, the ACRs in distal intergenic regions seemed to play an increasingly important role in natural evolutionary processes. In addition, we compared the ATAC-Seq signals of two types of ACRs and found that iACRs had higher ATAC-Seq signal levels around the peak center than gACRs in all genotypes (Fig. 1g), which may imply that the ACRs in genic regions need less chromatin accessibility to regulate gene expression.
Chromatin accessibility changed dramatically in resynthesized and natural allopolyploid B. napus
To examine the changes in chromatin accessibility during the allopolyploidization process, we overlapped the ACRs identified in the three genotypes. We found that genotype-specific ACRs were more abundant than common ACRs (Fig. 2a). A total of 8531 ACRs were common among three genotypes, whereas 16,418, 19,274, and 11,089 genotype-specific ACRs were identified in in silico ‘hybrid’, resynthesized B. napus and natural B. napus, respectively. The abundant genotype-specific ACRs indicated drastic changes in chromatin accessibility, both during the formation and natural evolution process of B. napus. Then, we compared the distribution of the two types of ACRs in the genomic region and found that the distribution difference was mainly in the promoter region and distal intergenic region, although the sum of these two regions was almost unchanged (Fig. 2b). More common ACRs were distributed in the distal intergenic region (5.6−12.4%) but less of that distributed in the promoter region (4.8–11.2%) than genotype-specific ACRs (Fig. 2b). As showed in Fig. 2c, common ACRs were more enriched than genotype-specific ACRs especially near the peak center, indicating that common ACRs were more open than genotype-specific ACRs.
Since many ACRs were identified as common ACRs, we wondered if there were quantitative differences in common ACRs between genotypes. Only those ACRs that had a fold change of 2 or more (P value < 0.05) in a genotype were categorized as different enriched ACRs (DEAs) in that genotype and we identified 320 genes that had more accessible DEAs and 500 genes that had fewer accessible DEAs in in silico ‘hybrid’ than in resynthesized B. napus (Supplementary Fig. 4). The results of GO analysis showed that resynthesized B. napus-enriched DEAs-related genes were involved in ‘cell wall organization’ and ‘lipid metabolic process’, whereas in silico ‘hybrid’-enriched DEAs-related genes were involved in ‘chromosome organization’ and ‘organelle organization’ (Supplementary Data 2). A total of 556 genes had different accessible ACRs between the two types of B. napus, in which 415 DEA-related genes were enriched in natural B. napus and 141 DEA-related genes were enriched in resynthesized B. napus. Natural B. napus-enriched DEAs-related genes were associated with some metabolic processes, such as ‘ncRNA metabolic process’, whereas resynthesized B. napus-enriched DEAs-related genes were involved in ‘biosynthetic process’ and ‘gene expression’. There were 592 genes that had natural B. napus-enriched DEAs and 200 in silico ‘hybrid’-enriched DEAs, and these genes were involved in multiple biological processes, such as ‘mRNA process’ and ‘photosynthesis’. These results indicated that genes exhibiting different ACRs played important roles in metabolic processes and gene expression regulation.
The effects of chromatin accessibility on gene expression in B. napus and its progenitors
To determine the relationship between chromatin accessibility and gene expression levels, we explored the chromatin accessibility of four groups of genes divided by gene expression levels: highly expressed genes (high; transcripts per million reads (TPM) greater than 10), moderately expressed genes (med; TPM between 1 and 10), lowly expressed genes (low; TPM greater than 0 and less than 1), and silenced genes (none; TPM equal to 0). As shown in Fig. 3a, highly expressed genes had the highest ATAC-Seq signal, followed by moderately expressed genes. The ATAC-Seq signal of lowly expressed genes and silenced genes was the lowest. These results indicated that chromatin accessibility positively affected gene expression. Then, we wondered whether differential gene expression was caused by changes in chromatin accessibility. The distribution of expression levels of differentially expressed genes (DEGs) were shown in Supplementary Fig. 5. Surprisingly, compared with the in silico ‘hybrid’, both up-regulated and down-regulated DEGs had higher ATAC-Seq signals in resynthesized and natural B. napus (Fig. 3b), which may be caused by the higher whole ATAC-Seq signals of the two types of B. napus. Up- and down-regulated DEGs had higher ATAC-Seq signals in natural B. napus than in resynthesized B. napus, which was consistent with the higher whole ATAC-Seq signal in natural B. napus. Interestingly, the ATAC-Seq signals of down-regulated DEGs was higher than that of up-regulated DEGs in all comparisons. These results indicated that the changes in chromatin accessibility between genotypes did not seem to explain the differential gene expression due to differences in overall chromatin accessibility between genotypes and that the down-regulated DEGs needed higher chromatin accessibility to maintain gene transcription. To examine how many DEGs could be regulated by ACRs, we overlapped the genotype-enriched ACR-associated genes (genes related to genotype-specific ACRs and genotype-enriched DEAs) and DEGs. In each comparison, the highly expressed DEGs in a genotype were referred to as genotype-enriched DEGs. As shown in Fig. 3c, 17.4% (46) of in silico ‘hybrid’-enriched DEGs and 17.3% (53) of resynthesized B. napus-enriched DEGs were associated with genotype-enriched ACRs. In the comparison of two types of B. napus, 15.5% (101) of the resynthesized B. napus-enriched DEGs and 14.3% (95) of the natural B. napus-enriched DEGs were regulated by ACRs. Although most DEGs were found in comparison of in silico ‘hybrid’ and natural B. napus, the proportion of genotype-enriched ACR-associated DEGs was similar to that in the previous comparisons. These results indicated that the changes in chromatin accessibility regulated the differential expression of a part of DEGs, which was similar in maize22.
Identification of cis- and trans-regulators in ACRs of B. napus and its progenitors
Since active cis-regulatory elements (CREs) are generally embedded in ACRs that can be integrated by trans-regulatory proteins to regulate gene expression, we identified over-represented motifs from ACRs of each genotype. In total, 82, 84, and 71 motifs and corresponding transcription factor (TF) families (Supplementary Data 3) were identified and 21, 22, and 18 of them were unique in in silico ‘hybrid’, resynthesized B. napus and natural B. napus, respectively (Supplementary Fig. 6a). Then, we detected 23,907, 28,602, and 19,855 genes that were target genes of these TFs in in silico ‘hybrid’, resynthesized B. napus and natural B. napus, respectively. We selected the top 4 TFs that regulated the most DEGs and visualized their regulatory network with DEGs (Supplementary Fig. 6b–d). We overlapped these target genes and DEGs, and found 37.5%, 34.6%, and 35.6% DEGs were regulated by these TFs in comparisons of in silico ‘hybrid’ vs. resynthesized B. napus, in silico ‘hybrid’ vs. natural B. napus and resynthesized B. napus vs. natural B. napus, respectively (Supplementary Fig. 6e). These results implied that many genotype-unique over-represented motifs were bound by corresponding TFs to regulate the differential expression of genes.
Local epigenetic modifications affected chromatin accessibility in B. napus and its progenitors
To examine the relationship between chromatin accessibility and local epigenetic modifications, we compared three histone modification statuses and DNA methylation levels around the ACRs. Surprisingly, not only active histone modification (H3K4me3 and H3K27ac) but also repressive histone modification (H3K27me3) were found to be highly enriched around the peak center of ACRs (Supplementary Fig. 7), whereas DNA methylation except CHG content was deficient around the peak center of ACRs (Supplementary Fig. 8). To determine which histone modification had the greatest impact on ACR openness, we divided ACR into six clusters: cluster 1 was associated with H3K27me3, cluster 2 was associated with H3K4me3, cluster 3 was associated with H3K27ac, cluster 4 was associated with H3K27me3 and H3K4me3, cluster 5 was associated with H3K4me3 and H3K27ac, and cluster 6 was associated with H3K27me3 and H3K27ac. We found that cluster 1 had the highest ATAC-Seq signal, followed by cluster 3 in the in silico ‘hybrid’ (Fig. 4a, d). Cluster 6 had the highest ATAC-Seq signal, followed by cluster 1 in resynthesized B. napus (Fig. 4b, d), whereas cluster 1 had the highest ATAC-Seq signal, followed by cluster 6 in natural B. napus (Fig. 4c, d). These results indicated that H3K27me3 had the greatest impact on ACR openness in the in silico ‘hybrid’, whereas the bivalent histone modifications H3K27me3 and H3K27ac had the greatest impact on ACR openness in resynthesized B. napus, and the influence of H3K27me3 gradually became dominant in natural B. napus.
To further determine the relationship between ACR intensity and local histone modifications, we divided the ACRs into 10 groups by ranking their pileup from low to high for each genotype. All groups of ACRs exhibited higher three histone modification signals than both flanks (Fig. 5a–c), whereas all groups except rank 9 or rank 10 exhibited lower CG, CHG, and CHH DNA methylation levels than both flanks (Supplementary Fig. 9). Surprisingly, the intensity of ACRs was positively associated not only with three histone modification levels, but also with the DNA methylation level (Supplementary Fig. 10). Additionally, we compared four local epigenetic modifications around gACRs and iACRs and found that iACRs exhibited the much higher H3K27ac and H3K27me3 and DNA methylation levels than gACRs, whereas iACRs showed higher H3K4me3 around the peak center but lower H3K4me3 in both flanks than gACRs (Fig. 5e, f and Supplementary Fig. 9e, f). Since H3K27me3 has long been considered an inhibitory histone modification, we wondered why it was positively correlated with chromatin accessibility. Therefore, we examined the relationship between histone modification level and gene expression and found that highly expressed genes not only had the highest H3K4me3 and H3K27ac, but also had the highest H3K27me3 signal, followed by moderately expressed genes (Supplementary Fig. 11). Similar to the ATAC-Seq signals, the H3K27me3 signals of lowly expressed genes and silenced genes were almost absent. Although both lowly expressed genes and silenced genes had H3K4me3 and H3K27ac signals, the signal of the former was much higher than that of the latter. In contrast to the active histone modifications H3K4me3 and H3K27ac, the DNA methylation level was found to be negatively associated with the gene expression level (Supplementary Fig. 12).
Since the overall chromatin accessibility did not account for differential gene expression, we examined the histone modification level of up- and down-regulated DEGs. Similar to the ATAC-Seq signals, down-regulated DEGs showed higher H3K27me3 levels than up-regulated DEGs except in comparison of resynthesized B. napus and natural B. napus (Fig. 6c). Up-regulated DEGs exhibited lower H3K4me3, H3K27ac, and H3K27me3 levels, whereas down-regulated DEGs showed higher H3K4me3, H3K27ac, and H3K27me3 levels in in silico ‘hybrid’ than that in two types of B. napus (Fig. 6a–c). However, in the comparison of resynthesized B. napus and natural B. napus, up-regulated DEGs exhibited higher H3K4me3, H3K27ac, and H3K27me3 levels, whereas down-regulated DEGs showed lower H3K4me3, H3K27ac, but higher H3K27me3 levels in resynthesized B. napus. In contrast to active histone modification, up-regulated DEGs had lower DNA methylation levels, whereas down-regulated DEGs had higher DNA methylation levels (Supplementary Fig. 13). These results suggested that the chromatin accessibility of DEGs was correlated with the H3K27me3 level. The up-regulation of DEGs seemed to require active histone modifications (H3K4me3 and H3K27ac), but lower DNA methylation levels in the three genotypes.
Asymmetrical chromatin accessibility and epigenomic modifications between subgenomes
B. napus contains two distinct subgenomes derived from B. rapa and B. oleracea35. To examine which subgenome is more dominant, we initially compared gene expression levels of two subgenomes in leaves of three genotypes, and found that gene expression levels of subgenome Cn were significantly higher than subgenome An (Supplementary Fig. 14a). To compare the chromatin accessibility between two subgenomes, we counted the number of ACRs in two subgenomes and found that ACRs were more in all genomic regions of the Cn subgenome (Fig. 7a). There was no significant difference between intensity of ACRs between two subgenomes of in silico ‘hybrid’, whereas the intensity of ACRs of subgenome An was significantly higher than that of Cn of natural B. napus and resynthesized B. napus (Fig. 7b). In in silico ‘hybrid’, the intensity of intergenic ACRs in subgenome An was significantly higher than that of Cn, but the intensity of genic (including exon, intron, promoter-TSS and TTS) ACRs was significantly lower than that of Cn (Fig. 7c). The intensity of intergenic ACRs in subgenome An was also significantly higher than that of Cn in natural B. napus and resynthesized B. napus, but the differences of intensity of genic ACRs were attenuated in resynthesized B. napus and not even significant in natural B. napus. Further analysis revealed that intergenic ACRs showed higher H3K4me3, H3K27ac, H3K27me3 modifications and DNA methylation than genic ACRs (Supplementary Figs. 15 and 16). Then, we compared the chromatin accessibility, H3K4me3, H3K27ac, H3K27me3 modifications and DNA methylation between two subgenomes. Genes in subgenome Cn had higher average ATAC-Seq signals, H3K27me3 levels, and DNA methylation levels than the genes in subgenome An, whereas genes in subgenome Cn had slightly lower average H3K4me3 and H3K27ac levels than genes in subgenome An (Supplementary Figs. 14b–d and 17). To further explore the cause of the differences between the two subgenomes, we divided the genes of each genotype into homeologous genes and subgenome-unique genes according to our previous study38. As shown in Fig. 8a, subgenome Cn-unique genes exhibited the highest expression level in each genotype. Subgenome Cn-unique genes had significantly higher gene expression than those in subgenome An-unique in each genotype, whereas there was no significant difference between homeologous genes in two subgenomes except natural B. napus. Then, we investigated the epigenetic modifications of these genes, and found that subgenome Cn-unique genes showed the highest ATAC-Seq signal and H3K27me3 level in all genotypes (Fig. 8b, e). Surprisingly, homeologous genes had higher H3K4me3 and H3K27ac levels but lower DNA methylation levels than subgenome unique genes in each subgenome in all genotypes (Fig. 8c, d; Supplementary Fig. 18), which indicated that homeologous gene pairs may need more H3K4me3 and H3K27ac to finely regulate gene expression. Taken together, these results suggested that the higher overall gene expression of subgenome Cn may be due to the high expression of subgenome Cn-unique genes, which had higher chromatin accessibility and H3K27me3 levels.
According to our previous study38, homeologous gene pairs were divided into A-/C-biased expression genes (A-/C-BEGs) and no biased expression genes (no-BEGs). The distribution of gene expression levels of homeologous gene pairs is shown in Fig. 9a. Although a similar ATAC-Seq signal and H3K27me3 were observed between genes in two subgenomes of A-/C-BEGs and no-BEGs, we found that An genes had higher H3K4me3 and H3K27ac levels between 100 bp upstream and downstream of TSS in A-BEGs, whereas Cn genes had higher H3K4me3 and H3K27ac levels between 100 bp upstream and downstream of TSS in C-BEGs, and An genes and Cn genes had similar H3K4me3 and H3K27ac levels in no-BEGs in all genotypes (Fig. 9b–e). In contrast to the distribution of active histone modifications, genes with higher expression had lower DNA methylation level than their homeologous genes between 200 upstream and 700 bp downstream of TSS, and no BEGs had similar DNA methylation levels in the two subgenomes (Supplementary Fig. 19). Surprisingly, genes in subgenome Cn had higher DNA methylation levels than their homeologous genes in An on both sides of TSS regardless of gene expression bias. These results indicated that homeologous gene pairs had more H3K4me3 and H3K27ac modifications to finely regulate biased gene expression, which was seemingly independent of chromatin accessibility, and the gene expression was positively associated with H3K4me3 and H3K27ac levels but negatively related to DNA methylation levels around TSS.
Chromatin accessibility of singleton and five types of duplicated genes in B. napus
Allopolyploid B. napus is rich in duplicated genes due to multiple round WGDs during the evolution process35. According to a previous study43, we divided genes in B. napus into six types of genes: singletons (genes that have one copy), WGD-derived genes (genes derived from WGD), TD-derived genes (genes derived from tandem duplication), TRD-derived genes (genes derived from transposed duplication), DSD-derived genes (genes derived from dispersed duplication), and PD-derived genes (genes derived from proximal duplication). The TD-, TRD-, DSD-, and PD- derived genes were referred to as small-scale duplicated genes43. As shown in Fig. 10a, singletons had significantly higher gene expression than all duplicated genes in the three genotypes. Among duplicated genes, WGD-derived genes had the highest gene expression levels, followed by DSD- and TRD-derived genes, and PD- and TD-derived genes had the lowest gene expression levels in all genotypes. We wondered whether chromatin accessibility was related to the regulation of gene expression of these types of genes, and detected the ATAC-seq signal of these genes. Consistent with the higher gene expression level, singletons had higher chromatin accessibility than duplicated genes in all genotypes (Fig. 10b). WGD-derived genes had higher chromatin accessibility than small-scale duplicated genes, which may be related to the higher gene expression of WGD-derived genes. However, unlike the gene expression levels, TRD- and PD-derived genes had higher ATAC-Seq signals than DSD- and TD-derived genes. These results indicated that singletons and WGD-derived genes were more accessible than small-scale duplicated genes and were coupled with higher gene expression levels.