I agree that the evidence of Pol II pausing is not as prominent at exon-intron boundaries as compared to start and termination sites as indicated from ChIP-Seq data.
However, I would think that higher nucleosome occupancy in exons could cause pausing or at least decrease elongation rates if you will.
*****
This paper attempts to link DNA methylation, G+C content, and nucleosome occupancy in exons with cotranscriptional splicing.
DNA-methylation effect on cotranscriptional splicing is dependent on GC architecture of the exon–intron structure
Genome Research March 2013
Sahar Gelfman, Noa Cohen, Ahuvi Yearim, and Gil Ast
This paper is an older paper by the same group that links G+C content and nucleosome occupancy in exons with cotranscriptional splicing.
Differential GC Content between Exons and Introns Establishes Distinct Strategies of Splice-Site Recognition
Maayan Amit, Maya Donyo, Dror Hollander, Amir Goren, Eddo Kim, Sahar Gelfman, Galit Lev-Maor, David Burstein2, Schraga Schwartz, Benny Postolsky, Tal Pupko, Gil Ast
Genome-wide studies have revealed that nucleosomes are preferentially associated with exons as compared to introns, and this phenomenon has been implicated in co-transcriptional splicing events. See the reference below. If really interested, you should read the paper, which I attached. I copied and pasted some quotes from the paper. See below.
Oesterreich,F., Bieberstein,N. and Neugebauer,K. (2011) Pause
"The increased GC content in exons relative to introns favors another potential influence on local elongation rates, nucleosome positioning. Nucleosome disfavoring sequence elements are located at exon–intron boundaries resulting in a depletion of nucleosomes over the 5′ and 3′ splice sites." In agreement, nucleosomes are reported by several independent studies to be positioned on internal exons in various species and cell types but one study challenges the generality of this phenomenon. An appealing hypothesis is that nucleosome positioning can regulate elongation kinetics. Indeed, in vitro as well as in vivo data show that nucleosomes impose a natural barrier to transcribing Pol II; in vivo data show a significantly higher density of Pol II over exons compared to introns, suggesting pausing. Slowing transcription within metazoan exons might regulate co-transcriptional splicing by increasing the time available for splicing of upstream introns, similar to terminal exon pausing seen in yeast. If pausing occurs near the 3′ SS, as observed in yeast, synthesis of alternative 3′ splice sites might be delayed, allowing inclusion of upstream exons. In agreement with splicing regulation, exons flanked by weak splice sites show a stronger enrichment of nucleosomes than exons with strong splice sites, and nucleosome occupancy is correlated with inclusion. Moreover, pseudo-exons are depleted of nucleosomes, suggesting that impaired nucleosome positioning inhibits exon recognition. Interestingly, nucleosome positioning on exons is independent of transcription, suggesting that this epigenetic mark is not determined by splicing."
"Nucleosome positioning can influence elongation and co-transcriptional splicing by (i) locally stalling Pol II and/or (ii) providing a local scaffold for recruitment of positive or negative splicing regulators via modified histone tails."
Thanks, Clayton, for your reply and for the reference. I'll read the paper with interest. The excerpts you pasted here suggest that nucleosomes are more than just mechanical and passive packaging devices. From an evolutive point of view, I guess that the current structure of nucleosomes was first 'established' to compress the increasing genomes and then the size of exons was somehow adjusted to fit their size due to the obtained advantages in controlling transcription and splicing...
Yes, I never really thought about exon length actually evolving to become nucleosomal DNA length. But, it makes sense if nucleosome occupancy in exons helps facilitate splicing / RNA polymerase pausing.
On a side note, I get so confused on this topic because I go back and forth from DNA and RNA. Of course, the RNA is not wrapped up in nucleosomes. At least, I don't think it is.
I see your background in in computer science. There are a number of factors that dictate the positions of nucleosomes along chromosomes. The first, is the DNA sequence itself. Long tracts of A and T nucleotides, inhibit nucleosome formation because it prevents the DNA from wrapping around the histone octamer. These long A tracts and T tracts are known occur in nucleosome free regions around transcription start sites in yeast. Also, certain ~10 bp periodicities of certain dinucleotides and tetranucleotides along nucleosomal DNA have been shown to facilitate the bending of DNA around the histone octamer. Also certain DNA sequence motifs that are highly flexible are known to frequently occur where great demands of curvature around the histone octamer are required at certain positions. The second factor is nucleosome occlusion by other proteins that bind DNA such as transcription factors and RNA polymerase. The third factor is chromatin remodeling factors, which can remove nucleosomes or reposition nucleosomes at the expense of energy or ATP, overriding DNA sequence preferences. Other factors are DNA methylation and Histone acetylation/methylation, and the significance of these modifications are being debated - in terms of how these modifications directly affect nucleosome stability and their effect on gene expression. Many believe that the purpose of these epigenetic marks are to recruit other proteins, which are more important in regulating chromatin organization, thereby having a more significant role in governing gene expression.
Yes, I have read about that in Alberts et al.'s book, but I greatly appreciate this nice summary. You are right, my background is in CS, but I'm becoming more and more interested in so many computational features and concepts that are enclosed into cells!
There is a great need for bioinformaticians with strong computational skillls, which is why I felt compelled to offer some general information about nucleosomes. I write Perl scripts for customized data analysis of genome-wide data, but I'm definitely an amateur as my background is not CS.
Regarding evolution of exons, apparently they were 'first' and genes came later (in eucaryote cells) as aggregates of exons. As far as I know, this has been suggested by the correspondence between exons and protein domains. Is this consistent with exons that adapt their size to nucleosome capacity? They look like opposite evolution moves...
I suppose it depends on the length of the primary sequences in the protein domains. If the protein domains consist of give or take approximately 50 amino acids, then that would correspond to 150 bp of coding DNA sequence, the length of DNA in the nucleosome.
It appears that the average protein domain consists of ~100 amino acid residues. I am not sure what this implies.
Wikipedia "protein domain"
Domains have limits on size.[24] The size of individual structural domains varies from 36 residues in E-selectin to 692 residues in lipoxygenase-1,[15] but the majority, 90%, have less than 200 residues[25] with an average of approximately 100 residues.[26] Very short domains, less than 40 residues, are often stabilised by metal ions or disulfide bonds. Larger domains, greater than 300 residues, are likely to consist of multiple hydrophobic cores.
24.
Savageau MA. (1986). "Proteins of Escherichia coli come in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications". Proc Natl Acad Sci USA 83 (5): 1198–202. Bibcode:1986PNAS...83.1198S. doi:10.1073/pnas.83.5.1198. PMC 323042. PMID 3513170.
25.
Islam SA, Luo J, Sternberg MJ. (1995). "Identification and analysis of domains in proteins". Protein Eng 8 (6): 513–25. doi:10.1093/protein/8.6.513. PMID 8532675.
26.
Wheelan, S. J. and Marchler-Bauer, A. and Bryant, S. H. (2000). "Domain size distributions can predict domain boundaries". Bioinformatics 16 (7): 613–618. doi:10.1093/bioinformatics/16.7.613. PMID 11038331.
27.
Garel, J. (1992). "Folding of large proteins: Multidomain and multisubunit proteins". In Creighton, T. Protein Folding (First ed.). New York: W.H. Freeman and Company. pp. 405–454. ISBN 0-7167-7027-X.
In addition to Vladimir's excellent points, I'd caution against putting too much weight into the idea that PolII pauses on exons preferentially in higher eukaryotes. At least in terms of ChIP-Seq datasets for PolII, I don't think this hypothesis can be well supported. In contrast, PolII enrichment at the 3' terminal exon of expressed genes is likely a general phenomenon.
Thanks for the new references and comments! Vladimir, the information in Alberts et al.'s book comes from the paper "Initial sequencing and analysis of the human genome" by the International Human Genome Sequencing Consortium (published in Nature 409:860-921, February 2001, doi: 10.1038/35057062, it is freely available) Figure 35a (page 896) shows the diagram that has been used in the book to establish the claim that originated my question. Table 21 in the same page gives the number 145 bp for the average size of exons for protein-coding genes in humans. And, actually, Figures 35b and 35c concern the length of introns (for humans, worms and flies), but (in sharp contrast with the exon case) no clear conclusion about a common average size for all of them can be established, i.e., the average size of introns depends more on the species than the average size of exons...
I agree that the evidence of Pol II pausing is not as prominent at exon-intron boundaries as compared to start and termination sites as indicated from ChIP-Seq data.
However, I would think that higher nucleosome occupancy in exons could cause pausing or at least decrease elongation rates if you will.
*****
This paper attempts to link DNA methylation, G+C content, and nucleosome occupancy in exons with cotranscriptional splicing.
DNA-methylation effect on cotranscriptional splicing is dependent on GC architecture of the exon–intron structure
Genome Research March 2013
Sahar Gelfman, Noa Cohen, Ahuvi Yearim, and Gil Ast
This paper is an older paper by the same group that links G+C content and nucleosome occupancy in exons with cotranscriptional splicing.
Differential GC Content between Exons and Introns Establishes Distinct Strategies of Splice-Site Recognition
Maayan Amit, Maya Donyo, Dror Hollander, Amir Goren, Eddo Kim, Sahar Gelfman, Galit Lev-Maor, David Burstein2, Schraga Schwartz, Benny Postolsky, Tal Pupko, Gil Ast
@Clayton- excellent post. Thanks for the citations from the Ast lab, and yes I agree that nucleosomal occupancy might affect PolII elongation rates without causing pausing per se. Introns themselves are hardly passive players here, though. The fact that enhancers often occur within them will lead to decreased nucleosomal occupancies within introns due to TF binding. In addition, additional regulatory elements promoting association with the nuclear matrix or nuclear envelope (often AT rich, and with lower nucleosomal conent) are probably exluded from exons.
I was wondering if you guys knew the answers to my following questions. I am quite confused about long-range chromatin interactions that are associated with gene regulation.
Where do CTCF binding sites occur relative to the nuclear matrix? I have read that CTCF serves as an insulator to prevent promiscuous enhancer looping and are relevant in establishing chromatin topological domain boundaries.
And so what then is the relationship between chromatin topological domains and the AT-rich nuclear matrix theory, which, if I recall correctly, is located at the bases of chromatin loops?
Genuine enhancers have been identified and have been shown to be associated with chromatin characterized by certain histone modifications and DNaseI hypersensitivity. But if transcription factors bind enhancers, how can they be DNaseI hypersensitive? Are CTCF binding sites DNaseI hypersensitive?
And what is the purpose of nucleosomes occurring in phased arrays? This sort of leads back to Vladimir's old topic of discussion. Are they produced due to chromatin remodelling activities or is it simply due to a boundary effect that results in a statistical positioning pattern that just can't be reproduced in vitro? I think the former because one finds h2a.z in flanking nucleosomes surrounding CTCF sites, which always seems to pop up as a neighbor to something important. Also, these variants are associated decreased nucleosome stability. So, back to the original question: what is the relevance of these arrays in terms of chromatin structure and gene regulation?
Thanks everyone for this interesting discussion. I am wondering if we in certain diseases there is deletion of an exon and consequent alteration in nucleosome structure or shifting of nucleosome etc. Do we know any such disease?