Following the sequencing of the human genome, the next major hurdle was to define the genes it encoded. Studies probing global transcriptional activity yielded a surprising result: the mammalian genome is pervasively transcribed with nearly the entire genome being transcribed into RNA under some circumstance. As the numbers of non-coding transcripts increased, so too did concerns that many of the transcripts were simply 'transcriptional noise' without a biological function. The reasons for concern included the observation that many of the transcripts are expressed at incredibly low levels and exhibit no evolutionary conservation. Yet, there were a handful of functional large ncRNAs that played important biological roles in genome regulation; a dramatic example is the XIST ncRNA which coats and silences the entire X chromosome. It was unclear whether these few examples represented quirky exceptions, or exemplified an entire class of functional large ncRNAs.
We reasoned that there were likely to be some bona fide functional large ncRNAs. The challenge was how to systematically identify them. Taking a cue from protein coding genes, we looked at a chromatin signature of active transcription. This signature, consisting of H3K4me3 and H3K36me3, marks all known actively transcribed protein-coding and non-coding genes. We reasoned that by identifying K4-K36 domains that lay outside known protein-coding gene loci, we would be able to systematically discover functional lincRNAs.
To do this, we developed a computational algorithm that identifies K4-K36 domains from genome-wide chromatin datasets and excluded those that overlapped any annotated gene. This analysis yielded a set of ~3500 unannotated intergenic K4-K36 domains in the mouse and human genomes. Using tiling microarrays, we demonstrated that the vast majority of the intergenic K4-K36 domains produced multi-exonic RNAs with many canonical features of RNA Polymerase II transcription. These transcripts had little ability to encode a functional protein of any size. Importantly, these lincRNAs demonstrated distinct but clear patterns of evolutionary conservation across 29 mammalian genomes, providing evidence that the lincRNAs were likely to be biologically functional.
Despite the identification of thousands of large ncRNAs it remained to be determined how these RNAs function. Determining the functional role of ncRNAs requires direct perturbation experiments such as loss of function and gain of function yet without a clear hypothesis of what phenotype to look for proved difficult in characterizing the functions of most ncRNAs. To more globally classify putative functional roles of lincRNAs we developed a 'guilt-by-association' method to systematically associate functions based on correlation of gene expression. This method associates ncRNAs with biological processes based on common expression patterns across cell types and tissues, and identifies groups of ncRNAs associated with specific cellular processes. Utilizing this approach allowed us to classify hundreds of ncRNAs across diverse biological processes such as stem cell pluripotency, immune response, neural processes, and cell cycle regulation.
While such correlations do not prove that ncRNAs function in these processes, they provide hypotheses for targeted loss-of-function experiments. To test these predictions, we performed targeted perturbations to determine the role of specific ncRNAs in the associated classes. As an example, we predicted 39 ncRNAs involved in the p53-mediated DNA damage response and showed that one of these candidates, termed lincRNA-p21, is a direct target of p53. Perturbation of this ncRNA affected the apoptosis response upon exposure to DNA damage. Another lincRNA, lincEnc1, was predicted to have a roll in cell-cycle regulation in ESCs and shown in a distinct study to affect proliferation in ESCs. Overall, our 'guilt-by-association' approach implicated lincRNAs in diverse biological processes.
Having identified a class of lincRNAs, we next sought to determine the function of these lincRNAs. A critical pre-requisite for comprehensive experimental studies of lincRNA function is defining their precise sequences. Discovering lincRNA gene structures required reconstructing a mammalian transcriptome from scratch, a significant computational challenge as read lengths are significantly shorter than the size of the original RNA. To address this challenge, we developed a statistical method, called Scripture, which was one of the first methods to accurately reconstruct a mammalian transcriptome without prior gene models. We performed RNA-Seq on mouse ES cells, mouse lung fibroblasts, and mouse neural progenitor cells. Importantly, Scripture identified the full-length gene structure of the vast majority of expressed lincRNA genes.
Using the lincRNA sequences, we set out to determine the functional role of lincRNAs using loss-of-function experiments. Unlike correlation analysis, these perturbation-based experiments provide evidence for the functional role of a ncRNA. We focused on mouse ES cells because the signalling, transcriptional, and chromatin regulatory networks controlling pluripotency have been well characterized providing an ideal system to determine how lincRNAs integrate into the molecular circuitry of the cell. We designed, cloned, and validated shRNAs targeting all lincRNAs expressed in mouse ES cells. To determine whether lincRNAs play an important function in the cell, we studied the effects of knocking down each lincRNA on global transcription. Upon knockdown, virtually all of the lincRNAs showed a significant impact on gene expression demonstrating that the lincRNAs are functionally important in the cell.
Having determined the role of lincRNAs in regulating gene expression programs, we sought to determine whether lincRNAs play a role in regulating ES cell state. Regulation of the ES cell state involves two components, maintaining the pluripotency program and repressing differentiation programs. To determine whether lincRNAs play a role in the maintenance of the pluripotency program, we studied the effects of lincRNA knockdown on the expression of Nanog, a key transcription factor that is required to establish and uniquely marks the pluripotent state. We identified 26 lincRNAs that had major effects on endogenous Nanog levels along with other markers of the pluripotent state.
Next, we sought to determine whether lincRNAs play a role in repressing differentiation programs. To do this, we compared the overall gene expression patterns resulting from knockdown of the lincRNAs to gene expression patterns resulting from induced differentiation of ESCs. We identified 30 lincRNAs whose knockdown produced expression patterns similar to differentiation into specific lineages. Together, these results demonstrate that many lincRNAs play important roles in regulating the ES cell state, including maintaining the pluripotent state and repressing specific differentiation lineages.
Having demonstrated the functional importance of lincRNAs in the cell, we wanted to determine how lincRNAs affect gene expression. Motivated by the XIST and HOTAIR ncRNAs, which interact with the polycomb complex, we tested whether lincRNAs more generally associate with the polycomb complex. We found that ~20% of expressed human lincRNAs and ~10% of the ESC lincRNAs physically associate with the polycomb complex. Next, we systematically analyzed chromatin-modifying proteins that have been shown to play critical roles in ESCs. We screened antibodies against 28 chromatin complexes and identified 11 additional chromatin complexes that are strongly and reproducibly associated with the ESC lincRNAs. These chromatin complexes are involved in 'reading', 'writing', and 'erasing' histone modifications. Altogether, we found that ~30% of the ESC lincRNAs are associated with at least one of these chromatin complexes. Interestingly, many of the lincRNAs physically interacted with multiple chromatin complexes.
Our results suggests a model whereby a distinct set of lincRNAs is transcribed in a cell type and interacts with ubiquitous regulatory protein complexes to give rise to cell-type-specific RNA-protein complexes that coordinate cell-type specific gene expression programs. In this model, lincRNAs provide regulatory specificity to ubiquitous regulatory protein complexes. Because many of the lincRNAs interact with multiple different protein complexes, one hypothesis is that they act as cell-type specific 'flexible scaffolds' to bring together protein complexes into larger functional units. In this model, RNA contains discrete domains that interact with specific protein complexes. These RNAs, through a combination of domains, bring specific regulatory components into proximity resulting in the formation of a unique functional complex. While a model for lincRNAs acting as 'flexible scaffolds' is an attractive hypothesis, it remains to be tested.
Mammalian genomes encode thousands of large noncoding RNAs (lncRNAs), many of which regulate gene expression, interact with chromatin regulatory complexes, and are thought to play a role in localizing these complexes to target loci across the genome. A paradigm for this class of lncRNAs is Xist, which orchestrates mammalian X-chromosome inactivation (XCI) by coating and silencing one X-chromosome in females. Despite the central role of RNA-chromatin interactions in this process, the mechanisms by which Xist localizes to DNA and spreads across the X-chromosome remain unknown. To determine the mechanisms of lncRNA localization to chromatin, we developed a method termed RNA Antisense Purification (RAP), which enables high-resolution mapping of lncRNA localization. We explored Xist localization during initiation and maintenance of XCI. We found that during the maintenance of XCI, Xist localizes broadly across the entire X-chromosome, lacking focal binding sites. To gain insights into how Xist establishes this broad localization pattern during the initiation of XCI, we examined Xist localization upon activation in mouse embryonic stem (ES) cells. We find that Xist initially localizes to early distal sites across the X-chromosome. Interestingly, these sites are not defined by specific sequences. Rather, we found that the Xist RNA identifies these regions using a proximity-guided search mechanism, exploiting the three-dimensional conformation of the X-chromosome to spread to distal regions in close spatial proximity to the Xist genomic locus. Indeed, altering the conformational context of the Xist transcription locus by incorporating the Xist RNA into a transgenic location was sufficient to reposition its early target locations to reflect the 3-dimensional proximity contacts of the new transgene integration site. Taken together, these results demonstrate that spatial proximity to the Xist transcription locus guides early Xist RNA localization.
Although early Xist localization correlated strongly with proximity contact frequency across the chromosome, we noticed several large chromosomal domains where Xist occupancy was lower than would be expected based on the observed proximity contacts. These regions were enriched for actively transcribed genes in ES cells. Initially, Xist is excluded from actively transcribed genes and accumulates on the periphery of regions containing many active genes. Based on this, we hypothesized that the ability of Xist to spread across active genes is dependent on its ability to silence gene expression. To test this, we explored the localization of Xist in the absence of the A-repeat, the RNA domain of Xist responsible for interacting with PRC2 and silencing gene expression. In the absence of the A-repeat, Xist showed a strong depletion over active gene-dense regions, suggesting that active gene-dense regions loop out of the Xist compartment. Together, these observations suggest that the A-repeat may allow Xist to access and spread across active gene-dense regions by modifying chromatin and altering chromosome architecture to reposition these regions into the Xist compartment.
Our data suggest a model for how Xist can integrate its two functions – localization to DNA and silencing of gene expression – to coat the entire X-chromosome. In this model, Xist exploits three-dimensional conformation to identify and localize to initial target sites and leads to repositioning of these regions into the growing Xist compartment. These structural changes effectively pull new regions of the chromosome closer to the Xist genomic locus, allowing Xist RNA to spread to these newly accessible sites by proximity transfer. Since Xist is actively transcribed throughout XCI, it will remain spatially close to other actively transcribed genes – the precise targets required for propagating Xist-mediated silencing. This process – involving searching in three dimensions, modifying chromatin state and chromosome architecture, and spreading to newly accessible locations – would explain how Xist can silence the entire X-chromosome reproducibly, such that silencing occurs in each cell. This coordinated interplay between lncRNA localization and chromosome conformation is likely to have broader implications beyond Xist. Other lncRNAs may similarly take advantage of chromosome conformation to identify target sites in close spatial proximity, which could even reside on other chromosomes. This search strategy capitalizes on the unique ability of a lncRNA to act while tethered to its transcription locus, in contrast to an mRNA which requires export and translation to carry out its function. Because chromosome conformation is nonrandom, a proximity-guided search strategy might explain how low-abundance lncRNAs can reliably identify their genomic targets. Upon binding these targets, lncRNAs may in turn alter chromosome conformation through their interactions with various chromatin regulatory complexes. These alterations would allow localization to and regulation of previously inaccessible chromatin domains, and might even establish local nuclear compartments that contain the co-regulated targets of lncRNA complexes.
This localization strategy capitalizes on the abilities of a lncRNA to act while tethered to its transcription locus and to interact with chromatin regulatory proteins to modify chromatin structure. As such, this coordinated interplay between lncRNA localization and chromosome conformation is likely to have broader implications beyond Xist. Other lncRNAs may similarly take advantage of chromosome conformation to identify target sites in close spatial proximity, which could even reside on other chromosomes. This search strategy capitalizes on the unique ability of a lncRNA to act while tethered to its transcription locus, in contrast to an mRNA which requires export and translation to carry out its function. Because chromosome conformation is nonrandom, a proximity-guided search strategy might explain how low-abundance lncRNAs can reliably identify their genomic targets. Upon binding these targets, lncRNAs may in turn alter chromosome conformation through their interactions with various chromatin regulatory complexes. These alterations would allow localization to and regulation of previously inaccessible chromatin domains, and might even establish local nuclear compartments that contain the co-regulated targets of lncRNA complexes.