A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogenous ribonucleoprotein C on cleavage and polyadenylation


Alternative polyadenylation (APA) is a general mechanism of transcript diversification in mammals, which has been recently linked to proliferative states and cancer. Different 3’ untranslated region (3’ UTR) isoforms interact with different RNA-binding proteins (RBPs), which modify the stability, translation, and subcellular localization of the corresponding transcripts. Although the heterogeneity of pre-mRNA 3’ end processing has been established with high-throughput approaches, the mechanisms that underlie systematic changes in 3’ UTR lengths remain to be characterized. Through a uniform analysis of a large number of 3’ end sequencing data sets we have uncovered 18 signals, 6 of which novel, whose positioning with respect to pre-mRNA cleavage sites indicates a role in pre-mRNA 3’ end processing in both mouse and human. With 3’ end sequencing we have demonstrated that the heterogeneous ribonucleoprotein C (HNRNPC), which binds the poly(U) motif whose frequency also peaks in the vicinity of polyadenylation (poly(A)) sites, has a genome-wide effect on poly(A) site usage. HNRNPC-regulated 3’ UTRs are enriched in ELAV-like RNA binding protein 1 (ELAVL1) binding sites and include those of the CD47 gene, which participate in the recently discovered mechanism of 3’ UTR-dependent protein localization (UDPL). Our study thus establishes an up-to-date, high-confidence catalog of 3’ end processing sites and poly(A) signals and it uncovers an important role of HNRNPC in regulating 3’ end processing. It further suggests that U-rich elements mediate interactions with multiple RBPs that regulate different stages in a transcript’s life cycle.

Genome Research