The term “small non-coding RNA” broadly refers to diverse RNA species ~15–150 nucleotides (nt) in size that fulfill biological functions without being translated into proteins. While the involvement of small RNAs in cellular housekeeping processes such as transcript splicing and protein translation has been known since the 1960s, research over the past twenty years has revealed that small RNAs play vital roles in the regulation of gene expression, via both transcriptional and post-transcriptional mechanisms (Choudhuri, 2010).
Of the small RNAs involved in gene regulation, the most well-studied are microRNAs (miRNAs; ~22 nt in size), which facilitate post-transcriptional gene silencing by binding specific target mRNAs via base-pair complementarity, and either blocking translation or triggering transcript degradation (Ha and Kim, 2014). Another group of small RNAs that have been well characterized are Piwi-interacting RNAs (piRNAs), which silence transposons using a miRNA-like mechanism, in addition to inducing epigenetic modifications that influence the transcription of both transposons and protein-coding genes (Weick and Miska, 2014).
Tremendous progress has been made in the identification and characterization of small RNAs, and the current rate of discovery in this field suggests that much more remains to be elucidated. The development of next-generation sequencing (NGS) technology has proven instrumental to this progress, in part because it allows for identification of small RNAs without prior knowledge of their existence (in contrast with array-based or qPCR methods), and can discriminate between small RNA variants that differ by a single nucleotide. However, small RNA-seq library preparation is not without its challenges, which may include time-consuming enrichment steps prior to cDNA synthesis, and sample misrepresentation due to biases in small RNA end modification, reverse transcription, and PCR amplification.
A major source of bias in small RNA-seq data involves the manner in which small RNAs are captured during library construction (reviewed in Raabe et al., 2014). The most common method involves using a T4 RNA ligase (T4Rnl) to attach adapters to RNA 5′ and 3′ ends. However, T4Rnl exhibits sequence-specific substrate preferences, such that certain combinations of adapters and small RNAs are more readily incorporated than others, leading to sample misrepresentation in small RNA-seq libraries (Jayaprakash et al., 2011; Hafner et al., 2011). An alternative to adapter ligation is RNA 3′ polyadenylation, in which a poly(A) polymerase is used to add a stretch of repeated nucleotides to RNA 3′ ends. In contrast with adapter ligation, RNA polyadenylation occurs in a sequence-independent manner. While RNA 3′ polyadenylation was previously reported to generate small RNA-seq libraries (Berezikov, et al., 2006), this approach involved ligation of RNA 5′ ends, and was still susceptible to sequence-specific biases.
Here we present data from the SMARTer smRNA-Seq Kit for Illumina, which employs RNA 3′ polyadenylation and SMART (Switching Mechanism at the 5′ end of RNA Template) technology (Chenchik et al., 1998) to generate sequencing libraries in a ligation-independent manner. Rather than ligating adapters to small RNAs, this method incorporates adapters at both ends of nascent cDNAs during first-strand synthesis (Figure 1). Following polyadenylation of input RNA, first-strand cDNA synthesis is dT-primed (3′ smRNA dT Primer) and performed by the MMLV-derived PrimeScript Reverse Transcriptase (RT), which adds non-templated nucleotides upon reaching the 5′ end of each RNA template. The SMART smRNA Oligo then anneals to the non-templated nucleotides, and serves as a template for the incorporation of an additional sequence of nucleotides to the first-strand cDNA by the RT. Sequences incorporated at the 5′ and 3′ ends of each cDNA molecule serve as primer-annealing sites for PCR, which is performed using oligos that incorporate Illumina-compatible adapters and indexes during library amplification.