The term “small non-coding RNA” broadly refers to diverse RNA species ~15–150 nucleotides (nt) in size that fulfill biological functions without being translated into proteins. While the involvement of small RNAs in cellular housekeeping processes such as transcript splicing and protein translation has been known since the 1960s, research over the past twenty years has revealed that small RNAs play vital roles in the regulation of gene expression, via both transcriptional and post-transcriptional mechanisms (Choudhuri, 2010).
Of the small RNAs involved in gene regulation, the most well-studied are microRNAs (miRNAs; ~22 nt in size), which facilitate post-transcriptional gene silencing by binding specific target mRNAs via base-pair complementarity, and either blocking translation or triggering transcript degradation (Ha and Kim, 2014). Another group of small RNAs that have been well characterized are Piwi-interacting RNAs (piRNAs), which silence transposons using a miRNA-like mechanism, in addition to inducing epigenetic modifications that influence the transcription of both transposons and protein-coding genes (Weick and Miska, 2014).
Tremendous progress has been made in the identification and characterization of small RNAs, and the current rate of discovery in this field suggests that much more remains to be elucidated. The development of next-generation sequencing (NGS) technology has proven instrumental to this progress, in part because it allows for identification of small RNAs without prior knowledge of their existence (in contrast with array-based or qPCR methods), and can discriminate between small RNA variants that differ by a single nucleotide. However, small RNA-seq library preparation is not without its challenges, which may include time-consuming enrichment steps prior to cDNA synthesis, and sample misrepresentation due to biases in small RNA end modification, reverse transcription, and PCR amplification.
A major source of bias in small RNA-seq data involves the manner in which small RNAs are captured during library construction (reviewed in Raabe et al., 2014). The most common method involves using a T4 RNA ligase (T4Rnl) to attach adapters to RNA 5′ and 3′ ends. However, T4Rnl exhibits sequence-specific substrate preferences, such that certain combinations of adapters and small RNAs are more readily incorporated than others, leading to sample misrepresentation in small RNA-seq libraries (Jayaprakash et al., 2011; Hafner et al., 2011). An alternative to adapter ligation is RNA 3′ polyadenylation, in which a poly(A) polymerase is used to add a stretch of repeated nucleotides to RNA 3′ ends. In contrast with adapter ligation, RNA polyadenylation occurs in a sequence-independent manner. While RNA 3′ polyadenylation was previously reported to generate small RNA-seq libraries (Berezikov, et al., 2006), this approach involved ligation of RNA 5′ ends, and was still susceptible to sequence-specific biases.
Here we present data from the SMARTer smRNA-Seq Kit for Illumina, which employs RNA 3′ polyadenylation and SMART (Switching Mechanism at the 5′ end of RNA Template) technology (Chenchik et al., 1998) to generate sequencing libraries in a ligation-independent manner. Rather than ligating adapters to small RNAs, this method incorporates adapters at both ends of nascent cDNAs during first-strand synthesis (Figure 1). Following polyadenylation of input RNA, first-strand cDNA synthesis is dT-primed (3′ smRNA dT Primer) and performed by the MMLV-derived PrimeScript Reverse Transcriptase (RT), which adds non-templated nucleotides upon reaching the 5′ end of each RNA template. The SMART smRNA Oligo then anneals to the non-templated nucleotides, and serves as a template for the incorporation of an additional sequence of nucleotides to the first-strand cDNA by the RT. Sequences incorporated at the 5′ and 3′ ends of each cDNA molecule serve as primer-annealing sites for PCR, which is performed using oligos that incorporate Illumina-compatible adapters and indexes during library amplification.

Figure 1. Schematic of technology used by the SMARTer smRNA-Seq Kit for Illumina. SMART technology is used in a ligation-free workflow to generate sequencing libraries for Illumina platforms. Input RNA is first polyadenylated in order to provide a priming sequence for an oligo(dT) primer. cDNA synthesis is primed by the 3′ smRNA dT Primer, which incorporates an adapter sequence (green) at the 5′ end of each first-strand cDNA molecule. When the MMLV-derived PrimeScript Reverse Transcriptase (RT) reaches the 5′ end of each RNA template, it adds non-templated nucleotides which are bound by the SMART smRNA Oligo-enhanced with locked nucleic acid (LNA) technology for greater sensitivity. In the template-switching step, PrimeScript RT uses the SMART smRNA Oligo as a template for the addition of a second adapter sequence (purple) to the 3′ end of each first-strand cDNA molecule. In the next step, full-length Illumina adapters (including indexes for sample multiplexing) are added during PCR amplification. The Forward PCR Primer binds to the sequence added by the SMART smRNA Oligo, while the Reverse PCR Primer binds to the sequence added by the 3′ smRNA dT Primer. Resulting library cDNA molecules include adapters required for clustering on an Illumina flow cell (P5 shown in light blue, P7 shown in red), Illumina TruSeq® HT indexes (Index 2 [i5] shown in orange, Index 1 [i7] shown in yellow), and regions bound by the Read Primer 1 or Read Primer 2 sequencing primers (shown in purple and green, respectively). Note that adapters included in the final library add 153 bp to the size of RNA-derived insert sequences.
Following PCR and column-based purification of PCR products, library profiles are analyzed using an Agilent Bioanalyzer (or similar device) to confirm that small RNA sequences were successfully incorporated and amplified. The combined length of 5′ and 3′ library adapters is 153 bp. Therefore, library molecules containing miRNA-derived sequences typically yield a discrete peak in the ~172–178 bp size range in resulting electropherograms (Figure 2, Panels A and B). For most applications, a size selection step is required; for example, libraries generated from total RNA typically include a substantial amount of large molecular weight products and yield a peak at ~1,000 bp (Figure 2, Panel A) due to dT-primed capture of mRNAs (which are naturally polyadenylated) during cDNA synthesis. For libraries that require size selection, there are two options: a gel-free, bead-based approach that retains library molecules including inserts ≤150 bp in size (Figure 2, Panel C), or size selection with the BluePippin system, which allows for enrichment of specific small RNA species (Figure 2, Panel D). Following size selection and validation, libraries are ready for sequencing on an Illumina platform.

Figure 2. Small RNA-seq library profiles before and after size selection. Libraries were generated using the SMARTer smRNA-Seq Kit for Illumina with the indicated inputs and cycling parameters, and analyzed on an Agilent 2100 Bioanalyzer. Peaks labeled "LM" and "UM" correspond to DNA reference markers included in each analysis. Panel A. Typical result for a library generated from total RNA, prior to size selection. The peak at 176 bp corresponds with the predicted combined size of miRNAs plus adapters. Panel B. Blowup of the boxed region in Panel A, with individual peaks labeled by size (bp). Panel C. Typical result following gel-free, bead-based size selection of the library profiled in Panels A and B. Visible peaks fall within the size range of ~153–300 bp, which corresponds with inserts of 0–150 bp. Panel D. Typical result following BluePippin size selection, which affords greater stringency than the bead-based approach. The peak at 175 bp corresponds with the predicted combined size of miRNAs plus adapters.