Obtaining an accurate portrait of expression levels for coding and non-coding RNAs from small sample inputs carries potential for both the fulfillment of basic research objectives and the development of novel therapeutics and clinical diagnostic solutions. While next-generation sequencing (NGS) technology has contributed greatly to our understanding of cellular mRNA composition and dynamics, it has also revealed the existence of a vast assortment of non-coding RNAs that play diverse roles in processes such as gene expression regulation (Mattick and Makunin 2006; Kornienko et al. 2013), and are implicated in the development of various human diseases (Hindorff et al. 2009; Wapinski and Chang 2011). Whereas oligo(dT) priming is typically used to capture polyadenylated mRNA for NGS, random priming allows for capture of both coding and non-coding RNA and is often the only feasible option available for processing degraded RNA inputs, such as those obtained from formalin-fixed, paraffin-embedded (FFPE) samples or liquid biopsies. However, a significant challenge associated with random priming is that it also captures ribosomal and mitochondrial RNA molecules, which are typically present in great abundance but not of interest to researchers.
To enable NGS-based analysis of coding and non-coding RNA (i.e., total RNA-seq) from picogram inputs, we previously developed the SMARTer Stranded Total RNA-Seq Kit - Pico Input Mammalian (referred to below as “Pico v1”), which incorporates a novel technology that enables removal of ribosomal cDNA following cDNA synthesis (as opposed to direct removal of corresponding rRNA molecules prior to reverse transcription).
In keeping with our tradition of continuously refining and improving the performance of our products, we have subsequently developed the SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (referred to as “Pico v2”; see workflow in Figure 1). Features that distinguish the Pico v2 kit from its predecessor include superior sequencing performance—particularly for NextSeq and MiniSeq™ instruments that use two-channel SBS technology and for HiSeq 3000/4000—and a new PCR buffer formulation enabling a more user-friendly library-purification process.
Figure 1. Schematic of technology in the SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian. SMART technology is used in this ligation-free protocol to preserve strand-of-origin information. Random priming (represented as the green N6 Primer) allows the generation of cDNA from all RNA fragments in the sample, including rRNA. When the SMARTScribe Reverse Transcriptase (RT) reaches the 5' end of the RNA fragment, the enzyme’s terminal transferase activity adds a few non-templated nucleotides to the 3' end of the cDNA (shown as Xs). The carefully designed Pico v2 SMART Adapter (included in the SMART TSO Mix v2) base-pairs with the non-templated nucleotide stretch, creating an extended template to enable the RT to continue replicating to the end of the oligonucleotide. The resulting cDNA contains sequences derived from the random primer and the Pico v2 SMART Adapter used in the reverse transcription reaction. In the next step, a first round of PCR amplification (PCR1) adds full-length Illumina adapters, including barcodes. The 5' PCR Primer binds to the Pico v2 SMART Adapter sequence (light purple), while the 3' PCR Primer binds to sequence associated with the random primer (green). The ribosomal cDNA (originating from rRNA) is then cleaved by ZapR v2 in the presence of the mammalian-specific R-Probes v2. This process leaves the library fragments originating from non-rRNA molecules untouched, with priming sites available on both 5' and 3' ends for further PCR amplification. These fragments are enriched via a second round of PCR amplification (PCR2) using primers universal to all libraries. The final library contains sequences allowing clustering on any Illumina flow cell (see details in Figure 2).
The improved sequencing performance provided by the Pico v2 kit is due to reconfiguration of the resulting sequencing libraries (Figure 2). Libraries produced with the Pico v2 kit are generated such that bases corresponding to the random-priming site (located at 3' end of each RNA molecule) are read at the beginning of Read 1, while bases corresponding to nontemplated nucleotides added during the template-switching process are read at the beginning of Read 2. This is essentially a reverse orientation relative to libraries generated with the original version of the kit (in which bases associated with template-switching are read at the beginning of Read 1). The reconfigured libraries produced by the Pico v2 kit provide greater nucleotide diversity at the beginning of Read 1. This in turn eliminates the necessity of adding significant amounts of PhiX control library to the sequencing reaction to achieve a higher percentage of clusters passing filter (%PF), yielding more meaningful data per sequencing run and reducing sequencing costs.
Figure 2. Structural features of final libraries generated with the SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian. The adapters added using 5' PCR Primer HT and 3' PCR Primer HT contain sequences allowing clustering on any Illumina flow cell (P7 shown in light blue, P5 shown in red), Illumina TruSeq® HT indexes (Index 1 [i7] sequence shown in orange and Index 2 [i5] sequence shown in yellow), as well as the regions recognized by sequencing primers Read Primer 2 (Read 2, purple) and Read Primer 1 (Read 1, green). Read 1 generates sequences antisense to the original RNA, while Read 2 yields sequences sense to the original RNA (orientation of original RNA denoted by 5' and 3' in dark blue). The first three nucleotides of the second sequencing read (Read 2) are derived from the Pico v2 SMART Adapter (shown as Xs). These three nucleotides must be trimmed prior to mapping if performing paired-end sequencing.