SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian: improved ease of use and sequencing performance for whole transcriptome analysis of high-quality or degraded samples
Superior sequencing performance Reconfigured sequencing libraries perform well on all Illumina platforms (including NextSeq® and HiSeq® 3000/4000), achieving a high %PF without the addition of PhiX.
Obtaining an accurate portrait of expression levels for coding and non-coding RNAs from small sample inputs carries potential for both the fulfillment of basic research objectives and the development of novel therapeutics and clinical diagnostic solutions. While next-generation sequencing (NGS) technology has contributed greatly to our understanding of cellular mRNA composition and dynamics, it has also revealed the existence of a vast assortment of non-coding RNAs that play diverse roles in processes such as gene expression regulation (Mattick and Makunin 2006; Kornienko et al. 2013), and are implicated in the development of various human diseases (Hindorff et al. 2009; Wapinski and Chang 2011). Whereas oligo(dT) priming is typically used to capture polyadenylated mRNA for NGS, random priming allows for capture of both coding and non-coding RNA and is often the only feasible option available for processing degraded RNA inputs, such as those obtained from formalin-fixed, paraffin-embedded (FFPE) samples or liquid biopsies. However, a significant challenge associated with random priming is that it also captures ribosomal and mitochondrial RNA molecules, which are typically present in great abundance but not of interest to researchers.
To enable NGS-based analysis of coding and non-coding RNA (i.e., total RNA-seq) from picogram inputs, we previously developed the SMARTer Stranded Total RNA-Seq Kit - Pico Input Mammalian (referred to below as “Pico v1”), which incorporates a novel technology that enables removal of ribosomal cDNA following cDNA synthesis (as opposed to direct removal of corresponding rRNA molecules prior to reverse transcription).
In keeping with our tradition of continuously refining and improving the performance of our products, we have subsequently developed the SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian (referred to as “Pico v2”; see workflow in Figure 1). Features that distinguish the Pico v2 kit from its predecessor include superior sequencing performance—particularly for NextSeq and MiniSeq™ instruments that use two-channel SBS technology and for HiSeq 3000/4000—and a new PCR buffer formulation enabling a more user-friendly library-purification process.
The improved sequencing performance provided by the Pico v2 kit is due to reconfiguration of the resulting sequencing libraries (Figure 2). Libraries produced with the Pico v2 kit are generated such that bases corresponding to the random-priming site (located at 3' end of each RNA molecule) are read at the beginning of Read 1, while bases corresponding to nontemplated nucleotides added during the template-switching process are read at the beginning of Read 2. This is essentially a reverse orientation relative to libraries generated with the original version of the kit (in which bases associated with template-switching are read at the beginning of Read 1). The reconfigured libraries produced by the Pico v2 kit provide greater nucleotide diversity at the beginning of Read 1. This in turn eliminates the necessity of adding significant amounts of PhiX control library to the sequencing reaction to achieve a higher percentage of clusters passing filter (%PF), yielding more meaningful data per sequencing run and reducing sequencing costs.
Results
Improved sequencing performance with the Pico v2 kit
Even with an industry-leading product such as the SMARTer Stranded Total RNA-Seq Kit - Pico Input Mammalian, there is always room for improvement. As described above, a limitation of the Pico v1 kit is that it generates sequencing libraries with relatively low nucleotide diversity at the beginning of Read 1. This low nucleotide diversity results from the nontemplated nucleotides that facilitate adapter binding and incorporation via the template-switching mechanism (see Figure 1, above). Having low nucleotide diversity at the beginning of Read 1 poses challenges for sequencing because the first 25 sequencing cycles are used to determine which clusters pass filtering, and is particularly problematic on platforms using two-channel SBS technology (e.g., NextSeq and MiniSeq). Challenges associated with low library diversity can be mitigated by spiking in a suitable amount of PhiX control library—we recommend adding PhiX at concentrations as high as 30% depending on the platform—however this reduces the amount of relevant sequencing reads generated per sequencing run, consuming time and increasing sequencing costs.
To demonstrate the improved sequencing performance of the Pico v2 kit vs. the original kit, sequencing libraries were generated from various inputs of total RNA using each kit according to the corresponding user manuals and sequenced on both NextSeq and MiniSeq platforms (Figure 3). Whereas libraries generated with the Pico v1 kit yielded %PF values of 81.3% and 77.1% and quantities of reads passing filter that met or approached established benchmarks for NextSeq and MiniSeq instruments, respectively, libraries generated with the Pico v2 kit achieved %PF values of 88.3% and 90.5%, with quantities of reads passing filter that exceeded performance specifications for each platform by a considerable margin. These results demonstrate that the Pico v2 kit provides superior sequencing performance relative to Pico v1.
Improved ease of use during library purification with the Pico v2 kit
As with many NGS library prep kits, Pico v1 and Pico v2 both employ magnetic AMPure beads for multiple library purification steps. Customer feedback regarding the Pico v1 kit indicated that formation, drying, and resuspension of bead pellets during library purification was a common pain point in the kit workflow. To address this, we optimized the PCR buffer for greater compatibility with AMPure bead purification while maintaining its performance for PCR. The new buffer formulation, SeqAmp CB PCR Buffer (CB = “compatible with beads”), allows for the beads to separate more quickly, yielding a tighter bead pellet that dries more uniformly and is easier to resuspend (Figure 4).
Comparison of sequencing metrics for FFPE samples processed with Pico v1 and Pico v2
To further demonstrate the enhanced capabilities of the Pico v2 kit relative to its predecessor, particularly for analysis of challenging samples, sequencing libraries were generated from 1-ng and 10-ng inputs of human lung total RNA (DV200 = 68%) obtained from FFPE tissue and sequenced on a NextSeq 500 instrument. In comparison with Pico v1, library yields from the Pico v2 kit were considerably greater for both input amounts (Figure 5A). For the 1-ng input amount, sequencing data for the Pico v2 library identified thousands more transcripts than data for the Pico v1 library, whereas numbers of transcripts identified were comparable at the 10-ng input level. In contrast with the data generated using Pico v1, numbers of transcripts identified for 1-ng and 10-ng inputs using Pico v2 were very similar, suggesting that Pico v2 offers superior sensitivity for detection of low-abundance transcripts in low-input samples.
Proportions of reads mapping to various RNA species were comparable across kits and input amounts, however libraries generated with the Pico v2 kit yielded a lower proportion of reads mapping to rRNA and mtRNA relative to the Pico v1 libraries. For both input amounts, the duplicate rate was lower for Pico v2 libraries, and for the 10-ng input in particular the duplicate rate was ~50% lower. Comparison of transcript expression levels across input amounts for each version of the kit indicated that the correlation was much stronger for the Pico v2 libraries vs. the Pico v1 libraries (Pearson = 0.96 and Spearman = 0.83 vs. Pearson = 0.91 and Spearman = 0.67, Figure 5B). These results suggest that Pico v2 outperforms Pico v1 by providing higher library yields, improved sensitivity, reduced representation of rRNA and mtRNA sequences, and a stronger correlation in gene expression measurements across input amounts.
Sequencing Alignment Metrics for 1-ng and 10-ng Inputs of Total RNA
Kit
Pico v1
Pico v2
Pico v1
Pico v2
RNA source
Human lung FFPE total RNA
Input amount (ng)
1
10
Library yield (ng/µl)
0.4
3.2
4.4
21.7
Number of reads (millions)
8.25 (paired-end reads)
Number of transcripts >1 FPKM
8,481
9,916
10,096
9,878
Number of transcripts >0.1 FPKM
14,347
19,594
20,724
21,325
Proportion of reads (%)
Exonic
15.9
15.0
16.4
14.9
Intronic
50.5
53.9
54.9
57.9
Intergenic
12.1
12.1
12.8
12.9
rRNA
15.0
13.3
10.3
9.2
Mitochondrial
1.3
0.9
1.5
0.7
Duplicate rate (%)
79.9
67.2
60.1
34.3
Summary
To better serve the scientific community, we have incorporated several design improvements into the SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian that provide superior sequencing performance and a more user-friendly workflow relative to its predecessor. Sequencing libraries generated with the Pico v2 kit demonstrate a higher %PF rate relative to libraries produced with the original kit while requiring little or no addition of PhiX. This improvement will allow researchers to extract more meaningful data from each sequencing run, saving time and conserving resources. The Pico v2 kit also outperforms the Pico v1 kit by providing higher library yields, improved sensitivity, and greater consistency across input amounts, even for challenging samples obtained from FFPE tissue. Optimization of the PCR buffer included with the kit has streamlined the various bead-purification steps, which should also help reduce operational costs for labs performing RNA-seq at high throughput.
Methods
Comparison of pass-filter rates for Pico v1 and Pico v2 libraries
To compare the %PF rates for libraries generated with the Pico v1 and Pico v2 kits, sequencing libraries were generated from varying input types and amounts of total RNA and pooled together. Pools of sequencing libraries were run on the NextSeq 500 using the NextSeq 500/550 Mid Output Kit v2 (150 cycles; Cat. # FC-404-2001) with 2 x 75-bp paired-end reads, and on the MiniSeq using the MiniSeq High Output Kit (75 cycles; Cat. # FC-420-1001) with 2 x 38-bp paired-end reads.
Comparison of sequencing metrics for FFPE samples
To evaluate the performance of the Pico v1 and Pico v2 kits with FFPE samples, total RNA was extracted from a 5-µm curl of FFPE human lung tissue (Cureline) using a NucleoSpin totalRNA FFPE kit (Takara Bio, Cat. # 740982.10). Prior to library preparation, RNA integrity was evaluated on an Agilent Bioanalyzer using an Agilent RNA 6000 Pico Kit (Cat. # 5067-1513), yielding a DV200 value of 68%. Libraries were generated from the extracted RNA using both the Pico v1 and Pico v2 kits without additional RNA fragmentation (protocol option 2). Libraries were sequenced on a NextSeq 500 using the NextSeq 500/550 Mid Output Kit v2 and resulting sequencing datasets were downsampled to 8.25 million paired-end reads.
Sequence analysis
Reads from all libraries were trimmed and mapped to mammalian rRNA and the human mitochondrial genomes using CLC Genomics Workbench. The remaining reads were subsequently mapped using CLC to the human (hg19) genomes with RefSeq annotation. All percentages shown, including the number of reads that map to introns, exons, or intergenic regions, are percentages of the total reads in the library. The number of transcripts identified in each library was determined by the number of transcripts with an FPKM greater than or equal to 1 or 0.1, as shown in Figure 5A. Scatter plots were generated using FPKM values from CLC mapping to the transcriptome. To identify transcripts found in only one replicate (dropouts), 0.001 was added to each value prior to graphing.
References
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U. S. A.106, 9362–7 (2009).
Kornienko, A. E., Guenzl, P. M., Barlow, D. P. & Pauler, F. M. Gene regulation by the act of long non-coding RNA transcription. BMC Biol.11, 59 (2013).
Mattick, J. S. & Makunin, I. V. Non-coding RNA. Hum Mol Genet15 Spec No, R17–29 (2006).
Wapinski, O. & Chang, H. Y. Long noncoding RNAs and human disease. Trends Cell Biol.21, 354–361 (2011).