Enabling long-read RNA sequencing from low-input samples
The SMART-Seq mRNA Long Read (SSmRNA LR) system efficiently and reliably makes libraries for long-strand RNA sequencing from low-input amounts (10 pg compared to hundreds of nanograms of material needed for other workflows).
Generate full-length cDNA libraries with sample barcoding
Detect isoforms up to 8 kb, determine strand orientation, obtain accurate measurements of transcript expression, and identify gene fusions
Long-read RNA sequencing (RNA-seq) offers unique advantages for resolving full-length transcript isoforms—enabling de novo transcriptome analysis, discovering fusion transcripts, and identifying novel structural variants—but typically requires hundreds of nanograms of sample. Requiring large amounts of RNA can make this technique difficult for certain sample types, such as tumor tissue, sorted cancer cells, and primary cells.
SMART (Switching Mechanism At the 5′ end of RNA Template) technology provides an efficient, sensitive method for generating full-length cDNA. The method is especially well-suited for very low amounts of input RNA, making it ideal for studies with limited sample material. However, until recently, analysis of intact full-length cDNA molecules has been severely limited due to the use of short-read sequencing approaches, which require fragmentation.
SMART-Seq mRNA Long Read (SSmRNA LR) offers the first commercial solution for sequencing intact, full-length cDNA molecules from low-input samples using long-read sequencing technologies, unlocking new opportunities for biomarker discovery. With the ability to generate average read lengths (N50) of ~2 kb and detect full-length transcripts over 8 kb, it can prepare libraries for sequencing full-length transcripts or complete cDNA molecules from end to end, retaining positional information and eliminating the need for bioinformatics assembly, which is required by short-read sequencing technologies. The kit leverages SMART technology in combination with a barcoding strategy to support long-read library preparation for up to 96 samples from as little as 10 pg of input RNA or even single cells for sequencing on the Oxford Nanopore Technologies (ONT) platform (Figure 1).
Figure 1. Library preparation workflow for the SSmRNA LR kit. First-strand cDNA synthesis is primed by the SMART-Seq LR Oligo-dt primer and performed by Moloney murine leukemia virus (MMLV)-derived reverse transcriptase (RT). Upon reaching the 5’ end of each mRNA molecule, the RT adds nontemplated nucleotides to the first-strand cDNA, facilitating hybridization with a template-switching oligonucleotide (TSO). In the template-switching step, the RT uses the remainder of the SMART-Seq LR TSO as a template for the incorporation of an additional sequence on the end of the first-strand cDNA. The first-strand cDNA is then barcoded and amplified by the first round of PCR (PCR1); after cleanup of the PCR1 product, a second round of PCR enriches for barcoded fragments. Samples are pooled and end-prepped, and sequencing adapters are ligated using the Ligation Sequencing Kit V14 (Oxford Nanopore Technologies or ONT). The SMART-Seq Library Prep Kit is used to generate sequencing-ready libraries. After sequencing, samples are basecalled and demultiplexed using Guppy (ONT). Downstream analysis is performed using minimap2 (Github), SAMtools, and Salmon.
In this technical note, we demonstrate that SSmRNA LR technology offers a sensitive and streamlined workflow for generating full-length cDNA libraries with sample barcoding to enable long-read RNA-seq from minimal sample inputs or single cells. The kit can reliably generate full-length cDNA fragments with average read lengths of 1,200–1,400 nucleotides (nt) or N50 of 2 kb, enabling detection of full-length transcripts. The kit delivers excellent sensitivity, detecting thousands of genes and transcripts with high reproducibility and accuracy. This technology enables the detection of full-length isoforms up to 8 kb with a high percentage of full-length reads, determination of strand orientation, accurate measurement of transcript expression, and identification of gene fusions. It is also compatible with direct cell inputs, demonstrating excellent performance from a single-cell sample up to a thousand cells. The robust workflow supports automation and miniaturization for efficient, scalable, and cost-effective processing.
Results
Generate long-read sequencing libraries with broad size distribution and read lengths from low inputs
To demonstrate read-length distribution from low-input samples, the SSmRNA LR workflow was used to generate libraries from two sample amounts: 10 pg and 10 ng of mouse brain RNA (MBR). cDNA size distribution and read length distributions were analyzed for the two different MBR input amounts (Figure 2).
As shown in Figure 2, enriched barcoded cDNA profiles show that the kit generates barcoded cDNA fragments spanning a wide range of fragment lengths (from 400 to >5,000 bp) with an average size of ~2,000 bp. The 10 pg and 10 ng samples demonstrated similar cDNA size distribution, indicating the efficacy of the SSmRNA LR workflow at both high and low RNA input amounts. ONT library prep and sequencing resulted in read lengths of over 5,000 nt.
Figure 2. SSmRNA LR kit generates barcoded cDNA with a wide range of fragment lengths. The SSmRNA LR workflow was used to create cDNA from 10 pg and 10 ng total mouse brain RNA (n = 8). cDNA size distribution was measured on a 2100 Bioanalyzer (Agilent Technologies) using an Agilent High Sensitivity DNA Kit. For the 10 ng sample, the average size of cDNA was 2,759 bp. For the 10 pg sample, the average size of cDNA was 2,281 bp (Panel A). Barcoded cDNA was pooled per input, and libraries were generated using the Ligation Sequencing Kit V14. Libraries were sequenced on a MinION Flow Cell (ONT) for 72 hr. Samples were basecalled and demultiplexed using Guppy, and read-length distribution was plotted using MS-Excel. Panels B and C. Read-length distributions are shown for a representative sample of 10 pg (N50 = 1.6 kb) and 10 ng (N50 = 1.9 kb) total MBR.
Deliver excellent sensitivity and reproducibility and generate libraries with uniform gene-body coverage
Libraries prepared from 10 pg and 10 ng MBR using the SSmRNA LR workflow demonstrate high gene and transcript sensitivity, allowing the user to uncover thousands of genes and transcripts from low input amounts (Figure 3, Panel A). The libraries also demonstrate a balanced barcode distribution (2.4 fold across barcodes) and a high demultiplexing rate of over 95% (data not shown). This was achieved using Takara Bio's demultiplexing protocol with ONT's Dorado base caller. Furthermore, the libraries show high reproducibility among gene counts, indicated by a high Pearson correlation between technical replicates (range: 0.876–0.983; average: 0.968) (Figure 3, Panel B). Additionally, analysis of transcript coverage shows uniform read distribution across the gene body, indicating that the SSmRNA LR workflow generates unbiased, full-length, long-read sequencing libraries (Figure 3, Panel C).
Figure 3. SSmRNA LR kit demonstrates high sensitivity and even gene-body coverage across a broad range of RNA inputs and reproducibly multiplexes bulk samples. To evaluate the performance of the SSmRNA LR workflow, cDNA was generated from 10 pg and 10 ng total MBR using the workflow described in Figure 1. After sequencing, data was basecalled and demultiplexed using Guppy, and reads were downsampled to the indicated read counts. Downsampling analysis of the 10 pg MBR dataset and the 10 ng MBR dataset demonstrates the gene and transcript sensitivity of the workflow. Sensitivity is defined as the number of genes or transcripts detected. SSmRNA LR workflow was used to create cDNA from 96 replicates of 10 ng MBR. Barcoded cDNA was pooled, and libraries were generated and sequenced according to the workflow. Samples were basecalled, demultiplexed using Guppy, aligned with minimap2, and count matrices were generated using feature counts (Panel A). Pairwise correlation matrices were calculated from the count matrices in R. High Pearson correlation values between samples indicate high technical reproducibility (Panel B). To evaluate gene-body coverage, 10 pg and 10 ng MBR samples underwent the SSmRNA LR workflow. Gene-body coverage was assessed for an average of eight replicates of 10 pg and 10 ng MBR samples (Panel C).
Detect complete, full-length isoforms (up to 8 kb) with accurate strand orientation
Long-read sequencing technologies can capture whole transcripts, enabling the identification and accurate measurement of alternative splicing and disease-specific isoforms of clinical relevance, which is not readily achievable with traditional short-read sequencing.
Analyses of long-read sequencing data from MBR libraries prepared using the SSmRNA LR workflow are shown below, demonstrating the detection of two full-length isoforms with different transcript lengths (Figure 4, Panels A and B).
Figure 4. SSmRNA LR kit detects full-length isoforms. cDNA was generated using the SSmRNA LR workflow described in Figure 1. Basecalling and demultiplexing was performed using Guppy, and reads were aligned using minimap2. Isoforms of Snap25 (Panel A) and Nbr1 (Panel B) detected from 10 pg MBR input are visualized in Integrative Genomics Viewer (IGV).
The completeness and accuracy of full-length isoform detection is further demonstrated using Spike-In RNA Variant (SIRV) controls (Lexogen). SSmRNA LR libraries prepared from 10 ng MBR spiked in with SIRVs ranging from 600–2,492 bp show the detection of all RNA fragments with a high percentage (>90%) of complete full-length fragments (Figure 5, Panels A and B). Observed number of reads matches closely to the expected number of reads for all data; sense strand data are nearly perfectly aligned with the expected read counts (Figure 5,Panel B). The sequencing data also showed accurate strand identification across the different SIRV-spike-in species (Figure 5, Panel B).
Figure 5: Visualization showing completeness of fragment lengths and strand orientation accuracy of the SSmRNA LR kit. This experiment used Spike-In RNA Variant (SIRV) controls (600–2,492 bp), showcasing detection of full-length fragments with a high percentage of complete transcripts, and accurate identification of strand orientation across all isoforms. Expected results for perfect coverage from known concentrations are represented by the straight-lined boxes in the bottom section of both panels. For antisense strands, undercoverage results are blue; overcoverage results are tan. For sense strands, undercoverage results are green; overcoverage results are tan (very minimal). Panel A. Coverage analysis of complete, full-length fragments between 919–2,492 bp. Analysis shows 89% of full-length isoforms were detected. Panel B. Strand identification across different SIRV-spike-in species visualized via SIRVsuite on fragments between 600–1,597 bp. Analysis shows 95% of full-length isoforms were detected.
Spike-in controls of different lengths were further used to demonstrate the performance of the SSmRNA LR kit to detect complete transcripts as a function of fragment length. Figures 6 and 7 show that the SSmRNA LR kit can detect full-length transcripts even up to 8 kb and beyond (data not shown), with significant amounts of reads showing complete coverage.
Figure 6. SSmRNA LR kit performance with mRNA reference standards. To evaluate the performance of the SSmRNA LR workflow, cDNA was generated from SIRV-Set 4 (Lexogen) mRNA reference standards, which contains both ERCC quantification controls and Long SIRV mRNA standards. 10 ng of mouse brain total RNA was prepared—with SIRV spike-ins added to account for approximately 5% of reads—using the workflow described in Figure 1. Libraries were sequenced by ONT MinION, FASTQ data was read-strand corrected using the Restrander tool, and data were aligned with minimap2. IGV plots show data from 1 kb, 4 kb, 6 kb, and 8 kb long transcripts from the ERCC and long SIRV isoform set. Red-colored reads indicate positive-strand reads; blue indicates very rare negative-strand reads, and small purple or red marks indicate small indels/variations common in ONT sequencing. Full length is defined as reads that cover at least 90% of expected mRNA length.
Quantify transcripts accurately using ERCC spike-in controls
To assess the accuracy of gene expression measurements, known amounts of External RNA Controls Consortium (ERCC) standards were added to 10 ng MBR, and libraries were prepared for ONT sequencing using the SSmRNA LR workflow. The ERCC data demonstrates a high correlation (r = 0.978) between the expected and measured spike-in RNA concentration (Figure 7), demonstrating high accuracy in gene expression measurements across a wide dynamic range (10–10-5 RNA copies).
Figure 7. SSmRNA LR kit performance with mRNA reference standards. To evaluate the performance of the SSmRNA LR workflow, cDNA was generated from SIRV-Set 4 (Lexogen) mRNA reference standards, which contain both ERCC quantification controls and Long SIRV mRNA standards. 10 ng of mouse brain total RNA was prepared—with SIRV spike-ins added to account for approximately 5% of reads—using the workflow described in Figure 1. Libraries were sequenced by ONT MinION, FASTQ data was read-strand corrected using the Restrander tool, and data were aligned with minimap2. ERCC standard abundance for measured vs. theoretical concentration was plotted.
Demonstrate excellent performance from single-cell inputs
The sequencing data of single K562 cells and 1,000 K562 cells demonstrate excellent gene and transcript sensitivity (Figure 8, Panel A) and high reproducibility among gene counts with an average Pearson correlation of R = 0.938 (Figure 8, Panel B).
Figure 8. Single-cell performance of the SSmRNA LR kit. cDNA was generated using the SSmRNA LR workflow described in Figure 1. Basecalling and demultiplexing was performed using Guppy, and reads were aligned using minimap2 (Panel A). Isoforms of RAF1 (C-RAF) detected from 10 pg of RNA were isolated from primary lung cancer samples. Sensitivity is defined as the number of genes or transcripts detected. The SSmRNA LR workflow was used to create cDNA from eight single K562 cells (Panel B). Barcoded cDNA was pooled, and libraries were generated and sequenced according to the workflow. Samples were basecalled, demultiplexed using Guppy, aligned with minimap2, and count matrices were generated using feature counts. Pairwise correlation matrices were calculated from the count matrices in R. High Pearson correlation values between samples indicate high technical reproducibility.
Detect gene fusions from single-cell inputs
Gene fusions represent an important type of structural variant and can often act as key drivers of cancer development or be used as drug targets. Accurately identifying gene fusions from short-read sequencing data can be challenging because few reads typically span the fusion breakpoint. The SSmRNA LR kit was used to prepare long-read, ONT-sequencing libraries. The known gene fusions in K562 cells were then examined (Figure 9).
Figure 9. Detection of the known NUP214-XKR3 fusion in single K562 cells using the SSmRNA LR kit. cDNA was generated using the SSmRNA LR workflow described in Figure 1. Basecalling and demultiplexing was performed using Guppy, and reads were aligned using minimap2. NUP214-XKR3 gene fusions detected from single-cell inputs are visualized in IGV.
Integrate automation and miniaturization into workflows
We further demonstrate that the SSmRNA LR workflow is well suited for automation and miniaturization, enabling efficient, scalable, and reproducible processing without compromising performance compared to manual workflows. The SSmRNA LR kit was used to prepare barcoded cDNA from 10 ng MBR by implementing the full-volume manual protocol. An 8X miniaturized protocol was automated on the mosquito by SPT Labtech followed by conversion to ONT sequencing libraries. Sequencing data demonstrates comparable performance between the manual (full volume) and automated (8X miniaturized) workflows (Figure 10, Panels A and B).
Figure 10. Automation- and miniaturization-friendly SSmRNA LR workflow comparison. Comparable gene and transcript detection between full-length manual library prep 8X miniaturization (Panel A). Table showing the mean and median sizes of reads between the manual and mosquito automation methods (Panel B).
Conclusions
This tech note demonstrated the SMART-Seq mRNA LR kit’s ability to:
Enable full-length transcriptome analysis from low-input samples using LR sequencing (Figure 2)
Generate LR sequencing libraries with broad size distribution and read lengths from low inputs (Figure 2)
Deliver excellent sensitivity and reproducibility and generate libraries of transcripts with uniform gene-body coverage (Figure 3)
Detect complete, full-length transcripts (up to 8 kb) with accurate strand orientation (Figures 4, 5, and 6)
Quantify transcripts accurately (Figure 7)
Demonstrate excellent performance from single-cell inputs (Figure 8)
Detect gene fusions (Figure 9)
Automate or miniaturize workflows (Figure 10)
These data showcase the reliability, reproducibility, sensitivity, and adaptability of the SSmRNA LR kits, giving users the confidence to generate full-length cDNA libraries with sample barcoding to enable long-read RNA-seq from as little as 10 pg inputs or single cells. This technology also enables the detection of isoforms up to 8 kb, determination of strand orientation, accurate measurement of transcript expression, and identification of gene fusions. It is also compatible with direct cell inputs, and the robust workflow supports automation and miniaturization for efficient, scalable, and cost-effective processing.
Methods
First-strand cDNA synthesis is primed by the SMART-Seq LR Primer and performed by a Moloney murine leukemia virus (MMLV)-derived reverse transcriptase (RT). Upon reaching the 5’ end of each mRNA molecule, the RT adds non-templated nucleotides to the first-strand cDNA facilitating hybridization with a template-switching oligonucleotide (TSO). In the template-switching step, the RT uses the remainder of the SMART-Seq LR TSO as a template for the incorporation of an additional sequence on the end of the first-strand cDNA. The first-strand cDNA is then barcoded and amplified by the first round of PCR (PCR1); after clean-up of the PCR1 product, a second round of PCR enriches barcoded fragments. Samples are pooled and end-prepped, and sequencing adapters are ligated using the Ligation Sequencing Kit V14 (Oxford Nanopore Technologies or ONT). The SMART-Seq Library Prep Kit is used to generate sequencing-ready libraries. After sequencing, samples are basecalled and demultiplexed using Guppy (ONT). Downstream analysis is performed using minimap2 (Github), SAMtools, and Salmon.