By examining gene expression in individual cells through single-cell RNA sequencing (scRNA-seq), researchers can untangle the cellular complexity within a heterogeneous sample such as tumor or other disease tissues (Hay et al. 2020). scRNA-seq can identify subpopulations with unique gene expression profiles and biomarkers that may be correlated with drug resistance or cancer progression.
Current scRNA-seq methods with full gene-body coverage have never been scaled up to satisfy users' need for cell throughput (i.e., existing methods are limited to up to 384 single cells per run). However, widely used end-counting technologies only give partial information due to the lack of full gene-body coverage. Scientists need new technologies with both high cell throughput and full gene-body coverage so they can leverage the data to get extra insights besides gene counting, such as transcript-level information including alternative splicing and gene fusions. To address this long-standing need, we developed the Shasta™ Total RNA-Seq Kit (workflow illustrated in Figure 1), enabling scientists to get scRNA-seq libraries for up to 100,000 cells per run with full gene-body coverage for the very first time.
Evaluation of the Shasta Total RNA-Seq Kit demonstrated the kit has a low doublet rate, full gene-body coverage, high sensitivity, and high-throughput capability. Data generated with Shasta Total RNA-Seq enables annotation of different cell populations in a heterogeneous sample. Embracing scalability and sensitivity, the Shasta Total RNA-Seq Kit will be a powerful biomarker discovery tool for scientists to better understand disease mechanisms and find more optimal treatment strategies for cancer and other diseases.
Results
Deconvolution of sample complexity with Shasta Total RNA-Seq
Peripheral blood mononuclear cells (PBMCs) are a heterogeneous population composed of various types of lymphocytes, monocytes, dendritic cells, and other cell populations. We processed human PBMCs using Shasta Total RNA-Seq and analyzed the sequencing data, finding that it enabled identification of distinct cell populations based on gene expression profiles. In this experiment encompassing 8,000 PBMCs, Shasta Total RNA-Seq displayed very high sensitivity, manifested by detection of about 3,500 genes per cell. With its full gene body coverage, Shasta Total RNA-Seq was not susceptible to sequencing saturation (Figure 2, Panel A and B), while end-counting technologies hit sequencing saturation very easily at 25,000–50,000 read depth. The Shasta Total RNA-Seq data allowed us to successfully distinguish several PBMC subpopulations, demonstrating that Shasta Total RNA-Seq’s data is sensitive enough to show the different gene expression programs (Figure 2, Panel C).
Low doublet rate and even gene body coverage for better biomarker detection
When utilizing a high-throughput workflow such as Shasta Total RNA-Seq to decode a complex biological sample, having a low doublet rate is extremely important for scientists to have full confidence in their data. We tested Shasta Total RNA-Seq with a cell mixture composed of mouse 3T3 and human K562 cells. Shasta Total RNA-Seq demonstrated a very low doublet rate (Figure 3, Panel A). Notably, the method provided full gene body coverage, with no bias toward the 5′ end or 3′ end (Figure 3, Panel B).
Scalability and sensitivity empower the discovery of more biomarkers
With Shasta Total RNA-Seq Kit at its highest cell throughput, we can obtain scRNA-seq data with full gene body coverage for up to 100,000 cells per experiment (Figure 4, Panel A). The in situ RT allows the addition of up to 96 sample-specific barcodes during this workflow. We tested the distribution of sequencing reads among these 96 barcodes and found the distribution is even, suggesting the workflow’s compatibility with high sample throughput as well. (Figure 4, Panels B and C).
Conclusion
The Shasta Total RNA-Seq Kit is the first solution for high-throughput full-length scRNA-seq, enabling detection of novel biomarkers such as alternative splicing isoforms and gene fusions. The above experiments demonstrated the Shasta Total RNA-Seq Kit with full gene body coverage, provides throughput of up to 100,000 cells per experiment and up to 96 different samples. Embracing high-thoughput and outstanding sensitivity, the Shasta Total RNA-Seq Kit offers scientists great potential to identify novel and hidden biomarkers to answer crucial biological questions.
Materials and methods
K562, an immortalized chronic myelogenous leukemia (CML) cell line, was used to generate libraries using the Shasta Total RNA-Seq Kit as per the user manual. Following purification, libraries were quantified using the Qubit, the Agilent 2100 Bioanalyzer, and the Library Quantification Kit (Cat. # 638324). Libraries were then sequenced on an Illumina NextSeq 500/550 platform with a Mid Output or a High Output Kit with 150-cycle cartridges. Sequencing data analysis was completed using Takara Bio Cogent NGS tools.
Peripheral blood mononuclear cells (PBMCs) extracted from whole blood were used to generate libraries using the Shasta Total RNA-Seq Kit as per the user manual. Following purification, libraries were quantified using the Qubit, the Agilent 2100 Bioanalyzer, and the Library Quantification Kit (Cat. # 638324). Libraries were then sequenced on an Illumina NextSeq 500/550 platform with a High Output Kit with 150-cycle cartridges. Sequencing data analysis was completed using Cogent NGS tools.
Reference
Hay, MA. et al. Identifying opportunities and challenges for patients with sarcoma as a result of comprehensive genomic profiling of sarcoma specimens. JCO Precision Oncology, 4, 176–182 (2020).