Splice variants or alternative isoforms play an important role in the understanding of human health and disease. It is estimated that nearly all protein-coding genes in the human genome are alternatively spliced, providing an essential source of protein diversity (Pan et al. 2008; Wang et al. 2008). As many cancer-associated genes are regulated by alternative splicing, tumor-specific splice variants have clear diagnostic value as biomarkers and may serve as potential drug targets in novel therapeutics (Zhang et al. 2021).
One of the toughest challenges in biology and medicine today is to map genotypes to phenotypes, which can be tackled by performing transcriptomics analysis via single-cell mRNA sequencing (Hwang et al. 2018). Full-transcriptome mRNA sequencing enables the discovery of rare biological events such as gene fusions, SNPs, and alternative splicing, which is an essential first step to new advancements in medicine.
Though there are plate-based methods for full-length, single-cell RNA-seq (scRNA-seq), they often lack integrated analysis tools. Our complete, automated solution for enhanced biomarker detection includes optimized SMART-Seq chemistry on the ICELL8 cx Single-Cell System and Cogent NGS tools (Figure 1). With the full-length coverage of our SMART-Seq Pro kit, you can identify clinically relevant, novel biomarkers at single-cell resolution with confidence.
Results
SMART-Seq Pro uncovers the most biological information from single cells
To determine if the automated, streamlined SMART-Seq Pro workflow could detect genes from single cells at the same rate as or better than other popular plate-based methods such as Takara Bio’s SMART-Seq v4 kit or the Smart-seq2 homebrew method, we used all three methods to prepare single-cell sequencing libraries. Then we analyzed the number of identified genes using Cogent NGS tools. With the SMART-Seq Pro workflow we identified a median of 3,195 genes/cell, while we identified a median of 1,747 genes/cell with the Smart-seq2 workflow, and a median of 2,574 genes/cell with the SMART-Seq v4 workflow, demonstrating that SMART-Seq Pro detects more genes per cell than plate-based methods (Figure 2).
SMART-Seq Pro enables transcript-level investigation, revealing information that can be missed with a gene-level-only investigation
To test the ability of the SMART-Seq Pro’s end-to-end workflow to identify information that is missed by gene-level-only analysis, we first identified transcript-level expression changes in a population of human blood mononuclear cells (PBMCs) using the complete SMART-Seq Pro workflow. Briefly, PMBCs were processed using the SMART-Seq Pro application kit on the ICELL8 cx Single-Cell System, which resulted in the creation of full-length scRNA-seq libraries. After sequencing, data were demultiplexed and mapped using the transcript analysis option in the Cogent NGS Analysis Pipeline (Cogent AP), our free-to-use bioinformatics software. The output file from Cogent AP was then used as input for Cogent NGS Discovery Software (Cogent DS). Cogent DS was then used to perform clustering analysis and generate a UMAP plot based on transcript counts (Figure 3, Panel A).
Among the eight different transcript-based clusters identified, Cluster 1 (red circle) and Cluster 5 (blue circle) were chosen for further analysis. From the 'Gene Discovery' menu in Cogent DS (Figure 3, Panel B), two lists of differently expressed transcripts were downloaded, one for Cluster 1 and one for Cluster 5. For Cluster 1, 4,935 differently expressed transcripts were identified, while 2,598 transcripts were identified for Cluster 5. Then, transcripts that were detected in both clusters but displayed opposite expression levels were extracted, which resulted in a list of 959 differentially expressed transcripts.
Then, we identified expression changes in the same PBMC population using a gene-level-only analysis akin to the type of analysis that would be performed using data generated from 3′DE or 5′DE single-cell RNA sequencing. Cogent DS was used to generate a UMAP plot showing the results of gene-based clustering (Figure 4, Panel A). From the 'Gene Discovery' menu in Cogent DS (Figure 4, Panel B), a list of differently expressed genes from all gene-based clusters was downloaded. A unique gene list was created, which resulted in 5,987 genes. This list of genes represents expression changes that could be identified by 3′DE or 5′DE methods.
To demonstrate the information gained by transcript-level analysis over a gene-level-only analysis, transcripts differentially expressed between transcript-based Clusters 1 and 5 whose parental gene was not differentially expressed were extracted, which resulted in a list of 52 transcripts. These 52 transcripts are expression changes that were only identified through a full-length, transcript-level analysis like the Smart-Seq Pro.
SMART-Seq Pro empowers novel biomarker discovery with transcript-level insight
To demonstrate that the SMART-Seq Pro empowers novel, clinically relevant biomarker discovery, the 52 transcripts that were identified as being differentially expressed at the transcript level but not the gene level between transcript-based Clusters 1 and 5 were further investigated. Of these 52 transcripts, several PTPRC isoforms were identified, including PTPRC-201, PTPRC-207, and PTPRC-209. PTPRC encodes a transmembrane protein tyrosine phosphatase known as CD45 that is required for T-cell antigen receptor signal transduction (Li et al. 2020). Previous studies have shown that different PTPRC isoforms generated through alternative splicing depend on the state of T-cell activation (Li et al. 2020).
By using the 'Gene Discovery' module and choosing PTPRC in Cogent DS, the expression of the PTPRC gene across gene-based clusters was examined. High expression of PTPRC was observed across all gene-based clusters, which was consistent with our initial analysis (Figure 5, Panel A). However, when the expression of PTPRC-201, PTPRC-207, or PTPRC-209 across transcript-based clusters was examined, PTPRC-201 demonstrated higher expression in Cluster 1 (Figure 5, Panel B; red) compared to PTPRC-207 and PTPRC-209, while PTPRC-207 and PTPRC-209 demonstrated higher expression in Cluster 5 (Figure 5, Panel B; blue) compared to PTPRC-201. These data illustrate why an ultra-sensitive, full-length approach such as SMART-Seq Pro is needed for the detection of novel biomarkers.
Conclusions
We successfully used the SMART-Seq Pro kit, together with the ICELL8 cx Single-Cell System and Cogent NGS analysis tools, to generate high-quality data that can be easily used for characterizing clinically relevant isoforms. The findings show how the ICELL8 cx system offers an efficient way to scale up biomarker discovery by leveraging the full-length sequencing coverage and unparalleled sensitivity of SMART-Seq Pro. With the freely available Cogent NGS tools, we streamlined the discovery of novel biomarkers by maximizing detection power, reproducibility, and resolution.
Methods
Cell staining and preparation
Cryopreserved PBMCs (Cat. # HUMAN-PBMC-M-170046, lot # 00PB000450, BioIVT) were thawed in a 37°C water bath for 60 seconds and then topped off with pre-warmed RPMI-1640 Complete Medium (20% FBS). After centrifugation and subsequent removal of the supernatant, the concentration of recovered PBMCs was determined via a Moxi automated cell counter. PBMCs were then stained with Hoechst 33342 and propidium iodide as described in the SMART-Seq Pro application kit user manual.
Cell dispensation and imaging
The PBMCs and controls were then dispensed into 5,184 nanowells of the ICELL8 350v Chip using the ICELL8 cx system and ICELL8 cx CELLSTUDIO v2.5 Software. The nanowells were then imaged by the ICELL8 cx system with both blue and red wavelength filters. After imaging, the chip was frozen at –80°C while the ICELL8 cx CellSelect v2.5 Software was used to analyze the resulting images with the automated threshold detection. After selecting candidate wells, the software generated a filter file to use for all the remaining dispenses.
Library construction
The chip was thawed and returned to the ICELL8 cx system, where RT reagents were dispensed only into the nanowells defined as candidates by the CellSelect filter file. The chip was run through a program to perform first-strand cDNA synthesis on the ICELL8 cx Thermal Cycler initiated by SMART-Seq Pro CDS (an oligo-dT primer). Following first-strand cDNA synthesis, the SMART-Seq Pro Oligonucleotide was hybridized to the 3′ end of the full-length cDNA and mediated template switching, serving as a priming site for second-strand cDNA synthesis. Once synthesized, the second-strand cDNA was amplified through PCR, resulting in copies of unbiased, full-length cDNA. After amplification, the full-length cDNA was tagmented by Illumina Bead-Linked Transposome (BLT). The tagmented cDNA was then amplified using forward and reverse indexing primers, generating the final library construct.
The resulting libraries were extracted from the chip, purified, amplified, and purified again. After validation steps, the libraries were loaded on the Illumina NextSeq® 500 system. The resultant sequencing reads were downsampled to 100,000 reads per cell to see the number of genes detected.
Gene expression analysis
Cogent NGS Analysis Pipeline (CogentAP) v1.5 was used for gene expression analysis. As part of the pipeline, steps including adapter trimming (using cutadapt tool), genome alignment (using STAR), gene read counting (using featureCounts), and transcript read counting (using RSEM) were performed. The resulting gene/transcript matrices were then input into Cogent NGS Discovery Software (CogentDS) v1.5 to perform clustering analysis and generate UMAP plots and clusters.
References
Hwang, B. et al. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med.50, 1–14 (2018).
Li, J. et al. Landscape of transcript isoforms in single T cells infiltrating in non-small-cell lung cancer. Genet. Genomics 47, 373–388 (2020).
Pan, Q. et al. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Wang, E. et al. Alternative isoform regulation in human tissue transcriptomes. Nature456, 470–476 (2008).
Zhang, Y. et al. Alternative splicing and cancer: a systematic review. Sig. Transduct. Target Ther. 6, 78 (2021).