Target enrichment with the ICELL8 full-length workflow for superior fusion detection
Introduction
Under-expressed biological events can be difficult to identify, even when one is expecting to see them, but they are frequently the events researchers most want to find. Rare fusions and isoforms, for example, have applications in oncology research as they can relate to carcinogenesis (Yu et al. 2019, Mertens et al. 2015). The single-cell full-length transcriptome analysis (ICELL8 SMART-Seq) application for the ICELL8 cx Single Cell System gives scientists the means to detect these biological events (see tech note for highlights).
Despite this capability, with higher expressors consuming a larger percentage of the finite number of reads and a higher probability of being sequenced, some low-expressor targets of interest may remain undetected. Enrichment is commonly used to investigate specific nucleic acid sequences most relevant to your research, although it is not generally part of the standard ICELL8 full-length scRNA-seq workflow. We investigated how targeted enrichment can further improve detection with the full-length ICELL8 SMART-Seq application to rescue underrepresented regions from being lost in the crowd.
The experimental goal was to improve identification of certain important low-expressor fusions. Genes selected for enrichment included BCR and ABL1 (Grosveld, G. et al.), shown in Figure 1, as well as four other genes associated with fusions: PAX5, ETV6, EP300, and ZNF384. By incorporating target enrichment, detection for known regions was expected to improve.
The ICELL8 cx Single Cell System was used to isolate 1,512 K562 cells, and the ICELL8 SMART-Seq workflow was implemented to generate an initial library which was then enriched for specific targets of interest prior to sequencing. K562 cells identified with the Philadelphia chromosome (Lozzio, C.B. and Lozzio, B.B., 1975) were used as a positive sample set for BCR-ABL1 fusions. The IDT xGen Lockdown kit was used for the target enrichment steps; though in theory, any enrichment method would be compatible with this approach. The designed probes consisted of 5' biotinylated oligos for hybridization capture enrichment in next-generation sequencing.
Both unenriched and enriched data were then analyzed together using Cogent NGS Analysis Pipeline to map reads.
Results
From comparing the number of cells expressing the BCR-ABL1 fusion between enriched and unenriched data sets, it was clear that enrichment greatly improved the number of cells detected with fewer reads needed to find them. Out of 1,512 cells, only 40 were determined to express transcripts containing the BCR-ABL1 fusion even at a sequencing depth of greater than 1.6 M reads. In contrast, targeted enrichment of BCR and ABL1 from the same library determined that 264 cells expressed the BCR-ABL1 fusion at a depth of only 582 K reads. (Figure 2).
By focusing on the BCR and ABL1 targets of interest, there was a >150-fold increase overall in junction and spanning reads detected (Table 1). Several fusions expected to be detected in K562, based on DepMap data (The Cancer Dependency Map Consortium, 2020), were identified using the SMART-Seq workflow. BCR-ABL1 was present in 7% of the cells identified with any fusion; following enrichment, 94% of the fusion-identified cells were the BCR-ABL1 fusion.
Original library
BCR- and ABL1-enriched
BCR-ABL1 junction reads
13
1,735
BCR-ABL1 spanning reads
39
6,420
Total # cells with any fusion identified
590
280
# fusion-identified cells expressing BCR-ABL1
40
264
Table 1.Targeted enrichment of the original library led to a significant increase in both junction and spanning reads for BCR-ABL1.
Cogent NGS Analysis Pipeline was used to analyze the number of reads to each of the six targeted genes (BCR, ABL1, EP300, ETV6, PAX5, and ZNF384); the analysis revealed the percentage of reads for all six genes increased as a result of enrichment. In the case of the genes involved in the BCR-ABL1 fusion, detection was increased from ~0% to 6.18% of the total reads for BCR, and from ~%0 to 5.06% of the total reads for ABL1 (Table 2).
Original library
xGen-enriched
Original library
xGen-enriched
Original library
xGen-enriched
Targeted gene
BCR
BCR
ABL1
ABL1
Total %
Total %
Percentage of reads to gene (out of total barcoded reads)
0.00%
6.18%
0.00%
5.06%
0.00%
11.24%
Table 2. Comparison of gene detection between unenriched original library set and same library enriched for the genes of primary interest, BCR and ABL1.
For the secondary genes targeted by enrichment, detection was also increased from ~0% in the original, unenriched library to a non-zero percentage in the enriched library (Table 3).
Original library
xGen-enriched
Original library
xGen-enriched
Original library
xGen-enriched
Original library
xGen-enriched
Original library
xGen-enriched
Targeted gene
EP300
EP300
ETV6
ETV6
PAX5
PAX5
ZNF384
ZNF384
Total %
Total %
Percentage of reads to gene (out of total barcoded reads)
0.00%
5.01%
0.00%
1.60%
0.00%
0.05%
0.00%
22.5%
0.00%
29.16%
Table 3. Comparison of K562 gene detection between unenriched original library set and same library enriched for the four secondary genes targeted for enrichment.
Overall, 40% of the total reads were attributed to the six targeted genes, compared to ~0% for the original, unenriched library data for the same targets (Table 4).
Original library
xGen-enriched
Percentage of reads to targeted genes, aggregate (out of total barcoded reads)
0.00%
40.4%
Table 4. Comparison of gene detection between unenriched original library set and same library enriched using the protocol for all six targeted genes (Tables 2 and 3).
HPRT1 and GAPDH were chosen as representative of non-targeted genes, and,as expected, the percentage of reads for non-targeted genes diminished as more reads were distributed to the six targeted genes (Table 5).
Original library
xGen-enriched
Original library
xGen-enriched
Non-targeted genes
HPRT1
HPRT1
GAPDH
GAPDH
Percentage of reads to gene (out of total barcoded reads)
0.02%
0.01%
0.20%
0.10%
Table 5. Comparison of K562 gene detection for non-targeted genes between unenriched original library set and enriched. The percentage for each gene not targeted for enrichment decreased by half after the protocol.
Conclusion
When there is interest in rare events and specific regions are being investigated, target enrichment can be a very powerful method to boost detection. While the process does introduce additional steps and time to the ICELL8 scRNA-seq workflow, the benefits are demonstrated by the results. The IDT xGen Lockdown protocol worked well in this experiment, though it is expected that other enrichment kits would also provide similar benefits. The ICELL8 SMART-Seq application coupled with target enrichment looks to greatly improve detection for important, low-expressor fusions.
Methods
Initial library preparation from single cells
An initial library from K562 cells was generated by following the SMART-Seq ICELL8 cx Application Kit User Manual using the ICELL8 cx system. Libraries from single cells were pooled and purified. The library profile and yield were quantified using Agilent High Sensitivity DNA Reagents (Agilent, Cat. # 5067-4627).
Target capture
Target enrichment was performed using the xGen Lockdown protocol available from IDT, which included custom-designed probes and reagent kits. Probes consisting of 5' biotinylated oligos for genes BCR, ABL1, PAX5, ETV6, EP300, and ZNF384 were designed following recommendations by the manufacturer. The length of each probe was 120 bases and tiled at 1x to cover the fusion junction as well as 1 kb on either side (i.e., both 5' and 3' gene partners). A total of 106 probes were synthesized. Targeted enrichment from the initial library was performed according to the instructions of the xGen Hybridization and Wash Kit (IDT, Cat. # 1080557) and xGen Universal Blockers-NXT Mix (IDT, Cat. # 1079584). The captured library was amplified, purified, and validated using Agilent High Sensitivity DNA Reagents.
For this experiment, enrichment contributed additional costs of less than $0.10 per cell before sequencing (at current list price for necessary xGen Lockdown reagents and probes). With 264 fusions identified, this is equivalent to roughly an additional $0.40 per fusion for this particular experiment.
Illumina sequencing
Quantified unenriched and enriched post-PCR libraries were loaded onto an Illumina NextSeq® 550 Sequencing System using a NextSeq 500/550 High-Output Kit v2.5 (Illumina, cat. # 20024907) and a NextSeq 500/550 Mid-Output Kit v2.5 (Illumina, cat. # 20024907), respectively, for sequencing. Libraries were loaded following loading concentrations recommended by Illumina.
Data analysis
Both unenriched and enriched data were then analyzed using Cogent NGS Analysis Pipeline to map reads. Fusion detection was then performed using the STAR-Fusion pipeline from the Haas et al. 2017 publication.
In the second step, STAR-Fusion interpreted the reads: discordant reads became spanning reads and split reads became junction reads. SMART-Seq full-length chemistry used paired-end reads, so both junction and spanning reads were captured.
In the final step, Haas et al. applied a STAR-Fusion filter to remove sequence-similar gene pairs and promiscuous fusion partners. While this is a good approach for bulk analysis, for this single-cell experiment, we instead used a relaxed setting that did not apply these filters.
Refer to Figure 1 of Haas et al., 2017 for more details and visualization of this pipeline.
Grosveld, G. et al., The chronic myelocytic cell line K562 contains a breakpoint in bcr and produces a chimeric bcr/c-abl transcript. Mol. Cell. Biol. (1986).
Haas, B. et al., STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq. bioRxiv (2017).
Lozzio, C. B. & Lozzio, B. B. Human chronic myelogenous leukemia cell line with positive Philadelphia chromosome. Blood (1975).
Mertens, F. et al., The emerging complexity of gene fusions in cancer. Nature Reviews Cancer (2015).
Yu, Y. et al., Identification of recurrent fusion genes across multiple cancer types. Sci. Rep. 9, 1074 (2019).
Learn how our products can help you uncover biomarkers from a broad range of sample types, including FFPE RNA, cell-free RNA, and extracellular vesicles.
Learn how the ICELL8 cx SMART-Seq protocol generates full-length scRNA-seq libraries that enable deeper analyses—such as detection of gene fusions, SNPs, and alternative splicing—than 3' DE on droplet-based systems.
The accurate capture and quantification of RNA transcript variations from single tumor cells would allow researchers to gain insights into tumor complexity and ultimately help in the development of tailored anticancer therapies.