Reliable CNV calling from single cells

Figure 2. Single-cell CNV analysis using the Embgenix GT-omics Kit and Embgenix Analysis Software. Panel A. CNV plots displaying normalized counts of sequencing reads mapped to 1 Mb bins across each chromosome for six replicates of the GM08331 cell line. A segmental loss at chromosome 13 was identified in all replicates. Panel B. Automated sample classification, karyotype calls, and corresponding QC metrics for each replicate. Number of total reads denotes the total number of sequencing reads submitted for analysis, while Number of informative reads represents the quantities of sequencing reads that were successfully mapped and used for CNV analysis. DLRS (derivative log ratio spread) quantifies signal noise, serving as a key metric for evaluating data suitability for accurate CNV analysis. QC status indicates whether a sample met predefined thresholds for informative reads (%) and DLRS, ensuring data quality for downstream analysis.
The Embgenix GT-omics Kit was used to generate DNA-seq and RNA-seq libraries from six single-cell replicates derived from the well-characterized lymphoblastoid cell line, GM08331. DNA-seq libraries were sequenced at a depth of 1.5 million paired-end reads per cell and analyzed using Embgenix Analysis Software (Figure 2, Panels A and B). Sequencing data from all replicates met the software's QC thresholds, and the assay accurately determined the cell line's karyotype—detecting a known segmental aneuploidy, a 12.1 Mb loss at chromosome 13, in each replicate. These results demonstrate that single-cell DNA-seq data generated with the Embgenix GT-omics Kit exhibit accuracy and reproducibility comparable to standalone approaches, enabling reliable CNV analysis with minimal background noise and a low likelihood of false positive calls.
RNA-seq libraries were sequenced and analyzed at a depth of 4.0 x 106 paired-end reads per cell using the Cogent NGS Analysis Pipeline (CogentAP). The distribution of reads mapping to exonic, intronic, intergenic, ribosomal RNA, and mitochondrial regions was consistent across replicates. Each replicate yielded over 11,150 unique detected genes, with an average of 12,195 genes identified across all six replicates (Figure 3). These results highlight the high quality and sensitivity of single-cell transcriptome data generated using the Embgenix GT-omics Kit.
High-quality single-cell transcriptome data

Figure 3. Single-cell transcriptome analysis using the Embgenix GT-omics Kit and Cogent software. Single-cell RNA-seq data for each of six replicates from the GM08331 cell line. The bar charts depict the distributions of sequencing reads mapped to exonic, intronic, intergenic, ribosomal, and mitochondrial regions for each sample. The number of unique genes detected for each replicate based on mapping of RNA-seq data is shown on the top.
While the analysis of GM08331 cells (Figures 2 and 3) provided insights into data quality obtained with the Embgenix GT-omics Kit and its suitability for CNV characterization, it did not directly assess the accuracy or reproducibility of the corresponding transcriptome data. To address this, synthetic RNA reference standards from the External RNA Controls Consortium (ERCC) were utilized. These standards consisted of 92 distinct polyadenylated RNA species with known sequences, each present at defined concentrations in one of two formulations (Mix1 or Mix2). To simulate real-world sample processing conditions, ERCC standards were combined with five-cell samples derived from the GM05067 cell line at two different dilution levels (low and high), generating four distinct RNA concentration conditions. Samples were processed in triplicate using the Embgenix GT-omics Kit and resulting RNA-seq libraries were sequenced and analyzed at a depth of 4 x 106 reads per sample using a custom analysis pipeline. Measured RNA quantities were compared to expected values to assess the correlation across the four concentration levels, validating the accuracy and reproducibility of the transcriptomic data obtained.
Accurate transcript quantitation at levels relevant to single-cell analysis

Figure 4. Assessing transcriptomic accuracy with synthetic RNA spike-in standards. Panel A. Pearson correlation matrix illustrating the relationships between measured quantities of 92 synthetic ERCC spike-in RNA species across two formulations (Mix1 vs. Mix2). Each spike-in mix was added to GM05067 cells at two dilution levels (low vs. high) in triplicate or measured directly using the Embgenix GT-omics Kit and a custom analysis pipeline. Panel B. Scatter plot comparing the measured fold changes of 92 ERCC spike-in RNA species (Y-axis) to their expected fold changes (X-axis) at four different concentrations. Each dot represents an individual RNA species, with spiked-in concentrations indicated by the color gradient to the right of the plot. The plot was generated using the Embgenix GT-omics Kit and a custom analysis pipeline.
The resulting profiles for synthetic ERCC transcripts exhibited strong linear correlations (R > 0.99 for intra-mix comparisons and R > 0.93 for comparisons with expected ERCC values), as visualized in a Pearson correlation matrix (Figure 4, Panel A), highlighting the high reproducibility provided by the GT-omics assay. Comparison of measured vs. expected ERCC transcript counts at each of four concentrations (Figure 4, Panel B) demonstrated the assay’s ability to detect transcripts at an abundance of 100 copies or more with a sequencing depth of 4 x 106 reads. While measured fold changes did not precisely match expected values, particularly at lower abundance levels, a clear correlation was observed between expected and measured fold changes. These results highlight the ability of the Embgenix GT-omics Kit’s to generate reliable data for single-cell differential expression analysis, including quantification of low-abundance transcripts.