The presence of circulating DNA has long been observed in human blood and is primarily attributed to apoptosis (Williamson 1970). Circulating tumor DNA (ctDNA) isolated from the plasma of cancer patients has been the subject of many research studies (Murtaza et al. 2013; Klevebring et al. 2014).
Many of these experiments have been made possible with the advent of next-generation sequencing (NGS) platforms, which allow for identification of copy number and single nucleotide variants. Since the amount of DNA present in plasma is generally low, highly sensitive methods for preparing NGS libraries are required. ThruPLEX technology provides an excellent choice due to its low-input requirements, using as little as 1 ng of cfDNA, as well as its single-tube workflow, which helps prevent sample loss and cross-contamination and ensures positive sample identification.
Instead of sequencing the entire genome, many researchers are choosing to direct their attention to the exome or a targeted subset of the exome to focus on the coding regions and reduce costs. This approach consists of enriching the exonic regions of interest to identify various types of genetic alterations inkKit, with the downstream enrichment tools from Agilent Technologies, namely the SureSelectXT, SureSelectXT2 and SureSelectQXT Target Enrichment Systems.
ThruPLEX DNA-seq kit is an essential addition to the SureSelect platforms because the default SureSelect library preparation kits are limited to input amounts of 200 ng (SureSelectXT), 100 ng (SureSelectXT2), and 50 ng (SureSelectQXT), while cfDNA in plasma samples is frequently present in amounts between 1 ng and 30 ng per 1 ml of plasma. The data we present here clearly demonstrate a powerful method that will allow investigators to obtain high-quality exome data from limited amounts of starting material with minimal protocol adjustment.
Results
The relatively low level of cfDNA in plasma samples presents a major challenge to the detection of genomic variations using next-generation sequencing. We carried out whole-exome enrichment of cfDNA from plasma samples by integrating ThruPLEX DNA-seq kit with Agilent SureSelectXT, XT2, and QXT Target Enrichment Systems (Figure 1). As a reference, sheared gDNA was also enriched and sequenced following library preparation with the SureSelectXT, XT2, and QXT Library Prep Kits. The key sequencing metrics are summarized in Table I.
Results from whole-exome enrichment of the ThruPLEX DNA-seq cfDNA libraries with each of the SureSelect platforms were comparable to those of the ThruPLEX DNA-seq gDNA libraries in terms of key sequencing metrics (Table I) and on-target specificity (Figure 2). These data demonstrate the exceptional repair capacity of ThruPLEX technology. When comparing the number of enriched (on-bait plus near-bait) bases, ThruPLEX DNA-seq cfDNA libraries were 99% (QXT), 88% (XT2), and 79% (XT) efficient relative to the gDNA libraries (Figure 2). This loss of information can be attributed to factors such as shorter fragment length and lower complexity of cfDNA from plasma samples.
Enrichment platform
Library preparation
Input type
Input amount
Unique reads
Fold enrichment
Library size
% duplication
SureSelectXT
ThruPLEX DNA-seq
cfDNA
500 pg
1,396,048
31.6
3.65 x 106
4.67
2 ng
1,444,625
34.0
1.51 x 107
1.32
10 ng
1,458,560
31.5
4.57 x 107
0.40
gDNA
10 ng
1,450,470
37.9
3.50 x 107
0.78
SureSelectXT
gDNA
200 ng
1,456,623
34.5
2.11 x 108
0.12
SureSelectXT2
ThruPLEX DNA-seq
cfDNA
500 pg
619,266
32.1
7.36 x 106
1.47
2 ng
621,686
32.9
1.22 x 107
1.04
10 ng
624,013
32.0
3.77 x 107
0.31
gDNA
10 ng
622,495
34.0
2.64 x 107
0.38
SureSelectXT2
gDNA
100 ng
625,674
29.5
8.05 x 107
0.55
SureSelectQXT
ThruPLEX DNA-seq
cfDNA
500 pg
1,052,819
44.9
2.62 x 106
8.53
2 ng
1,122,128
45.3
9.12 x 106
3.30
10 ng
1,142,897
45.3
3.36 x 107
0.82
gDNA
10 ng
1,139,272
43.1
2.83 x 107
1.06
Table I. High-quality exome-enriched libraries. Summary of sequencing metrics from whole exome sequencing of cfDNA and gDNA libraries prepared using the ThruPLEX DNA-seq kit or SureSelect Library Prep Kits and enriched with SureSelectXT, XT2, and QXT target enrichment systems.
In general, the SureSelectXT2 platform showed similar performance compared to SureSelectXT. In the SureSelectXT2 workflow, samples are pooled prior to hybridization, which confers ease of use and cost advantages. ThruPLEX DNA-seq kit can be integrated very conveniently with the SureSelectXT2 platform, requiring only minor adjustments to the protocol and additional universal blocking oligos from IDT (Table II). The SureSelectQXT platform provided higher mean target coverage and required the shortest hybridization time. Enrichment with SureSelectQXT also appeared to be more efficient despite variable input amounts. However, SureSelectQXT resulted in much higher AT-dropout rates (Figure 3), which may be the consequence of the temperature cycling during hybridization used in its protocol.
Additional reagents required
Omitted reagents
SureSelectXT Reagent Kit
Illumina P5 Primer
Illumina P7 Primer
xGen Universal Blocking Oligo i5
xGen Universal Blocking Oligo i7
SureSelect TE Kit Indexing Hyb Module Box #2
SureSelect ILM Indexing Pre Capture PCR Reverse Primer
SureSelect ILM Indexing Post Capture Forward PCR Primer
SureSelect Library Prep Kit
SureSelectXT2 Reagent Kit
xGen Universal Blocking Oligo i5
xGen Universal Blocking Oligo i7
XT2 Pre-capture Indexes
XT2 Library Prep Kit, except
SureSelect Herculase II Master Mix
XT2 Primer Mix
SureSelectQXT Reagent Kit
Illumina P5 Primer
Illumina P7 Primer
xGen Universal Blocking Oligo i5
xGen Universal Blocking Oligo i7
QXT Library Prep Kit, Box 2, except
Herculase II Fusion DNA Polymerase
5X herculase II Reaction Buffer
100 mM dNTP Mix (25 mM each dNTP)
QXT TE Kit, Hyb Module, Box #1
SureSelect QXT Stop Solution
QXT TE Kit, Hyb Module, Box #2
QXT Primer Mix
Table II. SureSelect compatibility. List of reagents used when integrating ThruPLEX DNA-seq kit with SureSelect Target Enrichment Systems.
Deep sequencing data was also generated using an Illumina NextSeq® 500. ThruPLEX DNA-seq gDNA/SureSelectXT libraries at 10 ng input required less than 1 gb of additional sequencing data than the SureSelectXT gDNA library at 200 ng input to yield 20X coverage of at least 80% of the exome (Figure 4). As expected, more sequencing data is required for libraries made from 10 ng of cfDNA. This is likely due to decreased diversity due to reduced input amount and lower capture efficiency of plasma cfDNA samples. For the ThruPLEX DNA-seq/SureSelectXT library, 100 M total 75-base-reads per sample were adequate for SNV calling of at least 85% of the exome (Figure 5). From the NextSeq 500 high output run (2 x 75 bp), up to 8 cfDNA libraries prepared from 10 ng input could be sequenced to achieve at least 80% coverage of the exome.
Conclusions
By integrating the ThruPLEX DNA-seq kit with Agilent SureSelect Target Enrichment Systems, we were able to exploit ThruPLEX technology's high sensitivity to perform library preparation and whole exome enrichment using the low amounts of cfDNA present in plasma samples. The amount of data generated is adequate for SNV calling. Compatibility of ThruPLEX DNA-seq kit with SureSelect platforms can be easily attained with minor adjustments to the SureSelect protocols and with the addition of universal blocking oligos and sequencing primers. The SureSelectXT2 platform, in which samples are pooled prior to hybridization, is the simplest to integrate. In addition to its higher sensitivity and excellent performance, ThruPLEX DNA-seq kit offers a faster and simpler workflow with a single-tube, three-step protocol. An integrated enrichment method combining ThruPLEX and SureSelect technologies will be instrumental in translational genomic research where the DNA of interest is present in limiting quantities.
Methods
DNA isolation
Plasma samples were acquired from Medical Research Networx, LLC. Blood was collected from healthy donors into BD Vacutainer EDTA tubes and plasma was separated by double centrifugation at 4°C for 12 minutes at 1,500g. Processed plasma samples were stored at –80°C until DNA was extracted. Qiagen QIAamp Circulating Nucleic Acid Kit was used to extract DNA from 5 ml of plasma. DNA quantity and size distribution were measured using Qubit (Thermo Fisher Scientific) and a Bioanalyzer (Agilent), respectively.
Library preparation
Libraries were prepared from either cfDNA isolated from plasma samples or Covaris-sheared (average size 200 bp) NA12878 genomic DNA (gDNA) using ThruPLEX DNA-seq kit with dual indexes at different input amounts (Figure 1). The quality of prepared libraries was verified on Qubit and Bioanalyzer (Figure 1). All cfDNA libraries enriched on the same SureSelect platform were prepared from the same plasma sample. As a reference, libraries were also prepared with SureSelect Library Prep Kits using the lowest input amounts recommended by the manufacturer.
Whole exome enrichment and sequencing
Amplified libraries were purified using Agencourt AMPure XP (Beckman Coulter, Cat. # A63880) and eluted in 20–50 μl of PCR grade water. Prior to enrichment, purified libraries were individually assessed using a Qubit and a Bioanalyzer. For enrichment using the SureSelectXT2 platform, purified libraries were pooled to obtain 1.5 μg of indexed DNA. For SureSelectXT and QXT platforms, the entire volume of each ThruPLEX DNA-seq library was used for exome enrichment (Table III).
Whole-exome enrichment reagent kit and capture library
Sample
Library preparation kit
Library prep input (ng)
PCR cycles
Yield (ng)
Capture input (ng)
SureSelectXT Reagent Kit SureSelectXT Human All Exon V5
cfDNA
ThruPLEX DNA-seq kit
0.5
14
572
572
2
11
388
388
10
9
577
577
gDNA
ThruPLEX DNA-seq kit
10
7
449
449
SureSelectXT Library Prep Kit
200
10
2,430
750
SureSelectXT2 Reagent Kit SureSelectXT2 Human All Exon V5
cfDNA
ThruPLEX DNA-seq kit
0.5
14
610
610
2
11
343
343
10
9
430
430
gDNA
ThruPLEX DNA-seq kit
10
7
268
268
SureSelectXT2 Library Prep Kit
100
8
1,366
375
SureSelectQXT Reagent Kit SureSelectXT Human All Exon V5
cfDNA
ThruPLEX DNA-seq kit
0.5
14
837
837
2
11
396
396
10
9
496
496
gDNA
ThruPLEX DNA-seq kit
10
7
365
365
Table III. Experimental design. For each SureSelect Target Enrichment System, libraries were prepared from cfDNA or gDNA using ThruPLEX DNA-seq kit or the corresponding SureSelect Library Prep Kit. Whole exome enrichment was carried out using the SureSelect Reagent Kits and Human All Exon V5 probe sets.
Exome enrichment was performed using the SureSelect Reagent Kits and SureSelect Human All Exon V5 probe sets (Table III). To integrate ThruPLEX DNA-seq kit with the SureSelect platforms, reagent use was modified (Table II; see Results section). For all three platforms, IDT xGen Universal Blocking Oligos (TS HT-i7 and TS HT-i5) were spiked into the blocking mixture containing ThruPLEX DNA-seq libraries prior to hybridization with the probes. The xGen Universal Blocking Oligos were each resuspended to 1 μl per reaction (1 nmol) in nuclease-free water prior to use. For SureSelectXT and QXT platforms, Illumina P5 and P7 primers were used for post-capture amplification of the ThruPLEX DNA-seq libraries. All samples were subjected to 10 cycles of post-capture amplification to produce the final sequencing libraries.
Sequencing
Pooled samples were quantified using KAPA Library Quantification Kit and loaded onto Illumina MiSeq® v3 flow cells. Reactions were carried out as 2 x 75 bp paired-end runs, and approximately 0.6–1.5 M reads per sample were generated. Selected samples were also sequenced on an Illumina NextSeq 500 as a 2 x 75 bp high output paired-end run.
Data analysis
Sequence reads were analyzed using DNANexus. Reads were mapped to the human genome reference, hg19, using the Burrows-Wheeler Alignment algorithm, BWA-MEM, to generate BAM files for each sample. BAM files were downsampled to obtain equal numbers of reads, and duplicates were marked using Picard Mark Duplicates (Li and Durbin 2009; Broad Institute 2017). Output files from Picard Mark Duplicates were used to determine quality metrics related to the whole exome capture and sequencing using Picard CalculateHsMetrics.
References
Broad Institute. Picard Tools - A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. http://broadinstitute.github.io/picard/
Klevebring, D. et al. Evaluation of Exome Sequencing to Estimate Tumor Burden in Plasma. PLoS One9, e104417 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics25, 1,754–1,760 (2009).
Murtaza, M. et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature497, 108–112 (2013).
Williamson, R. Properties of rapidly labelled deoxyribonucleic acid fragments isolated from the cytoplasm of primary cultures of embryonic mouse liver cells. J. Mol. Biol.51, 157–168 (1970).