NGS library preparation with integrated enzymatic fragmentation—ThruPLEX DNA-seq FLEX EF
Note: The protocols and QC procedures for ThruPLEX DNA-Seq HV PLUS have been updated to accommodate lower inputs and compatibility with the Unique Dual Index Kit sets. While product naming has been revised accordingly (ThruPLEX DNA-Seq FLEX EF), reagent formulations remain unchanged.
Successful next-generation sequencing (NGS) experiments rely on a streamlined, straightforward workflow that accommodates a wide range of sample inputs and produces consistent results. ThruPLEX DNA-Seq FLEX EF combines the complete, efficient, and accurate workflow of ThruPLEX DNA-Seq FLEX with an enzymatic fragmentation module to produce Illumina®-ready libraries. With the same three-step, one-tube workflow as ThruPLEX DNA-Seq FLEX, ThruPLEX DNA-Seq FLEX EF fragments and repairs in parallel to decrease hands-on time and to remove extraneous steps for separate fragmentation (Figure 1). This industry-leading, single-tube workflow prevents sample loss and eliminates the need for time-consuming post-ligation bead purification.
Figure 1. ThruPLEX DNA-Seq FLEX EF single-tube library preparation workflow. The ThruPLEX DNA-Seq FLEX EF workflow consists of three simple steps that take place in the same well or PCR tube, eliminating the need to purify or transfer the sample material. In this latest version of the ThruPLEX technology, an enzymatic fragmentation step at the start of the protocol streamlines the generation of higher-complexity libraries from double-stranded DNA, up to 200 ng in 30 µl.
ThruPLEX DNA-Seq FLEX EF accommodates up to 200 ng of starting input with a volume up to 30 µl, similar to its predecessor. The libraries produced by ThruPLEX DNA-Seq FLEX EF are ready to be used directly for whole genome sequencing applications or enriched using a custom panel of the leading target-enrichment platforms.
Results
Familiar workflow, consistent performance—improved with fragmentation
The ThruPLEX DNA-Seq FLEX EF library preparation system expands upon the industry-leading single-tube workflow of the ThruPLEX DNA-Seq FLEX with a modified template preparation step involving enzymatic fragmentation of intact DNA inputs. Included with the enzymatic fragmentation module are optimized protocols for generating library insert fragments of 300 and 450 bp. Fragment size can be modulated by simply varying the concentration of fragmentation enzyme. The inclusion of enzymatic fragmentation in the first step of the workflow eliminates the need for additional enzymatic or mechanical fragmentation steps.
ThruPLEX DNA-Seq FLEX EF performs comparably to ThruPLEX DNA-Seq FLEX (without enzymatic fragmentation), providing similar coverage uniformity across a range of inputs, including at an input of 5 ng (Figure 2).
Figure 2. Reliable coverage and similar performance. ThruPLEX DNA-Seq FLEX (y-axis) and ThruPLEX DNA-Seq FLEX EF (x-axis) provide robust performance across a broad range of inputs. Correlation plots are shown for sequencing libraries generated with ThruPLEX DNA-Seq FLEX and ThruPLEX DNA-Seq FLEX EF with inputs of 5, 50, and 200 ng of NA12878 and downsampling of resulting sequencing data to 5 million total reads. Coverage of each 100 kb region of hg19 was compared across inputs.
Uniform GC coverage
The robustness of a library preparation kit depends on its ability to accurately and uniformly cover an assortment of challenging genomes of varying GC content. ThruPLEX DNA-Seq FLEX, when used with DNA inputs subjected to mechanical shearing, have been previously demonstrated to easily handle complex genomes. Like ThruPLEX DNA-Seq FLEX, ThruPLEX DNA-Seq FLEX EF provides consistent GC coverage across a range of input amounts (Figure 3, Table 1). This impressive uniformity can also be observed in microbial samples of varying GC content (Figure 4, Table 2).
Figure 3. Consistent GC coverage across inputs. Libraries were prepared in triplicate from 5, 50, and 200 ng inputs of NA12878 gDNA. Libraries were generated following ThruPLEX DNA-Seq FLEX EF (purple, red, and green curves) and ThruPLEX DNA-Seq FLEX (yellow, blue, and black curves) protocols. Paired-end sequencing was performed on an Illumina NextSeq® 500/550 Mid Output Kit v2.5 (150 Cycles), and data were downsampled to 5 million total reads per sample. The vertical blue bars represent the expected GC content distribution using 100-bp windows.
Input
Total reads alligned
% reads aligned
% chimera
% duplicate
ThruPLEX DNA-Seq FLEX
200 ng
4.14E+06
96.79%
0.60%
0.72%
50 ng
4.83E+06
96.69%
0.49%
0.75%
5 ng
4.84E+06
96.73%
0.50%
0.80%
ThruPLEX DNA-Seq FLEX EF
200 ng
4.73E+06
95.10%
2.59%
0.82%
50 ng
4.76E+06
96.03%
1.54%
0.77%
5 ng
4.75E+06
96.11%
1.12%
0.91%
Table 1. Comparison of ThruPLEX DNA-Seq FLEX and ThruPLEX DNA-Seq FLEX EF. Processed data from ThruPLEX DNA-Seq FLEX and ThruPLEX DNA-Seq FLEX EF over a range of input DNA from 5 ng to 200 ng. % reads aligned refers to those successfully aligned to a reference genome. % chimera refers to the percentage of reads that align to two distinct portions of the genome. % duplicate refers to the percentage of reads originating from a single fragment of DNA, typically during library construction via PCR.
Figure 4. Uniform coverage in libraries with extreme base content. Panels A–C. Libraries were amplified in triplicate with ThruPLEX DNA-Seq FLEX (blue and purple curves) or ThruPLEX DNA-Seq FLEX EF (green and yellow curves) chemistry using 50-ng and 5-ng inputs of (Panel A) Haemophilus influenzae 51907D-5, (Panel B) Escherichia coli 11303, or (Panel C) Rhodopseudomonas palustris BAA-98D-5 (ATCC). After purification with AMPure beads, paired-end sequencing was performed on a NextSeq 150 Cycle Mid Output (2 x 75 bp). Normalized coverage is represented by the colored lines and the expected number of 100-bp regions at each %GC is represented by the vertical bars. The H. influenzae genome is 1.83 Mb in size, with 38% GC content. The E. coli genome is 4.7 Mb in size, with 51% GC content. The R. palustris genome is 5.46 Mb in size, with 65% GC content. ThruPLEX DNA-Seq FLEX and ThruPLEX DNA-Seq FLEX EF demonstrate similar performance when preparing libraries using microbial sample input with a variety of GC contents.
Chemistry
Starting input
Genome
Genome size
% GC
Total reads aligned
% reads aligned
Mean coverage
% chimera
% duplication
ThruPLEX DNA-Seq FLEX EF
50 ng
H. influenzae
1.83 Mb
38%
3.52E+06
94.5%
114
2.93%
3.72%
E. coli
4.7 Mb
51%
3.56E+06
93.7%
44
4.58%
2.47%
R. palustris
5.46 Mb
65%
3.46E+06
94.6%
33
4.83%
3.05%
ThruPLEX DNA-Seq FLEX
50 ng
H. influenzae
1.83 Mb
38%
3.77E+06
96.2%
115
2.47%
1.13%
E. coli
4.7 Mb
51%
3.74E+06
95.2%
44
2.76%
0.88%
R. palustris
5.46 Mb
65%
3.43E+06
96.8%
37
2.35%
1.14%
ThruPLEX DNA-Seq FLEX EF
5 ng
H. influenzae
1.83 Mb
38%
3.63E+06
94.4%
110
3.08%
3.19%
E. coli
4.7 Mb
51%
3.62E+06
93.1%
42
4.30%
1.88%
R. palustris
5.46 Mb
65%
3.54E+06
94.2%
35
4.35%
2.30%
ThruPLEX DNA-Seq FLEX
5 ng
H. influenzae
1.83 Mb
38%
3.68E+06
95.4%
111
2.88%
1.26%
E. coli
4.7 Mb
51%
3.72E+06
95.0%
44
2.36%
0.90%
R. palustris
5.46 Mb
65%
3.66E+06
96.5%
38
2.47%
1.05%
Table 2. Bacterial genomic data comparing ThruPLEX DNA-Seq FLEX and ThruPLEX DNA-Seq FLEX EF. Processed data over two sample DNA inputs of 5 ng and 50 ng for H. influenzae, E. coli, and R. palustris with the corresponding GC content of each genome. % mapped refers to those successfully aligned to a reference genome. Mean coverage is the average of the number of unique reads for a given nucleotide in a specific position in the reconstructed sequence. % chimera refers to the percentage of reads that align to two distinct portions of the genome. % duplicate refers to the percentage of reads originated from a single fragment of DNA, typically during library construction via PCR.
Conclusion
ThruPLEX FLEX chemistry is engineered and optimized to generate DNA libraries with high molecular complexity and balanced GC representation from input volumes of up to 30 µl. Through workflow optimization, reformulation, and the addition of an enzymatic fragmentation module, ThruPLEX DNA-Seq FLEX EF perform size-tunable enzymatic fragmentation and template repair in parallel. The entire workflow is performed in a single tube in about two and a half hours and requires only 15 min of hands-on time.
ThruPLEX DNA-Seq FLEX EF demonstrates consistent performance comparable to ThruPLEX DNA-Seq FLEX commonly used with mechanical shearing of DNA inputs. Both chemistries provide unbiased GC coverage across input ranges and for microbial samples with varying GC content. The integration of an enzymatic fragmentation module into the ThruPLEX DNA-Seq FLEX workflow provides a more versatile and user-friendly solution for your NGS library preparation needs.
Methods
DNA preparation
Human genomic DNA (NA12878) and bacterial genomic DNA from Haemophilus influenzae 51907D-5, Escherichia coli 11303, or Rhodopseudomonas palustris BAA-98D-5 (ATCC) were left intact (ThruPLEX DNA-Seq FLEX EF kits) or mechanically sheared for correct size on a Covaris M220 following the 250- or 200-bp shearing protocol and evaluated on an Agilent 2100 BioAnalyzer using High Sensitivity DNA Reagents (ThruPLEX DNA-Seq FLEX kits). Concentrations of intact and sheared samples were measured using a Qubit 2.0 Fluorometer with the dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific).
Library preparation
Libraries were prepared according to the ThruPLEX DNA-Seq FLEX User Manual or ThruPLEX DNA-Seq FLEX EF User Manual. Amplified libraries were purified using AMPure XP (Beckman Coulter) and eluted in low-TE buffer for whole genome sequencing (WGS). Size of purified libraries was assessed by Agilent 2100 BioAnalyzer using High Sensitivity DNA Reagents. Libraries were quantified by Qubit 2.0 Fluorometer with Quant-IT dsDNA Assay Kit, high sensitivity (Thermo Fisher Scientific).
Illumina sequencing
Quantified post-PCR libraries were pooled and loaded onto an Illumina NextSeq® 500/550 Mid Output Kit v2.5 (150 Cycles) flow cell for sequencing. Libraries were loaded following Illumina’s recommended loading concentrations.
Data analysis
Raw sequencing reads were downsampled to equal numbers across all samples using seqtk (v1.3-r106) and quality processed to remove adapters and low-quality bases using trimmomatic (v0.36). Quality processed reads were aligned to the UCSC hg19 reference genome with bowtie2 (v2.3.4.3) with default parameters. Resulting SAM files were sorted by coordinates using Picard SortSam (v2.18.3) and converted to BAM files with samtools view (v1.8). Duplicate reads were identified and marked from sorted BAM files with Picard MarkDuplicates (v2.18.3) and used as input to collect alignment, insert size, GC bias, and various WGS metrics with Picard AlignmentSummaryMetrics (v2.18.3), Picard CollectInsertSizeMetrics (v2.18.3), Picard CollectGcBiasMetrics (v2.18.3), and Picard CollectWgsMetrics (v2.18.3), respectively.