Using Unique Molecular Identifiers (UMIs) in NGS experiments

Date: September 17, 2018

Author: Takara Bio Blog Team

Research in next-generation sequencing (NGS) is rapidly evolving, and the ability to confidently detect low-frequency alleles or differentiate between molecules is now critical to the development of highly sensitive, NGS-based assays. Unique Molecular Identifiers (UMIs) can be useful in many situations, providing two major utilities—error correction and molecular de-duplication.

In NGS, errors are introduced during PCR, sequencing, and base calling. The addition of UMIs allows removal of errors to reduce background noise and false-positive rate, allowing you to confidently detect rare alleles with high sensitivity and specificity. In addition, since the UMIs label each DNA fragment at the beginning, bioinformatics solutions will enable you to remove PCR duplicates, distinguish molecular duplicates, and consequently obtain a valid count of the starting molecules. This provides accurate coverage information of the genomic regions of interest. The following mini-FAQ will help you understand what UMIs are, how they work, and how they can be applied to your research.

What are UMIs?

UMIs are unique sequences used to tag individual DNA fragments prior to amplification, to enable tracking of the fragments through the library preparation, target enrichment, and the data analysis processes. In each ThruPLEX Tag-seq kit, over 16 million UMIs are provided to ensure unique labeling of every fragment.

How do UMIs work?

For the ThruPLEX Tag-seq Kit, we designed our proprietary stem-loop adapters to contain a unique sequence made up of degenerate bases that act as the 'tag.' The molecularly-tagged adapters are incorporated during the ligation step of the library preparation process to label the starting DNA molecules. This allows the sequencing reads to be grouped during data processing into amplification families based on their UMIs.

Through bioinformatics analysis, the reads within each amplification family are compared and PCR artifacts and sequencing errors are removed to form a consensus sequence.

When should I use UMIs?

Molecular tagged libraries are useful for confident detection of mutations below 5% allele frequency, with many researchers interested in the range around and/or below 1%. There is, however, a price to pay for higher detection sensitivity and specificity (see section below).

Conventional NGS libraries are sufficient for detecting mutations around 5% allele frequency and above; however, most PCR and sequencing errors are below this range.

What else do I need to consider when sequencing libraries with UMIs?

Redundant or deep sequencing is required to realize the full benefit of unique molecular tags. Each unique DNA fragment is sequenced many times, and the duplicate reads are compared to eliminate PCR and sequencing errors to reach a consensus sequence representing the original, unique fragment. One common method to help mitigate this issue is to use targeted sequencing approaches in order to reduce the size of the genomic target, requiring a lower number of overall sequencing reads for each sample.

Also, specific data analysis tools are required to identify duplicate reads and to generate a consensus sequence from the duplicate reads.

Back to Blog Front

Takara Bio USA, Inc.
United States/Canada: +1.800.662.2566 • Asia Pacific: +1.650.919.7300 • Europe: +33.(0)1.3904.6880 • Japan: +81.(0)77.565.6999
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES. © 2025 Takara Bio Inc. All Rights Reserved. All trademarks are the property of Takara Bio Inc. or its affiliate(s) in the U.S. and/or other countries or their respective owners. Certain trademarks may not be registered in all jurisdictions. Additional product, intellectual property, and restricted use information is available at takarabio.com.