Home › Learning centers › Next-generation sequencing › Bioinformatics resources › Cogent NGS Analysis Pipeline v1.0 User Manual

Learning centers

New products

Need help?

Contact Sales

Cogent NGS Analysis Pipeline v1.0 User Manual

(Last updated: 01-Oct-2020)

Cogent NGS Analysis Pipeline (CogentAP) is bioinformatic software for analyzing RNA-seq NGS data generated using the following systems or kits:

ICELL8 cx Single-Cell System or the ICELL8 Single-Cell System on the single-cell full-length transcriptome (SMART-Seq ICELL8 workflow)
ICELL8 cx Single-Cell System or the ICELL8 Single-Cell System on the single-cell differential expression (3′ DE or 5′ DE) workflows (ICELL8 3′ DE or ICELL8 TCR)
SMARTer Stranded Total RNA-Seq Kit v3 - Pico Input Mammalian

The program takes input data from sequencing and outputs an HTML report, with results typical to single-cell analysis, plus other files, such as a gene matrix, to continue further analysis. R data object with pre-computed results based on recommended parameters are also output. Either the standard output files or the R data object can serve as input for Cogent NGS Discovery Software (CogentDS), another bioinformatic software package provided by Takara Bio.

CogentAP software is written in Python and can be run either via a GUI or command-line interface.

I. Before you begin

A. Supported operating systems

CogentAP software is designed to be installed on a server running Linux. The following versions of Linux have been tested and are supported for use with CogentAP software:

CentOS 6.9 & 6.10
RedHat 7.5
Ubuntu 17

B. Hardware requirements

For analyzing the output of Illumina NextSeq® High-Output sequencing data analysis, the following server requirements (or better) are recommended:

CPU: 24-cores
RAM: 64 GB
Free hard drive space: 500 GB

Testing was also done on MiniSeq™, MiSeq®, HiSeq®, and NovaSeq™ datasets.

MiniSeq or MiSeq—less computational power may be needed than the specifications described above for NextSeq output
HiSeq or NovaSeq—requires more computational power than described above for NextSeq output

Precise hardware requirements were not determined for output from these datasets. Support for performance issues of the servers in conjunction with these dataset types may be limited.

C. User account requirements

CogentAP software can be installed in two different access scenarios; the user account requirement depends on which scenario you wish to implement in your environment.

For use by a single user (single Linux username/account), CogentAP software can be installed and run by any account type with install and executable privileges (regular or root access).
For use by multiple users (accounts), CogentAP software can either be installed singly across the multiple accounts (regular or root access) or installed for system-wide access by an administrator with root access.

D. Additional hardware and software dependencies and recommendations

ICELL8 cx Single-Cell System or ICELL8 Single-Cell System
Bash UNIX shell
Internet connectivity on the server
The installation process requires Internet connectivity as it sources scripts from GitHub, Bioconda, and CRAN and downloads genome information from Ensembl. Please ensure that internet connectivity is available on the UNIX server while installing.
Conda
CogentAP software leverages the open-source package manager Conda for installation of the pipeline and its dependencies. Any tools and applications required by the pipeline are installed through Conda inside a local environment created specifically for CogentAP software.

If Conda is not currently installed on the server, instructions to do so can be found at https://conda.io/miniconda.html. We recommend installation of the lightweight version for Python 3.7+ (typically the 64-bit bash installer), also called Miniconda3 (version 4.8.2 or later).
bcl2fastq
CogentAP takes as input raw-fastq files that must be converted from the sequencer output FASTQ files using the conversion software bcl2fastq.

If you're not sure where to find bcl2fastq in your environment, it can be downloaded from https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html.
Keyboard, monitor, and mouse directly into the server, or a remote access program
CogentAP software must be run on the Linux server. If users do not have direct console access, a remote access program that enables a Virtual Network Computing (VNC) connection is required through a program such as RealVNC (realvnc.com), TightVNC (tightvnc.com), TigerVNC (tigervnc.org), or similar.

For more information on VNC, along with other VNC clients that can be used, please see the Wikipedia entry at https://en.wikipedia.org/wiki/Virtual_Network_Computing.

E. Required input files

Paired-end read FASTQ files
Sample description file
- Well-list file—a well-selection text file derived from CellSelect software that contains well-level sample information. A typical CellSelect well-list file should have no more than 1,728 wells.
- Well-list-like format file—a text file that contains sample information, including columns "Barcode" and "Sample". Each column name is case sensitive. The "Barcode" column contains i7 and i5 indexes concatenated with a plus-sign ("+") (e.g., TAGCGAGT+CCGTTGCG).

II. Software overview

CogentAP software consists of two main parts, the demultiplexer (demuxer) and the analyzer.

The demultiplexer extracts the barcode from the sequencing data (based on the protocol) and writes it into FASTQ files at the end of the read name. There are two options:
- The default leaves the barcode-assigned reads in combined FASTQ files (which saves time during subsequent analysis)
- The second option splits the data up into barcode-level FASTQ files which can be carried over to other analysis pipelines
The analyzer takes the data sent to it by the demultiplexer and performs the following functions:
- Read trimming (using Cutadapt)
- Genome alignment (using the STAR aligner)
- Calculating Unique Start Stop (USS) positions (using the samtools and bedtools, only for UMI-enabled kits)
- Counting (using Subread)
- Summarization (using custom scripts)
- Generating an HTML report (using CogentDS)

The demultiplexer and analyzer can be invoked using a graphical user interface (GUI), called the CogentAP launcher, or can be run on the command line.

III. Installation & configuration options

The CogentAP software is available for download on the ICELL8 software portal at takarabio.com/ICELL8-software. An email will be sent to you with information on downloading the install script and a unique password required to run the installation on your server.

A. Verifying Conda installation

The following steps can be used to verify that Conda is installed properly on the server.

Type the following command in at the prompt in any directory location on the server:

conda -V

If Conda is successfully installed, it should return text with the version number.

e.g.,
conda 4.8.2
Check to see if the base Conda environment can be activated.
1. Type the following command into the command-line Linux prompt on the server:
  
  source activate
  Or, if that returns an error, try:
  
  conda activate
  A successful Conda install will result in a change in the prompt, as shown in Figure 2.
  
  Figure 2. High-level screenshot of the Linux command line showing a successful check of the base Conda environment.
2. If Conda is successfully installed and the prompt changed as displayed in Figure 2, type the following command to return to the default Linux prompt:
  
  source deactivate
  
  or
  
  conda deactivate
  
  This command will take the user back to the Linux prompt and out of the Conda environment.
Installation of Miniconda3 typically adds the location of its installation to the user's system environment. This is also required for the successful installation of CogentAP software.

The following steps can be used to verify that the Conda $PATH is configured correctly.
1. Open the file .bash_profile, which will be located in the home directory (for an individual user account):
  
  more ~/.bash_profile
2. Verify a line similar to the following is showing in the file:
  
  export PATH="/home/<USERNAME>/miniconda3/bin:$PATH"
  where <USERNAME> is replaced by the username of the account that installed Conda.
  
  e.g., username is ‘myacct':
  
  export PATH="/home/myacct/miniconda3/bin:$PATH"
  If the line isn't displaying or the .bash_profile file does not exist, it will need to be manually created and populated. For more information on setting an environment variable, see a UNIX user manual or a forum post like https://stackoverflow.com/a/7502128.

B. Installation

Download the installation script (takarabio_Linux64_installer.sh), following the directions on the thank you page or the email sent after submitting the sign-up form on the CogentAP product page.
Move or copy the installation script onto the Linux server into the directory location where you want to install the CogentAP software.

NOTE: The account logged into while doing the installation must have read/write privileges to the install directory chosen.
From the same directory location in Step 2, run the following command:

bash takarabio_Linux64_installer.sh CogentAP <PASSWORD>

<PASSWORD> will be replaced by the unique password included in the sign-up confirmation email.

NOTE: The installation process at this point will take anywhere from 90 minutes to 3 hours to complete, depending on the computational capacity of the server and the download speed of the internet connection. No further user input is required until the install is complete. The installation procedure may be left to run overnight, with the user not present at the console. If desired, the user can work in another terminal window simultaneous to the install.

Once the installation is complete, a message will display saying CogentAP software is ready to use.

Figure 3. Text to console illustrating a successful CogentAP software install on the Linux server.

There should be a folder named CogentAP in the directory into which the software was installed (Step 2) which contains files and directories required by the pipeline’s scripts.

Figure 4. The sub-directory and files list of the CogentAP/ folder post-install.

C. (Optional) Set up a $COGENT_AP_HOME environmental variable

For ease of use, it's recommended that the CogentAP software install directory location be added to the .bash_profile as a permanent environmental variable.

Example:

If your account name is 'myacct', the absolute pathname for myacct's home directory is /home/myacct, and CogentAP software was installed in the ~/bin directory, edit .bash_profile to add the following line:

export COGENT_AP_HOME=/home/myacct/bin/CogentAP

Once added to the profile, you will either need to log out and back in to the account, or load the file in with the following command:

source ~/.bash_profile

The phrase $COGENT_AP_HOME can then be used as an alias shortcut to reference /home/myacct/bin/CogentAP.

Example:

Running the following command logged in as 'myacct':

cd $COGENT_AP_HOME

will change directories to ~/bin/CogentAP.

NOTE: Subsequent references to $COGENT_AP_HOME in this document refer to the full path where CogentAP software is installed.

D. Updating the pipeline scripts

To update CogentAP, run the following two commands in sequence:

cd $COGENT_AP_HOME

sh CogentAP_setup.sh update

E. Add a genome build

The human genome (Ensembl hg38 release 94) is built as default with the pipeline, but the genomes of other species can be loaded into the software post-install. To do so, download the following two files for the genome of interest:

The FASTA file containing all the sequences (chromosomes and contigs)
The GTF file containing the annotation, importantly the gene information for analysis
Run the script:

$COGENT_AP_HOME/cogent add_genome \ -g <common_species_name> \ -f <FASTA_FILENAME> \ -a <GTF_FILENAME>

Where <common_species_name> is the name of the genome being added (e.g., fruitfly) and the <FASTA_FILENAME> and <GTF_FILENAME> are the exact file names and path(s) for the FASTA and GTF files, respectively, on the server. For additional help with this script, type:

$COGENT_AP_HOME/cogent add_genome -h

Example, using the fruit fly genome from Ensembl.org (https://uswest.ensembl.org/info/data/ftp/index.html):

Download the FASTA file (Drosophila_melanogaster.BDGP6.dna.chromosome.2L.fa) using FTP:

wget ftp://ftp.ensembl.org/pub/release-94/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz

or copy and paste the URI:

ftp://ftp.ensembl.org/pub/release-94/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.dna.toplevel.fa.gz
into the address bar of a web browser.
Download the GTF file (Drosophila_melanogaster.BDGP6.94.gtf):

wget ftp://ftp.ensembl.org/pub/release-94/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.94.gtf.gz

or copy and paste:

ftp://ftp.ensembl.org/pub/release-94/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.94.gtf.gz
into the address bar of a web browser.
If the downloaded files are stored in ~/ensembl directory in "myacct", run the following script:

$COGENT_AP_HOME/cogent add_genome \ -g fruitfly \ -f ~/ensembl/Drosophila_melanogaster.BDGP6.dna.chromosome.2L.fa \ -a ~/ensembl/Drosophila_melanogaster.BDGP6.94.gtf

NOTE: As FASTA and GTF files are a standard file format, files from any source, should work with this script. However, this script has only been tested on genomes downloaded from Ensembl. If a problem is encountered using files from another source it is recommended to try the import using the Ensembl file versions of the genome.

F. Uninstall

CogentAP software can be uninstalled by deleting the CogentAP/ directory defined as $COGENT_AP_HOME from the server.

If $COGENT_AP_HOME has been defined in .bash_profile, edit the file to remove the reference to $COGENT_AP_HOME as well.

IV. Running the pipeline

Before running an analysis, raw-fastq files need to be generated from the sequencer-output FASTQ files. Once those are created, CogentAP can be run in either of two ways:

Using a graphical user interface (GUI) (Section IV.B)
On the command line (CLI) (Section IV.C)

A. Generating raw-fastq files

CogentAP demultiplexer takes two, one-paired raw-fastq files as input (not split by barcode).

Log in to the server that stores the sequencing run output folder. This server will typically have the program bcl2fastq installed (see Section I.D for more information about bcl2fastq).
Change to the directory where you want to generate the raw-fastq files.
Run bcl2fastq with the following syntax:

bcl2fastq -R <RUN_FOLDER> \ -o <RUN_ID> \ --no-lane-splitting \ --sample-sheet $COGENT_AP_HOME/config/SampleSheet_dummy.csv > <RUN_ID>.stdout \ 2> <RUN_ID>.stderr

where
- <RUN_FOLDER> is the path to the sequencing run folder
- <RUN_ID> is the ID automatically generated by Illumina sequencer
The file SampleSheet_dummy.csv is stored in the CogentAP config folder, under the main install folder.

NOTE: Recent versions of bcl2fastq (2.17 and higher) have a bug where the indexes required for demultiplexing will not be inserted into the raw-fastq if a samplesheet is not specified in the command syntax. We recommend using the SampleSheet_dummy.csv file provided for every run of the bcl2fastq command to generate the raw-fastq files in order to prevent encountering that issue.
Retrieve the raw-fastq files. These are typically named Undetermined_*.fastq.gz and are generated in the <RUN_ID> folder in the working directory from Step 2.

NOTE: It is recommended that the raw-fastq files are moved to a directory on the server on which CogentAP is installed in order to reduce processing time.

B. GUI

The CogentAP GUI is a program called CogentAP_launcher. Since it's graphical, it should be accessed by connecting to the Linux server using a remote access tool like VNC (see Section I.D, above).

Connect to the Linux console where CogentAP software is installed, either remotely or via a graphical interface on the server.
Once connected, navigate using the file browser functionality to the folder in which CogentAP software is installed. In Figure 5, the CogentAP install folder is located on the user's desktop.

Figure 5. The CogentAP/ folder on the desktop, opening it to highlight the CogentAP_launcher executable file.
In the install folder, locate the file CogentAP_launcher and double-click on it to run. This will open the CogentAP GUI (Figure 6).

Figure 6. High-level view of the CogentAP GUI.

The launcher has four tabs: Run Information, Advanced Options, Demuxer Options, and Analyzer Options.
1. The Run Information tab takes as input:
  - The workflow (default being 'Demultiplexer & Analyzer')
  - R1 and R2 raw-fastq files (generated in Section IV.A)
  - The well-list file (see Section I.E for more information about this file)
  - The protocol: full-length, 3′ DE, or 5′ DE
  - The genome to use for alignment (default: human hg38)
  - The directory location in which to output the results
  - The name for a new analysis folder, which will be created in the directory location specified above and will contain the analysis output files
  NOTE: Use the scrollbar on the right of the window to view all the fields.
The other three tabs are intended for advanced users only, but their purposes are listed below:
1. Advanced Options can be used to modify parameters such as the number of processor cores available for analysis, the number of mismatches to consider during barcode assignment, etc.
2. Demuxer Options can be used to modify parameters on the demultiplexer process. The UMI length can be changed in this tab. Please input corresponding length to the kit used, such as SMARTer Stranded Total RNA-Seq Kit v3 - Pico Input Mammalian.
3. Analyzer Options tab can be used to modify parameters of the analyzer's process such as normalizing method, skip trimming, etc.
Once all the fields have been appropriately filled, click on [Start] to launch the pipeline. The process typically takes a few hours to run.
Upon successful completion, the UI displays a "Finished" message as shown in the right panel of Figure 6 (above).

C. Command-line interface

NOTE: See Section V.B for an example of the full syntax for the command-line scripts.

CogentAP software starts from the main script, cogent, and has three subcommands: demux, analyze, and add_genome. The demux and analyze commands will be covered in this section.

In general, you would first run the demuxer (cogent demux) to obtain the demultiplexed FASTQ files. These files are then used as input to run the analyzer (cogent analyze) to obtain the output files (described in detail in Section VI). These scripts can be launched from any location (working directory) on the Linux server where CogentAP software is installed.

The full list of arguments can be accessed using the syntax:

cogent <COMMAND> -h

See examples for <COMMAND> values 'demux' and 'analyze' in Figures 7 and 8, below.

%COGENT_AP_HOME%/cogent demux -h

%COGENT_AP_HOME%/cogent analyze -h

D. Processing time

The time taken by the pipeline will vary based on the hardware specifications of the server on which it is run and the size of the raw-fastq input files. During testing, a MiSeq run (~25M read pairs) typically took about 1–1.5 hr, and a NextSeq High Ouput run (~400M read pairs) typically took ~20–25 hr to complete.

NOTE: These baselines might be exceeded if the raw-fastq files are stored on a network device instead of locally on the server where CogentAP is installed.

V. Test dataset

A mini dataset file (test dataset) is included in the CogentAP distribution package; it can be found under the main installation folder in a sub-folder called $COGENT_AP_HOME/test/ (Figure 9, below). This dataset can be used to test the running of the pipeline end-to-end and will provide a sample of the output files. The output (report and stats only) from the test dataset are also included and can be found in $COGENT_AP_HOME/test/out_test/. This included output can be used to compare to the output of your test run to verify everything is working correctly.

NOTE: The mini dataset should not be used for inference purposes. Output statistics and plots will only be meaningful with a real dataset.

A run using the test data can be started using either the UI launcher (Section A) or the command line (Section B).

The test run using either method should take ~5–10 min to complete successfully.

A. Test data through CogentAP launcher UI

The parameters defined in the CogentAP launcher fields in Figure 10 are listed below. The paths are written as $COGENT_AP_HOME being /usr/local/bin/.

Field name	Input value
Workflow	`Demultiplexer & Analyzer`
R1 Fastq	`/usr/local/bin/CogentAP/test/test_FL_R1.fastq.gz`
R2 Fastq	`/usr/local/bin/CogentAP/test/test_FL_R2.fastq.gz`
Well List File	`/usr/local/bin/CogentAP/test/99999_CogentAP`
Protocol	`Full Length`
Genome	`hg38`
Output Directory	`/tmp/CogentAP_testing/`
Output Folder	`install_test`

Table 1. Parameter information for example CogentAP launcher run of the provided test dataset in Figure 10.

Once completed, the output folder (install_test/) will be located under the directory specified as 'Output Directory' (/tmp/CogentAP_testing/).

B. Test data through the command-line interface

Run the demultiplexer script.

$COGENT_AP_HOME/cogent demux \ -i $COGENT_AP_HOME/test/test_FL_R1.fastq.gz \ -p $COGENT_AP_HOME/test/test_FL_R2.fastq.gz \ -b $COGENT_AP_HOME/test/99999_CogentAP_test_selected_WellList.TXT \ -t Full_length \ -o Desktop/install_test \ -n 8

Figure 11. Example syntax for running the demux script on the provided test data via the command line.
Once the demuxer is finished, run the data through the analysis script.

$COGENT_AP_HOME/cogent analyze \ -i $COGENT_AP_HOME/install_test/install_test_demuxed_R1.fastq \ -p $COGENT_AP_HOME/install_test/install_test_demuxed_R2.fastq \ -g hg38 \ -o Desktop/install_test/analysis \ -n 8 -d Desktop/install_test/install_test_counts_all.csv -t Full_length

Figure 12. Example syntax for running the demux script on the provided test data via the command line.

Once completed, the output folder (install_test/) will be located under the directory specified as 'Output Directory' (Desktop/ as part of the parameter -o Desktop/install_test).

VI. Output files

The pipeline produces output files that serve two purposes:

Summarization of results using typical statistics and plots
Facilitating further analyses using our interactive R kit, Cogent NGS Discovery Software (CogentDS), or any other tertiary analysis tool

A. Output folder structure

Both the CogentAP GUI launcher and CLI are designated to output to a folder specified in the run parameters, but the file structure within that directory is slightly different between the two methods.

The output folder from CogentAP launcher UI will resemble Figure 13:

Figure 13. Folders and files of the output directory by launcher UI.
The folder structure from the CLI will resemble Figure 14:

Figure 14. Folders and files of the output directory by command-line interface.

B. HTML report

The HTML report is generated by the same report generator process as CogentDS using standard parameters and contains the example statistics and plots listed below. For complete details, please see the Cogent NGS Discovery Software User Manual.

Experiment overview

Figure 15. Sample experiment overview section of the HTML report.
Correlation analyses

Figure 16. Example correlation analyses section of the HTML report.
Various counts like reads, genes, Mito, Ribo.

Figure 17. Example gene counts chart from the HTML report.
Clustering tables

Figure 18. Example clustering tables from the HTML report.

C. Stats file

The Stats file is provided in CSV format and contains barcode-level statistics across the analysis pipeline. Starting from barcoded reads, it summarizes the number of reads after every step in the pipeline: for example, trimmed reads, mapped reads, exon/intron/intergenic reads, mitochondrial reads, ribosomal reads, etc. It also lists the number of genes detected per barcode. The columns shown in this file depends on the reagent kit used to generate the input data. As examples, you can see columns related to UMI when you use ICELL8_3DE_UMI or Strnd_UMI. The full list of possible columns in this file is described in the Appendix.

D. Gene matrix file

The gene matrix file (sometimes called the gene table or counts matrix) is also in CSV format and contains gene counts for each barcode, with the genes in the rows and barcodes/cells in the columns. The file contains raw counts that can then be normalized and transformed using CogentDS. An example is shown below.

E. Gene info file

The gene info file is a CSV-formatted file that contains the main annotation for the genes as described in the GTF file that is part of the genome build. It includes gene ID (used in CogentDS), Ensembl gene ID, gene symbol, and gene length (used for some normalizations).

F. CogentDS.analysis.rda

During the generation of the HTML report, an R Data object, CogentDS.analysis.rda, is created with the results of various modules. This file can be used directly as input into CogentDS to perform further analysis, which saves on processing time in that tool.

G. Extra folder

The extras/ folder contains similar result files as B–F above, in the same format, but the data is calculated by also including intron regions of genes. Additionally, when used with a UMI enabled kit, results without UMI calculation are also stored in this folder. You can also use these files for tertiary analysis.

From the GUI, extras/ can be found in <output folder>/analysis
From the CLI, it will be in the folder specified by the value of the cogent analyze command output_dir (-o) argument

Appendix

Columns in Stats file

The tables below document all potential columns that might appear in the Stats output file (Section VI.C).

As mentioned in that section, not all stats files will include every column listed.

Columns that will be present in all Stats files output by CogentAP (input workflow agnostic).

Column name	Description
Barcode	Detected barcodes. This value will usually be the sample name from the well-list or well-list-like file, but there are three exceptions, documented in the table below.
Sample	Sample names described in sample description file.
Barcoded_Reads	Number of reads after demultiplexing.
Trimmed_Reads	Number of remained reads after trimming.
Unmapped_Reads	Number of reads not mapped to genome.
Mapped_Reads	Number of reads mapped to genome.
Multimapped_Reads	Number of reads mapped to multiple genomic locations.
Uniquely_Mapped_Reads	Number of reads mapped to one genomic location. These reads are used for counting.
Exon_Reads	Number of reads assigned to an exonic region.
Ambiguous_Exon_Reads	Number of reads assigned to exonic regions of multiple genes.
Intron_Reads	Number of reads assigned to an intronic region.
Ambiguous_Intron_Reads	Number of reads assigned to intronic regions of multiple genes.
Gene_Reads	Number of reads assigned to a gene region (exon + intron).
Intergenic_Reads	Number of reads assigned to an intergenic region.
No_of_Genes	Number of detected genes.
Mitochondrial_Reads	Number of reads assigned to mitochondrial chromosome.
Ribosomal_Reads	Number of reads assigned to a ribosomal gene.

The "Barcode" column, in addition to the samples named in the well-list or well-list-like file, will also have three additional rows, which are described in the following table.

Barcode field value	Description
Short	Number of reads containing N in barcode or having shorter length than barcode.
Unselected	Number of reads having a barcode included in Chip’s description, but not included in sample description file.
Undetermined	Number of reads having undetermined barcode.

Additional columns in the Stats file for 3′ DE analysis with UMIs on ICELL8 system.

The table below lists additional columns that will be present in the Stats file when the input FASTQ files result from the ICELL8 3′ DE for UMI Reagent Kit (Cat # 640005) workflow on the ICELL8 Single-Cell System (Cat # 640000).

Column name	Description
No_of_UMIs	Number of UMI variations detected after demultiplexing.
Exon_nUMIs	Number of deduplicated reads assigned to an exonic region. Deduplication is done by UMI.
Intron_nUMIs	Number of deduplicated reads assigned to an intronic region. Deduplication is done by UMI.
Gene_nUMIs	Number of deduplicated reads assigned to a gene region (exon + intron). Deduplication is done by UMI.

Additional columns in Stats file for the SMARTer Stranded Total RNA-Seq Kit v3- Pico Input Mammalian protocol.

The table below lists additional columns that will be present in the Stats file when the input FASTQ files result from the SMARTer Stranded Total RNA-Seq Kit v3- Pico Input Mammalian protocol.

Column name	Description
No_of_UMIs	Number of UMI variations detected after demultiplexing.
Exon_nUMIs	Number of deduplicated reads assigned to an exonic region. Deduplication is done by UMI.
Exon_nUSSs	Number of deduplicated reads assigned to an exonic region. Deduplication is done by Unique Start&Stop Site (USS).
Exon_nUMIs_USSs	Number of deduplicated reads assigned to an exonic region. Deduplication is done by both UMI and USS.
Intron_nUMIs	Number of deduplicated reads assigned to an intronic region. Deduplication is done by UMI.
Intron_nUSSs	Number of deduplicated reads assigned to an intronic region. Deduplication is done by Unique Start&Stop Site (USS).
Intron_nUMIs_USSs	Number of deduplicated reads assigned to an intronic region. Deduplication is done by both UMI and USS.
Gene_nUMIs	Number of deduplicated reads assigned to a gene region (exon + intron). Deduplication is done by UMI.
Gene_nUSSs	Number of deduplicated reads assigned to a gene region (exon + intron). Deduplication is done by Unique Start&Stop Site (USS).
Gene_nUMIs_USSs	Number of deduplicated reads assigned to a gene region (exon + intron). Deduplication is done by both UMI and USS.
Strand_Specificity	Ratio of reads detected as correct strand after mapping to genome.

RNA-seq

Cogent NGS Analysis Pipeline

Analyze sequencing data generated by select Takara Bio applications.

Cogent NGS Discovery Software

Visualize sequencing data using the output from the Cogent NGS Analysis Pipeline.

SMART-Seq DE3 Demultiplexer

Demultiplex sequencing data from SMART-Seq mRNA 3′ DE into sorted read data files.

Takara Bio USA, Inc.
United States/Canada: +1.800.662.2566 • Asia Pacific: +1.650.919.7300 • Europe: +33.(0)1.3904.6880 • Japan: +81.(0)77.565.6999
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES. © 2025 Takara Bio Inc. All Rights Reserved. All trademarks are the property of Takara Bio Inc. or its affiliate(s) in the U.S. and/or other countries or their respective owners. Certain trademarks may not be registered in all jurisdictions. Additional product, intellectual property, and restricted use information is available at takarabio.com.