Step A. Trim adapters and reverse complement UMIs
The UMIs are read during the first seven cycles of Read1 and Read2. However, if long reads are performed or if the inserts are short, it is possible to read the reverse complement of the UMI (rcUMI) after the insert and before the Illumina adapters (see figure below). To remove the artificial sequence before alignment to the genome assembly, the reverse complement of the UMI is added to the Illumina adapter during the trimming step. The FASTA files containing the sequences are available here and from Takara Bio technical support.
java -jar trimmomatic-0.36.jar PE <read1.fastq.gz> <read2.fastq.gz> <paired_output1.fq.gz> <unpaired_output1.fq.gz> <paired_output2.fq.gz> <unpaired_output2.fq.gz> ILLUMINACLIP:TruSeq3-PE-2with_rcUMI.fa:1:10:5:9:true MINLEN:20
where:
- <read1.fastq.gz> is the input Illumina sequencing file for Read 1
- <read2.fastq.gz> is the input Illumina sequencing file for Read 2
- <paired_output1.fq.gz> is the output file containing paired forward reads
- <unpaired_output1.fq.gz> is the output file containing unpaired forward reads
- <paired_output2.fq.gz> is the is the output file containing paired reverse reads
- <unpaired_output2.fq.gz> is the output file containing unpaired reverse reads
- TruSeq3-PE-2with_rcUMI.fa is the downloaded FASTA file containing the rcUMI sequences
E.g.,
java -jar trimmomatic-0.36.jar PE read1_R1_001.fastq.gz read2_R2_001.fastq.gz trimmed_R1.fq.gz UnPaired_R1_001.fq.gz trimmed_R2.fq.gz UnPaired_R2_001.fq.gz ILLUMINACLIP:TruSeq3-PE-2with_rcUMI.fa:1:10:5:9:true MINLEN:20