T cells are essential parts of the adaptive immune response. The T‑cell receptors (TCRs) expressed on their surface enable the recognition of unique molecular patterns of the pathogens invading the host. Understanding the expression profiles of TCRs (i.e., the diversity of receptors and clonotypes) provides insights into the adaptive immune response in healthy individuals and those with a wide range of diseases. Accurate determination of the clonotypes expressed by the immune system will aid in generating a complete picture of the T‑cell repertoire and its role in human health, as well as help guide the development of immune therapy research.
Current next-generation sequencing (NGS) approaches for profiling T‑cell repertoires have yielded valuable insights into the adaptive immune response and clonal selection. There are two major approaches used in profiling T‑cell repertoires: multiplex PCR and 5′ RACE combined with NGS. While multiplexing allows you to amplify multiple TCR genes in one reaction, it can be difficult to achieve accurate, reproducible clonotype identification due to suboptimal sensitivity and specificity, as well as a prevalence in amplification bias of certain sequences. The new SMARTer Human TCR a/b Profiling Kit v2 (TCRv2 kit) leverages SMART full-length cDNA synthesis technology (Switching Mechanism at 5′ end of RNA Template) and pairs NGS with a 5'-RACE approach to provide a sensitive, accurate, and optimized method for TCR profiling that captures complete V(D)J variable regions of TRA and TRB genes (Figure 1, Panel A). Contrary to systems that use multiplex PCR, the 5'-RACE method does not require any prior knowledge of the sequences comprising the 5′ end of the TCR transcripts. Additionally, the 5'-RACE method reduces variability and allows for priming from the constant region of TRA/TRB (Figure 1, Panel B). Downstream sequencing provides accurate identification of top clonotypes.
The TCRv2 kit has several key improvements over our first TCR kit (referred to as TCRv1), from optimized chemistry to the addition of both unique molecular identifiers (UMIs) and unique dual indexes (UDIs). During the template-switching step, 12 random nucleotides are incorporated into the cDNA with the TCR SMART UMI Oligo. When used with Cogent NGS Immune Profiler (CogentIP), PCR duplicates and sequencing errors can be detected and removed from the data, enabling more accurate, reliable clonotype calling and quantification. The addition of the UDIs lets researchers pool multiple samples—currently up to 192 different samples—while providing greater confidence in sample integrity when sequencing on a patterned Illumina flow cell. The shorter length of the libraries makes them compatible with any Illumina sequencer, allowing researchers to save on sequencing costs and increase sample multiplexing on high-throughput sequencers like the NovaSeqTM system. Alternatively, the full-length TCRa and/or TCRb transcript information can be obtained when libraries are sequenced on the MiSeq® system. The TCRv2 kit is designed to generate a consistent library yield from 10 ng–1 µg of PBMC RNA, 20 ng–200 ng of whole-blood RNA, and 1 ng–100 ng of T‑cell RNA, ensuring a sufficient yield is achieved for sequencing and allowing for improved ease of use.
Figure 1. SMARTer Human TCR a/b Profiling Kit v2 workflow. Panel A. First-strand cDNA synthesis is dT-primed and performed by the MMLV-derived SMARTScribe Reverse Transcriptase (RT), which adds nontemplated nucleotides upon reaching the 5′ end of each mRNA template. The TCR SMART UMI Oligo anneals to these nontemplated nucleotides and serves as a template for the incorporation of an additional sequence of nucleotides into the first-strand cDNA by the RT (this is the template-switching step). The first-strand cDNA is then subjected to two rounds of gene-specific PCR amplification (see details in Panel B). The nested PCR in the second round ensures the incorporation of the entire V(D)J region of the TCR, thus ensuring that the vast majority of the reads map to TCR transcripts. Panel B. The first PCR uses the first-strand cDNA as a template. It includes a forward primer with complementarity to the Illumina Read Primer 2 sequence (hTCR PCR1 Universal Forward) and a reverse primer that is complementary to the constant region of TRA/TRB genes (hTCRa/hTCRb PCR1 reverse). By priming from the Read Primer 2 sequence and the constant region, the first PCR specifically amplifies the entire variable region and a considerable portion of the constant region of TRA/TRB genes. The second PCR takes the product from the first PCR as a template and uses semi-nested primers (hTCRa/hTCRb PCR2 UDI reverse) and SMARTer RNA unique dual indexes to amplify the entire variable region and a portion of the constant region of the TCRa and/or TCRb cDNA. The UDIs include adapter and index sequences that are compatible with Illumina sequencing platforms and allow for multiplexing of up to 192 samples in a single flow cell lane.
Results
Sensitive and reproducible clonotype detection from a wide range of RNA amounts
To evaluate the sensitivity of the new SMARTer Human TCR a/b Profiling Kit v2 (TCRv2), libraries were prepared from either 1, 10, and 100 ng human CD3+ T‑cell total RNA (generating approximately 150,000, 900,000, and 2.2 million reads/sample, respectively) or 1,000 and 10,000 cells human CD3+ T cells (generating approximately 400,000 and 1.5 million reads/sample, respectively).
As shown in Figure 2, TRA and TRB clonotype counts from the TCRa and TCRb libraries consistently increase as the amount of RNA input increases. Similar data were obtained when using RNA extracted from PBMCs with inputs ranging from 10 ng to 1 μg (data not shown). 1,000 and 10,000 T cells were resuspended in lysis buffer containing RNase Inhibitor and were used for the library preparation. A significant number of clonotypes was identified using whole cells as input (Figure 2b). Similar data were obtained when using RNA extracted from whole blood with inputs from the upper/lower end of the range, 20 ng and 200 ng (Figure 2c), and PBMCs with inputs ranging from 10 ng to 1 μg (data not shown). Approximately 1.7M reads/sample were obtained from an input of 20 ng Total RNA, generating 4,000–7,000 TRA and 8,000–13,000 TRB clonotypes in the three donors (Figure 2c). The clonotype counts were equivalent to the data using half input of total RNA using PBMCs (data not shown). These results demonstrated that using the TCRv2 kit with whole-blood RNA is excellent for producing clonotype data with minimal variability and good sensitivity, suggesting its utility for monitoring multicenter studies requiring measurement of clonotypes. These data clearly indicate that the kit is robust enough to accommodate variable samples types with very high complexity (e.g., whole-blood RNA and libraries can be generated directly from lysed T cells without the need for RNA purification).
Figure 2. Sensitive and reproducible clonotype detection from a broad range of sample types and RNA amounts.TRA and TRB libraries were generated from 1, 10, and 100 ng of human CD3+ T‑cell total RNA (Panel A), 1,000 and 10,000 CD3+ T cells (Panel B), and 20 ng of whole-blood RNA extracted from three different samples (Panel C). The sequence reads were processed by CogentIP.
In addition to high sensitivity and the ability to accommodate a large range of sample complexities, this protocol also shows a high level of reproducibility. Technical replicates of TCRb libraries generated with 100 ng of T‑cell RNA extracted from a single donor showed excellent correlation between overlapping TRB clones, as demonstrated by a Pearson correlation (r) of 0.999 and a Spearman's correlation (ρ) of 0.97 for the top 50 ranked TCRb clonotypes (Figure 3).
Figure 3. TCRv2 libraries show a high level of reproducibility.TCRb libraries were generated in duplicate with 1, 10, and 100 ng of T‑cell RNA from a healthy donor. Plots show the abundance of each TRB clonotype from the technical replicates.
TCRv2 provides improved unbiased amplification of TCR transcripts
Even with an industry-leading product such as the SMARTer Human TCR a/b Profiling Kit, there is always room for improvement. To evaluate the improved performance of the TCRv2 kit, we compared the libraries prepared from PBMC RNA from a single donor using the TCRv2 kit and our original TCRv1 kit. Since the TCRv1 kit does not include UMIs, an arbitrary cutoff (using a frequency cutoff line of 0.0001%, 0.001%, and 0.01%) needs to be set to remove low-confidence clonotypes like singletons generated from sequencing or PCR errors (Figure 4, Panel A). The TCRv2 data identified clonotypes with greater confidence due to the addition of UMIs. Furthermore, the detected TRA/TRB V and J segments perfectly overlapped between the two versions of the SMARTer TCR kits (Figure 4, Panel B). Chord diagrams showed similar patterns of V-J combination between TCRv1 and TCRv2 (Figure 4, Panel C). These data demonstrated that TCRv2 chemistry shows an improvement in the unbiased amplification of V-J segments compared to TCRv1, thanks to the greater confidence provided by UMI analysis.
Figure 4. Greater confidence in clonotype counts while maintaining V-J distributions between SMARTer TCRv1 and TCRv2 chemistries. Panel A. The clonotypes identified with TCRv2 have a greater degree of confidence thanks to the addition of the UMIs, although fewer are detected. The use of UMIs assures users that the clonotypes identified are truly present in the sample rather than a result of errors from PCR amplification or sequencing. Panel B. The distribution of TRAV (upper panel), TRAJ (middle-upper panel), TRBV (middle-lower panel), and TRBJ (lower panel) segments indicated as TCRv1 (v1; gray) and TCRv2 (v2; blue). Panel C. Chord diagrams of TRA and TRB clonotype distributions observed with the v1 and v2 kits. Each chord diagram depicts the distribution of the indicated TRA and TRB variable-joining (V-J) segment combinations for the top 9,999 clonotypes generated with v1 and v2 from 100-ng inputs of PBMC RNA, with read depth of approximately 2 million per sample. Each arc (on the periphery of each diagram) represents a V or J segment and is scaled lengthwise according to the relative proportion at which the segment is represented in the dataset. Each chord (connecting the arcs) represents a set of clonotypes, which include the indicated V-J combination and is weighted according to the relative abundance of that combination in the dataset.
Confident identification of low-abundance clonotypes
In order to further test the reproducibility and detection limit of the TCRv2 kit, we spiked a serial dilution of Jurkat RNA into 100 ng of PBMC RNA. As shown in Table 1, we were able to accurately quantify the TRBV12-3-TRBJ1-2 Jurkat-specific sequence reads to a concentration of 0.01% without UMI collapse at a depth of ~2,500,000 TRA/TRB total reads (indicated as gray background). Importantly, multiple PCR cycles amplified Jurkat transcripts with very low copy numbers and did not maintain the linear ratio at a spike-in concentration of 0.001%. In contrast, when UMI collapse was performed, the linear detection of Jurkat-specific sequences at 0.001% was evidence of the improved sensitivity afforded by the UMI-based analysis approach. When comparing the percentage of Jurkat RNA spiked into the sample versus the percentage of detected Jurkat UMI, there is a perfectly linear correlation (r >0.99) from 10% to 0.001% (five orders of magnitude). This can be seen consistently in both of the technical duplicates (Figure 5). This result demonstrates that differences in relative abundance of transcripts for a particular TCR clonotype are faithfully and reproducibly represented in sequencing libraries generated using SMARTer technology and a UMI approach. Thus, the TCRv2 kit can accommodate the detection of rare TCR clones.
% Jurkat RNA spiked in to 100 ng of PBMC RNA
Total read count (TRA/TRB)
Without UMI collapse
With UMI collapse
# of TRB raw reads
# of reads for TRBV12‑3-TRBJ1‑2
Detected percentage of Jurkat reads
# of detected UMIs
# of UMIs for TRBV12‑3-TRBJ1‑2
Detected percentage of Jurkat UMIs
10.0%
2,500,000
1,565,005
397,179
25.0%
281,280
62,629
22.0%
1.0%
2,500,000
1,422,102
47,160
3.3%
219,776
6,426
2.9%
0.1%
2,500,000
1,366,127
5,412
0.4%
189,580
631
0.33%
0.01%
2,500,000
1,218,025
521
0.043%
196,615
74
0.038%
0.001%
2,500,000
1,331,465
909
0.068%
197,870
6
0.003%
0.0001%
2,500,000
1,409,199
-
0%
124,149
-
0%
0%
2,500,000
1,222,245
-
0%
197,933
-
0%
Table 1. Assessing the sensitivity and reproducibility of the SMARTer approach. Spike-in analysis was performed in replicate on PBMC RNA samples spiked at varying concentrations (10%, 1.0%, 0.1%, 0.01%, 0.001%, and 0.0001%) with RNA obtained from a homogeneous population of leukemic Jurkat T cells (containing TRBV12-3-TRBJ1-2 clonotypes). TRB CDR3 regions were amplified from 100 ng of total RNA using the TCRv2 kit and sequenced. Reads of 2 x 150 bp were obtained on an Illumina NextSeq® system. The sequencing reads were downsampled to 2.5M reads. Read results for spike-in concentrations identified as the reliable concentration limit for each criterion (without and with UMI collapse) have data highlighted in gray. Without UMI collapse, PCR duplicates of TRBV12-3 were observed in 0.0010% of the raw reads.
Figure 5. Successful identification of low-abundance Jurkat transcripts. 1 ng, 10 ng, 1 pg, 10 pg, and 100 pg of the Jurkat RNA were spiked into 100 ng of PBMC RNA. The plots show high correlations between the spike-in RNA proportions (%) and detected percentages of Jurkat UMIs for each replicate (shown in blue and purple).
Significant impact of biological variation on the number of clonotypes detected
The number and expression profile of T cells in peripheral blood circulation vary from person to person. We tested 10 ng of PBMC RNA from six different donors with the TCRv2 kit. A total of 12 libraries (TCRa and TCRb) from these six donors were pooled and sequenced to obtain approximately 1.5 million reads per sample. We found that clonotype counts were indeed very different from sample to sample, as shown in Figure 6. These data also demonstrated the large range of clonotype counts that the kit can identify. The smallest clonotype counts identified in one library was 4,200 (TRA) and 8,700 (TRB), while the largest was around 10,000 (TRA) and 17,000 (TRB). The on-target rates of these libraries ranged from 75% to 95% (data not shown).
Figure 6. TCRv2 kit identifies a wide range of clonotype counts. Duplicate libraries were generated from 10 ng RNA extracted from single-donor PBMC samples (P1–P6) and sequenced on an Illumina MiSeq system using 300-bp paired-end reads to obtain approximately 1.5 million reads per sample. Resulting sequencing reads were processed with CogentIP. Panel A. Clonotype numbers from different donors for TRA (blue) and TRB (orange) are shown. Error bars shown represent the standard error between the duplicates. Panel B. Common clonotypes between duplicates were plotted for TRA (upper panels) and TRB (lower panels) libraries from healthy donor samples P1 and P5. The horizontal/vertical lines indicate the frequency of clonotypes in the samples. The most representative clonotype was 6% of the total clones (TRA, P1 top clone). The numbers above each panel show the Pearson correlation r-value. The number of overlapped clonotypes between replicates were 562 (P1) and 659 (P5) for TRA and 1,115 (P1) and 1,328 (P5) for TRB. The frequency of well-represented clonotypes was highly correlated in replicates from the same donors. In contrast, very few overlapping clonotypes were observed among different donors.
Avoid oversequencing with UMI analysis
The incorporation of unique molecular identifiers (UMIs) is another great feature of the SMARTer Human TCR a/b Profiling Kit v2. UMIs are often used to remove molecular duplicates and sequencing errors resulting from PCR. Without UMI-based correction (Figure 7, Panel A), the number of clonotype counts identified increases as you continue to sequence deeper (yellow line). However, without correcting for UMIs, it is difficult to say if the newly identified clonotypes are rare or if some are the result of the accumulation of PCR and/or sequencing errors. However, when UMI-based correction is included, the clonotype count plateaus after reaching a saturated sequencing depth; in this data, the plateauing occurs at 1M reads per library. To further illustrate this, when comparing the clonotype calls between 1M (+UMI) and 5M (+UMI) reads, there is at least a 90% overlap in the clonotypes identified (Figure 7, Panel B). With fewer reads required to identify the same number of clonotypes, users can instead pool more samples with the additional sequencing reads available. Collectively, these results suggest that for SMARTer human TCR libraries generated from 10 ng PBMC RNA, 1M reads per library is sufficient to capture the majority of clones.
Figure 7. TCRv2 libraries allow users to confidently sequence at lower depths, resulting in sequencing cost savings. TCR profiling libraries from 10 ng of PBMC RNA from a single donor were prepared using the TCRv2 kit. The TRA (upper-left graph) and TRB (upper-right graph) clonotype count at different sequencing depths are shown with (blue line) and without (yellow line) UMI-based error correction. TCRa/TCRb mixed libraries were sequenced at 5M reads, then were downsampled to 2.5M, 1M, 500K, 250K, 125K, 63K, and 31K reads. All analyses at different sequencing depths were generated with the Cogent NGS Immune Profiler Software. Venn diagrams show overlapping TRA (lower left) and TRB (lower right) clonotypes between libraries with low (1M) and high (5M) sequencing depths processed with the UMI pipeline, indicating that lower sequencing reads are enough to obtain similar information, thus saving sequencing costs.
Superior sensitivity and reproducibility compared to alternative profiling approaches
Applications of NGS to genomes (DNA-seq) and transcriptomes (RNA-seq) are becoming standard components of immune profiling. However, it remains unclear which methods provide the best quantitative data. We, therefore, conducted comparative studies using technologies from two different vendors. Company Q takes advantage of a ligation-based method to add their adapters after reverse transcription using RNA as the starting input. Company A uses gDNA and multiplex PCR to amplify the TRB gene. Takara Bio's kit (TCRv2) uses a 5'-RACE RNA approach. A total of 5M PBMCs were used for each gDNA and RNA extraction, and a significant portion of gDNA (1.6 µg) and total RNA (100 ng) were used for library preparation. Clonotype numbers were generated following each respective company's pipeline. (Note: Company A does not provide TRA information, so it was not tested).
Downsampling allows for the fair comparison of different sequencing data, and we previously demonstrated that Takara Bio's TCRv2 has superior sensitivity in clonotype calling at 5M reads. Superior sensitivity was also observed in TRA and TRB clonotypes in two biologically different samples (Figure 8). The length distribution of the TRB CDR3 amino acid sequence showed similar patterns for all three technologies (Figure 9, Panel A). These data opened the question of whether the three technologies share identical clonotype listings in the top ranks. The clonotypes called in the top 10 and 20 ranks were identified by all three technologies (Figure 9, Panel B). In contrast, the clonotypes in the top 100 rank correlated well in Takara Bio TCRv2 replicates (Figure 9, Panel B, left plot) but not in the replicates of the other technologies (data not shown). These results demonstrate that the Takara Bio TCRv2 method has greater reproducibility than other mRNA and gDNA methods.
Figure 8. Takara Bio's TCRv2 generates data with superior sensitivity and reproducibility than competitors. We split 5M PBMC cells from two different healthy donors for RNA and gDNA extraction. 1.6 µg of gDNA was used for library preparation according to manufacturer's instructions (15% of the total amount of extracted gDNA). 100 ng of RNA was used for library preparation (2% of the total amount of extracted RNA). Panel A. We observed a dramatically higher clonotype number for TRB after downsampling with the TCRv2 kit (TRA results were similar; data not shown). Panel B. Clonotype numbers for TCRa/b libraries were shown from each company's technology (NT: not tested). In the comparison, TCRv2 generated 48.7K and 163K clonotypes for TRA and TRB, respectively, representing a 290% increase against Company Q's RNA-based approach and a 145% increase against Company A's gDNA-based approach. Importantly, the RNA methods used only 2% of the total RNA from the 5M PBMCs.
Figure 9.TCRv2 provides greater reproducibility in top clones identified. Panel A. CDR3 amino acid (AA) length distribution of the TRB transcripts among the three different technologies. Even if detected clonotype numbers varied among the three technologies, the histogram showed similar AA distribution patterns for TRB CDR3 (TRA results were similar, but not shown). Panel B. The clonotype rank orders of TRB generated by Takara Bio's TCRv2 were compared to itself (Takara Bio TCRv2), Company Q, and Company A. Top-ranked clonotypes were differentially colored: Top 10 (yellow), Top 20 (white), Top 100 (orange), Top 500 (gray), and Top 5,000 (blue). Takara Bio replicates showed great rank correlation for the Top 10 clonotypes and good rank correlation for the Top 100. Although some of the common clonotypes were observed in the Top 20 clonotypes among Takara Bio TCRv2 and Companies Q and A, these were not good as Takara Bio TCRv2 replicates.
Conclusions
The SMARTer Human TCR a/b Profiling Kit v2 is a powerful tool for profiling human T‑cell receptors. By leveraging SMART technology and combining a 5′-RACE approach with gene-specific amplification, this workflow captures complete V(D)J variable regions of TCRs and is optimized for highly sensitive and specific clonotype detection. With primers that incorporate Illumina-specific adapter sequences during cDNA amplification, the protocol generates indexed libraries ready for sequencing on Illumina platforms. This optimized method also includes a unique PCR cycling and pooling workflow, which reduces sequencing costs while still enabling accurate clonotype identification. By avoiding multiplex PCR, this kit also avoids the pitfalls of amplification biases of certain sequences, helping to provide a complete and accurate view of human TCR repertoires. Incorporating UMIs into the libraries makes it possible to remove reads derived from PCR or sequencing errors, thus ensuring more accurate and reliable results. Incorporating UDIs into the libraries allows for both pooling of multiple samples and sequencing on patterned flow cells without worrying about index hopping. Lastly, our Cogent NGS Immune Profiler Software provides an easy-to-use method for analyzing the immune repertoire at your fingertips.
Materials and methods
CD3+ T‑cell RNA was purchased from AllCells (Cat. # LP, CR, CD3+, NS, 25M). PBMC RNA from single donors and Jurkat RNA was purchased from Biochain (Cat. # R1255815-50) in addition to in-house RNA, which was extracted from PBMC cells acquired from AllCells. Whole-blood RNA were extracted from three different patients. RNA was extracted using the Macherey-Nagel NucleoSpin RNA PLUS kit (available from Takara Bio, Cat. # 740984.50).
10 ng, 1 ng, 100 pg, 10 pg, and 1 pg of Jurkat RNA were spiked into 100 ng of a single donor's T‑cell RNA. All libraries containing TCRa/b sequences were generated using the SMARTer Human TCR a/b Profiling Kit v2, as per the user manual. Following purification and size selection, libraries were quantified using the Qubit and the Agilent 2100 Bioanalyzer. Pooled libraries were quantified with the Library Quantification Kit (Cat. # 638324) and sequenced on either an Illumina MiSeq platform with 600-cycle V3 cartridges (Illumina, Cat. # MS-102-3003), Illumina MiSeq platform with 300-cycle V3 cartridges (Illumina, Cat. # MS-102-3001), or NextSeq platform with 300-cycle cartridges (Illumina, Cat. # 20024905). Sequencing data analysis was completed using Cogent NGS Immune Profiler. The report of top 9,999 clonotypes generated by the immune profiler were uploaded to VDJviz browser (https://vdjviz.cdr3.net/) for chord diagram visualization.