Background
What are T-cell receptors?
In humans and closely related species, cellular immunity is mediated by T cells (or T lymphocytes), which participate directly in the detection and neutralization of pathogenic threats. Essential to T-cell function are highly specialized extracellular receptors (T-cell receptors or TCRs) that selectively bind specific antigens displayed by major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells (APCs) (Figure 1, Panel A). Antigen recognition by TCRs activates T cells, causing them to proliferate rapidly and mount immune responses through the release of cytokines.
Given the relative specificity of TCR-antigen interactions, a tremendous diversity of TCRs are required to recognize the wide assortment of pathogenic agents one might encounter. To this end, the adaptive immune system has evolved a system for somatic diversification of TCRs that is unrivaled in all of biology. The vast majority of TCRs are heterodimers composed of two distinct subunit chains (α- and β-), which both contain variable domains and, in humans, are encoded by single-copy genes. The term "clonotype" is typically used to refer either to a particular TCR variant (TCR-α or TCR-β subunit) or to a particular pairing of TCR subunit variants (TCR-α + TCR-β) shared among a clonal population of T cells. TCR diversity is generated during the early stages of T-cell development. T-cell progenitors are derived from hematopoietic stem cells (HSCs) in the thymus, and as these cells divide, extensive recombination occurs between the V- and J-segments, and the V-, D-, and J-segments, in the TCR-α and TCR-β genes, respectively, via a mechanism that also incorporates and deletes additional nucleotides (Figure 1, Panel B). Ultimately, this process—commonly referred to as "V(D)J recombination"—yields a population of T cells with sufficient TCR diversity to collectively recognize any peptide imaginable. The region of TCR-β that spans the V-D and D-J junctions, known as "complementarity determining region 3" (CDR3), is unique to each TCR-β variant and is frequently used to quantify TCR diversity in high-throughput profiling experiments. Following somatic diversification, T cells that lack sufficient affinity for MHC molecules and those that recognize self-antigens are eliminated (positive and negative selection, respectively), yielding a functional T-cell repertoire.
The seemingly endless number of potential TCR clonotypes—estimates range from 106–107 (Six et al., 2013) to 1015–1020 unique clonotypes (Murphy et al. 2012; Laydon, Bangham, and Asquith 2015)—poses significant challenges for researchers seeking to characterize T-cell repertoires in the context of human development and disease, as extensive amounts of data must be obtained. While low-throughput approaches incorporating conventional cloning and Sanger sequencing and protein-based methods for identifying antigen-specific TCRs (e.g., tetramer assays) have yielded many important insights, the development of next-generation sequencing (NGS) technologies has dramatically expanded the prospects for this field of research.
Why do TCR profiling?
High-throughput TCR profiling experiments have already yielded fundamental insights regarding T-cell development and TCR repertoire diversity (Calis and Rosenberg 2014; Woodsworth, Castellarin, and Holt 2013). For example, these approaches have demonstrated that TCR variation does not determine T-cell fate (Wang et al. 2010) and that there is considerable overlap in the population at large for so-called "public TCRs" or "public clones", which occur much more frequently than would be expected by chance (Robins et al. 2010). A sampling of different populations has revealed that TCR repertoire diversity declines linearly with age and is significantly reduced in patients suffering from autoimmune diseases or cancer, relative to healthy individuals (Britanova et al. 2014; Sherwood et al. 2013; Klarenbeek et al. 2012).
In the clinic, TCR profiling has been used to analyze the recovery of the immune system in patients who have undergone hematopoietic stem cell transplants (HSCT), and to compare the efficacy of approaches aimed at accelerating this process (van Heijst et al. 2013). Looking to the future, high-throughput TCR profiling holds tremendous promise as both a diagnostic tool, and as a means for developing new therapeutics and treatment modalities (Calis and Rosenberg 2014; Woodsworth, Castellarin, and Holt 2013). For example, TCR repertoire analysis could be used to evaluate a candidate vaccine's capacity to trigger a protective immune response.
Sequencing approaches for TCR repertoire analysis
The vast majority of TCR-profiling experiments performed thus far have focused on capturing genomic DNA or mRNA sequences that correspond to the CDR3 region of the TCR-β subunit chain (Calis and Rosenberg 2014; Woodsworth, Castellarin, and Holt 2013). Given that the CDR3 region is thought to be unique to each TCR-β variant, sequence variation in this region has served as a useful proxy for overall T-cell repertoire diversity.
While sequencing genomic DNA may be preferable for certain TCR-profiling applications—including those that involve quantifying various T-cell subpopulations—this approach is not without its limitations, and methods that involve analyzing mRNA sequences carry several important advantages. TCR mRNA templates are likely to be more highly represented than DNA templates in any one T cell, such that mRNA sequencing approaches will afford greater sensitivity and allow for more comprehensive identification of unique TCR variants, including those that are present in a very small proportion of T cells. Another important benefit of sequencing mRNA rather than genomic DNA is that it specifically allows for the identification of expressed TCR sequences that have undergone splicing and post-transcriptional processing and are likely to yield functional proteins. DNA-based approaches, by contrast, do not identify TCR sequences in their translated forms, and will unavoidably yield many nonproductive sequences that are functionally irrelevant. For this reason, mRNA sequencing is the preferred option for researchers interested in exploring functional aspects of specific TCR variants. TCR profiling approaches that involve sequencing genomic DNA are also subject to significant technical limitations. Due to the lack of splicing, DNA-derived templates are considerably longer than their RNA counterparts, such that amplification of genomic DNA corresponding to TCR variable regions (including CDR3) requires multiplex PCR and is potentially susceptible to biases imposed by the various primer pairs. As demonstrated below, the relatively shorter length of TCR mRNA templates allows for simpler amplification schemes in which TCR-α and TCR-β variable regions are captured with single primer pairs, minimizing the potential for amplification biases and allowing for analysis of both subunit chains in the same experiment.
Experimental workflow
First-strand cDNA synthesis and template switching
This approach utilizes leukocyte RNA extracted from human peripheral blood or intact human T cells as starting material. First-strand cDNA synthesis is dT-primed (TCR dT Primer) and performed by the MMLV-derived SMARTScribe Reverse Transcriptase (RT), which adds nontemplated nucleotides upon reaching the 5′ end of each mRNA template (Figure 2, Panel A). The SMART-Seq v4 Oligonucleotide—enhanced with Locked Nucleic Acid (LNA) technology for increased sensitivity and specificity—then anneals to the nontemplated nucleotides, and serves as a template for the incorporation of an additional sequence of nucleotides to the first-strand cDNA by the RT (this is the template-switching step). This additional sequence—referred to as the "SMART sequence"—serves as a primer-annealing site for subsequent rounds of PCR, ensuring that only sequences from full-length cDNAs undergo amplification.
cDNA amplification and incorporation of Illumina adapters by semi-nested PCR
Following reverse transcription and extension, two rounds of PCR are performed in succession to amplify cDNA sequences corresponding to variable regions of TCR-α and/or TCR-β transcripts. The first PCR uses the first-strand cDNA as a template and includes a forward primer with complementarity to the SMART sequence (SMART Primer 1), and a reverse primer that is complementary to the constant (i.e. nonvariable) region of either TCR-α or TCR-β (TCRa/b Human Primer 1); both reverse primers may be included in a single reaction if analysis of both TCR subunit chains is desired. By priming from the SMART sequence and constant region, the first PCR specifically amplifies the entire variable region and a considerable portion of the constant region of TCR-α and/or TCR-β cDNA (Figure 2, Panel B).
The second PCR takes the product from the first PCR as a template, and uses semi-nested primers (TCR Primer 2 and TCRa/b Human Primer 2) to amplify the entire variable region and a portion of the constant region of TCR-α and/or TCR-β cDNA (once again, either or both TCR subunit chains may be amplified in a single reaction). Included in the forward and reverse primers are adapter and index sequences which are compatible with the Illumina sequencing platform (read 2 + i7 + P7 and read 1 + i5 + P5, respectively). Following post-PCR purification, size selection, and quality analysis, the library is ready for Illumina sequencing.
Library quality control and Illumina sequencing
Prior to sequencing, libraries are purified and size selected using Solid Phase Reversible Immobilization (SPRI) beads. To confirm the success of library amplification and purification, samples are run on a Fragment Analyzer or Bioanalyzer (Figure 3). The position and shape of electropherogram peaks vary depending on whether TCR-α and/or TCR-β sequence fragments are included in the library, the nature of the sample input, and the analysis method. Once the quality and size of each purified library have been confirmed, samples are sequenced on the Illumina platform using 300 bp paired-end reads, which fully capture the TCR sequence included in each cDNA molecule.