Next Generation Sequencing
The past decade has witnessed a profound revolution in biology, driven largely by advances in high-throughput sequencing, functional genomics, and computational technologies. The power of high-throughput sequencing, also known as next-generation sequencing (NGS) technologies, is being harnessed by researchers to address an increasingly diverse range of biological questions. The scale and efficiency of NGS that can now be achieved are providing unprecedented progress in areas from the analysis of genomes, such as in which changes in gene copy number, sequence, expression, structure, modification, and interaction, to how proteins interact with nucleic acids, thereby enabling impressive scientific achievements and novel biological applications.
Since first introduced to the market in 2005, NGS technologies and applications have made a tremendous impact on genomic research. Particularly, a great deal of NGS efforts today centers on biomedical research, which is the major research theme of investigators at Greehey CCRI, UT Health Cancer Center and UT Health San Antonio.
The Genome Sequencing Facility (GSF) provides high-quality and cost-effective next-generation sequencing services. The GSF works on various types of sequencing library preparation and Illumina next generation sequencing. Our users and collaborators include researchers in the Greehey CCRI, UT Health Cancer Center and UT Health San Antonio community, as well as investigators at other universities and Institutions.
Single Cell Analysis
Single cell sequencing is changing our understanding of fundamental biological processes such as development, immunity, pathology and disease.
10X Genomics Chromium platform, application specific sequencing preparation kits, with GSF’s variety of Illumina platforms (HiSeq and NextSeq), and their analysis software, provides users with the latest and most advanced high-throughput sequencing technology available for single cell genomics with a fully integrated solution. This is a new and promising technology that enables generation of excellent genomic data: (1) The Chromium Single Cell Gene Expression Solution provides a comprehensive, scalable solution for cell characterization and gene expression profiling of hundreds to tens of thousands of cells. The latest improvements allow you to detect even more unique transcripts per cell and with the addition of Feature Barcoding technology, you can get a more complete molecular readout cell by cell—identify cell-specific CRISPR-mediated perturbations or simultaneously measure gene and cell surface protein expression in the same cell, with virtually unlimited possibilities for the additional feature types. (2) The Chromium Single Cell ATAC-seq Solution determines the regulatory landscape of chromatin to study epigenomics. (3) The Chromium Single-Cell Immune Profiling Solution is a comprehensive approach to simultaneously examine the cellular context of the adaptive immune response and immune repertoires of hundreds to tens of thousands of T and B cells in human or mouse on a cell-by-cell basis. (4) The Chromium Single-Cell CNV Solution provides a comprehensive, scalable solution for revealing genome heterogeneity and understanding clonal evolution.
Next generation sequencing (NGS) experiments and projects are still relatively new technologies and applications for many Principal Investigators. It poses great challenges and investments to design the experiment and understand the logic of downstream bioinformatics analysis. Project planning and experimental design are required. For sequencing project the GSF works on, we request that prior discussion takes place between the GSF and the project Investigators submitting the samples. Based on the biological questions you are asking, we will work with you to decide the experiment outline including the number of samples needed (biological replicates and groups), the sample preparation details (DNA-Seq, RNA-Seq, polyA selected or rRNA depleted, enriched procedures or not), the number of reads needed (it affects the pooling scheme), single read or paired end reads needed, and sequencing length (sequencing module choice, 50SR, 75PE or 150PE), and so on. Please contact us if you are interested in working with us.
We QC every sample we receive. We make judgment of samples based on the time point we have QC data. There are different QC steps involved for the different sample types. Generally Agilent Bioanalyzer or agarose gel (for genomic DNA), together with Qubit, is used for QC purpose. We will notify the user if there is any issue regarding the quality and quantity of samples and if samples need to be replaced. The tracking of specific QC metrics is recorded in our Wiki LIMS system.
We have developed extensive tools to monitor sequence quality and accuracy; every sequencing run that is performed by the GSF is subjected to quality control evaluation in the form of a report that includes a review of read output and overall quality metrics including the Q30 score, percentage of undetermined reads, FastQC result, duplicate rate, mappable rate et al. These QC mechanisms allow GSF to maintain the highest level of sequence quality that simplifies subsequent analyses.
Next Generation Sequencing Facts you may want to know about the new sequencer HiSeq 3000:
- Illumina HiSeq 3000/4000(3000 carries one flow cell vs HiSeq 4000 with two flow cell stations) is quite powerful and most efficient sequencing platform in Illumina NGS market. It provides greater efficiency in terms of high-throughput, faster turn-around-time and cheaper cost.
- The enhanced sequencing performance of HiSeq 3000/4000 is enabled by two new technologies: patterned flow cell and kinetic exclusion amplification. These two new technologies were only available on the X Ten sequencers for a single application, the re-sequencing of human genomic samples. In contrast to the random clustering employed previously (HiSeq 2500 and earlier models, including the GSF old sequencer HiSeq 2000), the clusters in HiSeq 3000 are now generated in ordered nano-wells to allow for higher cluster densities and unambiguous cluster identification.
- Illumina HiSeq 3000 flow cell uses an 8-lane flow cell. HiSeq3000 typically generates 300 to 350 million single-reads per lane or 600 to 700 million paired-reads (single read number X 2) per lane.
- Illumina HiSeq3000 carries one flow cell with 8 lanes and generates up to 750Gb data per run. We must have all samples ready for 8 lanes before we can start the sequencing run.
- Illumina HiSeq provides high quality of sequencing data. The sequencing read length from HiSeq3000 is 50bp, 75bp, 100bp, or 150bp. The Q30 (1 error chance in 1,000) is ≥ 85% for reads of 50 bps and ≥ 75% for reads of 150 bps.
- Illumina HiSeq3000 takes about 1 day to finish 50bp single read sequencing run and about 3.5 days to complete 150bp paired-end sequencing run.
Facts you may want to know about the new NextSeq 500 sequencer:
- Using Illumina core SBS technology, NextSeq 500 offers a fast and easy workflow for any project size and sequencing throughput for numerous popular sequencing applications such as exome-seq, RNA-seq, targeted panels, and small RNA-seq.
- Based on sample volume and coverage needs, users can choose between two flow cell configurations (High Output and Mid Output), easily shifting from the low-to higher-throughput processing with each sequencing run. .
- Illumina NextSeq 500 provides high quality of sequencing data, similar as what HiSeq offers. The sequencing read length from NextSeq is 75bp or 150bp. The Q30 (1 error chance in 1,000) is ≥ 80% for reads of 75 bps and ≥ 75% for reads of 150 bps.
Sequencing libraries for HiSeq 3000 and NextSeq 500
- Sequencing library for HiSeq 3000 can be sensitive due to patterned flow cell: 1. Primer dimer % < 0.5%; library insert size needs to < 550bp or total length of library < 670bp. Minor “tails” of longer fragments are still suitable.
- Library fragments can be sequenced from one end (single-reads) or from both ends (paired-ends). Single reads are typically used for re-sequencing, gene expression profiling, ChIP sequencing, MBDCap DNA sequencing, and small RNA sequencing. Paired-end reads are most commonly used for de novo assembly, splicing variants & structural variation identification, and other applications.
- Pooling strategy can significantly improve the sequencing efforts. Samples individually barcoded during the library preparation can be multiplexed on a lane. The number of samples that can be pooled per lane depends on the number of reads per sample needed for the following bioinformatics analysis. For human & mouse & rat samples, the general guideline is 6-8 RNA or ChIP samples and 8 to 24 small RNA samples can be pooled together for one lane sequencing of HISeq 3000.
- Illumina uses a green laser to sequence G/T and a red laser to sequence A/C. At each cycle at least one of two nucleotides for each color channel needs to be read to ensure proper image registration. It is important to maintain color balance for each base of the index read being sequenced; otherwise index read sequencing could fail due to registration failure. Follow these low plex pooling guidelines, depending on the TruSeq® Sample Prep kit you are using. (Attachment here)
- Illumina NextSeq uses 2-channel SBS sequencing technology. C base is seen as red, T is green, overlapping in both C and T is A, and G is seen as the dark channel with unlabeled. So try to avoid the first few of cycles with only G bases since G bases are identified as the dark channel.
10X Genomics Single-Cell
Description of 10X Genomics single-cell
The 10X Genomics Chromium system provides users with the latest and most advanced high-throughput sequencing technology available for single cell genomics.
The following table is the overall comparison of single cell platforms on market which ABRF (Association of Biomolecular Research Facility) Genomics research group conducted in 2018.
10x Genomics’ single-cell RNA-seq (scRNA-seq) technology, the Chromium™ Single Cell 3’ Solution, allows you to analyze transcriptomes on a cell-by-cell basis through the use of microfluidic partitioning to capture single cells and prepare barcoded, next-generation sequencing (NGS) cDNA libraries. Specifically, single cells, reverse transcription (RT) reagents, gel Beads containing barcoded oligonucleotides, and oil are combined on a microfluidic chip to form reaction vesicles called Gel Beads in Emulsion, or GEMs. GEMs are formed in parallel within the microfluidic channels of the chip, allowing the user to process 100’s to 10,000’s of single cells in a single 18-minute Chromium™ Instrument run (V3.1). It’s important to note that cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell to ensure a low doublet rate, while maintaining a high cell recovery rate of up to ~65%.
Cell density and viability are important!
The optimal cell density of your single-cell suspension stock is 1000 cells/ul, and viability > 90%.