Next Generation Sequencing

The past decade has witnessed a profound revolution in biology, driven largely by advances in high-throughput sequencing, functional genomics, and computational technologies. The power of high-throughput sequencing, also known as next-generation sequencing (NGS) technologies, is being harnessed by researchers to address an increasingly diverse range of biological questions. The scale and efficiency of NGS that can now be achieved are providing unprecedented progress in areas from the analysis of genomes, such as in which changes in gene copy number, sequence, expression, structure, modification, and interaction, to how proteins interact with nucleic acids, thereby enabling impressive scientific achievements and novel biological applications.

Since first introduced to the market in 2005, NGS technologies and applications have made a tremendous impact on genomic research. Particularly, a great deal of NGS efforts today centers on biomedical research, which is the major research theme of investigators at Greehey CCRI, UT Health Cancer Center and UT Health San Antonio.

The Genome Sequencing Facility (GSF) provides high-quality and cost-effective next-generation sequencing services. The GSF works on various types of sequencing library preparation and Illumina next generation sequencing. Our users and collaborators include researchers in the Greehey CCRI, UT Health Cancer Center and UT Health San Antonio community, as well as investigators at other universities and Institutions.

Experimental Design

Next generation sequencing (NGS) experiments and projects are still relatively new technologies and applications for many Principal Investigators. It poses great challenges and investments to design the experiment and understand the logic of downstream bioinformatics analysis. Project planning and experimental design are required. For sequencing project the GSF works on, we request that prior discussion takes place between the GSF and the project Investigators submitting the samples. Based on the biological questions you are asking, we will work with you to decide the experiment outline including the number of samples needed (biological replicates and groups), the sample preparation details (DNA-Seq, RNA-Seq, polyA selected or rRNA depleted, enriched procedures or not), the number of reads needed (it affects the pooling scheme), single read or paired end reads needed, and sequencing length (sequencing module choice, 50SR, 75PE or 150PE), and so on. Please contact us if you are interested in working with us.

Quality Control

We QC every sample we receive. We make judgment of samples based on the time point we have QC data. There are different QC steps involved for the different sample types. Generally Agilent Bioanalyzer or agarose gel (for genomic DNA), together with Qubit, is used for QC purpose. We will notify the user if there is any issue regarding the quality and quantity of samples and if samples need to be replaced. The tracking of specific QC metrics is recorded in our Wiki LIMS system.

We have developed extensive tools to monitor sequence quality and accuracy; every sequencing run that is performed by the GSF is subjected to quality control evaluation in the form of a report that includes a review of read output and overall quality metrics including the Q30 score, percentage of undetermined reads, FastQC result, duplicate rate, mappable rate et al. These QC mechanisms allow GSF to maintain the highest level of sequence quality that simplifies subsequent analyses.

Quick Facts

Facts you may want to know about the new sequencer HiSeq 3000:

  • Illumina HiSeq 3000/4000(3000 carries one flow cell vs HiSeq 4000 with two flow cell stations) is the latest and most efficient sequencing platform in Illumina NGS market. It provides greater efficiency in terms of high-throughput, faster turn-around-time and cheaper cost.
  • The enhanced sequencing performance of HiSeq 3000/4000 is enabled by two new technologies: patterned flow cell and kinetic exclusion amplification. These two new technologies were only available on the X Ten sequencers for a single application, the re-sequencing of human genomic samples. In contrast to the random clustering employed previously (HiSeq 2500 and earlier models, including the GSF old sequencer HiSeq 2000), the clusters in HiSeq 3000 are now generated in ordered nano-wells to allow for higher cluster densities and unambiguous cluster identification.
  • Illumina HiSeq 3000 flow cell uses an 8-lane flow cell. HiSeq3000 typically generates 300 to 350 million single-reads per lane or 600 to 700 million paired-reads (single read number X 2) per lane.
  • Illumina HiSeq3000 carries one flow cell with 8 lanes and generates up to 750Gb data per run. We must have all samples ready for 8 lanes before we can start the sequencing run.
  • Illumina HiSeq provides high quality of sequencing data. The sequencing read length from HiSeq3000 is 50bp, 75bp, 100bp, or 150bp. The Q30 (1 error chance in 1,000) is ≥ 85% for reads of 50 bps and ≥ 75% for reads of 150 bps.
  • Illumina HiSeq3000 takes about 1 day to finish 50bp single read sequencing run and about 3.5 days to complete 150bp paired-end sequencing run.

Facts you may want to know about the new NextSeq 500 sequencer:

  • Using Illumina core SBS technology, NextSeq 500 offers a fast and easy workflow for any project size and sequencing throughput for numerous popular sequencing applications such as exome-seq, RNA-seq, targeted panels, and small RNA-seq.
  • Based on sample volume and coverage needs, users can choose between two flow cell configurations (High Output and Mid Output), easily shifting from the low-to higher-throughput processing with each sequencing run. .
  • Illumina NextSeq 500 provides high quality of sequencing data, similar as what HiSeq offers. The sequencing read length from NextSeq is 75bp or 150bp. The Q30 (1 error chance in 1,000) is ≥ 80% for reads of 75 bps and ≥ 75% for reads of 150 bps.

Sequencing libraries for HiSeq 3000 and NextSeq 500

  • Sequencing library for HiSeq 3000 can be sensitive due to patterned flow cell: 1. Primer dimer % < 0.5%; library insert size needs to < 550bp or total length of library < 670bp. Minor “tails” of longer fragments are still suitable.
  • Library fragments can be sequenced from one end (single-reads) or from both ends (paired-ends). Single reads are typically used for re-sequencing, gene expression profiling, ChIP sequencing, MBDCap DNA sequencing, and small RNA sequencing. Paired-end reads are most commonly used for de novo assembly, splicing variants & structural variation identification, and other applications.
  • Pooling strategy can significantly improve the sequencing efforts. Samples individually barcoded during the library preparation can be multiplexed on a lane. The number of samples that can be pooled per lane depends on the number of reads per sample needed for the following bioinformatics analysis. For human & mouse & rat samples, the general guideline is 6-8 RNA or ChIP samples and 8 to 24 small RNA samples can be pooled together for one lane sequencing of HISeq 3000.
  • Illumina uses a green laser to sequence G/T and a red laser to sequence A/C. At each cycle at least one of two nucleotides for each color channel needs to be read to ensure proper image registration. It is important to maintain color balance for each base of the index read being sequenced; otherwise index read sequencing could fail due to registration failure. Follow these low plex pooling guidelines, depending on the TruSeq® Sample Prep kit you are using. (Attachment here)
  • Illumina NextSeq uses 2-channel SBS sequencing technology. C base is seen as red, T is green, overlapping in both C and T is A, and G is seen as the dark channel with unlabeled. So try to avoid the first few of cycles with only G bases since G bases are identified as the dark channel.