Bioinformatics

GCCRI’s Computational Biology and Bioinformatics (CBBI) team is designated to support the Genome Sequencing Facility. Our bioinformatics services are highly customizable so we will work with you to analyze NGS data and reports you are looking for.

Bioinformatics in NGS data analysis includes two major areas:

  1. NGS data quality assurance and initial genome alignment
  2. Customized NGS Analysis

NGS data quality assurance and initial genome alignment:

We have developed extensive tools to monitor sequence quality and accuracy; every sequencing run that is performed by the GSF is subjected to quality control evaluation in the form of a report that includes a review of read output and overall quality metrics including the Q30 score, percentage of undetermined reads, FastQC result, duplicate rate, mappable rate et al. These mechanisms allow GSF to maintain the highest level of sequence quality that simplifies subsequent analyses.

Customized NGS Analysis

CBBI is capable of analyzing almost all different types of sequencing data generated by Illumina HiSeq 2000 platform. These NGS data include ChIP-Seq, mRNA-Seq, small RNA-Seq, MBDCap-Seq, and exome-cap-Seq. The following is a list of bioinformatics capability examples for common NGS applications:

RNA-Seq:

CBBI’s RNA-Seq services include counts for all known mRNAs, differential expression, heatmap, and other standard RNA-Seq processing. Additionally, we can also provide intron-exon junction sites, non-coding RNA counts, SNPs within transcripts, and other tasks.

The following files will be provided with your whole-transcriptome results:

  1. Alignment report (total mappable reads, etc)
  2. Alignment results (.BAM. optional .SAM file)
  3. Counts file containing the number of reads matching annotated genes
  4. Differential expression report (optional)
  5. Functional analysis (optional)
  6. Non-coding counts report (optional)
  7. SNP report (optional)

ChIP-Seq:

For ChIP-Seq data, besides the sequence alignment to the reference (using Burrows-Wheeler Aligner, or BWA), the CBBI will further analyze your data with tools such as the Model-based Analysis for ChIP-Seq (MACS) and other tools to unveil binding sites within the genome. Users can load the results onto the UCSC browser or IGV to view regions in the context of genome. We will also assist user to use motif identification software such as the Motif-based sequence analysis suite (MEME) to discover common binding motifs.

The example files that will be provided with your ChIP-Seq are:

  1. Alignment file (.SAM or .BAM file)
  2. Peaks file (in .BED format)
  3. Peak annotation file
  4. Binding peak characteristics (percent in promoter regions, intronic regions, intergenic regions).