NGS Data Delivery and Storage Policy

NGS Data Delivery & Storage Policy

NGS data is “Big Data”. Every sequencing run generates hundreds of gigabytes of data. Depending on the services you request from the GSF, you will get between several to hundreds of GB for every service project. It is your responsibility to get your data and keep it safe and back up.

The GSF delivers demultiplexed FASTQ files (gzip-compressed), containing only the reads from clusters passing the Illumina quality filter to users. By default, we do not trim the sequencing data. The FASTQ is a text-based sequence file format that is generated from the sequencer that stores both raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from Illumina sequencing systems and can be used as input for a wide variety of secondary data analysis solutions.

FASTQ sequencing files provided by the Genome Sequencing Facility will be stored in GSF’s server for one year.  The GSF uses SFTP or Dropbox accounts for sequencing data delivery. FASTQ sequencing files provided by the GSF generally will be available at the user’s account for 3 months. It is recommended that the investigators download and archive their sequencing results as soon as they receive their data link.

The GSF delivers the following two files with your NGS data download:

  • md5 checksum – you can use this file to verify the integrity of your download
  • fastq files – generally zipped or tar format to deliver the sequencing data as a single file

For single-cell analysis project with 10X Genomics, the GSF delivers the following two files:

  • Cellranger count output:

We run cellranger count on all single-cell gene expression samples. Inside the top directory of your download is a directory for each sample by name that contains the results from the count step of the cellranger pipeline. The web_summary.html is likely what you want to look at first. The cloupe.cloupe file can be opened in the loupe browser supplied by 10x. You will be able to find this file in the following path:  [run_id]/Sample_[name]/outs

  • Fastq’s:

We use bcl2fastq2 to demultiplex all sequencing data. You will notice that each sample will have 4 fastq directories associated with it (one for each of the 4 barcodes in the 10x barcode set for each sample), named Sample_[number]_[letter].

Please contact us if you would like assistance interpreting the results produced by the cellranger. We will do our best to answer any questions or we can guide you towards assistance and resources provided by 10x.

For the NGS project requested with bioinformatics analysis, please contact Dr. Yidong Chen’s bioinformatics director at cheny8@uthscsa.edu for analyzed data format and delivery.

Inquire About Our Services:

The Genome Sequencing Facility Provides genomic service of Illumina next-generation sequencing and single-cell analysis using 10X Genomics for researchers, both inside and outside of UT Health San Antonio, other academic institutions, and the biotechnology and pharmaceutical industries.

Zhao Lai, PhD
Director, Genome Sequencing Facility
Greehey CCRI
UT Health San Antonio
8403 Floyd Curl Dr.
San Antonio, TX, 78229, USA
Office: GCCRI 4.100.14
Phone: (210) 562-9246
laiz@uthscsa.edu