Mobile Logo in White


NGS Data Delivery and Storage Policy

NGS Data Delivery & Storage Policy

NGS data is “Big Data”. Every sequencing run generates hundreds of gigabytes of data. Depending on the services you request from the GSF, you will get between several to hundreds of GB for every service project. It is your responsibility to get your data, keep it safe, and back up.

The GSF delivers demultiplexed FASTQ files (gzip compressed), containing only the reads from clusters passing the Illumina quality filter to users. By default, we do not trim the sequencing data. The FASTQ is a text-based sequence file format generated from the sequencer that stores raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from Illumina sequencing systems and can be used as input for various secondary data analysis solutions.

FASTQ sequencing files provided by the Genome Sequencing Facility will be stored in GSF’s server for one year. The GSF uses an SFTP or Dropbox account for sequencing data delivery. The GSF’s FASTQ sequencing files will generally be available in the user’s account for three months. It is recommended that the investigators download and archive their sequencing results as soon as they receive their data link.

The GSF delivers the following two files with your NGS data download:

  • md5 checksum – you can use this file to verify the integrity of your download
  • fastq files – generally zipped or tar format to deliver the sequencing data as a single file

For a single-cell analysis project with 10X Genomics, the GSF delivers the following two files:

  • Cellranger count output:
    • We run cellranger count on all single-cell gene expression samples. Inside the top directory of your download is a directory for each sample by name that contains the results from the count step of the cellranger pipeline. The web_summary.html is likely what you want to look at first—the cloupe. Cloupe files can be opened in the loupe browser supplied by 10x. You will be able to find this file in the following path: [run_id]/Sample_[name]/outs
  • Fastq’s:
    • We use bcl2fastq2 to demultiplex all sequencing data. You will notice that each sample will have 4 fastq directories associated with it (one for each of the four barcodes in the 10x barcode set for each sample), named Sample_[number]_[letter].

Please contact us if you would like assistance interpreting the results produced by cellranger. We
will do our best to answer any questions, or we can guide you toward assistance and resources
provided by 10x.

For the NGS project requested with bioinformatics analysis, please contact bioinformatics director Dr.
Yidong Chen at cheny8@uthscsa.edu for analyzed data format and delivery.

GSF_NGSData_DeliveryStoragePolicy_Aug2024.pdf

Inquire About Our Services:

The Genome Sequencing Facility Provides genomic service of Illumina next-generation sequencing and single-cell analysis using 10X Genomics for researchers, both inside and outside of UT Health San Antonio, other academic institutions, and the biotechnology and pharmaceutical industries.

Zhao Lai, PhD
Director, Genome Sequencing Facility
Greehey CCRI
UT Health San Antonio
8403 Floyd Curl Dr.
San Antonio, TX, 78229, USA
Office: GCCRI 4.100.14
Phone: (210) 562-9246
laiz@uthscsa.edu