Sequence File Formats

Sequence file formats for a variety of data analysis options

Choose your preferred format for downstream analysis of sequencing data

File Formats for Illumina Sequencing

Numerous options are available for converting data to compatible sequence file formats such as FASTQ files, and for downstream analysis of next-generation sequencing (NGS) data. Illumina sequencing systems are designed so data can be easily streamed into cloud-based Illumina informatics platforms for data management, analysis, and collaboration.

Raw data files are provided in sequence file formats that are compatible, or easily converted, to standardized data formats for streamlined aggregation and mining of large cohorts.

FASTQ Sequence File Formats

FASTQ File Format

FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from Illumina sequencing systems, and can be used as input for a wide variety of secondary data analysis solutions.

FASTQ files may contain up to millions of entries and can be several megabytes or gigabytes in size, which often makes them too large to open in a typical text editor. Generally, it is not necessary to view FASTQ files, since they are intermediate output files used as input for tools that perform downstream data analysis.

Learn more about FASTQ files

FASTQ ORA File Format

FASTQ Original Read Archive (ORA) files are lossless data compression files that make it easier to store, manage, and share large NGS data files. This file format reduces file size, time to transfer, and data storage costs. FASTQ ORA files are up to 5x smaller than FASTQ files in traditional fastq.gz format, without compromising data integrity. FASTQ ORA files can be generated with Illumina DRAGEN secondary analysis software.

All fastq.ora file formats can be read using the free DRAGEN ORA Decompression Software provided by Illumina. Once installed, a simple command can be used to pipe the output of decompression into popular mapping tools such as BWA,1 STAR,2 and Bowtie.3

Learn more about DRAGEN secondary analysis

Download DRAGEN ORA Decompression Software

BCL Sequence File Format

Binary base call (BCL) files contain raw data generated by Illumina sequencing systems. The BCL sequence file format requires conversion to FASTQ format for use with user-developed or third-party data analysis tools.

DRAGEN secondary analysis offers rapid BCL conversion to FASTQ files as part of its suite of pipelines. Illumina also offers BCL Convert software to convert BCL files to FASTQ files. BCL Convert is a standalone software solution that demultiplexes data and converts BCL files to standard FASTQ file formats for downstream analysis.

Download BCL Convert software

Other Sequence File Formats

FASTQ files are the typical starting format for sequencing data analysis. However, BaseSpace Sequence Hub can create other file formats that are common to secondary and tertiary analysis programs.

During secondary or tertiary analysis of NGS data, Illumina software platforms and apps often convert raw sequence files from FASTQ files to other sequence file formats (ie, .vcf, .bam) as part of the analysis workflow.

Learn more about BaseSpace Sequence Hub

Additional Resources

Developer Portal

Access user guides, release notes, and additional technical information.

NGS Training

Get hands-on NGS training from expert instructors. We also offer live or self-paced online courses and other educational resources.

Illumina DRAGEN secondary analysis pipelines

DRAGEN secondary analysis pipelines support various NGS experiment types, including genome, exome, transcriptome, and methylome studies.

Genomic Data Storage and Security

Store, process, and share large genomic and NGS datasets in the cloud with built-in speed, security, and scalability.

References
  1. Li H. and Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009 Jul 15; 25(14): 1754–1760.
  2. Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan; 29(1): 15–21.
  3. Langmead B. et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009 10:R25
Interested in receiving newsletters, case studies, and information on genomic analysis techniques?

Enter your email address.