Bioinformatics

Custom bioinformatics

Sequencer output - FASTQ

DNA reads aligned to reference

RNA reads aligned to reference with identification of splice junctions

Genotypes and variants

 

Custom bioinformatics

The genomics core is well equiped to execute custom bioinformatics tasks and projects that go beyond the standard bioinformatics pipelines detailed below. We welcome requests to use our expertise, programming skills, and computing infractructure.

Sequencer output - FASTQ 

Where?

Exome, Full genome, RNA-seq

What?

The raw sequence information is provided per sample in fastq format. Fastq files combine the nucleotide sequence with per base quality scores, and are supplemented with a FASTQC quality report.

How generated?

Fastq filles are generated with Illumina's CASAVA demultiplexing software and the FASTQC base quality reporting tool.

Why interesting?

This raw material is the starting point for bioinformatics analysis.  In combination with appropriate bioinformatics expertise and computing infrastructure, the storage of fastq files allows for the reproduction of all downstream results, and for reanalysis with updated or alternative pipelines.

DNA reads aligned to reference

Where?

Exome, Full genome

What?

Alignment of reads from a fastq file to the reference genome in SAM (Sequence Alignment/Map) format. For efficiency, results are provided in the BAM (Binary Alignment/Map) format, which is a compressed binary version of SAM.  In addition, an alignment quality report is provided.

How generated?

Two independent pipelines are used to align DNA reads:

For both pipelines, BAM files are supplemented with an alignment quality report generated with Picard.

Why interesting?

The alignment of the raw sequencing reads to a reference can be demanding in terms computational resources.  BAM files contain the results of those computations in an indexed and highly compressed format that can be used for interactive exploration of the genome with desktop applications such as IGV.  BAM files also serve as the starting point for genotyping and variant discovery and for multisample analyses.

RNA-seq reads aligned to reference with identification of splice junctions

Where?

RNA-seq

What?

Similar to alignment of DNA reads, but taking into account that some reads may span several exons. The results include, in addition to the BAM file with aligned reads, a list of identified exon-exon splice junctions.

How generated?

Also for the alignment of RNA-seq reads, two independent pipelines are used:

Why interesting?

In addition to the benefits of aligned DNA reads, the results of RNA-seq alignment include all information required for transcriptome reconstruction and (differential) expression analysis.

Genotypes and variants

Where?

Exome, Full genome, RNA-seq

What?

Deviations from the reference genome are listed and functionally annotated. These "abnormalities" include single-nucleotide polymorphisms (SNPs) and small insertions and deletions. Depending on the pipeline, they are stored in tabular text format or in the VCF format introduced by the 1000 Genomes project.  Functional annotation for single or combined samples is generated in Excel-compatible files.  In case multiple samples are combined, reference calls are added where needed to confirm the absence of a variant observed in other samples.

How generated?

Variants are called with two independent pipelines:

The variants output by both pipelines are functionally annotated using Annovar. In-house tools are used to combine several samples.

Why interesting?

Annotated genotypes and variants, for individual or combined samples, are a main end point in the sequence analysis pipeline and can be interpreted and further processed by experts in the biological domain without any background in bioinformatics. 

 

Contact: genomicscore@uzleuven.be +32 16 33 08 21