ELB19F

Course Description

This is an entry level course aimed that those with a reasonable biological background but no significant experience with bioinformatics. The course is broadly based around a series of exercises in which a combination of simple analytical tools and reference to publicly available databases is applied to the investigation of a single human gene. The training manual for the course is comprised of detailed instructions for the tasks undertaken. Included are, questions (with answers) and discussion of and the interpretation of the results achieved.

Participants are asked to imagine an interest in the disease aniridia. Course exercises then provide extremely detailed instruction leading participants to discover the gene primarily associated with this disease and all that is interesting about that gene and its protein products.

This course will also provide a soft introduction to Next Generation Sequencing (NGS) data analysis. This part of the course aims at providing basic skills that are needed when one needs to process NGS data, such as evaluating data quality, trimming sequences, changing data formats, visualising data, etc. Then, participants will learn how to address a simple transcriptomics problem, stepwise, using open source bioinformatics tools.

Target Audience

This course is intended for those wishing to investigate how they might begin to exploit the ever-expanding abundance of computing and data resources for researchers seeking help in using them.

Course Documentation

Note - All the datasets used for this training course is available in the following button. You need to unzip this file and follow the instructions throughout the documentation.

Download ELB19F Datasets _{File Size: 185,9MB}

Note - Powerpoint presentations for Days 1-3

Download ELB19F Presentations _{File Size: 3,6MB}

Day 1

1 - Short introduction of what is Bioinformatics

     Discussion about the best approach for the definition of Bioinformatics

2 - Genome Databases and Tools

     Investigating the gene(s) associated with the disease Aniridia

Day 2

3 - Graphical and Textual Pairwise Alignments

     Global vs. Sensitive Local Paiwise Sequence Comparison

Click here for Introductory Video of Pairwise Alignments

4 - Databases Searching Methods (primarly blast)

     Database searching to determine gene structure
     Iterative database searching to discover and align sequence families (psi-blast & cobalt)

Day 3

5 - Primer Design

     Primer design for the gene(s) responsible of Aniridia

6 - Protein Structure

     Simple Protein Sequence Analysis
     Secondary Structure Prediction
     Protein Domain/Motifs Databases

7 - Multiple Sequence Alignment

     Use of various software tools and their differences between them

Optional: Some Background Materials for the High ThroughPut Sequencing (HTS) Section

FASTA & FASTQ Sequencing File Formats

Read Pair Sequencing - Overview

Read Pair Sequencing - Generation Techniques

Day 4

8 - Broadly describe the High Throughput Sequencing Workflow

     A small introduction about the common steps in most high throughput sequencing workflows

9: Interpret and Manipulate raw sequencing data

9.1 - The FastQ file format

       What information is in fastq files, and how is it organized

9.2 - Use FastQC to analyse the quality of data in a fastq file

       Detect low-quality bases in the QC reports  
       Detect sequence bias and the possible presence of adaptors and other contaminants

9.3 - Use Trimmomatic to improve the quality of data in a fastq file

       Use trimmomatic to filter/trim low-quality bases from your reads
       Remove adaptors (eg. illumina adaptors) from your reads
       Check results using FastQC on filtered data

10: Align HTS data against a genome

10.1 - Use the BWA aligner to align HTS data against a genome

      Know the challenges of aligning millions of short reads to a genome
      The  burrows-wheeler transform aligner method

10.2 - The SAM/BAM alignment format

      What is the SAM/BAM format, its contents and the differences between them

Day 5

11: Visualize alignments

11.1 - Use Qualimap to assess the quality of alignments

        Interpret general alignment statistics such as the percentage of aligned reads

11.2 - Use IGV to visualize the content of a BAM file

12: Broadly describe different HTS applications

12.1 - Variant detection in resequencing experiments

       Understand the process of finding variants from alignments
       Use freebayes to infer variants
       Use of VEP online tool to infer the impact of variants

12.2 - Denovo genome assembly and annotation

       How to obtain complete genome from the assembly of millions of short reads
       Understand the different factors that can affect the genome assembly process

12.3 - Transcriptomics using RNA-Seq

       Run hisat2 to align RNA-Seq reads against a reference genome
       Generate gene counts from alignments and a reference annotation through htseq-counts
       Use DESeq2 to calculate differential gene expression from the counts generated

12.4 - 16S Metagenomics
12.5 - Epigenetics

Learning Objectives and Course Pre-requisites

Instructors

The source for this course webpage is on github.

ELB19F by GTPB is licensed under a Creative Commons Attribution 4.0 International License.

ELB19F Entry Level Bioinformatics (First Course in 2019)