Course Description

This is an entry level course aimed that those with a reasonable biological background but no significant experience with bioinformatics. The course is broadly based around a series of exercises in which a combination of simple analytical tools and reference to publicly available databases is applied to the investigation of a single human gene. The training manual for the course is comprised of detailed instructions for the tasks undertaken. Included are, questions (with answers) and discussion of and the interpretation of the results achieved.

Participants are asked to imagine an interest in the disease aniridia. Course exercises then provide extremely detailed instruction leading participants to discover the gene primarily associated with this disease and all that is interesting about that gene and its protein products.

This course will also provide a soft introduction to Next Generation Sequencing (NGS) data analysis. This part of the course aims at providing basic skills that are needed when one needs to process NGS data, such as evaluating data quality, trimming sequences, changing data formats, visualising data, etc. Then, participants will learn how to address a simple transcriptomics problem, stepwise, using open source bioinformatics tools.

Target Audience

This course is intended for those wishing to investigate how they might begin to exploit the ever-expanding abundance of computing and data resources for researchers seeking help in using them.

Course Documentation

Note - All the datasets used for this training course is available in the following button. You need to unzip this file and follow the instructions throughout the documentation.

Download ELB18S Datasets File Size: 185,9MB

Day 1

1 - Short introduction of what is Bioinformatics

     Discussion about the best approach for the definition of Bioinformatics

2 - Genome Databases and Tools

     Investigating the gene(s) associated with the disease Aniridia      

Day 2

3 - Graphical and Textual Pairwise Alignments

     Global vs. Sensitive Local Paiwise Sequence Comparison

4 - Databases Searching Methods (primarly blast)

     Database searching to determine gene structure
     Iterative database searching to discover and align sequence families (psi-blast & cobalt)

Day 3

5 - Primer Design

     Primer design for the gene(s) responsible of Aniridia

6 - Protein Structure

     Simple Protein Sequence Analysis
     Secondary Structure Prediction
     Protein Domain/Motifs Databases

7 - Multiple Sequence Alignment

     Use of various software tools and their differences between them

Day 4

8 - Broadly describe the High Throughput Sequencing Workflow

     A small introduction about the common steps in most high throughput sequencing workflows

9: Interpret and Manipulate raw sequencing data

10: Align HTS data against a genome

Day 5

11: Visualize alignments

12: Broadly describe different HTS applications

  • 12.1 - Variant detection in resequencing experiments

           Understand the process of finding variants from alignments
           Use freebayes to infer variants
           Use of VEP online tool to infer the impact of variants
  • 12.2 - Denovo genome assembly and annotation

           How to obtain complete genome from the assembly of millions of short reads
           Understand the different factors that can affect the genome assembly process
  • 12.3 - Transcriptomics using RNA-Seq

           Run hisat2 to align RNA-Seq reads against a reference genome
           Generate gene counts from alignments and a reference annotation through htseq-counts
           Use DESeq2 to calculate differential gene expression from the counts generated
  • 12.4 - 16S Metagenomics

  • 12.5 - Epigenetics

Learning Objectives and Course Pre-requisites


The source for this course webpage is on github.

Creative Commons License
ELB18S by GTPB is licensed under a Creative Commons Attribution 4.0 International License.