ELB18F Entry Level Bioinformatics (First Course in 2018)
Course Description
This is an entry level course aimed that those with a reasonable biological background but no significant experience with bioinformatics. The course is broadly based around a series of exercises in which a combination of simple analytical tools and reference to publicly available databases is applied to the investigation of a single human gene. The training manual for the course is comprised of detailed instructions for the tasks undertaken. Included are, questions (with answers) and discussion of and the interpretation of the results achieved.
Participants are asked to imagine an interest in the disease aniridia. Course exercises then provide extremely detailed instruction leading participants to discover the gene primarily associated with this disease and all that is interesting about that gene and its protein products.
This course will also provide a soft introduction to Next Generation Sequencing (NGS) data analysis. This part of the course aims at providing basic skills that are needed when one needs to process NGS data, such as evaluating data quality, trimming sequences, changing data formats, visualising data, etc. Then, participants will learn how to address a simple transcriptomics problem, stepwise, using open source bioinformatics tools.
Target Audience
This course is intended for those wishing to investigate how they might begin to exploit the ever-expanding abundance of computing and data resources for researchers seeking help in using them.
Course Documentation
Note - All the datasets used for this training course is available in the following button. You need to unzip this file and follow the instructions throughout the documentation.
Download ELB18S Datasets File Size: 185,9MB
Day 1
1 - Short introduction of what is Bioinformatics
Discussion about the best approach for the definition of Bioinformatics
2 - Genome Databases and Tools
Investigating the gene(s) associated with the disease Aniridia
Day 2
3 - Graphical and Textual Pairwise Alignments
Global vs. Sensitive Local Paiwise Sequence Comparison
4 - Databases Searching Methods (primarly blast)
Database searching to determine gene structure
Iterative database searching to discover and align sequence families (psi-blast & cobalt)
Day 3
5 - Primer Design
Primer design for the gene(s) responsible of Aniridia
Simple Protein Sequence Analysis
Secondary Structure Prediction
Protein Domain/Motifs Databases
7 - Multiple Sequence Alignment
Use of various software tools and their differences between them
Day 4
8 - Broadly describe the High Throughput Sequencing Workflow
A small introduction about the common steps in most high throughput sequencing workflows
9: Interpret and Manipulate raw sequencing data
-
9.1 - The FastQ file format
What information is in fastq files, and how is it organized
-
9.2 - Use FastQC to analyse the quality of data in a fastq file
Detect low-quality bases in the QC reports Detect sequence bias and the possible presence of adaptors and other contaminants
-
9.3 - Use Trimmomatic to improve the quality of data in a fastq file
Use trimmomatic to filter/trim low-quality bases from your reads Remove adaptors (eg. illumina adaptors) from your reads Check results using FastQC on filtered data
10: Align HTS data against a genome
-
10.1 - Use the BWA aligner to align HTS data against a genome
Know the challenges of aligning millions of short reads to a genome The burrows-wheeler transform aligner method
-
10.2 - The SAM/BAM alignment format
What is the SAM/BAM format, its contents and the differences between them
Day 5
11: Visualize alignments
-
11.1 - Use Qualimap to assess the quality of alignments
Interpret general alignment statistics such as the percentage of aligned reads
12: Broadly describe different HTS applications
-
12.1 - Variant detection in resequencing experiments
Understand the process of finding variants from alignments Use freebayes to infer variants Use of VEP online tool to infer the impact of variants
-
12.2 - Denovo genome assembly and annotation
How to obtain complete genome from the assembly of millions of short reads Understand the different factors that can affect the genome assembly process
-
12.3 - Transcriptomics using RNA-Seq
Run hisat2 to align RNA-Seq reads against a reference genome Generate gene counts from alignments and a reference annotation through htseq-counts Use DESeq2 to calculate differential gene expression from the counts generated
-
12.4 - 16S Metagenomics
-
12.5 - Epigenetics
Learning Objectives and Course Pre-requisites
Instructors
The source for this course webpage is on github.
ELB18F by GTPB is licensed under a Creative Commons Attribution 4.0 International License.