ELB18F Entry Level Bioinformatics (First Course in 2018)
This is an entry level course aimed that those with a reasonable biological background but no significant experience with bioinformatics. The course is broadly based around a series of exercises in which a combination of simple analytical tools and reference to publicly available databases is applied to the investigation of a single human gene. The training manual for the course is comprised of detailed instructions for the tasks undertaken. Included are, questions (with answers) and discussion of and the interpretation of the results achieved.
Participants are asked to imagine an interest in the disease aniridia. Course exercises then provide extremely detailed instruction leading participants to discover the gene primarily associated with this disease and all that is interesting about that gene and its protein products.
This course will also provide a soft introduction to Next Generation Sequencing (NGS) data analysis. This part of the course aims at providing basic skills that are needed when one needs to process NGS data, such as evaluating data quality, trimming sequences, changing data formats, visualising data, etc. Then, participants will learn how to address a simple transcriptomics problem, stepwise, using open source bioinformatics tools.
This course is intended for those wishing to investigate how they might begin to exploit the ever-expanding abundance of computing and data resources for researchers seeking help in using them.
Note - All the datasets used for this training course is available in the following button. You need to unzip this file and follow the instructions throughout the documentation.
Download ELB18S Datasets File Size: 185,9MB
Discussion about the best approach for the definition of Bioinformatics
Investigating the gene(s) associated with the disease Aniridia
Global vs. Sensitive Local Paiwise Sequence Comparison
Database searching to determine gene structure Iterative database searching to discover and align sequence families (psi-blast & cobalt)
5 - Primer Design
Primer design for the gene(s) responsible of Aniridia
Simple Protein Sequence Analysis Secondary Structure Prediction Protein Domain/Motifs Databases
Use of various software tools and their differences between them
A small introduction about the common steps in most high throughput sequencing workflows
9: Interpret and Manipulate raw sequencing data
9.1 - The FastQ file format
What information is in fastq files, and how is it organized
Detect low-quality bases in the QC reports Detect sequence bias and the possible presence of adaptors and other contaminants
Use trimmomatic to filter/trim low-quality bases from your reads Remove adaptors (eg. illumina adaptors) from your reads Check results using FastQC on filtered data
10: Align HTS data against a genome
Know the challenges of aligning millions of short reads to a genome The burrows-wheeler transform aligner method
10.2 - The SAM/BAM alignment format
What is the SAM/BAM format, its contents and the differences between them
11: Visualize alignments
Interpret general alignment statistics such as the percentage of aligned reads
12: Broadly describe different HTS applications
Understand the process of finding variants from alignments Use freebayes to infer variants Use of VEP online tool to infer the impact of variants
How to obtain complete genome from the assembly of millions of short reads Understand the different factors that can affect the genome assembly process
Run hisat2 to align RNA-Seq reads against a reference genome Generate gene counts from alignments and a reference annotation through htseq-counts Use DESeq2 to calculate differential gene expression from the counts generated
12.4 - 16S Metagenomics
12.5 - Epigenetics
The source for this course webpage is on github.
ELB18F by GTPB is licensed under a Creative Commons Attribution 4.0 International License.