# Blind hackathon
There are plenty of online tools for biological interpretation available. Most of them carry out functional enrichment of gene groups with particular properties (e.g. common biological pathway). 

_Why are there so many tools?_

This is a very good question! 

Possible reasons:

- No consensus on the "best" algorithm for the enrichment
- Additional features like using knowledge about protein-protein interaction networks
- Continuously changing knowledgebases
- Competition for the most user-friendly and visually appealing framework

_Example for enriched KEGG pathway. Proteins/genes of the submitted list are labeled green._
![](resources/images/hsa05144.png)

Here, you will look into several online enrichment tools and compare the results. Check out this [tutorial](https://hbctraining.github.io/Training-modules/DGE-functional-analysis/lessons/02_functional_analysis.html) for a detailed overview of the methods. 

In order to spice this exercise up, you get only a data table without further information about the experiment. 

Hence, this blind hackathon will test the capability of the tools to answer simple questions like "Which cell type / tissue?" and "What the hell happened to these cells?"

**What will you learn?**

You will get a better feeling into how much you can trust the information obtained from gene set enrichment

You will get an overview of available tools

You will learn to use them


### List of tools

This is list is by no ways complete but should contain the major players in the omics field. For a more complete overview, e.g. take a look into https://bio.tools/t?operationID=%22operation_2436%22

- DAVID https://david.ncifcrf.gov/ 
- g:Profiler https://biit.cs.ut.ee/gprofiler 
- StringDB http://string-db.org/ 
- EnrichNet http://www.enrichnet.org
- GeneTrail https://genetrail2.bioinf.uni-sb.de/
- clusterProfiler https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
- Cytoscape app ClueGO http://apps.cytoscape.org/apps/cluego
- GOrilla http://cbl-gorilla.cs.technion.ac.il/ 
- Reactome https://reactome.org 
- Panther http://www.pantherdb.org/ 
- IMPaLA http://impala.molgen.mpg.de/
- Use your own way

Further alternatives: 
WebGestalt, L2L, Gage, GOseq, SeqGSA, IMPaLA 

_Interface of g:Profiler_
![](resources/images/g_Profiler.png)

### <span style="background-color:#ddffdd;border-left:6px;">Tasks</span>

You will use and test tools for biological interpretation. Please make notes to answer the following questions. You can either write them in this notebook or use a separate text document. 

#### Selection and testing
üìì Select 5 tools from the list and use their included examples to understand how they work and what information they provide  
üë®‚Äçüíª Open the given data table and find out which species are you dealing with  
üìì Before you start, write down a sentence for each tool: How do you expect the tool to perform in comparison?

#### General knowledge
‚ùî What does functional enrichment mean?  
‚ùî What type of functional enrichment can you carry out?  
‚ùî Which information is tested for enrichment?

#### Data preparation
üë®‚Äçüíª Load the data tables


[Proteomics data set I](resources/data/UniprotAccsComma.txt)   
Platelet granule proteome taking the study https://www.ncbi.nlm.nih.gov/pubmed/24549006, most likely outcome (KEGG pathway): platelet activation

[Proteomics data set II](resources/data/UniprotAccs.csv)   
Malaria study of blood from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5137300/ 


[Gene set data set](resources/data/gene_set.txt)

<span style="background-color:#ffdddd;border-left:6px;">Note: We highly recommend to use the R cell below</span>

  ‚ùî What information does the data table contain?  
  ‚ùî Which information would you use for subsequent data interpretation?   
  ‚ùî What are the criteria to select genes/proteins for further processing?   
üìì Give 3 statements on the criteria

#### Run tools, compare and benchmark
Apply the tools and answer the following questions:  
 ‚ùî How easy was it?  
 ‚ùî Do the results compare?  
 ‚ùî Which tools would you prefer and why?  
 ‚ùî Which annotations do you mainly use for interpretation?  
üìì Collect the top three annotations you consider the ones best describing the data.   
 ‚ùî Which cell line / tissue?  
 ‚ùî Any idea on disease or treatment?

#### Disclosure and discussion
üí¨ Wrap your impressions and present them to the group



In [3]:
# Read the files in here using e.g. read.csv()