3DAROC21

Course Description

3C-based methods, such as Hi-C, produce a huge amount of raw data as pairs of DNA reads that are in close spatial proximity in the cell nucleus. Overall, those interaction matrices have been used to study how the genome folds within the nucleus, which is one of the most fascinating problems in modern biology. The rigorous analysis of those paired-reads using computational tools has been essential to fully exploit the experimental technique, and to study how the genome is folded in space. Currently, there is a clear expansion on the wealth of data on genome structure with the availability of many datasets of Hi-C experiments down to 1Kb resolution (see for example: http://hic.umassmed.edu/welcome/welcome.php ; http://promoter.bx.psu.edu/hi-c/view.php or http://www.aidenlab.org/data.html).

In this course, participants will learn to use TADbit, a software designed and developed to manage all dimensionalities of the Hi-C data:

1D - Map paired-end sequences to generate Hi-C interaction matrices
2D - Normalize matrices and identify constitutive domains (TADs, compartments)
3D - Generate populations of structures which satisfy the Hi-C interaction matrices
4D - Compare samples at different time points

Participants can bring specific biological questions and/or their own 3C-based data to analyze during the course. At the end of the course, participants will be familiar with the TADbit software and will be able to fully analyze Hi-C data. Although the TADbit software is central in this course, alternative software will be discussed for each part of the analysis.

Target Audience

The course design is oriented towards experimental researchers and bioinformaticians at the graduate and post-graduate levels. The last edition of this course was attended by people with different backgrounds and interested in the genome organization.

It is likely that the participants to this course aim at getting involved in generating Hi-C data for chromosome structure determination or that they just want to be able to correctly interpret and analyse publicly available data.

Detailed Program

All the datasets used for this training course is available throughout the documentation.

Days	Lectures (pdf)	Core pipeline (notebooks)
Day 1	Welcome and introduction Overview on structure determination Overview of 3D genomes Live Introduction to Linux and working environment Introduction to TADbit and data handling for 3C-based experiments From raw data to Hi-C contact matrices	01-Inspect_Hi-C_input_data.ipynb 02-Generate_mapper_index.ipynb 03-Hi-C_Quality_Check.ipynb 04-Mapping.ipynb 05-Parsing.ipynb
Day 2	Chromatin structure and 3C data Integrative modeling applied to chromatin with TADbit Hi-C contact matrices: filtering Biological applications I: TAD response to hormone treatment) Biological applications II: SOX2 gene activation dynamics)	06-Intersecting_and_Filtering_mapped_reads.ipynb 07-Bin_filtering_and_normalization.ipynb 08a-Compartments_detection.ipynb
Day 3	Comparison between experiments Biological applications: III: IMGR, combining image and Hi-C data for modeling)	08b-TADs_detection.ipynb 09-Compare_and_merge_Hi-C_experiments.ipynb 09b_CHESS_Compare_chromatin_contact_data.ipynb 10-Modeling_parameters_optimization.ipynb 11-3D_Models_production_and_analysis.ipynb 12-3D_Modeling_molecular_dynamics_TADdyn.ipynb Higlass.ipynb