{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download the data from NCBI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use data from [(Stadhouders R, Vidal E, Serra F, Di Stefano B et al. 2018)](#cite-ralph), which comes from mouse cells where Hi-C experiment where conducted in different states during highly-efficient somatic cell reprogramming.\n", "\n", "The data can be downloaded from:\n", "\n", "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53463\n", "\n", "Once downloaded the files can be passed to the FASTQ format in order for TADbit to read them. This can be done with the fastq-dump program from SRA Toolkit (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! fastq-dump -A SRR5344921 -DQ '+' --defline-seq '@$ac.$si' -X 10000000 --split-files --outdir FASTQs/\n", "! mv FASTQs/SRR5344921_1.fastq FASTQs/mouse_B_rep1_1.fastq\n", "! mv FASTQs/SRR5344921_2.fastq FASTQs/mouse_B_rep1_2.fastq\n", "! fastq-dump -A SRR5344925 -DQ '+' --defline-seq '@$ac.$si' -X 10000000 --split-files --outdir FASTQs/\n", "! mv FASTQs/SRR5344925_1.fastq FASTQs/mouse_B_rep2_1.fastq\n", "! mv FASTQs/SRR5344925_2.fastq FASTQs/mouse_B_rep2_2.fastq\n", "! fastq-dump -A SRR5344969 -DQ '+' --defline-seq '@$ac.$si' -X 10000000 --split-files --outdir FASTQs\n", "! mv FASTQs/SRR5344969_1.fastq FASTQs/mouse_PSC_rep1_1.fastq\n", "! mv FASTQs/SRR5344969_2.fastq FASTQs/mouse_PSC_rep1_2.fastq\n", "! fastq-dump -A SRR5344973 -DQ '+' --defline-seq '@$ac.$si' -X 10000000 --split-files --outdir FASTQs/\n", "! mv FASTQs/SRR5344973_1.fastq FASTQs/mouse_PSC_rep2_1.fastq\n", "! mv FASTQs/SRR5344973_2.fastq FASTQs/mouse_PSC_rep2_2.fastq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For TADbit tools we will analyse yeast Hi-C from [(Lazar-Stefanita L et al.,2017)](#cite-lazar)\n", "\n", "The data can be downloaded from :\n", "\n", "https://www.ncbi.nlm.nih.gov/bioproject/PRJNA356300" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! fastq-dump -A SRR5077820 -DQ '+' --defline-seq '@$ac.$si' --split-files --outdir FASTQs/\n", "! mv FASTQs/SRR5077820_1.fastq FASTQs/yeast_rep1_1.fastq\n", "! mv FASTQs/SRR5077820_2.fastq FASTQs/yeast_rep1_2.fastq\n", "! fastq-dump -A SRR5077821 -DQ '+' --defline-seq '@$ac.$si' --split-files --outdir FASTQs/\n", "! mv FASTQs/SRR5077821_1.fastq FASTQs/yeast_rep2_1.fastq\n", "! mv FASTQs/SRR5077821_2.fastq FASTQs/yeast_rep2_2.fastq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the VRE we have human chromosome 21 from one of the samples of [rao](#cite-raopaper) in Cell.\n", "\n", "The data can be downloaded from:\n", "\n", "https://www.ncbi.nlm.nih.gov/sra/?term=SRR1658573" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note: alternatively you can also directly download the FASTQ from http://www.ebi.ac.uk/_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internally we use DSRC [(Roguski and Deorowicz, 2014)](#cite-roguski2014dsrc) that allows better compression ration and, more importatntly, faster decompression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "! dsrc c -t8 FASTQs/mouse_B_rep1_1.fastq FASTQs/mouse_B_rep1_1.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_B_rep1_2.fastq FASTQs/mouse_B_rep1_2.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_B_rep2_1.fastq FASTQs/mouse_B_rep2_1.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_B_rep2_2.fastq FASTQs/mouse_B_rep2_2.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_PSC_rep1_1.fastq FASTQs/mouse_PSC_rep1_1.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_PSC_rep1_2.fastq FASTQs/mouse_PSC_rep1_2.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_PSC_rep2_1.fastq FASTQs/mouse_PSC_rep2_1.fastq.dsrc\n", "! dsrc c -t8 FASTQs/mouse_PSC_rep2_2.fastq FASTQs/mouse_PSC_rep2_2.fastq.dsrc\n", "! dsrc c -t8 FASTQs/yeast_rep1_1.fastq FASTQs/yeast_rep1_1.fastq.dsrc\n", "! dsrc c -t8 FASTQs/yeast_rep1_2.fastq FASTQs/yeast_rep1_2.fastq.dsrc\n", "! dsrc c -t8 FASTQs/yeast_rep2_1.fastq FASTQs/yeast_rep2_1.fastq.dsrc\n", "! dsrc c -t8 FASTQs/yeast_rep2_2.fastq FASTQs/yeast_rep2_2.fastq.dsrc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References\n", "\n", "[^](#ref-1) Stadhouders R, Vidal E, Serra F, Di Stefano B et al. 2018. _Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming_.\n", "\n", "[^](#ref-2) Lazar-Stefanita L et al., _Cohesins and condensins orchestrate the 4D dynamics of yeast chromosomes during the cell cycle_.\n", "\n", "[^](#ref-3) Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL, Suhas S.P. Rao, Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, Ido Machol, Arina D. Omer, Eric S. Lander, Erez Lieberman Aiden. _A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping_.\n", "\n", "[^](#ref-4) Roguski, \\Lukasz and Deorowicz, Sebastian. 2014. _DSRC 2—Industry-oriented compression of FASTQ files_.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "toc_cell": true, "toc_position": {}, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }