{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Iterative vs fragment-based mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterative mapping first proposed by [(Imakaev et al., 2012)](#cite-Imakaev2012a), allows to map usually a high number of reads. However other methodologies, less \"brute-force\" can be used to take into account the chimeric nature of Hi-C reads.\n", "\n", "A simple alternative is to allow split mapping, just as with RNA-seq data.\n", "\n", "Another way consists in _pre-truncating_ [(Ay and Noble, 2015)](#cite-Ay2015a) reads that contain a ligation site and map only the longest part of the read [(Wingett et al., 2015)](#cite-Wingett2015).\n", "\n", "Finally, an intermediate approach, _fragment-based_, consists in mapping full length reads first, and than splitting unmapped reads at the ligation sites [(Serra, Ba\\`{u, Filion and Marti-Renom, 2016)](#cite-Serra2016)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advantages of iterative mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- It's the only solution when no restriction enzyme has been used (i.e. micro-C)\n", "- Can be faster when few windows (2 or 3) are used" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advantages of fragment-based mapping " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Generally faster\n", "- Safer: mapped reads are generally larger than 25-30 nm (the largest window used in iterative mapping). Less reads are mapped, but the difference is usually canceled or reversed when looking for \"valid-pairs\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "_Note:_ We use __GEM__ [(Marco-Sola, Sammeth, Guig\\'{o and Ribeca, 2012)](#cite-Marco-Sola2012), performance are very similar to Bowtie2, perhaps a bit better. \n", "\n", "_For now TADbit is only compatible with GEM._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mapping" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pytadbit.mapping.full_mapper import full_mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full mapping function can be used to perform either iterative or fragment-based mapping, or a combination of both." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iterative mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here an example of use as iterative mapping:\n", "(Estimated time 15h with 8 cores)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "cell = 'mouse_B' # or mouse_PSC\n", "rep = 'rep1' # or rep2" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "! mkdir -p results/iterativ/$cell\\_$rep\n", "! mkdir -p results/iterativ/$cell\\_$rep/01_mapping" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Preparing FASTQ file\n", " - conversion to MAP format\n", " - trimming reads 1-25\n", "Mapping reads in window 1-25...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_y_ia0A\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_y_ia0A -o /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_y_ia0A_full_1-25\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_y_ia0A\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_y_ia0A_full_1-25.map\n", "Preparing MAP file\n", " - trimming reads 1-35\n", " x removing original input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_y_ia0A_filt_1-25.map\n", "Mapping reads in window 1-35...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_OHfdlX\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_OHfdlX -o /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_OHfdlX_full_1-35\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_OHfdlX\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_OHfdlX_full_1-35.map\n", "Preparing MAP file\n", " - trimming reads 1-45\n", " x removing original input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_OHfdlX_filt_1-35.map\n", "Mapping reads in window 1-45...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_frvaAl\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_frvaAl -o /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_frvaAl_full_1-45\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_frvaAl\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_frvaAl_full_1-45.map\n", "Preparing MAP file\n", " - trimming reads 1-55\n", " x removing original input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_frvaAl_filt_1-45.map\n", "Mapping reads in window 1-55...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_bflwpd\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_bflwpd -o /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_bflwpd_full_1-55\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_bflwpd\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_bflwpd_full_1-55.map\n", "Preparing MAP file\n", " - trimming reads 1-65\n", " x removing original input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_bflwpd_filt_1-55.map\n", "Mapping reads in window 1-65...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_J15GCY\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_J15GCY -o /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_J15GCY_full_1-65\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_J15GCY\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_J15GCY_full_1-65.map\n", "Preparing MAP file\n", " - trimming reads 1-75\n", " x removing original input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_J15GCY_filt_1-65.map\n", "Mapping reads in window 1-75...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_FrNeia\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_FrNeia -o /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_FrNeia_full_1-75\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_FrNeia\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_FrNeia_full_1-75.map\n" ] }, { "data": { "text/plain": [ "['results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-25.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-35.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-45.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-55.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-65.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-75.map']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# for the first side of the reads\n", "full_mapping(gem_index_path='genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem',\n", " out_map_dir='results/iterativ/{0}_{1}/01_mapping/mapped_{0}_{1}_r1/'.format(cell, rep),\n", " fastq_path='FASTQs/%s_%s_1.fastq.dsrc' % (cell,rep),\n", " frag_map=False, clean=True, nthreads=8,\n", " windows=((1,25),(1,35),(1,45),(1,55),(1,65),(1,75)),\n", " temp_dir='results/iterativ/{0}_{1}/01_mapping/mapped_{0}_{1}_r1_tmp/'.format(cell, rep))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And for the second side of the read:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/participant/miniconda2/lib/python2.7/site-packages/pytadbit/mapping/full_mapper.py:452: UserWarning: WARNING: only 58 Gb left on tmp_dir: /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp\n", "\n", " warn('WARNING: only %d Gb left on tmp_dir: %s\\n' % (fspace, temp_dir))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Preparing FASTQ file\n", " - conversion to MAP format\n", " - trimming reads 1-25\n", "Mapping reads in window 1-25...\n", "TO GEM /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_kLhPCR\n", "/home/participant/miniconda2/bin/gem-mapper -I /home/participant/3DAROC/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_kLhPCR -o /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_kLhPCR_full_1-25\n", "Parsing result...\n", " x removing GEM input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_kLhPCR\n", " x removing map /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_kLhPCR_full_1-25.map\n", "Preparing MAP file\n", " - trimming reads 1-35\n", " x removing original input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_kLhPCR_filt_1-25.map\n", "Mapping reads in window 1-35...\n", "TO GEM /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_BhyNKo\n", "/home/participant/miniconda2/bin/gem-mapper -I /home/participant/3DAROC/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_BhyNKo -o /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_BhyNKo_full_1-35\n", "Parsing result...\n", " x removing GEM input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_BhyNKo\n", " x removing map /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_BhyNKo_full_1-35.map\n", "Preparing MAP file\n", " - trimming reads 1-45\n", " x removing original input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_BhyNKo_filt_1-35.map\n", "Mapping reads in window 1-45...\n", "TO GEM /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_jLDBI4\n", "/home/participant/miniconda2/bin/gem-mapper -I /home/participant/3DAROC/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_jLDBI4 -o /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_jLDBI4_full_1-45\n", "Parsing result...\n", " x removing GEM input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_jLDBI4\n", " x removing map /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_jLDBI4_full_1-45.map\n", "Preparing MAP file\n", " - trimming reads 1-55\n", " x removing original input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_jLDBI4_filt_1-45.map\n", "Mapping reads in window 1-55...\n", "TO GEM /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_CH6gDR\n", "/home/participant/miniconda2/bin/gem-mapper -I /home/participant/3DAROC/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_CH6gDR -o /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_CH6gDR_full_1-55\n", "Parsing result...\n", " x removing GEM input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_CH6gDR\n", " x removing map /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_CH6gDR_full_1-55.map\n", "Preparing MAP file\n", " - trimming reads 1-65\n", " x removing original input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_CH6gDR_filt_1-55.map\n", "Mapping reads in window 1-65...\n", "TO GEM /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_N4nuzN\n", "/home/participant/miniconda2/bin/gem-mapper -I /home/participant/3DAROC/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_N4nuzN -o /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_N4nuzN_full_1-65\n", "Parsing result...\n", " x removing GEM input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_N4nuzN\n", " x removing map /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_N4nuzN_full_1-65.map\n", "Preparing MAP file\n", " - trimming reads 1-75\n", " x removing original input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_N4nuzN_filt_1-65.map\n", "Mapping reads in window 1-75...\n", "TO GEM /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_4B0q5i\n", "/home/participant/miniconda2/bin/gem-mapper -I /home/participant/3DAROC/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_4B0q5i -o /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_4B0q5i_full_1-75\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Parsing result...\n", " x removing GEM input /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_4B0q5i\n", " x removing map /home/participant/3DAROC/Notebooks/results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_4B0q5i_full_1-75.map\n" ] }, { "data": { "text/plain": [ "['results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-25.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-35.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-45.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-55.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-65.map',\n", " 'results/iterativ/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-75.map']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# for the second side of the reads\n", "full_mapping(gem_index_path='genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem',\n", " out_map_dir='results/iterativ/{0}_{1}/01_mapping/mapped_{0}_{1}_r2/'.format(cell, rep),\n", " fastq_path='FASTQs/%s_%s_2.fastq.dsrc' % (cell,rep),\n", " frag_map=False, clean=True, nthreads=8,\n", " windows=((1,25),(1,35),(1,45),(1,55),(1,65),(1,75)), \n", " temp_dir='results/iterativ/{0}_{1}/01_mapping/mapped_{0}_{1}_r2_tmp/'.format(cell, rep))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fragment-based mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With fragment based mapping it would be:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! mkdir -p results/fragment/$cell\\_$rep\n", "! mkdir -p results/fragment/$cell\\_$rep/01_mapping" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Preparing FASTQ file\n", " - conversion to MAP format\n", "Mapping reads in window 1-end...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU -o /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU_full_1-end\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU_full_1-end.map\n", " - splitting into restriction enzyme (RE) fragments using ligation sites\n", " - ligation sites are replaced by RE sites to match the reference genome\n", " * enzymes: MboI & MboI, ligation site: GATCGATC, RE site: GATC & GATC\n", "Preparing MAP file\n", " x removing pre-GEM input /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU_filt_1-end.map\n", "Mapping fragments of remaining reads...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_Z4QyOF\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_Z4QyOF -o /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_Z4QyOF_frag_1-end\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_Z4QyOF\n", " x removing failed to map /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_NV8xRU_fail.map\n", " x removing tmp mapped /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1_tmp/mouse_B_rep1_1.dsrc_Z4QyOF_frag_1-end.map\n" ] }, { "data": { "text/plain": [ "['results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_full_1-end.map',\n", " 'results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r1/mouse_B_rep1_1.dsrc_frag_1-end.map']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# for the first side of the reads \n", "full_mapping(gem_index_path='genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem',\n", " out_map_dir='results/fragment/{0}_{1}/01_mapping/mapped_{0}_{1}_r1/'.format(cell, rep),\n", " fastq_path='FASTQs/%s_%s_1.fastq.dsrc' % (cell, rep),\n", " r_enz='MboI', frag_map=True, clean=True, nthreads=8, \n", " temp_dir='results/fragment/{0}_{1}/01_mapping/mapped_{0}_{1}_r1_tmp/'.format(cell, rep))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Preparing FASTQ file\n", " - conversion to MAP format\n", "Mapping reads in window 1-end...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa -o /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa_full_1-end\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa\n", " x removing map /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa_full_1-end.map\n", " - splitting into restriction enzyme (RE) fragments using ligation sites\n", " - ligation sites are replaced by RE sites to match the reference genome\n", " * enzymes: MboI & MboI, ligation site: GATCGATC, RE site: GATC & GATC\n", "Preparing MAP file\n", " x removing pre-GEM input /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa_filt_1-end.map\n", "Mapping fragments of remaining reads...\n", "TO GEM /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_wcm6hL\n", "/home/dcastillo/miniconda2/bin/gem-mapper -I /scratch/workspace/3DAROC_master/Notebooks/genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem -q offset-33 -m 0.04 -s 0 --allow-incomplete-strata 0.00 --granularity 10000 --max-decoded-matches 1 --min-decoded-strata 0 --min-insert-size 0 --max-insert-size 0 --min-matched-bases 0.8 --gem-quality-threshold 26 --max-big-indel-length 15 --mismatch-alphabet ACGT -E 0.30 --max-extendable-matches 20 --max-extensions-per-match 1 -e 0.04 -T 8 -i /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_wcm6hL -o /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_wcm6hL_frag_1-end\n", "Parsing result...\n", " x removing GEM input /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_wcm6hL\n", " x removing failed to map /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_qGXbqa_fail.map\n", " x removing tmp mapped /scratch/workspace/3DAROC_master/Notebooks/results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2_tmp/mouse_B_rep1_2.dsrc_wcm6hL_frag_1-end.map\n" ] }, { "data": { "text/plain": [ "['results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_full_1-end.map',\n", " 'results/fragment/mouse_B_rep1/01_mapping/mapped_mouse_B_rep1_r2/mouse_B_rep1_2.dsrc_frag_1-end.map']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# for the second side of the reads\n", "full_mapping(gem_index_path='genome/Mus_musculus-GRCm38.p6/Mus_musculus.gem',\n", " out_map_dir='results/fragment/{0}_{1}/01_mapping/mapped_{0}_{1}_r2/'.format(cell, rep),\n", " fastq_path='FASTQs/%s_%s_2.fastq.dsrc' % (cell, rep),\n", " r_enz='MboI', frag_map=True, clean=True, nthreads=8, \n", " temp_dir='results/fragment/{0}_{1}/01_mapping/mapped_{0}_{1}_r2_tmp/'.format(cell, rep))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References\n", "\n", "[^](#ref-1) Imakaev, Maxim V and Fudenberg, Geoffrey and McCord, Rachel Patton and Naumova, Natalia and Goloborodko, Anton and Lajoie, Bryan R and Dekker, Job and Mirny, Leonid A. 2012. _Iterative correction of Hi-C data reveals hallmarks of chromosome organization._. [URL](http://www.ncbi.nlm.nih.gov/pubmed/22941365)\n", "\n", "[^](#ref-2) Ay, Ferhat and Noble, William Stafford. 2015. _Analysis methods for studying the 3D architecture of the genome_. [URL](http://genomebiology.com/2015/16/1/183)\n", "\n", "[^](#ref-3) Wingett, Steven and Ewels, Philip and Furlan-Magaril, Mayra and Nagano, Takashi and Schoenfelder, Stefan and Fraser, Peter and Andrews, Simon. 2015. _HiCUP: pipeline for mapping and processing Hi-C data._. [URL](http://f1000research.com/articles/4-1310/v1)\n", "\n", "[^](#ref-4) Serra, Fran\\c{cois and Ba\\`{u, Davide and Filion, Guillaume and Marti-Renom, Marc A.. 2016. _Structural features of the fly chromatin colors revealed by automatic three-dimensional modeling._. [URL](http://biorxiv.org/content/early/2016/01/15/036764)\n", "\n", "[^](#ref-5) Marco-Sola, Santiago and Sammeth, Michael and Guig\\'{o, Roderic and Ribeca, Paolo. 2012. _The GEM mapper: fast, accurate and versatile alignment by filtration_.\n", "\n" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "toc_cell": true, "toc_position": {}, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }