Learning objectives

In this exercise you learn how to

  • align reads against the graph,
  • call variants.

Getting started

Go to the day1_intro folder:

cd ~/day1_intro

Read mapping

Let’s step up to a slightly bigger example.

ls ~/vg/test/1mb1kgp

This directory contains 1Mbp of 1000 Genomes data for chr20:1000000-2000000. As for the tiny example, let’s’ build a graph that encodes the known sequence variation.

vg construct -r 1mb1kgp/z.fa -v 1mb1kgp/z.vcf.gz -m 1024 > z.vg

Let’s build the indexes needed for the mapping:

vg index -L -x z.xg z.vg
vg index -g z.gcsa -k 16 z.vg

Get the reads to map:

wget -c http://hypervolu.me/~guarracino/CPANG22/NA12878.20_1M-2M.30x.bam

samtools fastq -1 NA24385-20_1M-2M-30x_1.fq.gz -2 NA24385-20_1M-2M-30x_2.fq.gz NA12878.20_1M-2M.30x.bam 

The sample was used in the preparation of the 1000 Genomes results.

Map reads against the built graph:

vg map -d z -f NA24385-20_1M-2M-30x_1.fq.gz -f NA24385-20_1M-2M-30x_2.fq.gz > NA24385-20_1M-2M-30x.gam

Variant calling

Compute the read support:

vg pack -x z.xg -g NA24385-20_1M-2M-30x.gam -o NA24385-20_1M-2M-30x.pack -Q 5

Compute the snarls (using fewer threads with -t can reduce memory at the cost of increased runtime):

vg snarls z.xg > z.snarls

Call variants:

vg call z.xg -r z.snarls -k NA24385-20_1M-2M-30x.pack > NA24385-20_1M-2M-30x.vcf 

Back

Back to main page.