Application of PSMC to MS-simulated data

This is a simple tutorial showing how to use PSMC to infer the population size history from full DNA sequences. We will apply the PSMC to simulated data.

We need:

The ms software.
The PSMC software.
Two python scripts (one for converting files and the other one for plotting the results)

Downloading and compiling softwares

Download and compile ms

Go to http://home.uchicago.edu/rhudson1/source/mksamples.html and download the file ms.tar.gz
Uncompress the file

tar -zxvf ./ms.tar

Compile ms

cd msdir
gcc -O3 -o ms ms.c streec.c rand1.c -lm

Have a look to the ms readme file for more details. The original paper can be viewed here

Download and compile psmc

Clone the github repository

  git clone https://github.com/lh3/psmc.git

Compile

cd psmc
make
cd utils
make

The model of the PSMC was published by [Li and Durbin][1] in 2011. For the analysis we will generate a dataset similar to the first dataset used in the Supplementary Materials of the Li and Durbin paper.

Download the python scripts

  git clone https://github.com/willyrv/ms-PSMC.git

Finaly, create a new folder and copy the binary files ./msdir/ms and ./psmc/psmc as long as the python scripts.

Simulating data and running the analysis

I will suppose you have all this file in a folder:

|- ms
|- psmc
|- ms2psmcfa.py
|- plot_results.py

You need to verify that the files have execution permissions. To give execution permissions to the files type:

  chmod +x ./*

1-) Simulate the data with ms. Run:

./ms 2 100 -t 30000 -r 6000 30000000 -eN 0.01 0.1 -eN 0.06 1 -eN  0.2 0.5 -eN 1 1 -eN 2 2 -p 8  > sim1.ms

Normally this takes less than 30 seconds and should produce one output file (here called “sim1.ms”). Now we will use the ms2psmcfa.py python script to convert this ms output into the input file of PSMC.

2-) Run:

  ./ms2psmcfa.py ./sim1.ms > sim1.psmcfa

This creates the file “sim1.psmcfa” which is the input of PSMC. (This step is also fast, less than 30 seconds). Now we can run the PSMC analysis. For explanations about the parameters, see the README file of PSMC.

3-) Run:

  ./psmc -N25 -t15 -r5 -p 4+25*2+4+6 -o dem_history_sim1.psmc ./sim1.psmcfa

This is a bit long (about 3 hours). At the end, you should get a file like this

If you don’t want to wait, you can download the result file

  wget http://wrodrigu.perso.math.cnrs.fr/GTPB/tutorial_psmc/result_files/dem_history_sim1.psmc

Once the psmc analysis is finished, you can plot the results.

4-) Run:

  ./plot_results.py

And you will finally get a figure with the inferred demographic history (dashed blue line) and the real history (continuous black line) corresponding to the ms command. results

You can also plot the psmc results by using the perl script utils/psmc_plot.pl wich comes with the PSMC software.

[1] http://www.nature.com/articles/nature10231 "Heng Li & Richard Durbin, Inference of human population history from individual whole-genome sequences, Nature 475, 493–496 (28 July 2011) doi:10.1038/nature102312011"

Back

Back to Day #4.
Back to first page.