Molecular Evolution Tutorial

Jan 20, 2018 00:00 · 345 words · 2 minute read tutorial bioinformatics molecular evolution

This is a brief rundown of how to conduct tests for positive selection in sequence data using PAML and HYPHY

HYPHY

  • all files need for this tutorial are here

Pervasive positive selection in cone snail venom

1. Get the file Conus.01.fas

This is our translated alignment

alt tag

2. We will now do the FUBAR analysis * This is a site specific method for detecting selection at the amino acid level, helpful in arms race scenarios

3. Go to http://www.datamonkey.org/dataupload.php

4. Select choose file, select Conus.01.fas

  • Then, press Upload

5. If your data is no bueno, this is where it will let you know

  • It accepts fasta alignments and nexus files

  • Also listed here should be information about the alignments

  • If all is well, press Proceed to the analysis menu

6. Under Method select FUBAR

7. For now leave everything else alone and just press Run

  • If you had a nexus file, you could specify your own tree in Newick format

  • For now we’ll just let it use a generated neighbor joining tree

8. The results will look like the following This file is gone now

  • This is a heat map showing sites are under positive selection and negative selection

PAML

Serine/threonine-protein kinase gene family

  • This has been adapted from a tutorial by Romain Studer
  • In the evolution vertebrates, we would like to know if the branch leading to the Teleost fishes (genes A50 to A54)

You will need the following files

1. TF105351.Eut.3.phy

  • this is the alignment file

alt tag

2. TF105351.Eut.3.53876.tree

  • this is the newick tree with the branch of interest selected

alt tag

3. TF105351.Eut.3.53876.ctl

  • CodeML configuration file for alternative model

4. TF105351.Eut.3.53876.fixed.ctl

  • CodeML configuration file for null model

CODEML commands

codeml TF105351.Eut.3.53876.ctl
codeml TF105351.Eut.3.53876.fixed.ctl

Analyze results

Get liklihood values

grep lnL TF105351.Eut.3.53876.mlc
lnL(ntime: 41  np: 46):  -4707.209701      +0.000000

Liklihood value for alternative model is -4707.209701

grep lnL TF105351.Eut.3.53876.fixed.mlc
lnL(ntime: 41  np: 45):  -4710.222252      +0.000000

Liklihood value for null model is -4710.222252

ΔLRT = 2×(lnL1 - lnL0) = 2×(-4707.209701 - (-4710.222252)) = 6.02578

The degree of freedom is 1 (np1 - np0 = 46 - 45). p-value = 0.01098 (under χ²) => significant.

tweet Share