Contents
1 Citation 3
2 Authors 3
3 Overview 4
4 Installing PhyML 4
4.1 Sources and compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Installing PhyML on UNIX-like systems (including Mac OS) . . . . . . . . 4
4.3 Installing PhyML on Microsoft Windows . . . . . . . . . . . . . . . . . . . . 5
4.4 Installing the parallel version of PhyML . . . . . . . . . . . . . . . . . . . . 5
5 Program usage. 5
5.1 PHYLIP-like interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.1.1 Input Data sub-menu . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.1.2 Substitution model sub-menu . . . . . . . . . . . . . . . . . . . . . . 7
5.1.3 Tree searching sub-menu . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1.4 Branch support sub-menu . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Command-line interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Inputs / outputs. 12
6.1 Sequence formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.1.1 Gaps and ambiguous characters . . . . . . . . . . . . . . . . . . . . . 13
6.2 Tree format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.3 Multiple alignments and trees . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.4 Custom amino-acid rate model . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.5 Output files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7 Recommendations on program usage. 16
8 Frequently aske d questions 17
9 Acknowledgements 18
2
c
Copyright 1999 - 2008 by PhyML Development Team.
The software PhyML is provided “as is” without warranty of any kind. In no event shall the
authors or his employer be held responsible fo r any damage resulting from the use of this software,
including but not limited to the frustration that you may experience in using the package. The
program package a nd this documentation, are distributed free of charge fo r academic use only.
Permission is granted to c opy and use programs in the package provided no fee is charged for it
and provided that this copyright notice is not removed.
1 Citation
• “A simple, fast and accurate algorithm to estimate large phylogenies by maximum
likelihood” Guindon S ., Gascuel O . Systematic Biology 52(5):696-704
2 Authors
• St´ephane Guindon and Olivier Gascuel conceived the original PhyML algorithm.
• St´ephane Guindon, Wim Hordjik and Olivier Gascuel conceived the SPR-based tree
search algorithm.
• Maria Anisimova and Olivier Gascuel conceived the aLRT method for br anch sup-
port.
• St´ephane Guindon, Franck Lethiec, Jean-Francois Dufayard and Vincent Lefort im-
plemented PhyML.
• Jean-Francois Dufayard created the benchmark and implemented the tools th at are
used to check PhyML accuracy and performances.
• Vincent Lefort, St´ephane Guindon, Patrice Duroux and Olivier Gascuel conceived
and implemented PhyML web server.
• St´ephane Guindon wrote this document.
3
3 Overview
PhyML [1] is a software that estimates maximum likelihood phylogenies from alignments
of nucleotide or amino acid sequences. It pr ovides a wide range of options that were de-
signed to facilitate standard phylogenetic analyses. The main strengths of PhyML lies in
the large number of substitution mod els coupled to various options to search the space
of phylogenetic tree topologies, going from very fast and efficient methods to slower but
generally more accurate approaches. It also imp lements two method s to evaluate branch
supports in a sound statistical framework (the non-parametric bootstrap and the approx-
imate likelihood ratio test,)
PhyML was designed to pr ocess m oderate to large data sets. In theory, alignments
with up to 4,000 sequences 2,000,000 character-long can analyzed. In practice however,
the amount of memory required to process a data set is proportional of the product of
the number of sequences by their length. Hence, a large number of sequences can only be
processed provided that they are short. Also, PhyML can handle long sequences provided
that they are not numerous. With most standard personal computers, the “comfort zone”
for PhyML generally lies around 100-200 sequences less than 2,000 character long. For
larger data sets, we recommend using other software’s such as RAxML [2] or GARLI [3]
or Treefinder (
http://www.treefinder.de).
4 In stalling PhyML
4.1 Sources and compilation
The sources of the program are available free of charge by sending an e-mail to St´ephane
Guindon at
guindon@lirmm.fr or guindon@stat.auckland.ac.nz.
The compilation on UNIX-like systems is fairly s tandard. It is described in the ‘IN-
STALL’ file that comes w ith the sources. In a command-line window, go to the directory
that contains the sources and type:
> aclocal;
> autoconf -f;
> automake -f;
> ./configure;
> make;
Note – when PhyML is going to be used mostly of exclusively in batch mode, it is
preferable to turn on the batch mod e option in the Makefile. In order to do so, the fi le
Makefile.am needs to be modified: add -DBATCH to the line with DEFS=-DUNIX -D$(PROG)
-DDEBUG.
4.2 Installing PhyML on UNIX-like systems (including Mac OS)
Copy PhyML bin ary file in the directory you like. For the operating system to be able to
locate the p rogram, this directory must be specified in the global variable PATH. In order to
achieve this, you will have to add export PATH="/your
path/:$PATH" to the .bashrc or
the .bash
profile lo cated in your home directory (your path is the path to the directory
that contains PhyML binary).
4
4.3 Installing PhyML on Microsoft Windows
Copy the fi les phyml.exe and phyml.bat is the same directory. To launch PhyML, click
on the icon corresponding to phyml.bat. Clicking on the icon for phyml.exe works too
but the dimensions of the window will not fit PhyML interface.
4.4 Installing the parallel version of PhyML
Bo otstrap analysis can run on multiple processors. Each processor analyses one boot-
straped dataset. Therefore, th e computing time needed to perform N bootstrap replicates
is divided by the number of processors available.
This feature of PhyML relies on the MPI (Message Passing Interface) library. To
use it, your computer must have MPI installed on it. In case MPI is not installed, you
can dowload it from http://www.mcs.anl.gov/research/projects/mpich2/. Once MPI is
installed, a few modification of the fi le ‘Makefile.am’ must be applied. The relevant
section of this file and the instruction to add or remove the MPI option to PhyML are
printed below:
# Uncomment (i.e. remove the ‘#’ character at the begining of)
# the two lines below if you want to use MPI.
# Comment the two lines below if you don’t want to use MPI.
# CC=mpicc
# DEFS=-DUNIX -D$(PROG) -DDEBUG -DMPI
# Comment the line below if you want to use MPI.
# Uncomment the line below if you don’t want to use MPI.
DEFS=-DUNIX -D$(PROG) -DDEBUG
5 P rogram usage.
PhyML has two distinct user-interfaces. The first interface is probably the most popular.
It corresponds to a PHYLIP-like text interface that makes the choice of the options self-
explanatory (see Figure
1). The command-line interface is well-suited for people that are
familiar with PhyML options or for running PhyML in batch mode.
5.1 PHYLIP-like interface
The default is to use the PHYLIP-like text interface (Figure
1) by simply typing ‘phyml’
in a command-line window or by clicking on the PhyML icon (see Section
4.3). After
entering the name of the input sequ en ce file, a list of sub-menus helps the users to set up
the analysis. There are currently four distinct sub-menus:
1. Input Data: specify whether the input file contains amino-acid or nucleotide se-
quences. What the sequ en ce format is (see Section
6) and how many data sets
should be analysed.
2. Substitution Model: selection of the Markov model of substitution.
3. Tree Searching: selection of the tree topology searching algorithm.
5
评论0