Phylip free download for windows free –

Looking for:

Phylogeny Programs.PHYLIP – Free Software Directory

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Can process a set of trees in a PHYLIP or NEXUS format tree file. TreeStat is an software that can process a set of trees in a PHYLIP or NEXUS format tree file and calculate a number of summary statistics for each. Category: Miscellaneous; Developer: – Download – Free. 12 rows · Apr 16,  · That is, if you can, because security precautions on modern Windows . PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executables. These Web pages, by Joe Felsenstein of the Department of Genome Sciences and the Department of Biology at the University of Washington, contain information on PHYLIP and ways to transfer the executables, source code and .
 
 

 

Phylip free download for windows free.Index of /phylip/download

 

In Philip, you just enter the number of megabytes you want free and it calculates exactly how large its temporary file needs to be. Fill up your hard drive. What’s new in Philip 2.

Load comments. No, use as outgroup species 1 T Use Threshold parsimony? No, use ordinary parsimony N Use Transversion parsimony? No, count all steps W Sites weighted?

No M Analyze multiple data sets? No I Input sequences interleaved? Yes Y to accept these or type the letter for one to change If you want to accept the default settings they are shown in the above case you can simply type Y followed by pressing on the Enter key. If you want to change any of the options, you should type the letter shown to the left of its entry in the menu. For example, to set a threshold type T. Lower-case letters will also work.

For many of the options the program will ask for supplementary information, such as the value of the threshold. Note the Terminal type entry, which you will find on all menus. It allows you to specify which type of terminal your screen is.

Choosing zero 0 toggles among these three options in cyclical order, changing each time the 0 option is chosen. If one of them is right for your terminal the screen will be cleared before the menu is displayed. If none works, the none option should probably be chosen. The programs should start with a terminal option appropriate for your computer, but if they do not, you can change the terminal type manually.

This is particularly important in program Retree where a tree is displayed on the screen – if the terminal type is set to the wrong value, the tree can look very strange. The other numbered options control which information the program will display on your screen or on the output files.

The option to Print indications of progress of run will show information such as the names of the species as they are successively added to the tree, and the progress of rearrangements. You will usually want to see these as reassurance that the program is running and to help you estimate how long it will take. But if you are running the program “in background” as can be done on multitasking and multiuser systems, and do not have the program running in its own window, you may want to turn this option off so that it does not disturb your use of the computer while the program is running.

Note also menu option 3, “Print out tree”. This can be useful when you are running many data sets, and will be using the resulting trees from the output tree file. It may be helpful to turn off the printing out of the trees in that case, particularly if those files would be too big.

The Output File Most of the programs write their output onto a file called usually outfile , and a representation of the trees found onto a file called outtree. The exact contents of the output file vary from program to program and also depend on which menu options you have selected. For many programs, if you select all possible output information, the output will consist of 1 the name of the program and its version number, 2 some of the input information printed out, and 3 a series of phylogenies, some with associated information indicating how much change there was in each character or on each part of the tree.

The numbers at the forks are arbitrary and are used if present merely to identify the forks. For many of the programs the tree produced is unrooted. Rooted and unrooted trees are printed in nearly the same form, but the unrooted ones are accompanied by the warning message: remember: this is an unrooted tree! Mathematicians still call an unrooted tree a tree, though some systematists unfortunately use the term “network” for an unrooted tree.

This conflicts with standard mathematical usage, which reserves the name “network” for a completely different kind of graph. The root of this tree could be anywhere, say on the line leading immediately to Mouse. It is important also to realize that the lengths of the segments of the printed tree may not be significant: some may actually represent branches of zero length, in the sense that there is no evidence that those branches are nonzero in length.

Some of the diagrams of trees attempt to print branches approximately proportional to estimated branch lengths, while in others the lengths are purely conventional and are presented just to make the topology visible. You will have to look closely at the documentation that accompanies each program to see what it presents and what is known about the lengths of the branches on the tree. The above tree attempts to represent branch lengths approximately in the diagram.

But even in those cases, some of the smaller branches are likely to be artificially lengthened to make the tree topology clearer. When a tree has branch lengths, it will be accompanied by a table showing for each branch the numbers or names of the nodes at each end of the branch, and the length of that branch.

For the first tree shown above, the corresponding table is: Between And Length Approx. Confidence Limits 1 Bovine 0. Similar tables exist in distance matrix and likelihood programs, as well as in the parsimony programs Dnapars and Pars. Some of the parsimony programs in the package can print out a table of the number of steps that different characters or sites require on the tree. This table may not be obvious at first. Thus site 23 is column “3” of row “20” and has 1 step in this case.

There are many other kinds of information that can appear in the output file, They vary from program to program, and we leave their description to the documentation files for the specific programs. The Tree File In output from most programs, a representation of the tree is also written into the tree file outtree. The tree is specified by nested pairs of parentheses, enclosing names and separated by commas. We will describe how this works below.

Trailing blanks in the name may be omitted. The pattern of the parentheses indicates the pattern of the tree by having each pair of parentheses enclose all the members of a monophyletic group. The tree file could look like this: Mouse,Bovine , Gibbon, Orang, Gorilla, Chimp,Human ; In this tree the first fork separates the lineage leading to Mouse and Bovine from the lineage leading to the rest. Within the latter group there is a fork separating Gibbon from the rest, and so on.

The entire tree is enclosed in an outermost pair of parentheses. The tree ends with a semicolon. In some programs such as Dnaml, Fitch, and Contml, the tree will be unrooted.

The single three-way split corresponds to one of the interior nodes of the unrooted tree it can be any interior node of the tree. The remaining forks are encountered as you move out from that first node. In newer programs, some are able to tolerate these other forks being multifurcations multi-way splits. You should check the documentation files for the particular programs you are using to see in which of these forms you can expect the user tree to be in.

Note that many of the programs that actually estimate an unrooted tree such as Dnapars produce trees in the treefile in rooted form! This is done for reasons of arbitrary internal bookkeeping. The placement of the root is arbitrary. We are working toward having all programs be able to read all trees, whether rooted or unrooted, multifurcating or bifurcating, and having them do the right thing with them.

But this is a long-term goal and it is not yet achieved. For programs that infer branch lengths, these are given in the trees in the tree file as real numbers following a colon, and placed immediately after the group descended from that branch. Here is a typical tree with branch lengths: cat These representations of trees are a subset of the standard adopted on 24 June at the annual meetings of the Society for the Study of Evolution by an informal committee its final session in Newick’s lobster restaurant – hence its name, the Newick standard consisting of Wayne Maddison author of MacClade , David Swofford PAUP , F.

Day, and me. This standard is a generalization of PHYLIP’s format, itself based on a well-known representation of trees in terms of parenthesis patterns which is due to the famous mathematician Arthur Cayley, and which has been around for over a century. The standard is now employed by most phylogeny computer programs but unfortunately has yet to be decribed in a formal published description.

Options are selected in the menu. Common options in the menu A number of the options from the menu, the U User tree , G Global , J Jumble , O Outgroup , W Weights , T Threshold , M multiple data sets , and the tree output options, are used so widely that it is best to discuss them in this document. The U User tree option. This option toggles between the default setting, which allows the program to search for the best tree, and the User tree setting, which reads a tree or trees “user trees” from the input tree file and evaluates them.

The input tree file’s default name is intree. In many cases the programs will also tolerate having the trees be preceded by a line giving the number of trees: Alligator,Bear , Cow, Dog,Elephant ,Ferret ; Alligator,Bear , Cow,Dog ,Elephant ,Ferret ; Alligator,Bear , Cow,Dog , Elephant,Ferret ; An initial line with the number of trees was formerly required, but this now can be omitted.

Some programs require rooted trees, some unrooted trees, and some can handle multifurcating trees. You should read the documentation for the particular program to find out which it requires.

Program Retree can be used to convert trees among these forms on saving a tree from Retree, you are asked whether you want it to be rooted or unrooted. In using the user tree option, check the pattern of parentheses carefully. The programs do not always detect whether the tree makes sense, and if it does not there will probably be a crash hopefully, but not inevitably, with an error message indicating the nature of the problem.

Trees written out by programs are typically in the proper form. The G Global option. In the programs which construct trees except for Neighbor, the ” In most of these programs the rearrangements are automatically global, which in this case means that subtrees will be removed from the tree and put back on in all possible ways so as to have a better chance of finding a better tree.

Since this can be time consuming it roughly triples the time taken for a run it is left as an option in some of the programs, specifically Contml, Fitch, Dnaml and Proml. In these programs the G menu option toggles between the default of local rearrangement and global rearrangement.

The rearrangements are explained more below. The J Jumble option. In most of the tree construction programs except for the ” In these programs J option enables you to tell the program to use a random number generator to choose the input order of species.

This option is toggled on and off by selecting option J in the menu. The program will then prompt you for a “seed” for the random number generator. Each different seed leads to a different sequence of addition of species. By simply changing the random number seed and re-running the programs one can look for other, and better trees.

If the seed entered is not odd, the program will not proceed, but will prompt for another seed. The Jumble option also causes the program to ask you how many times you want to restart the process. If you answer 10, the program will try ten different orders of species in constructing the trees, and the results printed out will reflect this entire search process that is, the best trees found among all 10 runs will be printed out, not the best trees from each individual run.

Some people have asked what are good values of the random number seed. The random number seed is used to start a process of choosing “random” actually pseudorandom numbers, which behave as if they were unpredictably randomly chosen between 0 and 2 32 -1 which is 4,,, You could put in the number and find that the next random number was ,, However if you re-use a random number seed, the sequence of random numbers that result will be the same as before, resulting in exactly the same series of choices, which may not be what you want.

The O Outgroup option. This specifies which species is to have the root of the tree be on the line leading to it. For example, if the outgroup is a species “Mouse” then the root of the tree will be placed in the middle of the branch which is connected to this species, with Mouse branching off on one side of the root and the lineage leading to the rest of the tree on the other.

This option is toggled on and off by choosing O in the menu the alphabetic character O , not the digit 0. When it is on, the program will then prompt for the number of the outgroup the species being taken in the numerical order that they occur in the input file. Responding by typing 6 and then an Enter character indicates that the sixth species in the data the 6th in the first set of data if there are multiple data sets is taken as the outgroup.

Outgroup-rooting will not be attempted if the data have already established a root for the tree from some other consideration, and may not be if it is a user-defined tree, despite your invoking the option. Thus programs such as Dollop that produce only rooted trees do not allow the Outgroup option. It is also not available in Kitsch, Dnamlk, Promlk or Clique.

When it is used, the tree as printed out is still listed as being an unrooted tree, though the outgroup is connected to the bottommost node so that it is easy to visually convert the tree into rooted form. The T Threshold option. This sets a threshold forn the parsimony programs such that if the number of steps counted in a character is higher than the threshold, it will be taken to be the threshold value rather than the actual number of steps.

The default is a threshold so high that it will never be surpassed in which case the steps whill simply be counted. The T menu option toggles on and off asking the user to supply a threshold.

The use of thresholds to obtain methods intermediate between parsimony and compatibility methods is described in my b paper. When the T option is in force, the program will prompt for the numerical threshold value. This will be a positive real number greater than 1. In programs Dollop, Dolmove, and Dolpenny the threshold should never be 0. The T option is an important and underutilized one: it is, for example, the only way in this package except for program Dnacomp to do a compatibility analysis when there are missing data.

It is a method of de-weighting characters that evolve rapidly. I wish more people were aware of its properties. The M Multiple data sets option. In menu programs there is an M menu option which allows one to toggle on the multiple data sets option. The program will ask you how many data sets it should expect. The data sets have the same format as the first data set.

Using the program Seqboot one can take any DNA, protein, restriction sites, gene frequency or binary character data set and make multiple data sets by bootstrapping.

Trees can be produced for all of these using the M option. They will be written on the tree output file if that option is left in force. Then the program Consense can be used with that tree file as its input file. The result is a majority rule consensus tree which can be used to make confidence intervals. The present version of the package allows, with the use of Seqboot and Consense and the M option, bootstrapping of many of the methods in the package. Programs Dnaml, Dnapars and Pars can also take multiple weights instead of multiple data sets.

They can then do bootstrapping by reading in one data set, together with a file of weights that show how the characters or sites are reweighted in each bootstrap sample. Thus a site that is omitted in a bootstrap sample has effectively been given weight 0, while a site that has been duplicated has effectively been given weight 2. Seqboot has a menu selection to produce the file of weights information automatically, instead of producing a file of multiple data sets. It can be renamed and used as the input weights file.

The W Weights option. This signals the program that, in addition to the data set, you want to read in a series of weights that tell how many times each character is to be counted. If the weight for a character is zero 0 then that character is in effect to be omitted when the tree is evaluated.

If it is 1 the character is to be counted once. Some programs allow weights greater than 1 as well. These have the effect that the character is counted as if it were present that many times, so that a weight of 4 means that the character is counted 4 times. The values give weights 0 through 9, and the values A-Z give weights 10 through By use of the weights we can give overwhelming weight to some characters, and drop others from the analysis.

In the molecular sequence programs only two values of the weights, 0 or 1 are allowed. The weights are used to analyze subsets of the characters, and also can be used for resampling of the data as in bootstrap and jackknife resampling. For those programs that allow weights to be greater than 1, they can also be used to emphasize information from some characters more strongly than others.

Of course, you must have some rationale for doing this. The weights are provided as a sequence of digits. Thus they might be The weights are to be provided in an input file whose default name is weights.

The weights in it are a simple string of digits. Blanks in the weightfile are skipped over and ignored, and the weights can continue to a new line. In programs such as Seqboot that can also output a file of weights, the input weights have a default file name of inweights , and the output file name has a default file name of outweights.

Weights can be used to analyze different subsets of characters by weighting the rest as zero. Alternatively, in the discrete characters programs they can be used to force a certain group to appear on the phylogeny in effect confining consideration to only phylogenies containing that group.

This is done by adding an imaginary character that has 1 ‘s for the members of the group, and 0 ‘s for all the other species. That imaginary character is then given the highest weight possible: the result will be that any phylogeny that does not contain that group will be penalized by such a heavy amount that it will not except in the most unusual circumstances be considered.

Of course, the new character brings extra steps to the tree, but the number of these can be calculated in advance and subtracted out of the total when reporting the results.

This use of weights is an important one, and one sadly ignored by many users who could profit from it. In the case of molecular sequences we cannot use weights this way, so that to force a given group to appear we have to add a large extra segment of sites to the molecule, with say A’s for that group and C’s for every other species. The option to write out the trees into a tree file.

This specifies that you want the program to write out the tree not only on its usual output, but also onto a file in nested-parenthesis notation as described above. This option is sufficiently useful that it is turned on by default in all programs that allow it. You can optionally turn it off if you wish, by typing the appropriate number from the menu it varies from program to program.

This option is useful for creating tree files that can be directly read into the programs, including the consensus tree and tree distance programs, and the tree plotting programs.

The output tree file has a default name of outtree. The 0 terminal type option. This is the digit 0 , not the alphabetic character O.

This affects the ability of the programs to clear the screen when they display their menus, and the graphics characters used to display trees in the programs Dnamove, Move, Dolmove, and Retree. The Algorithm for Constructing Trees All of the programs except Factor, Dnadist, Gendist, Dnainvar, Seqboot, Contrast, Retree, and the plotting and consensus tree programs act to construct an estimate of a phylogeny.

Move, Dolmove, and Dnamove let you construct it yourself by hand. All of the rest but Neighbor, the ” They are trying to minimize or maximize some quantity over the space of all possible evolutionary trees. Each program contains a part that, given the topology of the tree, evaluates the quantity that is being minimized or maximized. The straightforward approach would be to evaluate all possible tree topologies one after another and pick the one which, according to the criterion being used, is best.

This would not be possible for more than a small number of species, since the number of possible tree topologies is enormous. A review of the literature on the counting of evolutionary trees will be found one of my papers Felsenstein, a and in my book Felsenstein, , chapter 3.

Since we cannot search all topologies, these programs are not guaranteed to always find the best tree, although they seem to do quite well in practice. The strategy they employ is as follows: the species are taken in the order in which they appear in the input file. The first two in some programs the first three are taken and a tree constructed containing only those.

There is only one possible topology for this tree. Then the next species is taken, and we consider where it might be added to the tree. If the initial tree is say a rooted tree with two species and we want the resulting three-species tree to be a bifurcating tree, there are only three places where we could add the third species. Each of these is tried, and each time the resulting tree is evaluated according to the criterion.

The best one is chosen to be the basis for further operations. Now we consider adding the fourth species, again at each of the five possible places that would result in a bifurcating tree. Again, the best of these is accepted. This is usually known as the Sequential Addition strategy. Local rearrangements The process continues in this manner, with one important exception.

After each species is added, and before the next is added, a number of rearrangements of the tree are tried, in an effort to improve it. The algorithms move through the tree, making all possible local rearrangements of the tree. A local rearrangement involves an internal segment of the tree in the following manner.

Each time a local rearrangement is successful in finding a better tree, the new arrangement is accepted. The phase of local rearrangements does not end until the program can traverse the entire tree, attempting local rearrangements, without finding any that improve the tree. This strategy of adding species and making local rearrangements will look at about n-1 x 2n-3 different topologies, though if rearrangements are frequently successful the number may be larger. I have been describing the strategy when rooted trees are being considered.

For unrooted trees there is a precisely similar strategy, though the first tree constructed may be a three-species tree and the rearrangements may not start until after the addition of the fifth species. Though we are not guaranteed to have found the best tree topology, we are guaranteed that no nearby topology i.

In this sense we have reached a local optimum of our criterion. Note that the whole process is dependent on the order in which the species are present in the input file. We can try to find a different and better solution by reordering the species in the input file and running the program again or, more easily, by using the J option. If none of these attempts finds a better solution, then we have some indication that we may have found the best topology, though we can never be certain of this.

Note also that a new topology is never accepted unless it is better than the previous one, so that the rearrangement process can never fall into an endless loop. This is also the way ties in our criterion are resolved, namely by sticking with the tree found first. However, the tree construction programs other than Clique, Contml, Fitch, and Dnaml do keep a record of all trees found that are tied with the best one found.

This gives you some immediate idea of which parts of the tree can be altered without affecting the quality of the result. In the others it automatically applies. When it is present there is an additional stage to the search for the best tree. Each possible subtree is removed from the tree from the tree and added back in all possible places. This process continues until all subtrees can be removed and added again without any improvement in the tree.

The purpose of this extra rearrangement is to make it less likely that one or more a species gets “stuck” in a suboptimal region of the space of all possible trees.

The use of global optimization results in approximately a tripling 3 x of the run-time, which is why I have left it as an option in some of the slower programs. My book Felsenstein, , chapter 4 contains a review of work on these and other rearrangements and search methods.

Infer phylogenies in an effective manner by turning to this comprehensive software solution that packs several tools to simplify your projects. Load comments. Trees are drawn in an unrooted way, that is, using a circular shape, Category: Miscellaneous Tools Developer: pbil. TreeStat v. Category: Miscellaneous Developer: tree. SeqVerter v. GeneStudio v. UGENE v. Category: CAD Developer: ugene. Geneious Basic v.

DTscore v. Category: Miscellaneous Developer: lirmm. Pages : 1 2 Free. Newest Reviews Project Timer Tenda Nov 28, Projects required to produce bill so that the employer can see how much time is spent and how

 
 

Phylip free download for windows free

 
 

It is worth mentioning that in order to use text documents from the PC, users must save them as flast ASCII or Text Only formats, as proprietary formats don’t work at all when you try to import them. Most of the tools attempt to identify data in a document called “infile. Infer phylogenies in an effective manner by turning to this comprehensive software solution that packs several tools to simplify your projects.

Load comments. All rights reserved. The programs can be terminated by typing control-C press down the “control” key in the lower-left corner of the keyboard and type “c”.

It is also possible to run the executables from within a Terminal window by typing the program name, but this is a little harder. You will find the Terminal utility available in the Utilities folder in the Applications folder. You do need to have links made in the exe folder to the programs.

This can be done the first time you need them, by entering the exe folder and opening a Terminal window, and then typing source linkmac. This creates the proper links, and thereafter you do not need to do this again. The programs can be run by typing their names in a Terminal window whose current working directory is exe The programs work well this way, though the programs Drawgram and Drawtree may be slow to open and close plotting windows.

The programs can be terminated by typing control-C or by closing the Terminal window by using the red button in the upper-left corner of the window. One problem we have often encountered using Mac OS X is that it is possible for data files to have the wrong kind of characters at the ends of their lines. This can happen with files transferred from other operating systems or files produced in some word processors.

It results in segmentation-fault or memory errors. If you encounter these, check this possibility carefully. You can find it with the command locate lsregister. Running the programs on a Unix or Linux system. Type the name of the program in lower-case letters such as dnaml.

To terminate the program while it is running, type Control-C which means to press down on the Ctrl key while typing the letter C. On some systems you may need to type.

This is mostly needed if the user’s PATH does not include their current directory, something which is often done as a security precaution.

Running the programs on a Macintosh with Mac OS 8 or 9 deprecated We no longer produce and distribute Mac OS 8 and Mac OS 9 executables of the Phylip programs, as we no longer have access to these operating systems to produce and test them. Once you have the executables, you may follow the directions below. Double-click on the icon for the program. A window should open. Further dialog with the program occurs by typing on the keyboard in response to what you see in the window. The programs can be terminated by using the mouse to open the File menu in the upper-left corner of the program’s window area and then select Quit.

Alternatively, you can use the Command-Q key combination. When you use Quit, the program will ask you whether you want to save a file whose name is the program name often followed by. This file is simply a record of everything that displayed on the program window, and you usually will not want to save it. Pressing the Enter key or selecting the Do Not Save button with the mouse will keep this from being saved. If you encounter memory limitations on a Mac OS 8 or 9 Macintosh, and determine that this is not due to a problem with the format of the input file, as it often will be, you may be able to solve it by raising the limits of the stack and heap sizes of the program.

To do this click on the program and then select Get Info from the Finder File menu. This will open a window which can be made to show the memory limits of the program. These can be changed by selecting them and typing in larger numbers. This may relieve nagging memory problems. If it does not, consult your local documentation and suspect problems with your input file format.

Running the Drawgram and Drawtree Java interfaces With version 3. Looking at available options, it seemed best to use Java to construct GUI interfaces, as this could be done in a reasonably compatible way across all three major platforms. There are disadvantages too — to get full compatibility we need to ask users to download the most recent available Java from its maker, Oracle.

That is not difficult but is a tiresome extra step. Oracle owns Java, and Java is not public-source, but there seems to be no sign that Oracle is going to make Java runtime machinery unavailable or charge for it. So for these two platforms you will need to download Oracle Java. We will give you instructions for that below. The work you do to put a recent version of Oracle Java on your system will make using version 4. For people who use Drawgram or Drawtree in a “pipeline” run by shell scripts, there should be no interruption in your ability to do that.

The current C code for those programs can either be called by the Java GUI or be run from a command line or a shellscript for which see below. Almost all of the features of Drawgram and Drawtree are available from their character-mode menu when run that way, except for the interactive previewing of plots. You can run them by clicking on their icons. Detailed instructions for using the interfaces are given in the general documentation file for tree-drawing programs draw. Installing a recent version of Oracle Java To run the interactive interfaces of the tree-drawing programs Drawgram and Drawtree, you need to have an appropriate version of Java installed on your computer.

If you have Java installed, you should test whether it is an appropriate version by trying to run Drawgram or Drawtree for this you will need an input tree file present as well. Is it likely that you have a compatible Java on your system? On Windows systems no Java implementation is installed by default. You can download a recent Oracle Java on your Windows system by using this link and following the instructions there.

On some Linux systems there are Java installations which are not compatible with our Java interfaces. This is the result of licensing issues. You can remedy the situation by downloading a recent Oracle Java version and installing it: On Debian-based Linux systems such as Ubuntu and Linux Mint, you can download Java from this link and install it. If you do not have administrator privileges on the Linux system, you can install it in your own folders.

Once a useable version of Java is installed, you do not have to repeat the installation every time you run one of the programs Drawgram or Drawtree. Running the programs on a Windows machine. A window should open with a menu in it. The programs can be terminated either by typing Control-C which means to press down on the Ctrl key while typing the letter C , or by using the mouse to open the File menu in the upper-left corner of the program’s window area and then select Quit.

The tree-drawing programs Drawtree and Drawgram do allow use of the mouse to select some options. The programs open a window for their menus. This window may be too small for your tastes. They can be resized by tugging on the lower-right corner of the window.

In addition, the font may be too small. One of its tab options allows you to change the font and size of the print. I prefer large font sizes such as 16x The programs can also be run in a Command Prompt window under Windows, in much the same way as they were under the MSDOS operating system, which is what the Command Prompt window emulates.

Command Prompt windows can be open by choosing that option in the Accessories menu which is in the All Programs menu. Once in the Command Prompt window, make sure that you are in the correct folder, using the cd command as needed to find the folder where the executable PHYLIP programs are. Then type the name of the program that you want to use in lower-case letters such as dnaml.

Running the programs in background or under control of a command file In running the programs, you may sometimes want to put them in background so you can proceed with other work.

On systems with a windowing environment they can be put in their own window, and commands like the Unix and Linux nice command used to make them have lower priority so that they do not interfere with interactive applications in other windows. This part of the discussion will assume either a Windows system or a Unix or Linux system. I will note when the commands work on one of these systems but not the other.

Mac OS X is actually Unix surprise! The Terminal utility can be found in the Utilities folder which is inside the Applications folder. You will have to put all the responses to the interactive menu of the program into a file and tell the background job to take its input from that file we cover this below. A command file can either be invoked by clicking on its icon or by typing its name from a Command Prompt window. The a file of commands must have a name ending in.

You can run the batch file from a Command window by typing its name such as foofile without the. Below you will find a separate example for Windows.

If you are using Windows you should read that section instead. Suppose you want to run Dnaml in a background, taking its input data from a file called sequences. The file input need only contain two lines: sequences. The usual output file and tree file will also be created by this run keep that in mind as if you run any other PHYLIP program from the same directory while this one is running in background you may overwrite the output file from one program with that from the other!

If you have problems with creating output files that are too large, you may want to explore carefully the turning off of options in the programs you run. An example Windows If you have a Windows system and want to run Dnaml in a background, taking its input data from a file called sequences. This “batch file” that has commands and has its name end in. Alternatively, you can open a Command Prompt window yourself. It will be found in the All Programs menu, as one of the options under Accessories.

Make sure that after it opens, you tell it to change its working directory to the one that has the batch file in it. The batch file with this command runs the program with input responses coming from input and interactive output being put into file screenout. The usual output file and tree file will also be created by this run keep that in mind as, if you run any other PHYLIP program from the same directory while this one is running in background, you may overwrite the output file from one program with that from the other!

Testing for existence of files Note also that when PHYLIP programs attempt to open a new output file such as outfile , outtree , or plotfile , if they see a file of that name already in existence they will ask you if you want to overwrite it, and offer alternatives including writing to another file, appending information to that file, or quitting the program without writing to he file.

This means that in writing batch files it is important to know whether there will be a prompt of this sort. You must know in advance whether the file will exist. You may want to put in your batch file a command that tests for the existence of a pre-existing output file and if so, removes it, such as these commands in Unix, Linux, or Mac OS X: if test -e fubarfile then rm fubarfile fi You might even want to put in a command that creates a file of that name, so that you can be sure it is there!

Either way, you will then know whether to put into your file of keyboard responses the proper response to the inquiry about overwriting that output file. Offhand, I do not know how to test for the existence of files in Windows, but I suspect that there is a way. Prototyping keyboard response files Making the proper files of keyboard responses for use with command files is most easily done if you prototype the process by simply running the program and keeping a careful record of the keyboard responses that you need to give to get the program to run properly.

Then create a file in an editor and type those keyboard responses into it. Thus if the program requires that you answer a question about what to do with the output file with a keyboard response of R, then wants you to type a menu selection of U to have it use a User tree , then wants you to answer Y to end the menu, and another R to tell it to replace the output file, you would have the file of keyboard responses be R U Y R Since when you run the program interactively, each keyboard response is ended by pressing the Enter key on your keyboard, in the file of keyboard responses you must end each line after typing the appropriate character.

Testing the keyboard responses with an interactive run will be essential to having batch runs succeed. You can use a word processor or text editor to prepare them yourself, or you can use a program that produces a PHYLIP-format output.

With the 3. Within this TestData directory there is a subdirectory that has the name of the program for example contrast and within that there are the files contrastinfile. If you look at the Contrast documentation you can see infile , intree , and outfile mentioned in the example. Many word processors such as Microsoft Word save their files in a format that contains unprintable characters, unless you tell them not to.

In the Microsoft Word family of word processors, the first time you edit a file, when you go to Save in the File menu, the file the program will instead do a Save As function, and ask you in what format you want the file to be written. If you are using Microsoft Word, chose Plain Text. The settings that start with Western European also should work. None of the other encodings are likely to work. Do not chose Unicode Text Document. Once that is done, TextEdit also has a checkbox in the Save As window that defaults to providing a.

Save As also may have a check box that defaults to hiding the three-letter extension of the file, so that when the file is saved as say foofile. It is best to uncheck that box. For these word processors, the next time you edit the same file, using Save , the program should use those settings without asking you.

If you have some trouble getting an input file that the programs can read, look into whether you properly set these options. This can be usually be done by using the Save As choice in the File menu and making the right settings. Text editors such as the vi and emacs editors on Unix and Linux and available on Mac OS X too , or the pico editor that comes with the pine mailer program, produce their files in Text Only format and should not cause any trouble.

The format of the input files is discussed below, and you should also read the other PHYLIP documentation relevant to the particular type of data that you are using, and the particular programs you want to run, as there will be more details there.

The programs interact with the user by presenting a menu. Aside from the user’s choices from the menu, they read all other input from files. These files have default names. The program will try to find a file of that name – if it does not, it will ask the user to supply the name of that file.

Input data such as DNA sequences comes from a file whose default name is infile. If the user supplies a tree, this is in a file whose default name is intree. Values of weights for the characters are in weights , and the tree plotting program need some digitized fonts which are supplied in fontfile all these are default names.

Where the files are When you run a program, you are in a current folder. If you run it by clicking on an icon, the folder is the one that has the icon. If you run it by typing the name of the program, the folder is the current folder when you do that.

The program will look for default files such as infile and intree in that folder. When it writes files, their default locations are also in the current folder. The program need not actually be in the current folder. An icon can sometimes be a link to a program located elsewhere. The operating system maintains a default path for your account, which is a series of names of folders. When you type the name of a program, the operating system will look in that series of folders until it finds the program, and then run it.

But in all of these cases, the input and output files will, by default, be in the current folder, even if the program is located in some other folder. Users can change where the input files are, or where the output files go.

If no file called infile is found in the current folder, you will be asked to type the name of the file. A similar process occurs when the program cannot find file intree. When the program starts to write an output file, such as outfile , a similar series of events happens, with one important difference.

It is when a file outfile already exists in the current folder that the user will be asked what to do. In the case of input files, it was when they did not exist that the user is asked what to do. You will be given the opportunity to Replace the file, Append to the file, write to a different File, or Quit. Understanding which folder is the current folder, and whether there are files named infile , intree , outfile , or outtree there, is crucial to successfully running PHYLIP programs, and making sure that they analyze the correct data set and write their files in the right place.

Data file format I have tried to adhere to a rather stereotyped input and output format. These are in free format, separated by blanks.

The information for each species follows, starting with a ten-character species name which can include blanks and some punctuation marks , and continuing with the characters for that species. The name should be on the same line as the first character of the data for that species. I will use the term “species” for the tips of the trees, recognizing that in some cases these will actually be populations or individual gene sequences. The name should be ten characters in length, and either terminated by a Tab character or filled out to the full ten characters by blanks if shorter.

If you forget to extend the names to ten characters in length by blanks, and do not terminate them with a Tab character, the program will get out of synchronization with the contents of the data file, and an error message will result.

A Tab character that terminates a name will not be taken as part of the name that is read; the name will then automatically be filled with blanks to a total length of 10 characters. In the discrete-character programs, DNA sequence programs and protein sequence programs the characters are each a single letter or digit, sometimes separated by blanks.

In the continuous-characters programs they are real numbers with decimal points, separated by blanks: Latimeria 2. The molecular sequence programs can take the data in “aligned” or “interleaved” format, in which we first have some lines giving the first part of each of the sequences, then some lines giving the next part of each, and so on.

The blank line which separates the two groups of lines the ones containing sites and ones containing sites may or may not be present. It is important that the number of sites in each group be the same for all species i. Alternatively, an option can be selected in the menu to take the data in “sequential” format, with all of the data for the first species, then all of the characters for the next species, and so on. This is also the way that the discrete characters programs and the gene frequencies and quantitative characters programs want to read the data.

They do not allow the interleaved format. In the sequential format, the character data can run on to a new line at any time except in the middle of a species name or, in the case of continuous character and distance matrix programs where you cannot go to a new line in the middle of a real number. Thus it is legal to have: Archaeopt or even: Archaeopt though note that the full ten characters of the species name must then be present: in the above case there must be a blank after the “t”.

In all cases it is possible to put internal blanks between any of the character values, so that Archaeopt is allowed. Note that you can convert molecular sequence data between the interleaved and the sequential data formats by using the Rewrite option of the J menu item in Seqboot. If you make an error in the format of the input file, the programs can sometimes detect that they have been fed an illegal character or illegal numerical value and issue an error message such as BAD CHARACTER STATE: , often printing out the bad value, and sometimes the number of the species and character in which it occurred.

The program will then stop shortly after. One of the things which can lead to a bad value is the omission of something earlier in the file, or the insertion of something superfluous, which cause the reading of the file to get out of synchronization.

The program then starts reading things it didn’t expect, and concludes that they are in error. So if you see this error message, you may also want to look for the earlier problem that may have led to the program becoming confused about what it is reading.

Some options are described below, but you should also read the documentation for the groups of the programs and for the individual programs. The Menu The menu is straightforward. It typically looks like this this one is for Dnapars : DNA parsimony algorithm, version 3. Yes S Search option?

More thorough search V Number of trees to save? Use input order O Outgroup root? No, use as outgroup species 1 T Use Threshold parsimony?

No, use ordinary parsimony N Use Transversion parsimony? No, count all steps W Sites weighted? No M Analyze multiple data sets? No I Input sequences interleaved? Yes Y to accept these or type the letter for one to change If you want to accept the default settings they are shown in the above case you can simply type Y followed by pressing on the Enter key.

If you want to change any of the options, you should type the letter shown to the left of its entry in the menu. For example, to set a threshold type T. Lower-case letters will also work. For many of the options the program will ask for supplementary information, such as the value of the threshold.

Note the Terminal type entry, which you will find on all menus. It allows you to specify which type of terminal your screen is.

Choosing zero 0 toggles among these three options in cyclical order, changing each time the 0 option is chosen. If one of them is right for your terminal the screen will be cleared before the menu is displayed. If none works, the none option should probably be chosen. The programs should start with a terminal option appropriate for your computer, but if they do not, you can change the terminal type manually.

This is particularly important in program Retree where a tree is displayed on the screen – if the terminal type is set to the wrong value, the tree can look very strange. The other numbered options control which information the program will display on your screen or on the output files. The option to Print indications of progress of run will show information such as the names of the species as they are successively added to the tree, and the progress of rearrangements.

You will usually want to see these as reassurance that the program is running and to help you estimate how long it will take. But if you are running the program “in background” as can be done on multitasking and multiuser systems, and do not have the program running in its own window, you may want to turn this option off so that it does not disturb your use of the computer while the program is running.

Note also menu option 3, “Print out tree”. This can be useful when you are running many data sets, and will be using the resulting trees from the output tree file. It may be helpful to turn off the printing out of the trees in that case, particularly if those files would be too big. The Output File Most of the programs write their output onto a file called usually outfile , and a representation of the trees found onto a file called outtree.

The exact contents of the output file vary from program to program and also depend on which menu options you have selected. For many programs, if you select all possible output information, the output will consist of 1 the name of the program and its version number, 2 some of the input information printed out, and 3 a series of phylogenies, some with associated information indicating how much change there was in each character or on each part of the tree.

The numbers at the forks are arbitrary and are used if present merely to identify the forks. For many of the programs the tree produced is unrooted. Rooted and unrooted trees are printed in nearly the same form, but the unrooted ones are accompanied by the warning message: remember: this is an unrooted tree! Mathematicians still call an unrooted tree a tree, though some systematists unfortunately use the term “network” for an unrooted tree. This conflicts with standard mathematical usage, which reserves the name “network” for a completely different kind of graph.

The root of this tree could be anywhere, say on the line leading immediately to Mouse. It is important also to realize that the lengths of the segments of the printed tree may not be significant: some may actually represent branches of zero length, in the sense that there is no evidence that those branches are nonzero in length.

Some of the diagrams of trees attempt to print branches approximately proportional to estimated branch lengths, while in others the lengths are purely conventional and are presented just to make the topology visible. You will have to look closely at the documentation that accompanies each program to see what it presents and what is known about the lengths of the branches on the tree. The above tree attempts to represent branch lengths approximately in the diagram.

But even in those cases, some of the smaller branches are likely to be artificially lengthened to make the tree topology clearer.

When a tree has branch lengths, it will be accompanied by a table showing for each branch the numbers or names of the nodes at each end of the branch, and the length of that branch. For the first tree shown above, the corresponding table is: Between And Length Approx. Confidence Limits 1 Bovine 0.

Similar tables exist in distance matrix and likelihood programs, as well as in the parsimony programs Dnapars and Pars. Some of the parsimony programs in the package can print out a table of the number of steps that different characters or sites require on the tree.

This table may not be obvious at first. Thus site 23 is column “3” of row “20” and has 1 step in this case. There are many other kinds of information that can appear in the output file, They vary from program to program, and we leave their description to the documentation files for the specific programs. The Tree File In output from most programs, a representation of the tree is also written into the tree file outtree.

The tree is specified by nested pairs of parentheses, enclosing names and separated by commas. We will describe how this works below. Trailing blanks in the name may be omitted. The pattern of the parentheses indicates the pattern of the tree by having each pair of parentheses enclose all the members of a monophyletic group.

The tree file could look like this: Mouse,Bovine , Gibbon, Orang, Gorilla, Chimp,Human ; In this tree the first fork separates the lineage leading to Mouse and Bovine from the lineage leading to the rest. Within the latter group there is a fork separating Gibbon from the rest, and so on. The entire tree is enclosed in an outermost pair of parentheses. The tree ends with a semicolon. In some programs such as Dnaml, Fitch, and Contml, the tree will be unrooted. The single three-way split corresponds to one of the interior nodes of the unrooted tree it can be any interior node of the tree.

The remaining forks are encountered as you move out from that first node. In newer programs, some are able to tolerate these other forks being multifurcations multi-way splits. You should check the documentation files for the particular programs you are using to see in which of these forms you can expect the user tree to be in. Note that many of the programs that actually estimate an unrooted tree such as Dnapars produce trees in the treefile in rooted form! This is done for reasons of arbitrary internal bookkeeping.

The placement of the root is arbitrary. We are working toward having all programs be able to read all trees, whether rooted or unrooted, multifurcating or bifurcating, and having them do the right thing with them. But this is a long-term goal and it is not yet achieved. For programs that infer branch lengths, these are given in the trees in the tree file as real numbers following a colon, and placed immediately after the group descended from that branch. Here is a typical tree with branch lengths: cat These representations of trees are a subset of the standard adopted on 24 June at the annual meetings of the Society for the Study of Evolution by an informal committee its final session in Newick’s lobster restaurant – hence its name, the Newick standard consisting of Wayne Maddison author of MacClade , David Swofford PAUP , F.

Day, and me. This standard is a generalization of PHYLIP’s format, itself based on a well-known representation of trees in terms of parenthesis patterns which is due to the famous mathematician Arthur Cayley, and which has been around for over a century. The standard is now employed by most phylogeny computer programs but unfortunately has yet to be decribed in a formal published description. Options are selected in the menu.

Common options in the menu A number of the options from the menu, the U User tree , G Global , J Jumble , O Outgroup , W Weights , T Threshold , M multiple data sets , and the tree output options, are used so widely that it is best to discuss them in this document.

The U User tree option. This option toggles between the default setting, which allows the program to search for the best tree, and the User tree setting, which reads a tree or trees “user trees” from the input tree file and evaluates them.

The input tree file’s default name is intree. In many cases the programs will also tolerate having the trees be preceded by a line giving the number of trees: Alligator,Bear , Cow, Dog,Elephant ,Ferret ; Alligator,Bear , Cow,Dog ,Elephant ,Ferret ; Alligator,Bear , Cow,Dog , Elephant,Ferret ; An initial line with the number of trees was formerly required, but this now can be omitted. Some programs require rooted trees, some unrooted trees, and some can handle multifurcating trees.

You should read the documentation for the particular program to find out which it requires. Program Retree can be used to convert trees among these forms on saving a tree from Retree, you are asked whether you want it to be rooted or unrooted. In using the user tree option, check the pattern of parentheses carefully. The programs do not always detect whether the tree makes sense, and if it does not there will probably be a crash hopefully, but not inevitably, with an error message indicating the nature of the problem.

Trees written out by programs are typically in the proper form. The G Global option. In the programs which construct trees except for Neighbor, the ” In most of these programs the rearrangements are automatically global, which in this case means that subtrees will be removed from the tree and put back on in all possible ways so as to have a better chance of finding a better tree. Since this can be time consuming it roughly triples the time taken for a run it is left as an option in some of the programs, specifically Contml, Fitch, Dnaml and Proml.

In these programs the G menu option toggles between the default of local rearrangement and global rearrangement. The rearrangements are explained more below. The J Jumble option. In most of the tree construction programs except for the ” In these programs J option enables you to tell the program to use a random number generator to choose the input order of species. This option is toggled on and off by selecting option J in the menu. The program will then prompt you for a “seed” for the random number generator.

Each different seed leads to a different sequence of addition of species. By simply changing the random number seed and re-running the programs one can look for other, and better trees. If the seed entered is not odd, the program will not proceed, but will prompt for another seed.

The Jumble option also causes the program to ask you how many times you want to restart the process. If you answer 10, the program will try ten different orders of species in constructing the trees, and the results printed out will reflect this entire search process that is, the best trees found among all 10 runs will be printed out, not the best trees from each individual run.

Some people have asked what are good values of the random number seed. The random number seed is used to start a process of choosing “random” actually pseudorandom numbers, which behave as if they were unpredictably randomly chosen between 0 and 2 32 -1 which is 4,,, You could put in the number and find that the next random number was ,, However if you re-use a random number seed, the sequence of random numbers that result will be the same as before, resulting in exactly the same series of choices, which may not be what you want.

The O Outgroup option. This specifies which species is to have the root of the tree be on the line leading to it. For example, if the outgroup is a species “Mouse” then the root of the tree will be placed in the middle of the branch which is connected to this species, with Mouse branching off on one side of the root and the lineage leading to the rest of the tree on the other.

This option is toggled on and off by choosing O in the menu the alphabetic character O , not the digit 0. When it is on, the program will then prompt for the number of the outgroup the species being taken in the numerical order that they occur in the input file.

Responding by typing 6 and then an Enter character indicates that the sixth species in the data the 6th in the first set of data if there are multiple data sets is taken as the outgroup. Outgroup-rooting will not be attempted if the data have already established a root for the tree from some other consideration, and may not be if it is a user-defined tree, despite your invoking the option.

Thus programs such as Dollop that produce only rooted trees do not allow the Outgroup option. It is also not available in Kitsch, Dnamlk, Promlk or Clique. When it is used, the tree as printed out is still listed as being an unrooted tree, though the outgroup is connected to the bottommost node so that it is easy to visually convert the tree into rooted form.

The T Threshold option. This sets a threshold forn the parsimony programs such that if the number of steps counted in a character is higher than the threshold, it will be taken to be the threshold value rather than the actual number of steps. The default is a threshold so high that it will never be surpassed in which case the steps whill simply be counted.

The T menu option toggles on and off asking the user to supply a threshold. The use of thresholds to obtain methods intermediate between parsimony and compatibility methods is described in my b paper. When the T option is in force, the program will prompt for the numerical threshold value.

This will be a positive real number greater than 1. In programs Dollop, Dolmove, and Dolpenny the threshold should never be 0. The T option is an important and underutilized one: it is, for example, the only way in this package except for program Dnacomp to do a compatibility analysis when there are missing data. It is a method of de-weighting characters that evolve rapidly.

I wish more people were aware of its properties. The M Multiple data sets option. In menu programs there is an M menu option which allows one to toggle on the multiple data sets option. The program will ask you how many data sets it should expect. The data sets have the same format as the first data set.

Using the program Seqboot one can take any DNA, protein, restriction sites, gene frequency or binary character data set and make multiple data sets by bootstrapping. Trees can be produced for all of these using the M option. They will be written on the tree output file if that option is left in force. Then the program Consense can be used with that tree file as its input file. The result is a majority rule consensus tree which can be used to make confidence intervals.

The present version of the package allows, with the use of Seqboot and Consense and the M option, bootstrapping of many of the methods in the package. Programs Dnaml, Dnapars and Pars can also take multiple weights instead of multiple data sets.

They can then do bootstrapping by reading in one data set, together with a file of weights that show how the characters or sites are reweighted in each bootstrap sample. Thus a site that is omitted in a bootstrap sample has effectively been given weight 0, while a site that has been duplicated has effectively been given weight 2. Seqboot has a menu selection to produce the file of weights information automatically, instead of producing a file of multiple data sets. It can be renamed and used as the input weights file.

The W Weights option. This signals the program that, in addition to the data set, you want to read in a series of weights that tell how many times each character is to be counted. If the weight for a character is zero 0 then that character is in effect to be omitted when the tree is evaluated. If it is 1 the character is to be counted once.

Some programs allow weights greater than 1 as well. These have the effect that the character is counted as if it were present that many times, so that a weight of 4 means that the character is counted 4 times.

The values give weights 0 through 9, and the values A-Z give weights 10 through By use of the weights we can give overwhelming weight to some characters, and drop others from the analysis. In the molecular sequence programs only two values of the weights, 0 or 1 are allowed. The weights are used to analyze subsets of the characters, and also can be used for resampling of the data as in bootstrap and jackknife resampling. For those programs that allow weights to be greater than 1, they can also be used to emphasize information from some characters more strongly than others.

Of course, you must have some rationale for doing this. The weights are provided as a sequence of digits. Thus they might be The weights are to be provided in an input file whose default name is weights. The weights in it are a simple string of digits. Blanks in the weightfile are skipped over and ignored, and the weights can continue to a new line. In programs such as Seqboot that can also output a file of weights, the input weights have a default file name of inweights , and the output file name has a default file name of outweights.

Weights can be used to analyze different subsets of characters by weighting the rest as zero. Alternatively, in the discrete characters programs they can be used to force a certain group to appear on the phylogeny in effect confining consideration to only phylogenies containing that group. This is done by adding an imaginary character that has 1 ‘s for the members of the group, and 0 ‘s for all the other species. That imaginary character is then given the highest weight possible: the result will be that any phylogeny that does not contain that group will be penalized by such a heavy amount that it will not except in the most unusual circumstances be considered.

Of course, the new character brings extra steps to the tree, but the number of these can be calculated in advance and subtracted out of the total when reporting the results.

This use of weights is an important one, and one sadly ignored by many users who could profit from it.

In the case of molecular sequences we cannot use weights this way, so that to force a given group to appear we have to add a large extra segment of sites to the molecule, with say A’s for that group and C’s for every other species.

The option to write out the trees into a tree file. This specifies that you want the program to write out the tree not only on its usual output, but also onto a file in nested-parenthesis notation as described above.

This option is sufficiently useful that it is turned on by default in all programs that allow it. You can optionally turn it off if you wish, by typing the appropriate number from the menu it varies from program to program. This option is useful for creating tree files that can be directly read into the programs, including the consensus tree and tree distance programs, and the tree plotting programs.

The output tree file has a default name of outtree. The 0 terminal type option. This is the digit 0 , not the alphabetic character O. This affects the ability of the programs to clear the screen when they display their menus, and the graphics characters used to display trees in the programs Dnamove, Move, Dolmove, and Retree. The Algorithm for Constructing Trees All of the programs except Factor, Dnadist, Gendist, Dnainvar, Seqboot, Contrast, Retree, and the plotting and consensus tree programs act to construct an estimate of a phylogeny.

Move, Dolmove, and Dnamove let you construct it yourself by hand. All of the rest but Neighbor, the ” They are trying to minimize or maximize some quantity over the space of all possible evolutionary trees.

Each program contains a part that, given the topology of the tree, evaluates the quantity that is being minimized or maximized. The straightforward approach would be to evaluate all possible tree topologies one after another and pick the one which, according to the criterion being used, is best.

This would not be possible for more than a small number of species, since the number of possible tree topologies is enormous. A review of the literature on the counting of evolutionary trees will be found one of my papers Felsenstein, a and in my book Felsenstein, , chapter 3. Since we cannot search all topologies, these programs are not guaranteed to always find the best tree, although they seem to do quite well in practice.

The strategy they employ is as follows: the species are taken in the order in which they appear in the input file. The first two in some programs the first three are taken and a tree constructed containing only those. There is only one possible topology for this tree. Then the next species is taken, and we consider where it might be added to the tree. If the initial tree is say a rooted tree with two species and we want the resulting three-species tree to be a bifurcating tree, there are only three places where we could add the third species.

Each of these is tried, and each time the resulting tree is evaluated according to the criterion. The best one is chosen to be the basis for further operations. Now we consider adding the fourth species, again at each of the five possible places that would result in a bifurcating tree.

Again, the best of these is accepted. This is usually known as the Sequential Addition strategy. Local rearrangements The process continues in this manner, with one important exception. After each species is added, and before the next is added, a number of rearrangements of the tree are tried, in an effort to improve it.

The algorithms move through the tree, making all possible local rearrangements of the tree. A local rearrangement involves an internal segment of the tree in the following manner. Each time a local rearrangement is successful in finding a better tree, the new arrangement is accepted.

The phase of local rearrangements does not end until the program can traverse the entire tree, attempting local rearrangements, without finding any that improve the tree. This strategy of adding species and making local rearrangements will look at about n-1 x 2n-3 different topologies, though if rearrangements are frequently successful the number may be larger. I have been describing the strategy when rooted trees are being considered.

For unrooted trees there is a precisely similar strategy, though the first tree constructed may be a three-species tree and the rearrangements may not start until after the addition of the fifth species. Though we are not guaranteed to have found the best tree topology, we are guaranteed that no nearby topology i. In this sense we have reached a local optimum of our criterion. Note that the whole process is dependent on the order in which the species are present in the input file.

We can try to find a different and better solution by reordering the species in the input file and running the program again or, more easily, by using the J option. If none of these attempts finds a better solution, then we have some indication that we may have found the best topology, though we can never be certain of this.

Note also that a new topology is never accepted unless it is better than the previous one, so that the rearrangement process can never fall into an endless loop. This is also the way ties in our criterion are resolved, namely by sticking with the tree found first. However, the tree construction programs other than Clique, Contml, Fitch, and Dnaml do keep a record of all trees found that are tied with the best one found.

This gives you some immediate idea of which parts of the tree can be altered without affecting the quality of the result. In the others it automatically applies. When it is present there is an additional stage to the search for the best tree. Each possible subtree is removed from the tree from the tree and added back in all possible places.

This process continues until all subtrees can be removed and added again without any improvement in the tree. The purpose of this extra rearrangement is to make it less likely that one or more a species gets “stuck” in a suboptimal region of the space of all possible trees. The use of global optimization results in approximately a tripling 3 x of the run-time, which is why I have left it as an option in some of the slower programs.

My book Felsenstein, , chapter 4 contains a review of work on these and other rearrangements and search methods. The programs doing global optimization print out a dot “. A new line of dots is started whenever a new round of global rearrangements is started following an improvement in the tree.

On the line before the dots are printed there is printed a bar of the form “! The dots will not be printed out at a uniform rate, but the later dots, which represent removal of larger groups from the tree and trying them consequently in fewer places, will print out more quickly.

With some compilers each row of dots may not be printed out until it is complete. It should be noted that Penny, Dolpenny, Dnapenny and Clique use a more sophisticated strategy of “depth-first search” with a “branch and bound” search method that guarantees that all of the best trees will be found. In the case of Penny, Dolpenny and Dnapenny there can be a considerable sacrifice of computer time if the number of species is greater than about ten: it is a matter for you to consider whether it is worth it for you to guarantee finding all the most parsimonious trees, and that depends on how much free computer time you have!

Clique finds all largest cliques, and does so without undue burning of computer time. Although all of these problems that have been investigated fall into the category of “NP-hard” problems that in effect do not have a rapid solution, the cases that cause this trouble for the largest-cliques algorithm in Clique apparently are not biologically realistic and do not occur in actual data.

Multiple jumbles As just mentioned, for most of these programs the search depends on the order in which the species are entered into the tree. Using the J Jumble option you can supply a random number seed which will allow the program to put the species in in a random order. Jumbling can be done multiple times. For example, if you tell the program to do it 10 times, it will go through the tree-building process 10 times, each with a different random order of adding species.

It will keep a record of the trees tied for best over the whole process. In other words, it does not just record the best trees from each of the 10 runs, but records the best ones overall.

Of course this is slow, taking 10 times longer than a single run. But it does give us a much greater chance of finding all of the most parsimonious trees.

In the terminology of Maddison it can find different “islands” of trees. The present algorithms do not guarantee us to find all trees in a given “island” from a single run, so multiple runs also help explore those “islands” that are found.

Saving multiple tied trees For the parsimony and compatibility programs, one can have a perfect tie between two or more trees. In these programs these trees are all saved. For the newer parsimony programs such as Dnapars and Pars, global rearrangement is carried out on all of these tied trees. This can be turned off in the menu. For trees with criteria which are real numbers, such as the distance matrix programs Fitch and Kitsch, and the likelihood programs Dnaml, Dnamlk, Contml, and Restml, it is difficult to get an exact tie between trees.

Consequently these programs save only the single best tree even though the others may be only a tiny bit worse. Strategy for finding the best tree In practice, it is advisable to use the Jumble option to evaluate many different orderings of the input species. It is advisable to use the Jumble option and specify that it be done many times as many as different orderings of the input species. This is usually not necessary when bootstrapping, though the programs will then default to doing it once to avoid artifacts caused by the order in which species are added to the tree.

People who want a magic “black box” program whose results they do not have to question or think about often are upset that these programs give results that are dependent on the order in which the species are entered in the data.

To me this property is an advantage, for it permits you to try different searches for better trees, simply by varying the input order of species. If you do not use the multiple Jumble option, but do multiple individual runs instead, you can easily decide which to pay most attention to – the one or ones that are best according to the criterion employed for example, with parsimony, the one out of the runs that results in the tree with the fewest changes.

In practice, in a single run, it usually seems best to put species that are likely to be sources of confusion in the topology last, as by the time they are added the arrangement of the earlier species will have stabilized into a good configuration, and then the last few species will by fitted into that topology.

There will be less chance this way of a poor initial topology that would affect all subsequent parts of the search. However, a variety of arrangements of the input order of species should be tried, as can be done if the J option is used, and no species should be kept in a fixed place in the order of input.

Note that the results of the ” Note also that with global search, which is standard in many programs and in others is an option, each group including each individual species will be removed and re-added in all possible positions, so that a species causing confusion will have more chance of moving to a new location than it would without global rearrangement. Nixon’s search strategy An innovative search strategy was developed by Kevin Nixon If one uses a manual rearrangement program such as Dnamove, Move, or Dolmove, and look at the distribution of characters on the trees, you will see some characters whose distributions appear to recommend alternative groupings.

The section is not configured correctly