Friday, October 25, 2013

Neighbor Joining Trees

October 25, 2013. For the past two weeks, I had been reading about the introduction of several major methods for estimating phylogenetic trees. There are primarily two approaches to tree estimation:

  • Algorithmic - uses an algorithm to estimate a tree from the data. Advantages of using this method include fast speed and yielding only a single tree. Algorithmic method includes Neighbor Joining, UPGMA
  • Tree-searching - estimates many trees and then uses some criterion to decide which is the best tree of all.
Another way to categorize those methods are distance vs. character-based methods.
  • distance - Neighbor Joining, UPGMA
  • character-based - Parsimony, Maximum Likelihood, Bayesian Interference
Our discussion today focused on Neighbor Joining Method (NJ). Nonetheless, it is very important to acknowledge that the 'right' tree does not exist since we cannot know what exactly happened in the past. All of these methods only allow us to deduce the the order in which existing taxa (sequences) diverged from a hypothetical common ancestor, and to calculate the amount of changes along the branches between the diverging events.

NJ is one of the most popular distance algorithmic method. It produces a single, strictly bifurcating tree (meaning that each internal node has exactly two branched descending from it). I downloaded another file for practice, in which I am going to show you here.

First I opened up the file LargeData.meg from MEGA. The window shows DNA sequence alignment. The program only show a base when it is different from that of the first sequence. Otherwise it'll just show "."


Then we calculate the average Jukes-Cantor (JC) distance for our data. The data are not suitable for NJ if JC >1.0 In this case, I got 0.811 for this data.

Next, I constructed the tree by choosing Construct/Test Neighbor Joining Tree from Phylogeny menu. A window of Option Summary will pop up. While it is good to stay with default setting (Maximum Composite Likelihood), my mentor and I discussed several other models for the Model/Method section (for simplicity, I won't get into detail here). 
  • Jukes-Cantor model
  • Kimura 2-Parameter model (K2P)
  • Tamura-Nei model
In order to test the reliability of our tree, we also entered "bootstrap method" for Phylogeny Test with a replication of 1000 times. Bootstrapping is, in essence, a method to test how likely a certain "split" is going to occur by using computational simulation.

As we hit compute...
The number assigned to each internal node is the bootstrap value (bootstrap percentage). The high number shows that the split is not random. We can resize the tree a little bit to get a better view by clicking Display Only Topology.
Each internal node only splits into two branches (strictly bifurcating)
The bootstrap value less than 50% means we really have no idea what the branching order is. Scientists usually collapse those branches into polytomies - nodes from which more than two branches descend.
Majority rule tree
Finally, we can do various things to change the appearance of the tree to make it best represent our data...
A circular tree is an unrooted tree
This chapter was quite hard for me because it involved in a lot of equations and concept. Nonetheless, I am happy that I constructed a descent-looking phylogenetic tree in the end! Next week I will discuss with my mentor more about the bootstrapping method and hopefully it will make more sense to me!:)

4 comments:

  1. Love the trees! You provide an interesting and robust review of your work, as usual. I have come to expect high quality work from your blog - kudos to you!

    As noted before, I would love to get a demonstration of your tree building skills. Feel free to send me some of your free times so that I can learn more about your great work.

    ReplyDelete
  2. Thank you Mr. Calos! Also, I would love to show you the program! I have A and C's free if you are also available then too.

    ReplyDelete
    Replies
    1. How about Friday, 15 November at 10:30 in my office?

      Delete
    2. I don't have frees on Fridays... are you free during A or C?

      Delete