Peggy's Intern Diary: January 2014

Friday, January 24, 2014

Small Myco bacterium Trees!

January 24, 2014. Last Friday I didn't meet with my mentor because he was out of town. Yet, over the week I was able to successfully run the BEAST on my mentor's website (finally!!). Today we made our first attempt to generate a tree for the small mycobacterium data!

The two main outputs of BEAST are the .log and the .tree files. First we analyzed our data by using a program called Tracer. After I imported the .log file...

One of the most important columns is the effective sample size (ESS) on the left. ESS is the number of independent samples that the trace is equivalent to, and it can help identify autocorrelation in our samples that might result from poor mixing. The ideal is to have the number >200. To do that, I eventually went back to BEAUti and changed the chain length to 5,000,000 with sampling step of 2000 (that increases our sample size in trace to 2500). It is also ideal for our graph to look Normal (this one is pretty good:)).

After confirming that our data converged to a stable posterior distribution, I used TreeAnnotator to summarize the information from a sample of trees (we have 2500) produced by BEAST onto a single “target” tree. The output of TreeAnnotator is a .nex file, which was to be loaded on FigTree program for visualization.

Voila my first tree! Everything looked good except that part in the red. In our data, myco16 and 23 were genetically identical. However, the sampling time of 23 and 30 were closer together, making closer together on the tree. Therefore, we wondered how much does date vs. DNA weighed in Bayesian trees. I went back to BEAUti and excluded myco16 from the taxa so that one sequence only correspond to one sample. The order of the samples on my second tree was good except this time the "time" (in days) was way too large!

I eventually spent the rest of the time going back and forth to see what setting in BEAUti has what effects on my tree. Again, I have generated at least 10 files in this process, including some that failed in BEAST run (I had to go over ALL the steps from BEAUti to FigTree). However, at least we were finally able to get a sense of what our tree would look like:) More trees next week!

Saturday, January 11, 2014

First Meet in Second Semester!

January 10, 2014. Happy New Year everyone!! After enjoying some home time, I had my first meet after so long! However, I felt that I wasn't in my best condition since the jetlag made me a bit dizzy throughout the meeting. Nonetheless, since we have been lagging off for a month, we decide to start running BEAST with the smaller data we got from the zoo people.

We started making our input file with BEAUti. The data was consisted of 6 samples. I had previously converted them into NEXUS file, which was the only format accepted by BEAUti. Dr. Miller sent me a paper (estimating divergence time of viruses is close to estimating that of bacteria) and suggested us to compare / discuss our work after working separately for a while.

So, for my part, I entered the date (the day each was sampled) of my samples in months (since Jan, 2001). Then, I moved on to setting the substitution model as was suggested in the paper - HKY. I would like to test out different models later as I have read several other combinations that would fit our data. But for our first run I would just stick with the tutorial.

Tips Lane - sampling date

As for Clock lane, I set strict clock (constant rate) because of the low diversity data we are analyzing. Next is the Tree Prior. The Priors panel allows the user to specify informative priors for all the parameters in the model, which can be helpful or burdensome, especially if no obvious prior distribution for a parameter exists (like in our case). Thus, we would need to try out different settings later. My initial setting shown below.

Exponential because bacteria usually grow exponentially

When all was ready, I clicked generate BEAST file. However, when I uploaded the file to run BEAST, it repeatedly said error! For some reason it said that my setting resulted to 0 prior distribution. One possible reason, in which I read from the BEAST user group online, was that "BEAST starts with a randomly generated tree and if you set tight priors for the parameters, the starting tree may have an extremely low probability which makes it impossible for BEAST to proceed." I went back to the Tree Prior panel and changed random starting tree to UPGMA starting tree. While it ran fine on my computer initially, it terminated to error the next time. After going back and forth, I in total generated 7 files!! Still not working. We decided to make our own starting tree using my mentor's website next time. Yet, the whole process kind of gave me a headache for staring at the screen for 3 hours. To be honest, I am quite intimidated by the computer program as we couldn't really understand the math behind those parameters to fully understand why we were doing. Yet again, those tools are indispensable to many scientists though they would not have invented them themselves. I could only hope that they would start making more sense along the way. Trials and errors - GO SCIENCE!!