Peggy's Intern Diary

Tuesday, May 20, 2014

Final Poster Session & Bicentennial!

My internship has finally come to an end. On April 30, 2014, we presented our poster during the assembly block to faculties and students. I am glad that many people stopped by and asked me questions. After analyzing my data, we concluded that most birds acquired the infection independently, while some acquired int through bird-to-bird transmission (not shown in the tree). What I was more happy about was that my mentor mentor was able to come to my presentation, and he also enjoyed other students' presentations.

Fortunately, I was also able to share my experience in one of the Academy class over the bicentennial weekend. The group asked a lot of questions and I was really happy to share my experience with STEM program. I think the fact that Emma Willard is developing this Signature Program is incredible, and I am glad to be one of the earliest participant. After 2 years of STEM intern, I'm more certain of what I want to do in the future, and I hope the program can help more students to fins their own passions!

Dr. Miller and I at the end of the year presentation.

Monday, April 21, 2014

18 Samples + ATCC Reference Strain

I didn't meet with my mentor last week because I was on a college visit. Nonetheless, the week before (April 11th, 2014), I had attempted to make several BEAST tress with 6 different BEAUti settings (no time included). I would totally include some visualizations since I've got some cool pictures except that my laptop decided to crash down and all the memory is lost (All the data run is probably too demanding for the computer oops). So, I will try to explain what I did in words.. please refer to my previous posts if needed!

I mainly focused on the effect of clock model and site model on the tree. Although I didn't actually look at the output tree visualization, I examine my the goodness of my output data on Tracer (see my previous posts). Out of 6, only one looked decent (approx. Normal), with the setting of Strict Clock Model and the GTR site model. So, I went on including the time of the samples in BEAUti and ran it in BEAST, yet the resulting tree didn't look so good..

However, my mentor gave me a new input file where our previous reference strain MAV104 was substituted by ATCC strain because MAV104, which was isolated from an AID patient was too distant from our M.avium samples, and that ATCC was originated in birds as well. Therefore, I ran the new data on BEAST (time included) as well as on GARLI and RAxML, where the trees were made based exclusively on the sequences. Something promising happened! The BEAST tree looks almost exactly like the RAxML tree with a little variations, which is reasonable since BEAST builds a tree based on both sequences and time. I will go ahead and analyze the GARLI tree to see if I get the resemblance.

In the meantime, I am starting to work on my final poster! Whoo the time has gone by so quick! I just got my new emergency laptop and am trying to reinstall many software. Thankfully that I ran most of my data online so I can still download them to my computer. The thing that will take a while is the actual pictures of the trees, in which I will try to rebuild them these days.

Finally, since there is no visualization in this post, I would like to share a picture from my college visit at LA! This is me standing next to my future school mascot - the Bruin! Can you guess where will I be spending my next 4 years at? Yes - UCLA!!!! I am so glad that the application process is finally over and I can't wait for my college life!!!

Sunday, April 6, 2014

Spring:)

Last Friday (April 4, 2014), I finally met with my mentor again after so long. However, we have a shorter meeting because I have a flight to catch that evening to Atlanta, GA for college visit. Also, since I have been missing several posts (I have been travelling a lot this month!), I will include what has been going on in this post. He has forwarded me several emails over the break to keep me updated about his conversation with the zoo people. So, over the break the zoo people have reassemble many of their mycobacterium sample sequence to increase the number of signals as well as eliminating sequence error. On the other hand, my mentor has been assessing the sequence alignment of the samples, and he did found some assembling errors such as one sequence contains an extra 1000bp and another contains extra 99bp. Additionally, he found 3 pairs of identical sequences, so we will later eliminate the duplicates.

Before Friday, I edited the 21-sample file Dr. Miller sent to me and ran it on BEAST with default setting. However, the graph on the Tracer did not look so good - it had 3 bumps, skewing overall to the left in stead of the fine Normal shape we want, so I did not proceed to the tree making. I showed my result to my mentor and we will just keep testing out with different settings.

Meanwhile, he showed me what has he been doing over the break - comparing each sample sequence to a reference sequence, in this case MAV.104, using a software called mauve to assess the validity of sample sequence and how similar / different are they to / from the reference. So to align the sequence, I clicked Align with progressiveMauve and added two sequences MAV 104 (reference) and myc01.

The top one is the reference sequence and the bottom one is our sample (myc01). Each color segment represent a contig, which is a set of overlapping DNA segments that together represent a consensus region of DNA. The bottom sequence is 2-sided simply because when we sequence the sample, there are some pieces copied from the positive strand and others copied from negative strands. Thus, what can be useful is if we reorder the sequences and convert them all to the positive strand, and you can do this by selecting Tools --> move contigs.

After the reordering, we can really see how similar is our sample sequence to that of the reference. The white gaps simply mean certain places do not match. However, this image does not tell the exact order of contigs of our sample since we've reordered it. One sequence may be preserved in bacteria / viruses though without being at the exact same site since small organisms have great ability to reorder DNA sequences.

This week my assignment is to compare several more contigs with the reference to get myself familiar with Mauve while trying to reinforce our tree. I will continue working on our tree with different BEAUti settings including using relax clock and UPGMA starting trees.

Thursday, March 20, 2014

Tree Comparison

Right before the break (early March), Dr. Miller and I had been working on constructing the 26 samples tree using various programs (see previous posts). After a lot of trial and error, we finally got decent trees from each program. However, since the outcome trees looked very different in each program, we used a software TreeGraph to standardize and compare the trees. When loading the files into the software, you would have to convert the file into nexus file simply by adding .nex in the file name. Below is the comparison between 4 trees:

In order of RAxMl, GARLI, MrBayes, BEAST

Fortunately, trees from GARLI and MrBayes looked exactly the same while the other two resembled them without great differences. This gave us a pretty good picture of the time-included tree in which we will later be working on. This is a rather short post because the process was lengthy and complicated that I don't think I will be able to explain it comprehensively here.

Time is going fast and it'll be April when we returned from the break! Hopefully we will be able to achieve some work before the year ends!

Sunday, February 23, 2014

More Trees with GARLI and RAxML

Last Friday (February 21, 2014), I didn't meet with my mentor because he was out of town. Nonetheless, I continued making trees with GARLI (using the right one this time) and RAxML.

GARLI_1

The numbers on the branches are the bootstrap result. In this trial, I set a bootstrap repetition of 50. Basically, the bigger the number is, the greater support we have for that particular branch, with the greatest number of 50. Though overall the tree looks pretty decent, notice that in this tree we have only little support for myc 1,2,5,23,16, and 30.

Next, I moved on to run the samples with RAxML. The branch lengths varied significantly and I am still understanding the implication of it. The maximum bootstrap number is 100. While most of them were pretty big, the numbers for were still very small (splitto the extreme). This lack of support was due to the same sequences. myc01 is identical to myc02 where as myc16 is identical to myc23.

Therefore, the next thing I did was to run both GARLI and RAxML again with identical sequences eliminated so that they wouldn't confuse the program.

GARLI_3 with identical samples eliminated.

RAxML_2 with identical samples eliminated.

This time the resulting trees all had branches with very high bootstrap numbers, suggesting that our trees were very strong.

The next thing I am going to do it is to run the samples with MrBayes and compare the result topologies. I am finally meeting with my mentor this coming Friday, and hopefully we'll discuss more about the results!

Sunday, February 16, 2014

Examining Different Trees Using Different Programs

Last Friday (February 14, 2014), my mentor wasn't able to come because his flight was cancelled due to the snow storm. However, we did chat through Skype and accomplished some work via email. In continuing our tree making, after reading several publications, we decided to run the larger data (26 samples) at once so it might be more accurate. Wayne, one of the zoo people, sent us an updated genomic sequence of the samples this time with more identified SNPs. We also want to examine the overall topology before taking time into consideration, so that we can first get a sense of how our tree would look like. Thus, we decide to look at different tree visualization tools and software for describing the difference between any two phylogenetic trees.

The programs I will be exploring in addition to BEAST are RAxML, GARLI, and MrBayes. I ran GARLI first, and did it on my mentor's website CIPRES. The whole process took a while and was quite complicated to describe it here. Basically our goal was to find a way to get a majority rule consensus tree form GARLI output. I later found out that I chose the wrong tool to run my data (there were several GARLI choices), but I still went ahead and analyze the data. I converted the output to nexus file so it can be read on Archaeopterix, a powerful tree visualization tool that supports many file formats. My final tree look like this:

I won't know how good was this tree until I make more with other tools so that I can compare them. I will also run GARLI again, using the right tool this time!

Tuesday, February 4, 2014

A Week Off

Last week I didn't meet with my mentor because I was under the weather:( We will try to catch up our work this Friday. However, it was also the Chinese New Year, so Happy Year of Horse everyone!!

Retrieved from http://eastweek.my-magazine.me/?aid=30771