Monday, April 21, 2014

18 Samples + ATCC Reference Strain

I didn't meet with my mentor last week because I was on a college visit. Nonetheless, the week before (April 11th, 2014), I had attempted to make several BEAST tress with 6 different BEAUti settings (no time included). I would totally include some visualizations since I've got some cool pictures except that my laptop decided to crash down and all the memory is lost (All the data run is probably too demanding for the computer oops). So, I will try to explain what I did in words.. please refer to my previous posts if needed!

I mainly focused on the effect of clock model and site model on the tree. Although I didn't actually look at the output tree visualization, I examine my the goodness of my output data on Tracer (see my previous posts). Out of 6, only one looked decent (approx. Normal), with the setting of Strict Clock Model and the GTR site model. So, I went on including the time of the samples in BEAUti and ran it in BEAST, yet the resulting tree didn't look so good..

However, my mentor gave me a new input file where our previous reference strain MAV104 was substituted by ATCC strain because MAV104, which was isolated from an AID patient was too distant from our M.avium samples, and that ATCC was originated in birds as well. Therefore, I ran the new data on BEAST (time included) as well as on GARLI and RAxML, where the trees were made based exclusively on the sequences. Something promising happened! The BEAST tree looks almost exactly like the RAxML tree with a little variations, which is reasonable since BEAST builds a tree based on both sequences and time. I will go ahead and analyze the GARLI tree to see if I get the resemblance.

In the meantime, I am starting to work on my final poster! Whoo the time has gone by so quick! I just got my new emergency laptop and am trying to reinstall many software. Thankfully that I ran most of my data online so I can still download them to my computer. The thing that will take a while is the actual pictures of the trees, in which I will try to rebuild them these days.

Finally, since there is no visualization in this post, I would like to share a picture from my college visit at LA! This is me standing next to my future school mascot - the Bruin! Can you guess where will I be spending my next 4 years at? Yes - UCLA!!!! I am so glad that the application process is finally over and I can't wait for my college life!!!


Sunday, April 6, 2014

Spring:)

Last Friday (April 4, 2014), I finally met with my mentor again after so long. However, we have a shorter meeting because I have a flight to catch that evening to Atlanta, GA for college visit. Also, since I have been missing several posts (I have been travelling a lot this month!), I will include what has been going on in this post. He has forwarded me several emails over the break to keep me updated about his conversation with the zoo people. So, over the break the zoo people have reassemble many of their mycobacterium sample sequence to increase the number of signals as well as eliminating sequence error. On the other hand, my mentor has been assessing the sequence alignment of the samples, and he did found some assembling errors such as one sequence contains an extra 1000bp and another contains extra 99bp. Additionally, he found 3 pairs of identical sequences, so we will later eliminate the duplicates.

Before Friday, I edited the 21-sample file Dr. Miller sent to me and ran it on BEAST with default setting. However, the graph on the Tracer did not look so good - it had 3 bumps, skewing overall to the left in stead of the fine Normal shape we want, so I did not proceed to the tree making. I showed my result to my mentor and we will just keep testing out with different settings.

Meanwhile, he showed me what has he been doing over the break - comparing each sample sequence to a reference sequence, in this case MAV.104, using a software called mauve to assess the validity of sample sequence and how similar / different are they to / from the reference. So to align the sequence, I clicked Align with progressiveMauve and added two sequences MAV 104 (reference) and myc01.


The top one is the reference sequence and the bottom one is our sample (myc01). Each color segment represent a contig, which ia set of overlapping DNA segments that together represent a consensus region of DNA. The bottom sequence is 2-sided simply because when we sequence the sample, there are some pieces copied from the positive strand and others copied from negative strands. Thus, what can be useful is if we reorder the sequences and convert them all to the positive strand, and you can do this by selecting Tools --> move contigs.


After the reordering, we can really see how similar is our sample sequence to that of the reference. The white gaps simply mean certain places do not match. However, this image does not tell the exact order of contigs of our sample since we've reordered it. One sequence may be preserved in bacteria / viruses though without being at the exact same site since small organisms have great ability to reorder DNA sequences.

This week my assignment is to compare several more contigs with the reference to get myself familiar with Mauve while trying to reinforce our tree. I will continue working on our tree with different BEAUti settings including using relax clock and UPGMA starting trees.