Last Friday (April 4, 2014), I finally met with my mentor again after so long. However, we have a shorter meeting because I have a flight to catch that evening to Atlanta, GA for college visit. Also, since I have been missing several posts (I have been travelling a lot this month!), I will include what has been going on in this post. He has forwarded me several emails over the break to keep me updated about his conversation with the zoo people. So, over the break the zoo people have reassemble many of their mycobacterium sample sequence to increase the number of signals as well as eliminating sequence error. On the other hand, my mentor has been assessing the sequence alignment of the samples, and he did found some assembling errors such as one sequence contains an extra 1000bp and another contains extra 99bp. Additionally, he found 3 pairs of identical sequences, so we will later eliminate the duplicates.
Before Friday, I edited the 21-sample file Dr. Miller sent to me and ran it on BEAST with default setting. However, the graph on the Tracer did not look so good - it had 3 bumps, skewing overall to the left in stead of the fine Normal shape we want, so I did not proceed to the tree making. I showed my result to my mentor and we will just keep testing out with different settings.
Meanwhile, he showed me what has he been doing over the break - comparing each sample sequence to a reference sequence, in this case MAV.104, using a software called mauve to assess the validity of sample sequence and how similar / different are they to / from the reference. So to align the sequence, I clicked Align with progressiveMauve and added two sequences MAV 104 (reference) and myc01.
The top one is the reference sequence and the bottom one is our sample (myc01). Each color segment represent a contig, which is a set of overlapping DNA segments that together represent a consensus region of DNA. The bottom sequence is 2-sided simply because when we sequence the sample, there are some pieces copied from the positive strand and others copied from negative strands. Thus, what can be useful is if we reorder the sequences and convert them all to the positive strand, and you can do this by selecting Tools --> move contigs.
After the reordering, we can really see how similar is our sample sequence to that of the reference. The white gaps simply mean certain places do not match. However, this image does not tell the exact order of contigs of our sample since we've reordered it. One sequence may be preserved in bacteria / viruses though without being at the exact same site since small organisms have great ability to reorder DNA sequences.
This week my assignment is to compare several more contigs with the reference to get myself familiar with Mauve while trying to reinforce our tree. I will continue working on our tree with different BEAUti settings including using relax clock and UPGMA starting trees.
Thank you for the detailed post and for catching up on all that happened over break.
ReplyDeleteI also wanted to ask you if you've thought about how you will display your work at the Signature Exhibition. Is there a way you can walk others through your process using visuals and text?
ReplyDeleteYeah I know it's kind of hard to explain the whole process in a few minutes, especially since I did a lot of trial-and-error in my project. I plan to exclude those trial-and-errors so I can see a clear path. I am also going to focus on BEAST since it's the software I mainly use, but I will also mention other software as complement.
Delete