Tuesday, October 8, 2013

More MEGA 5 with Protein Sequences

Last Friday (October 4th, 2013) I learned more about using MEGA 5 and Blast for constructing phylogenetic trees. This time, however, we used protein blast instead of nucleotide blast because searching protein sequences can detect much more distantly related homologs than searching nucleotide sequences. If two sequences are "homologous," we assume that they descended from a common ancestor (need to be distinguished from "similarity"). Recall that an amino acid is coded by a codon, and that the same amino acid can be coded by several codons, a silent substitution will not cause the amino acid sequence to change. For DNA sequences, there are only 4 possible states (A,T,C,G) of each characters, so when the sequences diverge greatly that there are only 25% identical, the program would classified them as not closely related even though they may have very similar amino acid sequences if you translate them. Thus, the solution id to use protein sequence as query. Proteins have 20 possible states so the lower limit of detectable homology drops to about 5% (instead of 25%). 

For this exercise we used EbgC protein sequence as a query and use blastp.


Those protein sequences are identified as closely related to our query sequences. If you click on the first entry...

The first protein sequences actually include 1087 DNA sequences. Those DNA sequences all code for the same protein sequences even though their DNA sequences vary (silent mutation). Therefore, all those 1087 sequences are very closely related. If we originally did a blastn instead of blastp, the result would only show 100 of these 1087 sequences, and we wouldn't even know other distantly related sequences. The second entry (evolved beta-galactosidase subunit beta [Escherichia coli]) is an example of distantly related sequences (even though they are still pretty close).

Later I chose several protein sequences and translated them back to DNA sequences to align them on MEGA and established a tree. The whole process took so long! Nonetheless, the main idea here is that protein sequences give us a bigger picture regarding the relatedness between species. We start from DNA sequence --> protein sequence --> protein structure --> functions. The bigger we look at , the more distantly related species we are incorporating.

I will be learning some computer code for the command line for the next few weeks. However, I will not meet with Dr. Miller for the next two weeks because of the schedule conflicts. In the meantime, I will continue reading the book and get myself more familiar with blast and MEGA:)

1 comment:

  1. Wow, so cool! I really need to see this all in action sometime soon.

    I am impressed with your command of DNA and protein information. You clearly know a great deal about these macromolecules, and can write about them commandingly.

    As we discussed today, please remember to meet with me if your mentor is unable to travel to campus. Then I can see the computer program in action!

    ReplyDelete