Friday, December 27, 2013

Models of Lineage-Specific Rate Variation

December 27, 2013. Unfortunately, because of the conjunction of Revels and winter break, I will not be meeting with Dr. Miller for almost a month! However, most of our work now involves in reading papers online and using computer programs, so the long break time won't be too big of a problem. For a little recap, our goal is to find the divergence time of mycobacterium strains identified in our samples. We are going to construct a tree and calculate the divergence time by using compute program BEAUTi (Bayesian Evolutionary Analysis Utility) and BEAST (Bayesian Evolutionary Analysis Sampling Trees) (very creative names). BEAUti is a program with a graphical user interface for creating an input file for BEAST, which muct be written in XML language. The part that really takes time is deciding which models to use in estimating the divergence time of bacteria. Therefore, what we are doing now is reading journals and previous studies on similar experiments to help us decide our models.

The first type of models introduced here is lineage-specific rate variation. A tutorial mentioned five common models:
  • global molecular clock
  • local molecular clock
  • compound poisson process
  • autocorrelated rate
  • uncorrelated rate
Sebastián Duchêne's  power point provides simplified explanation of those molecular clock, in which I found very helpful.
Different color represents different rate of substitution. Relaxed clocks include auto- and uncorrelated rates. Credit to S.  Duchene (page 8).
After reading several papers and websites, I decided to try global and local clocks because bacteria evolve quickly in a short period of time. Therefore, the lineage-specific rates variation of different strains are more likely to be same or similar to one and another.

While I am still absorbing more info, I wish everyone happy holidays! I can't believe another year has passed and that I will be graduating in less than 6 months! Nonetheless, I hope to have significant contribution to the bird project before I graduate:)

Friday, November 22, 2013

Happy Holiday!

Today (November 22, 2013), we had a short talk on a phone with the zoo people whom we will be working with from San Diego. They are a group of very vibrant and intelligent people even without seeing their faces. Although most of the time I was listening to the conversation as there was a lot to digest, I was nonetheless very excited for this project! After several follow-ups emails, my goal was narrowed down to a single, clear task: What is the divergence time for these avium micrbacterium strains?

Dr. Miller and I looking at data on our computer. Most of our work can be achieved by computer programs, so no fancy gears or species to show B-)


We won't be meeting next Friday because it's Thanksgiving break (Black Friday shopping!!)! Happy Holiday everyone!:)

Tuesday, November 12, 2013

Bayesian Inference vs. Maximum Likelihood

Last Friday (November 8, 2013), my mentor and I discussed a bit about the Bayesian Inference of Trees and the program BEAST.

Like Maximum Likelihood (ML), Bayesian Inference is a character-based tree method, and they both generate several trees and use some criterion to decide which tree is the best. However, BI differs from ML in that BI "seeks the tree that is most likely given the data and the chosen substitution model" whereas ML "seeks the tree that makes the data the most likely" (Hall, 140). Sounds the same, I know. However, after a short period of intense research and seeing a lot of alien equations, I interpreted the major distinction like this (please correct me if I am wrong, clarification needed!!):

  • ML - make  different trees and calculate the likelihood of each tree
  • BI -  finds the tree that constitute the X (observation) with given knowledge (i.e. the descendants or a substitution model).
Mathematically, according to Bayes' Theorem:

P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\cdot \,

In Bayesian inference, the event B is fixed (the "given knowledge") in the discussion, and we wish to consider the impact of its having been observed on our belief in various possible events A (how the tree split). In such a situation the denominator of the last expression, the probability of the given evidence B, is fixed; what we want to vary is A. Thus, the posterior probabilities are proportional to the numerator: 


P(A|B) \propto  P(A) \cdot P(B|A) \

  • P(B|A) is the prior probability, also the likelihood. It can be interpreted as P(data | hypothesis). Prior probability indicates our state of knowledge about the truth of a hypothesis before we have observed the data. 
  • P(A|B) is the posterior probability. We can rewrite as (hypothesis | data). It shows how well our models agree with the observed data --> Bayesian
I drew a picture and hopefully it helps with visualization. Suppose we know there are species A, B, C, E, and F. For ML, we are finding in what way are the species located on the tree that P(A)xP(B|A) is the largest. For BI, we know from data that A, D, E, B are in the order they are. Given this knowledge, we are finding where C and F locate so that P(A|B) is the greatest.

This is at least what I've got so far. Nonetheless, BI can be burdensome if no obvious prior distribution for a parameter exists. The researcher to ensure that the prior selected is not inadvertently influencing the posterior distribution of parameters of interest. I clarified my thoughts every time I wrote it out. Before we get into the bird data, we try to ask ourselves some fundamental questions about the phylogenetic trees. Next Friday, I will be joining a talk with the data holders to discuss our participation in the Avian mycobacteria project. 

Tuesday, November 5, 2013

More Phylo Talks!

Last Friday (November 1st, 2013), My mentor and I discussed more about the bootstrapping methods in terms of how it work. He gave me a paper, in which I will be reading this week. We also talked about the derivation of the Jukes Cantor model, which is the simplest evolutionary model used to predict the rate at which nucleotide substitutions occur during evolution. It has two main assumptions:

  • Equal frequencies of the four bases
  • The probability of changing from one state to a different state is always equal, i.e. A-->G is as likely to happen as G-->C

Q Matrix of Jukes Cantor Model
f(t) represents the mutation rate, which is equal for all if a nucleotide is substituted with a different nucleotide. The probability of changing from A to A is 1- f(t) since the sum of the probabilities should add to 1.

However, there are many modifications for this model later published, including Felenstein 81 (F81), Kimura 2-parameter (K2P), HKY85, TN93, GTR, etc.

While we are still waiting for the bird data for my project, Dr, Miller sent me a practice data, and he would like me to explore his website on my own first (http://www.phylo.org/). So I looked at the demo, uploaded the data, and let the data run by accepting the default setting. I found the website pretty user-friendly, yet I don't really know what those result means. Hopefully I will learn more about what I can get from these results this Friday.


Friday, October 25, 2013

Neighbor Joining Trees

October 25, 2013. For the past two weeks, I had been reading about the introduction of several major methods for estimating phylogenetic trees. There are primarily two approaches to tree estimation:

  • Algorithmic - uses an algorithm to estimate a tree from the data. Advantages of using this method include fast speed and yielding only a single tree. Algorithmic method includes Neighbor Joining, UPGMA
  • Tree-searching - estimates many trees and then uses some criterion to decide which is the best tree of all.
Another way to categorize those methods are distance vs. character-based methods.
  • distance - Neighbor Joining, UPGMA
  • character-based - Parsimony, Maximum Likelihood, Bayesian Interference
Our discussion today focused on Neighbor Joining Method (NJ). Nonetheless, it is very important to acknowledge that the 'right' tree does not exist since we cannot know what exactly happened in the past. All of these methods only allow us to deduce the the order in which existing taxa (sequences) diverged from a hypothetical common ancestor, and to calculate the amount of changes along the branches between the diverging events.

NJ is one of the most popular distance algorithmic method. It produces a single, strictly bifurcating tree (meaning that each internal node has exactly two branched descending from it). I downloaded another file for practice, in which I am going to show you here.

First I opened up the file LargeData.meg from MEGA. The window shows DNA sequence alignment. The program only show a base when it is different from that of the first sequence. Otherwise it'll just show "."


Then we calculate the average Jukes-Cantor (JC) distance for our data. The data are not suitable for NJ if JC >1.0 In this case, I got 0.811 for this data.

Next, I constructed the tree by choosing Construct/Test Neighbor Joining Tree from Phylogeny menu. A window of Option Summary will pop up. While it is good to stay with default setting (Maximum Composite Likelihood), my mentor and I discussed several other models for the Model/Method section (for simplicity, I won't get into detail here). 
  • Jukes-Cantor model
  • Kimura 2-Parameter model (K2P)
  • Tamura-Nei model
In order to test the reliability of our tree, we also entered "bootstrap method" for Phylogeny Test with a replication of 1000 times. Bootstrapping is, in essence, a method to test how likely a certain "split" is going to occur by using computational simulation.

As we hit compute...
The number assigned to each internal node is the bootstrap value (bootstrap percentage). The high number shows that the split is not random. We can resize the tree a little bit to get a better view by clicking Display Only Topology.
Each internal node only splits into two branches (strictly bifurcating)
The bootstrap value less than 50% means we really have no idea what the branching order is. Scientists usually collapse those branches into polytomies - nodes from which more than two branches descend.
Majority rule tree
Finally, we can do various things to change the appearance of the tree to make it best represent our data...
A circular tree is an unrooted tree
This chapter was quite hard for me because it involved in a lot of equations and concept. Nonetheless, I am happy that I constructed a descent-looking phylogenetic tree in the end! Next week I will discuss with my mentor more about the bootstrapping method and hopefully it will make more sense to me!:)

Tuesday, October 8, 2013

More MEGA 5 with Protein Sequences

Last Friday (October 4th, 2013) I learned more about using MEGA 5 and Blast for constructing phylogenetic trees. This time, however, we used protein blast instead of nucleotide blast because searching protein sequences can detect much more distantly related homologs than searching nucleotide sequences. If two sequences are "homologous," we assume that they descended from a common ancestor (need to be distinguished from "similarity"). Recall that an amino acid is coded by a codon, and that the same amino acid can be coded by several codons, a silent substitution will not cause the amino acid sequence to change. For DNA sequences, there are only 4 possible states (A,T,C,G) of each characters, so when the sequences diverge greatly that there are only 25% identical, the program would classified them as not closely related even though they may have very similar amino acid sequences if you translate them. Thus, the solution id to use protein sequence as query. Proteins have 20 possible states so the lower limit of detectable homology drops to about 5% (instead of 25%). 

For this exercise we used EbgC protein sequence as a query and use blastp.


Those protein sequences are identified as closely related to our query sequences. If you click on the first entry...

The first protein sequences actually include 1087 DNA sequences. Those DNA sequences all code for the same protein sequences even though their DNA sequences vary (silent mutation). Therefore, all those 1087 sequences are very closely related. If we originally did a blastn instead of blastp, the result would only show 100 of these 1087 sequences, and we wouldn't even know other distantly related sequences. The second entry (evolved beta-galactosidase subunit beta [Escherichia coli]) is an example of distantly related sequences (even though they are still pretty close).

Later I chose several protein sequences and translated them back to DNA sequences to align them on MEGA and established a tree. The whole process took so long! Nonetheless, the main idea here is that protein sequences give us a bigger picture regarding the relatedness between species. We start from DNA sequence --> protein sequence --> protein structure --> functions. The bigger we look at , the more distantly related species we are incorporating.

I will be learning some computer code for the command line for the next few weeks. However, I will not meet with Dr. Miller for the next two weeks because of the schedule conflicts. In the meantime, I will continue reading the book and get myself more familiar with blast and MEGA:)

Monday, September 30, 2013

Tutorial: A Phylogenetic Tree with MEGA 5

Last Friday (September 27, 2013) instead of having an internship, I read the tutorial book my mentor gave me and made my first phylogenetic tree!


First of all, I had to download a program called MEGA 5, which allows us to align and compare related sequences with our sequence of interest. For this tutorial, we use a sequence of an alpha-glucuronidase gene from the bacterium Thermotoga petrophilia. We can obtained the sequence by clicking Do BLAST Search from the Align menu, and it will lead us to the following window:
MEGA 5 is linked to BLAST.
I entered the "accession numbers" and the "query subrange", which are given by the book, chose "Neucleotide collection" for the Database, and blastn for Program Selection.

The results showed up few seconds after I hit search. It gave all the sequences that produce similar alignment to our query sequence. The first entry was eventually my query sequence (because that's what I searched for!) Each entry consists of 8 elements: accession number, description, Max score, Total score, query coverage, E-value, and Max identity. For the sake of time, I will just briefly explained Max score, Total score, query coverage, and E-value. Based on the definitions given by the book,

  • Max score - the score for the highest scoring segments of the subject sequence
  • Total score - the sum of the scores for all the segments that aligned (including noncontiguous alignments)
  • Query coverage (%) - shows how much of the query sequence aligns with the subject sequence.
  • E-value - describes the number of hits with scores this high that one would "expect" to see by chance when searching a database of a particular size. The closer the E-value to 0 the better.

 The book asked us to include those subject sequences that have E-value <10^-3 and query coverage >60%. For that, I chose 9 sequences, and I clicked "Add to Alignment" for each (some sequenced require to be reversed before added to alignment).

Sequences of the nine samples
Then I clicked "Align DNA" and get:

This is the end portion of the sequences after being aligned
Finally, I construct a phylogenetic tree from my DNA alignment by choosing "Construct / Test Neighbor Joining Tree" and tada~
My first phylogenetic tree!
This Friday, I will discuss with Dr. Miller more about the assumptions of making phylogenetic tree and do more practices with MEGA 5. I can't wait!:)

Friday, September 20, 2013

Cladisticules

(September 20, 2013) I began my internship today!!! This year I am very lucky to be able to do intern on campus, which cut off all the travelling time:) Dr. Miller, who is my mentor, currently works in UCSD as a part of the CIPRES project (http://www.phylo.org/). He has been developing software and information technologies to assist biomedical research and now works with a supercomputer located in San Diego Supercomputer Center that maintains a database for phylogeneticic tree inference.

Last year, I wrote down for one of the reasons to apply for internship was to "widen my horizon," and here is the opportunity! All of my past experiences have more to do with molecular biology, such as growing cells and running PCRs in a lab. However, this year I get to learn biology in an evolutionary perspective. Dr. Miller raised a good point that studying phylogenetic trees is very critical for it allows us to take a step back and look at the organisms in the most fundamental view, to see a bigger picture of the biological world. I also learned many applications of phylogenetic trees, including predicting the properties of new pathogens, fighting against diseases that destroy food sources, and maintaining biodiversity. Nonetheless, I actually had had some experiences with some database such as BLAST and CLUSTAL before, which would help me through this course.
Different trees have different patterns, which I think are pretty impressive and very beautiful.
http://explorebio.wikispaces.com/The+Art+of+Phylogeny

After the introduction, we began a short exercise - to establish a simple cladogram.

  1. There are 8 different species. We described each using different characters. The variants within a character is called character states. For example, the color of the abdomen is a character where as white / black abdomen are the character states.
  2. Then we made a data matrix, with 0 meaning primary states and 1 being the developed states.
  3. Based on the matrix, we constructed a Venn diagram.
  4. Finally, we drew a cladogram based on the Venn diagram.
This exercise was actually really tricky because black abdomen rise up in two different places, which is called a homoplastic character. Although I had some basic background from AP Biology, this exercise was harder than expected. Luckily, we now have technologies that help us do this (which is what I will be learning!), and this was just to give me a taste of how long scientists used to spend on sorting out the evolutionary tree.

I definitely learned a lot, including many vocabs, during today's meeting. While Dr. Miller won't be here next week, he gave me a book Phylogenetic Trees Made Easy: A How-To Manual to read. My assignment is to learn to establish a phylogenetic trees on MEGA. Today was a great start and I am ready to explore the beauty of a phylogenetic tree!


Here is the link for the video Dr. Miller showed me during the introduction, and I thought it was pretty cool and well explained: http://archive.peabody.yale.edu/exhibits/treeoflife/film_discovering.html

Sunday, September 8, 2013

A Brand New Year - STEM Intern 2013-2014!

September 8th, 2013 Time went by so fast and it has already been a week since the school start! This year, I am really glad that I got the opportunity to participate in EWS STEM Internship program again. However, it feels different doing an internship this year, especially after the intern experiences from last year and from this past summer. Last year, I focused more on learning lab techniques and the principles under systematic and molecular biology, and this past summer I practiced a lot of these techniques such as PCR, digestion, and gel electrophoresis. Thus, this year I hope I can learn more advanced techniques such as western blot.  If possible, I wish to have my own project to work on, which would allow me to be familiar with the process - question, hypothesis, experiments, and results - that a lab project is involved in. Last but not least, I would like to explore my topic of interest (molecular / cellular biology, genetics, oncology) even more in depth. I am very exciting for the program to start, and I believe this year will be another wonderful year!

The image on the left shows a cell undergoing extocytosis, and the image on the right shows the vesicles on dock located in the synapse in the cerebellar cortex. My work for this past summer involved in a protein that participated in intracellular trafficking in yeast. The image is retrieved from Collins Lab at Cornell Univeristy. http://blogs.cornell.edu/collinslab/


Wednesday, May 1, 2013

End of the Year Poster Section

Today (May, 1st 2013) was my last day of internship of the year! Time passed by so fast and I couldn't believe I have come so far! I made aposter and we interns presented our research during lunch today in Kellas Commons. Many people came and we explained our researches to them. Not until I repeated myself over and over again did I realize how much I actually learned. What seemed so incomprehensible back in November now made perfect sense to me:)
A screenshot of my final presentation poster.
That afternoon I went to RPI for the one last time. Eun Ji gave me a general view of some cool stuff she was working on in other labs. Guess what are those?
Frozen bovine (cow) cornea!
On the project on malaria prevention in fetus during pregnancy, Eun Ji was trying to extract and modify a compound from bovine cornea that could bind to specific receptors on placenta to block parasite-encoded variant surface antigens to bind to those receptors and transmit malaria to fetus. Later, I also helped prepare some dye for silver staining that is used widely in protein detection in gel.

In the end, I said goodbye to both Eun Ji and Namita. I really had a wonderful year doing STEM internship at RPI! From the people I worked with, to lab environment, to the actual lab work, it gave me a taste of the life of a scientist. The uppermost thing I would take away from this experience in addition to a plethora of laboratory techniques is the “qualities of a scientist” – precise, robust, patient, inquisitive, inventive, and inspirational. Not everything will yield the result we want, and we just have to keep trying. This experience makes me more certain about my goals. In the future, I would like to participate in more research focusing on molecular biology to enrich my experience and widen my horizon! Last but not the least, I will be working in another lab at Cornell this summer on intracellular communication, which I am very excited about! I will keep posting interesting things and events, so stay tuned!:)

When I grow up...

Saturday, April 27, 2013

Happy National DNA Day!

This Thursday April 25, 2013 was the National DNA Day!!! This year particularly is the 60th anniversary of Watson & Cricks' discovery of DNA structure and the 10th anniversary of Human Genome Project (HGP)!
Retrieved from ASHG
I am especially interested in genetics because of its intricacy and roles in diseases. HGP provides scientists an unprecedented opportunity to better understand the role of genetics in human health and gradually reveal the mystery of genetic diseases. Today, genetics becomes increasingly important in diagnosis, drug development, and new treatments. Bellow is an excerpt of an essay I wrote on HGP:
HGP, though it does not boost the speed, increases the “success rate” of drug development. Identifying a specific mutation allows scientists to develop a targeted drug that directly tackles that mutation. Many studies on targeted drugs are done in cancers. Targeted cancer therapies are drugs or substances that inhibit the uncontrollable growth of cancer cells by blocking growth factors or inducing cell death (apoptosis). HGP facilitates the identification of these “targets”, which are usually defective genes that encode for proteins involved in cell signaling pathways. For instance, in chronic myeloid leukemia (CML), researchers had identified gene BCR-ABL – a result of translocation between chromosome 9 and 22. This gene produces a hyperactive protein that keeps Abl signaling pathway active and causes continuous proliferation of CML cells. Researchers can then develop a drug that represses this defective gene and treat the deadly disease (National Cancer Institute, 2011).
Here is the speech delivered by Francis Collins, the director of NIH, on National DNA day: http://directorsblog.nih.gov/dnas-double-anniversary/#more-1194 I really I could attend the annual ASHG meeting someday! Anyways, Happy National DNA Day!!

Because I had proctor training this Wednesday, I wasn't able to go to RPI. My internship is soon coming to an end, and I have working on my poster for my presentation:) I hope it'll all go well!

Sunday, April 21, 2013

High-Performance Liquid Chromatography (HPLC)

Last Wednesday (April 11th, 2013), Eun Ji showed me how a high-performance liquid chromatography (HPLC) work. A HPLC is a chromatographic technique used to separate a mixture of compounds in biochemistry or analytical chemistry to identify, quantify or purify the individual components of the mixture [1]

Simplified map of a HPLC
Retrieved from http://web.nmsu.edu/~kburke/Instrumentation/Waters_HPLC_MS_TitlePg.html
First, we connected the computer to the machine. We can control the flow rate by typing in the computer. Eun Ji had made two kinds of buffer (solvent) A and B for her proteins, each of which had a thin tube connected to the pump where the two buffers mix. We typed in 1.000 ml/min for the flow rate (*A+B = 1ml not 1ml for A and 1ml for B) Overtime, the concentration of A decreases while the concentration of B increased, but the flow rate remained the same. This means that [A] and [B] in the mix solution in constantly changing. Concentration of A and B manipulated the polarity in the column.

Next, Eun Ji injected her sample through injector. 

Together, the solution and sample traveled to the column. A column contained very small resins that formed a fine filtrate. The sample contained a mixture of proteins. However, according to the affinity of each protein, proteins gradually separated from each other as [A] and [B] changed and flew out the column at different times.
Black sample is separated into blue, red, and yellow (3 proteins)
Retrieved from http://www.waters.com
As the separated protein bands leave the column, they pass immediately into the detector. The computer then construct a graph that contained "peaks" in it. Each peak represented a protein, so by counting how many peaks were there, we were able to determine how many kinds of protein were present in the sample (*but cannot determine "what" proteins are they unless by using special HPLC or conducting further study).

Retrieved from http://www.waters.com
However, we there were something wrong with the machine when we ran our HPLC. The pressure of the tubes continued to rise (normal: A-60, B-100; ours: A-90+, B-140+) and that the pumps automatically stopped to prevent the tubes from bursting. We reset the machine again, but it did not help. In addition, pump B was making weird noise, so we stopped our experiment. Nevertheless, I thought HPLC is a really brilliant tool. Before I have read many papers containing HPLC in their methods, and I am very glad that I now know what it means!

Where does this fit on our map?

Thursday, April 18, 2013

Buffer for Protein Purification

This Wednesday (April 17, 2013) I prepared for some buffers for protein purification with Namita. We made four 200 ml buffers:
  1. Lysis buffer - used to lyse the bacteria in order to collect the proteins. The solution would contain all proteins the bacteria produce. 
  2. Wash buffer (20mM) - used to wash out some undesired proteins
  3. Wash buffer (150mM) - further wash out the undesired proteins by breaking bonds between undesired proteins and resins
  4. 300mM - 300mM imidazole solution can break the bonds between the targeted proteins and Ni. After washing away other proteins, the column by this time contains mainly the targeted protein bonded with nickel. 
Materials

  • H2O
  • NaH2PO4
  • NaCl
  • Imidazole 

The buffers only differ in their percentage of imidazole. Imidazole is used to separate bonds between nickel and proteins. The targeted protein is his-tagged, which have a high affinity to nickel when running through the column. However, some random proteins can also loosely bind to Ni too. The higher the concentration of imidazole, the stronger bond it can break. Here 300mM is the concentration to break the bond between our proteins and Ni.

To determine what concentration breaks the bond between targeted protein and Ni, one runs a gel to determine the size of the target protein and at what [mM] does the solution contains the most targeted proteins.
Example of a protein gel.
Retrieved from http://www.sciencedirect.com/science/article/pii/S0168165610001926
For example, as shown above, if lane 1 is 10mM and increases by 10mM each lane. At 80mM (lane 8), a clear dark band is shown, meaning that 80mM of imidazole breaks the bond between OmpA70 and Ni the best. Othe lower concentrations are used to washed off some contamination shown by the blurring bands in lane 1-7.

After we added the appropriate amount of substances into four flasks according to the calculation, we need to make the pH into 8. We do this by adding NaOH to the solution and using a pH meter to measure the pH.
pH meter (right) and magnetic stirrer (left)

It took us quite a long time though, especially the two with higher concentration of imidazole bacause imidazole is slightly acidic. Luckily, we have the magnetic stirrer, which i thought was a very brillant invention, to speed up the mixing rate.
Magnetic stirrer
When all the buffers finally reach pH8, we decided to call that for a day. Making buffers, as Namita admitted, can be very boring, yet it is very demanding because everything should be very concise. Later, Namita would use those buffer to establish a  imidazole gradient to collect and purify the proteins from E.coli. For the details please look here.

Where does this fit on our map?

P.S. I will post the blog from last week asap!

Monday, April 8, 2013

Mutangenesis (Continued)

On Wednsday (April 3rd, 2013), I continued working on mutangenesis in VvSTS enzyme with Namita. After we created several mutant plasmids by PCR last time, she added Dpn I to the solution to digest the original non-mutant DNA. Then, she did a biotransformation with a strain of bacteria (BW27784) that has the ability to ligate the new plasmids (because the polymerases actually form open-ended plasmids in PCR). Later, she sequenced a few transformed colonies in order to make sure that she had the right mutant before she did another transformation with another strain of bacteria for expression. In other words, the first transformation was to complete the circular plasmids and to check if the mutant plasmids have the right sequences. The second transformation was actually for cell expression.

Now came my task of the day: to create stock of mutant E.coli. There are 6 samples in total: control, T197I, T197A, T197M, et. all.

Namita provided six 15ml tubes. Since aeration is important in cell growth, a container can usually only holds 1/5 of its max. volume of solution. I put 3ml of cell media (3/15 = 1/5) and 3μl of antibiotics (1μg / ml) to each tube. Adding antibiotics is essential in making sure that the cells keep the mutant plasmids, which contain antibiotic resistant gene. Then, I pick one colony from each plate and add it to the media. Lastly, we put the tubes in incubator and let the cells grow a day, and then store them in -80C.

My job today was short and simple. Yet, precision was very important so I had to be careful in every step. I am looking forward to further discuss with Namita about the process:)

Where does this fit on our map?

Thursday, March 14, 2013

Site-Directed Mutangenisis

Yesterday (March 13, 2013), I worked with Namita on mutangenisis - a process by which the genetic information of an organism is changed in a stable manner, resulting in a mutation. She was trying to create mutants for VvSTS enzyme - an enzyme found in grapes family that helps produce resveratrol, an interesting compound I have written in my previous post (click here). She wanted to create point mutation, which only changes one amino acid sequence in a protein. However, a single change in amino acid sequence can change the shape of the entire enzyme and may result to increase enzyme activity / efficiency or the ability to uptake other molecules.

  • Note: T197A = we want to change the threonine (T) at 197 site to alanine (A)
Procedure:

1)  Mutant strand Synthesis (by PCR) 
Attach a mutangenic primer to original DNA template. Use PfuUltra DNA ploymerase (high-fidelity and mutational tolerant) to extend primers, so the new synthesized plasmids are mutant.

2) Dpn I digestion 
Because DNA produced in organisms are usually methylated, Dpn I recognizes those original DNA (non-mutant) from E.coli. and digests them. The resulting mix only contains mutangenic plasmids.

3) Transformation 
Tranformed mutated plasmids in cells for them to produce mutant enzymes.
Retrieved from QuikChange II XL Site-Directed Mutagenesis Kit Instruction  Manual

We did PCR with T197I, T197A, and T197M and prepare a 30μl mix for each. While reviewing what I had to add, this time I focused more on techniques such as always check if the volume of solution in pipette looks correct, and vortex the materials to make sure the concentration is consistent before adding them to reaction mix.

While we were waiting for the PCR, Namita continued her protein extraction from a marine bacteria that she was working on with another student. The bacteria appeared purple because of a compound they were interested in (so pretty!). They repeated adding solvent and centrifuging the solution over and over because the compound was so hard to dissolve. Yet, eventually, by adding a lot more solvent, they successfully dissolve most of the compound:)

Where does this fit in the map?

I found mutangenesis super interesting, and I hope I can learn more about it in the future. I won't be going to my internship for the next two weeks due to the spring break. However, I am looking forward to what am I working on next!:)

Sunday, March 10, 2013

Enzyme Activity Assay (Attempt)

On Wednesday (March 6th, 2013), Eun Ji and I caught up where we left last week and worked on enzyme activity assay.  Enzyme activity measures how much enzymes is present in a reaction and how active an enzyme is under certain conditions. Eun Ji showed me her initial result of the assay. She expected that the graph should look like that of the red line, with the reaction rate eventually level off when all the substrates are converted to products. Yet, instead, hers look like the blue line. 

Why did this happen? We still don't know. Yet, we have come up with some hypothesis:

  • 3GT binds with the substrate instead of cyanidin Cl --> can't react
  • either enzyme or the intermediate molecules denatured under the condition
  • reverse reaction (The model below illustrates the enzyme action. E=enzyme, S = substrate, P=products. The main idea is that intermediate molecules can react reversely back to substrate while a small portion goes on to the second reaction to produce the final product.)

Enzyme assay examines the following control factors:
  • salt concentration
  • enzyme-substrate ratio
  • pH
  • inhibition (inhibitors decrease enzyme activity)
  • activators (increase enzyme activity)
  • temperature (most denatured in high temp.)
The two enzymes we examined are 3GT and ANS, and this time we examined the effect on enzyme activity under pH 6 and 7. We loaded our samples and some supplements in a 96-well plate. We spent quite a while recalculating the concentration because we had previously messed our calculation for the concentration of the standard. Also, the standard solution for some reason wouldn't dissolve until we diluted it to 10mM. Luckily, we recognized something wrong before we add everything and were able to solve them:) 
Division of the plate

Once all the substances were added, Eun Ji added HCl to the 0 min to stop the reaction (control), and removed them to 4 microtubes. The rest samples were put in the incubators. When it reached 15 min, Eun Ji would repeat the same thing she did with 0 min, and so on. Because we were short of time I wasn't able to see the complete process. Later Eun Ji would put this test tubes in a spectrophotometer to analyze the enzyme activity and construct a graph.

Since I didn't quite understand the whole process, the information above included some outside research. Nevertheless, I found some interesting facts about kuromanin Cl (the product). Kuromanin Cl belongs to anthocyans family. In a paper the Koffas group previously published, Anthocyanins are "red, purple, or blue plant pigments that belong to the family of polyphenolic compounds collectively called flavonoids". Their antioxidant properties give them the economic value in food dye.

Where does this fit one our map?

I will keep doing some research about the process, and I hope I can discuss this with my mentor next time!

Sunday, March 3, 2013

Protein Purification

Last Wednesday (February 27, 2013), Eun Ji taught me the basic steps for protein purification. Protein purification is a series of processes intended to isolate a single type of protein from a complex mixture. Eun Ji had already extracted the protein solution by breaking down E.coli. The whole process was done in a 4C room in order to prevent the proteins from degrading (so cold!!><) We went through the following steps to purify the protein:

1) Column Washing

We first set up two columns and pour some resin in them. Resin (ampiphiles) is a chemical used to attracted proteins to its hydrophobic region. The original resin contains buffer that may destroy the proteins, so we need to wash the resin by replacing the original buffer with our own buffer. 

Resin precipitated at the bottom because the particles are relatively big and close-knit. Resin lose its function once it is dry, so I help Eun Ji check them and add  more buffer every 20 min.

2) Loading Samples
Once the resin was fully washed, we load our protein mixture in the column. During the process, resin binds with the targeted proteins with its hydrophobic region while other proteins leaked out. One can repeat this step several times to make sure all the targeted proteins are bind to resin.
Resin can recognize the tag (red circles) we added to the targeted protein prior to this.
3) Washing
Keep adding buffer to the solution (at least 5x the volume of resin). This step can further purify the resin-protein mixture.

4) Elution
This step is very important. In order to separate the targeted protein from resin, we would add molecules (in this case maltose) that have stronger affinity that could bind with resin. This step would increase the volume of the solution.

5) Concentration + Column Regeneration
Last, we need to condense the solution by taking out the buffer. The resin can be re-used for several times.

Another important thing is that when we extract the proteins from E. coli., the mixture also contains proteinase that would degrade out protein. Thus, we need to add proteinase inhibitor to our buffer in order to prevent this.

Where does this fit in the big map?
Later Eun Ji would test the enzyme activity and determine what condition would maximize the production. She told me that the product she would get from her proteins (ANS and 3GT) reaction is currently used in food coloring (red) and food preservation. I thought I was interesting  that scientists suspected the molecule for being anti-cancer and anti-aging because of its anti-oxidant property!

Even though the process was quite long, and it was pretty cold in the room, I thought it was quite an experience!