Lab 3: DNA - Atoms, Molecules, DNA, And More


Primaeval Life, Earth Version


From the Big Bang itself came a few of the smallest elements, especially hydrogen and helium. From the tremendous forces inside giant stars came heavier elements, spread by the occasional supernova explosion, one in our neighborhood for the elemental matter of our solar system. Our planet solidified from the gaseous masses spinning into and around our star, occupying the third orbital position. The third position is a good one for easy origin of Life, with water existing as water, water vapor, and ice, with the temperatures in a nice range for chemical elements to clump into molecules forming the basic stuff of Life. The structure of molecules in Life on Earth is keyed on the letters in genes and on the amino acid molecules they represent. The letters of the code are strung out along DNA, long molecules that act like linear blueprints, controlling the construction of the molecules needed for building and operating organisms.


You'll recall from coverage of basic chemistry in Physical Geology, that atoms of different elements form bonds with one another on the basis of their electron configurations, especially the electrons in the outermost "shell" of the atom. The periodic table below shows hydrogen, carbon, nitrogen, and oxygen, the elements involved in the most basic molecules of Life. These atoms form bonds with one another, because together they can satisfy the octet rule in various combinations. Hydrogen has one valence electron, carbon has four, nitrogen three, and oxygen two. Carbon is such a common element in Life because it can form covalent (sharing electrons) bonds with the others in easy combinations that satisfy the octet rule.


Periodic Table, with H, C, N, and O


Molecules are formed by bonds between two or more atoms. For example, in salt, there are just two atoms, Na and Cl. Same for chlorine gas, two atoms of chlorine. But for larger molecules, of course, there can be three or four or several different elements involved. For the basic molecules of Life it is just these four: hydrogen, carbon, nitrogen, and oxygen. The "letters" of the genetic code, the so-called base pairs spelling out genes along DNA molecules, are molecules made by these elements. The "letters" are adenine, cytosine, guanine, and thymine, names given to the following simple molecules (carbon atoms lie at the unlabeled positions):










From Simple Molecules to Chemical Machinery of DNA, Genes, and Evolution


These basic molecules, abbreviated A, C, G, and T, are strung together along the "backbone" of the long DNA molecule. This "backbone" is made by simple sugar molecules bonded with phosphate atoms. Each basic "unit" of DNA, thus, is made by a phosphate, a sugar, and one of the A, C, G, or T molecular "letters" of the genetic code. The "units" are called nucleotides (see the rectangle surrounding a single nucleotide at the bottom of the following illustration):



Nucleotides are strung along DNA molecules to spell out the genetic code. There are two strands of the "backbone" of DNA -- DNA is double-sided -- with each side joined along the A, C, G, T molecules in what are called base pairs. A pairs with T and C pairs with G. The pairing is made by hydrogen atoms at the junction. When this double-sided, paired structure of DNA was first seen back in the early 1950's, it was correctly seen as allowing a neat way for the spelling of the code to be copied by "unzipping" and "rezipping." This is exactly what happens in the everyday life of organisms:



DNA replication (copying) is a routine process that happens every moment in your body, as DNA in your cells orchestrates the activity for processing food, rebuilding cells, and any one of a myriad of things that go on in an organism daily. This illustration shows that DNA is not only double-sided, but that the sides are twisted into a double-helix. This structure has been likened to a ladder, where the sides of the ladder are formed by the sugar-phosphate backbone, and the steps are made by the connecting A, C, G, T molecules in base pairs -- of course, with the additional condition of the twisting of the ladder into a spiral, double-helix shape.


The unzipping you see above lets the messages along DNA be spelled out, from letters into words. Let's take a look at how the letters of the code spell out words. Several words will constitute genes . The words are three letters long, and are called codons , which code for amino acids , the basic building blocks of proteins . Got it? Well, if you do, that's pretty much "it," because proteins of one sort or another either do the constructing or the operating of an organism. A DNA-like molecule called RNA carries the messages spelled out along DNA to the machinery that does the actual work in a cell, with the messages being the three-letter codons :



Codons are three-letter sequences that code for amino acid components of proteins . In RNA, shown here, a T-substitute called Uracil occurs, the U letter shown along with the A, C, and G letters. Uracil is equavalent to thymine, so it ends up having the same meaning as the spelling given by A, C, G, and T in DNA. So, in codon 1 in the illustration, the three-letter sequence GCU is the same as GCT, and for codon 4, GUU is the same as GTT in DNA. DNA is like the musical conductor of the orchestra. RNA messenger copies are like the instructions from the conductor to the musicians. Amino acids are like musical notes. Proteins are the melodies that result. The "music" of an organism (the constructing or operating) happens via proteins for doing this or that (anything and everything).


As you go along in this story, parts of the process don't seem all that complicated, but they accumulate into a certain complexity that can seem overwhelming. These illustrations are as good as they come, so hopefully they help your understanding. The illustrations are from the National Institutes of Health (NIH) Talking Glossary , which is a great resource that includes audio clips of people explaining things. Try it out!


The next step in the building complexity are the amino acids coded for by the three-letter codons. Amino acids are basic building blocks of proteins . Amino acids are strung out along long molecules called proteins (few of these molecules are "short"):



Amino Acids in a protein are coded for by three-letter codons. Pause a second to visualize that. Each of the circles is an amino acid molecule that is coded for by three-letter codons in the DNA, like GCT and GTT. Amino acids have names like phenylalonine, leucosine, serine, and cysteine, as shown at the end in the illustration. There are twenty amino acids in Life, but there are many others in the inorganic world. Here is an important point for understanding mutations and how evolution works: The three-letter codons redundantly code for the twenty amino acids. It works like this:


Given three letter words, with four letters in the alphabet, there are 64 possible words from all combinations of the letters, as follows:


   second letter:     A      C      G      T

                      A  AAT    AAC    AAA    AAG

                         ACT    ACC    ACA    ACG

                   F     AGT    AGC    AGA    AGG

                   i     ATT    ATC    ATA    ATG

                   r  C  CAT    CAC    CAA    CAG

                   s     CCT    CCC    CCA    CCG

                   t     CGT    CGC    CGA    CGG

                         CTT    CTC    CTA    CTG

                   L  G  GAT    GAC    GAA    GAG

                   e     GCT    GCC    GCA    GCG

                   t     GGT    GGC    GGA    GGG

                   t     GTT    GTC    GTA    GTG

                   e  T  TAT    TAC    TAA    TAG

                   r     TCT    TCC    TCA    TCG

                         TGT    TGC    TGA    TGG

                         TTT    TTC    TTA    TTG


But there aren't 64 amino acids in Life, only 20, so some of those three-letter codons code for the same amino acid, as follows:


   Isoleucine        ATT, ATC, ATA

   Leucine           CTT, CTC, CTA, CTG, TTA, TTG

   Valine            GTT, GTC, GTA, GTG

   Phenylalanine     TTT, TTC

   Methionine        ATG

   Cysteine          TGT, TGC

   Alanine           GCT, GCC, GCA, GCG

   Glycine           GGT, GGC, GGA, GGG

   Proline           CCT, CCC, CCA, CCG

   Threonine         ACT, ACC, ACA, ACG

   Serine            TCT, TCC, TCA, TCG, AGT, AGC

   Tyrosine          TAT, TAC

   Tryptophan        TGG

   Glutamine         CAA, CAG

   Asparagine        AAT, AAC

   Histidine         CAT, CAC

   Glutamic acid     GAA, GAG

   Aspartic acid     GAT, GAC

   Lysine            AAA, AAG

   Arginine          CGT, CGC, CGA, CGG, AGA, AGG

   Stop codons       TAA, TAG, TGA


So, in the list above, the most redundancy is for leucine, serine, and arginine, each coded for by five separate three-letter codons. And, at the other extreme, methionine and tryptophan are coded for by only one codon. And, last but not least, you'll note that TAA, TAG, and TGA at the bottom are called stop codons . Stop codons mark the ends of genes, or component parts of genes, called sequences. Ah ha! Now we are getting somewhere! And you may ask, if there are stops, there must be starts. There is only one start, the ATG (or AUG, in messenger RNA) that codes for methionine acts as a start codon: proteins, thus, start with methionine. Got it? Good.


The genetic code spelled out along a DNA molecule looks like a sea of letters. The following genetic sequence of 100 three-letter codons, for a total of 300 letters, was generated randomly by a computer program:





















Somewhere within that sea of letters are start codons (ATG) and stop codons (TAA, TAG, or TGA). For this example sequence, here they are:





















The sections between a start and one or more stops is what makes up genes. Some genes are in one continuous section between a start and a stop, but many are in different pieces that are "seamed" together by the unzipping machinery before they are copied into the messages that build amino acids into proteins. There can also be "extra" sections of letters between starts and stops that are ignored (you'll see this important phenomenon below). There are various ways that mutations can happen to change the spelling of the genetic code - and this area is also important for the study of evolution:


Mutations and Evolution


Mutations (misspellings from the original spelling) happen in several ways:


Point Mutations happen as:


Silent: Consider the TGT and TGC codons. They both code for cysteine. If a point mutation (one letter) happens in the third position of TGT, from T to C, TGC results. But that doesn't change anything really, because TGC also codes for cysteine -- the organism is unaffected, so it is "silent."


Missense: A point mutation can actually change the amino acid specified by a codon. For example, if the third position in TGT changes from T to G, the result is TGG. And, TGG, instead of coding for cysteine, codes for tryptophan, which is the wrong amino acid. This might change the chemistry of the protein significantly, so that the process or structure involved is affected in a way that is harmful to the organism. Or, it could be helpful. Mutations, even those that do cause a change, are not all harmful (otherwise, evolution wouldn't work too well).


Nonsense: A point mutation that changes a normal amino acid-coding codon to a stop codon, for example, can "chop" a gene prematurely. For example, if the first position in AAA, coding for the amino acid lysine, changes from A to T, the codon becomes TAA, a stop codon. The unzipping machinery would go merrily along, sending out instructions to build a string of amino acids into a protein, but it would hit this new stop, and who knows what would happen. There could be a totally different protein produced, and as usual, it may or may not be a functional protein.


Insertion mutations happen when:


a section of letters is inserted into the DNA sequence, perhaps causing the unzipping (reading frame, technically) to go haywire, at least from the original meaning. Sometimes inserted sequences are simply ignored, or are chopped out by error-correcting parts of the machinery (which are amazing in their own right).


Deletion mutations happen when:


a section of letters is omitted from the DNA sequence during the "rezipping" step, or in other ways, resulting in the same sort of perhaps fundamental "frameshift" change that can be significant.


Large-scale mutations happen when:


whole sections of DNA are duplicated, deleted, moved, flipped, scrambled, etc. At an even higher level, chromosomes, which hold the DNA in the cell, may become messed up. Sometimes things just go seriously wrong, like when you were combing your hair this morning.


So, mutations can be tiny or minor, causing just a little bit of change:


For example, maybe a particular hormone is only 85% as concentrated, as compared to the 75% it was supposed to be.


Or, maybe a little knob on a bone is just a little bit bigger.


Or, mutations can be major:


resulting in total absence of a hormone, for example,


or resulting in total absence of a structure, or duplication of a structure, etc.


Evolution happens when individual organisms with more favorable traits survive better than other individuals, and their genes are carried own on a statistically more favored basis. This is called natural selection, which was first described with many examples by Charles Darwin. Mutations that happen now and again, sometimes harmful, sometimes helpful, generate change in the characteristics of individuals, sometimes gradually, sometimes suddenly. And evolution, which is the word for this change over time, is inevitable, sped up by differential survival of the favored individuals. This was given the catch-phrase, "survival of the fittest."


The study of evolutionary processes -- how the mechanics work, or what is the range of occurrence of different types of genetic change -- is amazing, but we want to look at the area of science dealing with reconstructing evolutionary history. We want to learn about why clams are next to snails on the family tree of invertebrate animals. Or why birds are considered to be surviving dinosaurs -- they fit within Dinosauria on the family tree of Life. Or why whales are put next to hippopotamuses on the family tree of mammals. Or, closer to home, why humans are put next to chimpanzees and bonobos on the primate family tree.


To do this, we could look at the various ways those types of mutations happen, and the degree to which organisms share overall similarity in their DNA, but there is something very special that has been discovered lately: SINES and LINES.




Special things can happen during evolution to the genetic code between those starts and stops: SINES and LINES.


SINES = short interspersed nuclear elements; most are about 300 base pairs ("letters") long.


LINES = long interspersed nuclear elements


The key word there is interspersed . The "unzipping" and "rezipping" mechanics of DNA mentioned above are involved in everyday operation of an organism, and especially in the mixing and matching of the genetic code that goes on during reproduction to form offspring. A sequence of genetic codes (e.g., CTTAAGTTTAGTTTAACCCGCGCGCTTGC....) can be "copied into" DNA without disrupting the functionality of genes, even though it may be completely "interspersed" within the spelling of genes. These interspersed sections are part of what had been called "junk DNA," to signify that it doesn't do anything, but it is now starting to look like it has importance in several functions, and should not be thought of as total "junk." Notably, for this lab, SINES and LINES within long, let's call it mostly "unused," sections of DNA are like heredity markers , because when a new section of genetic code gets interspersed within DNA in an individual, and that spreads through the population from parent to offspring over many generations, and then, perhaps many thousands or even millions of years later, a descendant population in the lineage splits in two ("speciation," as when a mountain range or various other possible events splits a population), the SINE or LINE will be carried along in both lineages. And, later, when another SINE or LINE gets stuck into DNA in one of the descendant lineages independently, the newly added SINE or LINE will be carried forward also... Well, you hopefully get the idea: SINES and LINES are heredity markers, accumulated during the history of evolution of species lineages. Why are SINES and LINES special -- more special than other sections of DNA that actually are involved in coding for real things in the organism? Well, because they are stuck within noncoding sections, natural selection won't be acting to change it. There will be the occasional point mutation here and there, but for the most part the SINE will go along for the ride without much change.




With that background, we are now able to look at the description of the primate family tree, reconstructed by way of comparing SINES and LINES in various primates by Abdel-Halim Salem, et al., the primary scientific source for Carroll's example. Below is the family tree that results from this study:



A list of the representative primates in this example:


Owl Monkey - Aotus trivigatus

Green Monkey - Chlorocebus aethiops sabaeus

Orangutan - Pongo pygmaeus

Lowland Gorilla - Gorilla gorilla

Bonobo - Pan paniscus

Common Chimpanzee - Pan troglodytes

Human - Homo sapiens


The family tree is reconstructed on the basis of which primates share the most SINES -- the SINES indicate the order of evolutionary splits that happened during the evolution of this group, as a whole. How do you tell which primate has which SINES? Well, first you need DNA samples of each one, usually from a blood sample. Then you find specific spellings of the genetic code to find your way along the long DNA molecule, and get to the part where a SINE would be located, if it is there. This work is done by fancy footwork in the lab, to chop up DNA into pieces and then put the pieces in a gel (it really is a gel, about the consistency of firm jello). The gel is electrified and the pieces of DNA will move according to their size and charge and genetic spelling, and you can take a photograph of the gel after this is done to see which species have which SINES, like this:


In this photograph of a DNA fragment gel, the bright lines indicate the presence of a SINE in a species. This really is a simple process, overall, because you just need to find out "who's got what." As labeled, all of the apes have this particular SINE, but not the monkeys (the two right-most species are monkeys). This illustration, and those below, is from the original Salem, et al. article, and was modified in the way presented in the book by Sean Carroll, Making of the Fittest: DNA and the Ultimate Forensic Record of Evolution .


In this photograph of a DNA fragment gel, slightly more specific relationship is indicated -- the orangutan fits with (shares more recent common ancestry with) the human-bonobo-chimp-gorilla group, but not the siamang, who doesn't have this SINE (nor do the monkeys). Modified from Salem, et al. (See link at end of this page).


Getting more specific now, the gorilla is more closely related to the human-bonobo-chimp group, than the orangutan, siamang, or the monkeys, who all lack this SINE. Modified from Salem, et al. (See link at end of this page).


In this photograph of a DNA fragment gel, only the human, bonobo, and chimp species have this SINE, indicating that these three share a more recent common ancestor with each other, than with the gorilla, orangutan, siamang, and the monkeys. Modified from Salem, et al. (See link at end of this page).


This one is thrown in for completeness. The human group is the only one to possess this particular SINE. Modified from Salem, et al. (See link at end of this page).




There are several ways to reconstruct evolutionary relationships. Traditionally, morphology of bones and muscles and other observable hard parts and tissues was used as the data for comparison of shared characteristics. This method has generated many highly supported "family trees" (cladograms) for various groups of organisms. There is sometimes confusion though, when some groups are poorly known by fossils, or when the pattern of evolution is inherently confusing (as with some cases of convergent evolution). With the advent of genetics and genetic sequence identification, very recently accelerated by computers and lab technology, we can now look directly at the genes that code for those morphologic features. Use of DNA sequence differences and mutation data for reconstructing phylogeny has been successful too, but is sometimes confused by similar challenges to traditional morphology-based approaches, and to the complexity of possible mutations that can happen to the genetic spelling of genes.

The new fields of study described by Salem, et al. and Carroll take advantage of the DNA heredity markers described above. SINES offer a definitive guide to the branching pattern of evolution, such that we can have greater confidence in phylogenetic reconstructions, at least for groups that have still living representatives (can't do this for most dinosaurs, for example). It offers direct discovery of the relationships of birds and mammals and other major groups dominant today, and it offers ways to learn about details of evolutionary change that will better constrain our attempts to work with ancient groups represented by fossils only.


As Sean Carroll describes in his nice book, the forensic approach has come to the study of evolution.


Let's finish with a look at the timing of evolutionary splits for primates, as indicated by studies of DNA that reveal timing and by the fossil record:



The cladogram above shows the relationship of primates indicated by SINES, along with the timing of evolutionary splits derived by several methods. It could be drawn right-to-left, as the gel photographs above are ordered, but the left-to-right presentation of the cladogram is the same (You can draw branching patterns -- trees -- in all sorts of ways, but the meaning is the same, if the pattern of branching and grouping is correctly drawn). For timing of splits, see discussion in Sean Carroll's book, in the Salem, et al., paper, and in a paper by Fortna, et al., linked below. The lineage involving modern humans ("hominids") split off from the common ancestor with the chimpanzee-bonobo clade around 5 million years ago, and bonobos and chimpanzees have only recently split around 2 million years ago. The apes in general go back 20 to 30 million years for their start, and have split into the orangutan and gorilla and other lineages during the last 20 million years. So, primates are a Cenozoic group, as are most other mammal groups. We'll return to this phylogeny later, when we look at the Late Cenozoic evolution of the human lineage.


Resources, and Recommended Reading


Abdel-Halim Salem, David A. Ray, Jinchuan Xing, Pauline A. Callinan, Jeremy S. Myers, Dale J. Hedges, Randall K. Garber, David J. Witherspoon, Lynn B. Jorde, and Mark A. Batzer, Alu elements and hominid phylogenetics, PNAS, Oct 2003; 100: 12787 - 12791.


Sean B. Carroll, *Making of the Fittest: DNA and the Ultimate Forensic Record of Evolution*, Norton, 301 p.


Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, et al. (2004) Lineage-Specific Gene Duplication and Loss in Human and Great Ape Evolution. PLoS Biol 2(7)


National Institutes of Health (NIH) Talking Glossary


Example audio link at the NIH glossary website


Wikipedia entry on mutation


Early Primates, Part One, at


Early Primates, Part Two, at


Nice PBS Video Clip about primate evolution