Universal genetic code
From Wikipedia, the free encyclopedia
The term Universal Genetic Code, or Universal Code, is an old-fashioned name for the standard genetic code. In all known living creatures, instructions for making proteins are encoded in DNA. Three bases of DNA (the "codon") select an amino acid. In the 1960s, biologists and chemists worked out which particular amino acid is selected by each codon -- the genetic code. They were surprised to discover that this code was the same for every living creature they investigated -- plants, animals, bacteria, etc. The similarity most all organisms on Earth have is that they have a nearly identical genetic code.
Synthetase enzymes in most organisms have a similar function to the genetic code in that they pair the same amino acid to the transfer RNA (tRNA) with the same anticodon . The final product or function is to produce proteins. The similarity, universal genetic code, is not necessary for this function, because any amino acid can be bonded with any tRNA through a condensation reaction between the OH in the acid group of the amino acid and the OH at the 3' end of the tRNA[1]. Although significantly better codes than the standard genetic code do exist, they are extremely rare and thus an adaptive: all extant, naturally occurring, secondarily derived, nonstandard genetic codes are less adaptive. The arrangement of amino acid assignments to the codons of the standard genetic code are a direct product of natural selection for a system that minimizes phenotypic genetic error. The “adaptive genetic code” hypothesis (Freeland, 2002) states that the code has evolved to minimize the deleterious effects of mutation and translation error (Freeland, 1998).
In the 1980s, it was discovered that the genetic code was not universal after all. Quite a few organisms use genetic codes that have one or two differences from what was once thought to be the universal genetic code, now called the "canonical code with the standard list of 20 amino acids,[1][2][3]. Many modified genetic codes are found in genomes in which one or more codons have been reassigned to a different amino acid from that in the canonical code. If the translation apparatus changes so that a codon is reassigned to a new amino acid, this will introduce an amino acid change in every protein where this codon occurs. Thus, a distinction can and should be made between the "universal genetic code" and the more widely accepted, and the one most widely used in practice, the "standard genetic code."
The genetic code is no longer universal, even in non-mitochondrial genomes. The specific relationships between amino acids and codons are now proving to be variable in many taxa and a pattern is emerging that many organisms do not use the standard genetic code (Lehman, 2001). It was once thought that any changes in the genetic code would be lethal to the organism, because if one codon changed, then all similar codons in the entire organism's genome would have to change simultaneously. Therefore, these exceptions to the universal code were quite surprising. Variants of the code have been described in a wide range of nuclear and organellar systems, especially in metazoan mitochondria. The variants are found in positions that consistently code for a specific alternative amino acid in a new species and are generally found in the mitochondria where most of the genes originated from an endosymbiont (Abascal et al., 2006).
The standard genetic code is a set of mathematical rules that relates the 20 canonical amino acids in proteins to groups of three bases in the mRNA. The mathematical contrivances for getting the 20 amino acids out of 64 codons are reflections of the human urge to find patterns in Nature, not reflections of the natural order itself. Extra codons are redundant: some amino acids have one, two, four, and six codons, and three codons serve as stop signs. Code redundancies confer tolerance to error, so that when a mutation alters an amino acid, the substitute is likely to have properties similar to those of the original. Computer simulations show that the present code is nearly optimal in this respect.
Contents |
[edit] Historical Overview
In the early 1970s, evolutionary biologists assumed that a given piece of DNA specified the same protein subunit in every living thing, and that the genetic code was thus universal, because this was unlikely to have happened by chance. This was also interpreted as evidence that every organism had inherited its genetic code from a single common ancestor. In 1979, however, exceptions to the code were found in mitochondria. Other exceptions were subsequently found in bacteria and in the nuclei of algae and single-celled animals. It is now clear that the genetic code is not the same in all living things, and this may be indicative that all living things did not evolve from a single common origin, or single Last Universal Common Ancestor.
Beginning in 1979, numerous non standard genetic codes had been discovered. In some organisms, a few of the 64 possible "words" of the genetic code are different and have different meanings, although most biologists view each nonstandard code as a subtle derivative of the standard genetic code: 48 of the 64 words are identical in all living organisms, and 16 are known to vary across the enormous diversity of living things. In our own mitochondria, for instance, 4 of the 64 words have different meanings from the "standard" code. In most organisms, these differences are so slight as to be trivial. In common molds, for example, the DNA sequence "UGA" is translated into the amino acid tryptophan. In the standard code, it's a "stop" signal. The other 63 words, however, are identical between humans, elephants, daisies, and molds. The genetic code is in many respects equivalent to the language of DNA, and the differences in a few codons are very small differences. As far as we know today, the genetic coding mechanism is the same for all organisms, i.e., 3-base codons, tRNA, ribosomes, and they all read the code it in the same direction, translating the code 3 letters at a time into sequences of amino acids. In this respect, the genetic code is universal in nature, as is most evident by its use biotechnology.
[edit] Variations to the Universal Genetic Code
Since 1985, coding changes have been found in the nuclear systems of dozens of organisms, many of which are ciliated protozoans, but also Mycoplasma and other Firmicutes, some Diplomonads other than Giardia, the yeast Candida and some other fungi, and Acetabularia and some other green algae. The exceptions to the universal genetic code are mostly the use of a standard "stop" codon to encode an amino acid. For example, UGA normally is a stop codon, but in the mitochondria of the fruit fly Drosophila melanogaster, it encodes the amino acid tryptophan. Ciliated protozoans as a group demonstrate at least four different genetic codes; in addition to some taxa using the canonical code, the UAR codons are used for glutamine in several taxa, and UGA is used for cysteine in some taxa and for tryptophan in others (Lehman, 2001).
The striking feature of a great preponderance of these coding changes is that they involve codons that are either untranslated or cause chain termination: most mitochondrial coding changes involve untranslated or stop codons. On top of all this, the “twenty-first amino acid”, selenocysteine is incorporated into some proteins in organisms as diverse as Escherichia coli and humans via the ‘umber’ codon UGA. Other deviations include: the reassignment of stop codons or loss of codons into an unspecified (Usp) assignment; the reassignment of a codon from one amino acid to another, and the use of a codon as a ‘resume’ codon in ssrA RNA. These exceptions suggest the possibility of multiple evolutionary origins of life.
Deviations to the universal genetic code have been found in the nucleus of a wide variety of taxa and include: the use of CUG for serine instead of leucine in Candida; the loss of the AUA isoleucine codon from Micrococcus; the use of UAA and UAG for glutamine in ciliated protozoans and green algae; the use of UGA for tryptophan in Mycoplasma; the use of UGA as a supressor codon specifying tryptophan in bacteria; the use of UGA for cysteine in the ciliate Euplotes; the use of UGA to encode selenocysteine (SeC); the loss of the CGG arginine codon in Spiroplasma; the loss of the AGA arginine codon in Micrococcus; and the use of GCA as a resume codon in ssrA RNA (Lehman, 2001).
One study found that several arthropods have a new genetic code that translates the codon AGG as lysine, instead of serine (as in the invertebrate mitochondrial genetic code) or arginine (as in the standard genetic code). Several events of parallel evolution of the genetic code have been discovered in the arthropods in which the AGG codon was reassigned between serine and lysine (Abascal et al., 2006a).
Other efforts focus on expanding the genetic code by inserting an artificial fifth base. Romesberg and colleagues at the Scripps Research Institute in La Jolla, Calif., designed a fifth base, called 3-fluorobenzene (3FB), that pairs with itself through a new polymerase enzyme that recognizes the 3FB, latches on to it, and incorporates it appropriately into a replicating strand of DNA. Instead of just the canonical base pairs "G-C" or guanine–cytosine, and "A-T" or adenine–thymine, a third pairing occurs: "3FB-3FB" between two unnatural bases called 3-fluorobenzene (or 3FB). To improve replication, they create their own polymerase able to replicate the unnatural DNA. They use a process called amber suppression whereby a stop codon's function is changed so that it now codes for the unnatural amino acid using an entirely new pathway where the unnatural amino acid gets specifically placed onto a t-RNA. It makes one mistake for every 1,000 base pairs, compared to natural polymerases, that make one mistake every 10 million bases (Scripps, 2004 & 2005). Similarly, Ryan Mehl, added a pathway to an E. coli bacterium that allows it to synthesize a new amino acid called p-aminophenylalanine (pAF) from simple carbon sources. The pAF is incorporated into proteins alongside its existing 20 amino acids. The purpose is to expose the organism to selective pressures and watch the development to see if the organism with the expanded genetic code has an evolutionary advantage over natural organisms (ACS, 2003). This approach serves as a method for increasing the genetic repertoire of living cells to include a wide variety of amino acids with novel structural, chemical, and physical properties not found in the common 20 amino acids. Over 30 novel amino acids have been genetically encoded in response to unique triplet and quadruplet codons including fluorescent, photoreactive, and redox-active amino acids, glycosylated amino acids, and amino acids with keto, azido, acetylenic, and heavy-atom containing side chains. By removing the limitations imposed by the existing 20 amino acid code, it should be possible to generate proteins and perhaps entire organisms with new or enhanced properties (Wang and Schultz, 2005).
[edit] Evolution of the Genetic Code and Theories behind Variations
The genetic code evolved in congruence with the origin of life from a much simpler more primitive genetic code. The code may have functioned initially as a doublet code, ignoring the third base in each codon and specifying just a few amino acids. The translation mechanism probably then grew to include a few more. The most influential form of this idea is called “code coevolution.” Code evolution proposes that the genetic code coevolved with the evolution of biosynthetic pathways for new amino acids. Many patterns of similar biosynthetic relatedness have been reported, especially from amino acids with the same biosynthetic pathway assigned to codons that begin with the same first base. The metabolism of pyrimidine biosynthesis provide evidence that suggests that the genetic code could have begun in an RNA world with the two letters A and U grouped in eight triplets coding for seven amino acids and one stop signal. This code could have progressively evolved by making gradual use of letters G and C to end with 64 triplets coding for 20 amino acids and three stop signals. According to proposed evidence, DNA could have appeared after the four-letter structure was already achieved. In the newborn DNA world, T substituted U to get higher physicochemical and genetic stability (Jimenez-Sanchez, 1995). Computer simulations of this type of evolution from a hypothetical RNA cell, consisting of an RNA genome that codes for a replicase and for tRNAs, has been successfully shown (Weberndorfer, 2002). It has long been conjectured that the canonical genetic code (Standard Genetic Code) evolved from a simpler primodial form and that chemical determinism shaped the codon assignments for the current 20, although there are exceptions to the 20. The codon UGA, which is usually a stop signal, sometimes codes for a 21st amino acid, selenocysteine, and pyrrolysine is now viewed as the 22nd. Although Crick and Watson thought that the current cannonical genetic code was "frozen," many now view it as evolving in complexity toward a greater number of amino acids.
It is commonly held that selection between previous alternative codes resulted in the near universality of the current one because it is the most efficient at minimizing errors. Codons that start with the same base to produce the same amino acids use the same biosynthetic pathway. Amino acids from the same biochemical pathway share the same first base of the codon. This reflects the manner in which the code evolved from a simpler code. The code could not have evolved any other way than to allow biochemically related amino acids to have related codons (Freeland, 1998).
Osawa, in his book "Evolution of the Genetic Code" points out how the code differs in the mitochondria in certain organisms and describes how the genetic code is still evolving. He discusses the distribution and origin of the non-universal codes, the codon capture theory, and mechanisms involved in the evolution of the genetic code. In the codon capture theory, for example, in echinoderm mitochondrial 'AAA', its ancestral tRNA-lysine had an anticodon that translated both 'AAA' and 'AAG' to lysine, which is correct according to the standard genetic code. Due to selective pressure of having a genome higher in content of G-C (guanine-cytosine) nucleotide pairs, the 'AAA' codon gradually has been replaced by 'AAG' until the 'AAA' codon disappeared. In the standard genetic code, asparagine is coded for by similar codons, 'AAU' and 'AAC'. As evolution continued, 'AAA' was "recaptured" to be used as another code for the asparagine amino acid.
The current standard genetic code is thought to have evolved in two distinct phases: first, the "canonical" code emerged;[2] subsequently, this code diverged in numerous nuclear and organelle lineages. Mitochondrial variant codes all seem slightly worse than the canonical code. The majority of non-standard codes arise from alterations in the tRNA, with most occurring by post-transcriptional modifications, such as base modification or RNA editing, rather than by substitutions within tRNA anticodons. Francis Crick’s seminal observations on genetic-code evolution included speculative proposed mechanisms for codon reassignment. In 1963, he proposed that biased mutation could render specific codons very rare, permitting their reassignment. Three years later, he suggested a specific example whereby anticodon base modification could induce reassignment of AUA from Met (methionine) to Ile (isoleucine), through a stage in which the codon is translated ambiguously.However, subsequent findings regarding the apparent universality of the standard genetic code led him to an increasingly strong conviction that codon-reassignment events were limited to primordial evolution when “the genetic message of the cell coded for only a small number of proteins which were somewhat crudely constructed.” More recently, there have been three main attempts to explain variation in the code: Osawa and Jukes’s “Codon Capture” hypothesis (Osawa and Jukes 1989),[3][4] (neutral mechanism for codon reassignment through a stage in which the codon disappears from the genome entirely), Schultz and Yarus’s “Codon Ambiguity” hypothesis (Schultz and Yarus 1994),[5][6][7](suggests that the genetic code changes through a state in which some codons have more than one meaning, i.e., tRNA mutations cause translational ambiguity and fixation of the new meaning), and Andersson and Kurland’s “Genome Reduction” hypothesis (Andersson and Kurland 1990),[8] (suggests that code change in mitochondria is driven by selection that minimize translation apparatus; that pressure to minimize mitochondrial genomes leads to the reassignment of specific codons). It has recently been found that the genome reduction model does not explain the variant codes in mitochondria at all (Knight et al., 2001).
Based on original assumptions that the genetic code has a stereochemical basis, Yarus et al.'s (2005) most recent theory is that the triplets have escaped from their original function in amino acid–binding sites to become modern codons and anticodons: "There is significant evidence that cognate codons and/or anticodons are unexpectedly frequent in RNA-binding sites for seven of eight biological amino acids. This suggests that a substantial fraction of the genetic code has a stereochemical basis, the triplets having escaped from their original function in amino acid-binding sites to become modern codons and anticodons. This stereochemical basis is consistent with subsequent optimization of the code to minimize the effect of coding mistakes on protein structure. These data strengthen the argument for invention of the genetic code in an RNA world and for the RNA world itself."
The genetic code is still evolving in many lineages. The scope and extent of variation increases as new sequence data accumulate, which underscores the importance of related work in understanding how and why the standard code evolved in the way it has. This, in turn, provides the basis for asking important questions about the link between code structure and the process of molecular evolution. Comparative genomics is moving beyond the analysis of individual gene sequences towards analysis of assemblages of genes and the common evolutionary mechanisms that govern their alteration and rearrangement. As more non-standard codes are discovered, and the mechanisms that underlie codon reassignments clarified, we will be better able to explore the subtle relationships between coding rules and genome evolution (Knight et al., 2001).
[edit] Other Examples of Code Variants
In a study of the mitochondrial genome of the yeast Saccharomyces cerevisiae (coding for 24 tRNAs), the nucleotide sequences of the tRNA genes suggest a unique set of rules that govern the decoding of the mitochondrial genetic code. The tRNA for the arginine CGN family has an A in the wobble position of the anticodon. It is said to be of interest that the CGN codons have not been found in any other mitochondrial genes sequenced to date (Bonitz, 1980). For an example of a genetic code with a functional quadruplet codon see Anderson et al. (2004).
Computational tools, called GENVIEW and GENCODE, have been developed for testing the adaptive nature of a genetic code under different assumptions about patterns of genetic error and the nature of amino acid similarity. Rigorous statistical tests of the historical biosynthetic theories of the genetic code’s origin has concluded that "a randomly generated computer code, taking into account certain reasonable biochemical restrictions, like wobble pairing, is surprisingly likely to perform just as well as the canonical code, effectively overturning a widely accepted form of this theory held for the past 25 years." GENVIEW: provides a user friendly, point-and-click interface by which a user may reproduce and extend analysis of the adaptive properties of the standard genetic code or any of its secondary derivatives. GENVIEW: is a graphical user interface (GUI) program which runs on Linux, Unix and Microsoft Windows platforms and is based on the GTKf + toolkit (Ronneberg et al., 2001).
[edit] Classification Scheme for Variant Genetic Codes
In light of the differences in the current genetic code, Wilhelm and Nikolajewa (2004) have proposed a new classification scheme based on the binary representation of purines and pyrimidines. The scheme reveals variant patterns more clearly by “classification of strong, mixed, and weak codons as well as the ordering of codon families.” They hypothesize that the genetic code evolved from a binary doublet code and developed via a quaternary doublet code (A, G, C, U) into the contemporary more-expanded triplet code. Their conclusion that code evolution must have started with doublets and not with a single letter is underlined by the correlations they observe between properties of amino acids and the codon strengths.
In any case, an accurate classification scheme for the universal genetic code need to take into account the evolutionary phylogenies of the variant genetic codes: those proposed to account for its evolution, present variations, and potential future evolution. The origin of the genetic code is debateable, whether the current code is frozen as an "accident" of evolution, or was expansed from a primordial code with fewer amino acids. Nevertheless, it has been shown that it can be expanded (Wang, 2003).
The National Center for Biotechnology Information, U.S. National Library of Medicine, maintains a current listing of exceptions to the genetic code, and complete sets of mitochondrial genomes are also available from GenBank.
[edit] Footnotes
- ^ Campbell, Neil A., Reece, Jane B. (2005). Biology (7th ed.). San Francisco: Person Education, Inc. Benjamin Cummings. ISBN 0-8053-7146-X
- ^ Freeland, S.J.; R.D. Knight, L.F. Landweber, and L.D. Hurst (2000). "Early fixation of an optimal genetic code". Mol. Biol. Evol. 17: 511-518.
- ^ Osawa, S.; T.H. Jukes, K. Watanabe, and A. Muto (1992). "Recent Evidence for the Evolution of the Genetic Code". Microbiol. Rev. 56: 229-264.
- ^ Osawa, S.; T.H. Jukes (1989). "Codon reassignment (codon capture) in evolution". J. Mol. Evol. 28: 271-278.
- ^ Schultz, D.W.; and Yarus M. (1996). "On malleability in the genetic code". J. Mol. Evol. 42: 597-601.
- ^ Schultz, D.W.; and Yarus M. (1994). "Transfer RNA mutation and the malleability of the genetic code". J. Mol. Biol. 235: 1377-1380.
- ^ Schultz, D.W.; and M. Yarus (1995). "drives the evolution of the translation system". Biochem. Cell Biol. 73: 775-787.
- ^ Anderson, S.G.; and C.G. Kurland (1998). "Reductive evolution of resident genomes". Trends Microbiol. 6: 263-268.
[edit] References
- Abascal, Federico, Rafael Zardoya, and David Posada. (2006). "GenDecoder: genetic code prediction for metazoan mitochondria." Nucleic Acids Res. Jul 1;34. Retrieved March 23, 2007 from:[[4]]
- Abascal, Federico, David Posada, Robin D Knight, and Rafael Zardoya (2006a). "Parallel Evolution of the Genetic Code in Arthropod Mitochondrial Genomes." PLoS Biol. Apr 25;4. Retreived March 24, 2007 from:[[5]]
- ACS (American Chemical Society). (2003). "Expanding The Genetic Code: The World's First Truly Unnatural Organism." Reported in Science Daily. January 13, 2003. Retrieved March 24, 2007 from:[[6]]
- Anderson, J. Christopher, Ning Wu, Stephen W. Santoro, Vishva Lakshman, David S. King, and Peter G. Schultz. (2004). "An expanded genetic code with a functional quadruplet codon." Proc Natl Acad Sci U S A. May 18; 101(20): 7566–7571. Retrieved March 20, 2007 from [[7]]
- Bonitz, S.G., R. Berlani, G. Coruzzi, M. Li, G. Macino, F.G. Nobrega, M.P. Nobrega, B.E. Thalenfeld, and A. Tzagoloff. (1980). " Links Codon recognition rules in yeast mitochondria." Proc Natl Acad Sci. Jun;77(6):3167-70. Retrieved March 20, 2007 from [[8]]
- Freeland, Stephen J. and Laurence D. Hurst. (1998). “Load Minimization of the Genetic Code: History Does not Explain the Pattern.” Proceedings: Biological Sciences, Vol. 265, No. 1410 (Nov. 7, 1998), pp. 2111-2119. [[9]]
- Jimenez-Sanchez A. (1995). “On the origin and evolution of the genetic code.” J. Mol. Evol. Dec. 41(6): 712-6. [[10]]
- Knight, Robin D., Stephen J. Freeland and Laura F. Landweber. (January 2001). "Rewiring the Keyboard: Evolvability of the Genetic Code." Nature Reviews. Vo2. 1. Retrieved March 20, 2007 from: [[11]]
- Lehman, Niles. (January 2001). “Molecular evolution: Please release me, genetic code.” Current Biology. Volume 11, Issue 2 , pp. R63-R66. [[12]]
- Osawa, Syozo. (1995). Evolution of the Genetic Code. Oxford: Oxford University Press. ISBN: 0198547811.
- Ronneberg T.A., S.J. Freeland, and L.F. Landweber. (2001). "Genview and Gencode : a pair of programs to test theories of genetic code evolution." Bioinformatics. Mar;17(3):280-1. Retrieved March 23, 2007 frm:[[13]]
- Scripps Research Institute. (2004). "Expanding the Genetic Code." Scientific Report 2004. The Skaggs Institute for Chemical Biology. Retrieved March 24, 2007 from:[[14]]
- Scripps Research Institute in La Jolla. (2005). "DNA with three base pairs - A step towards expanding the genetic code." Reported on Innovations Report. A talk entitled "Efforts to Expand the Genetic Code" presented at the Biomimetic Polymers Symposium by the American Chemical Society. March 14. Retrieved March 24, 2007 from:[[15]]
- Wang, Lei. (2003). "Expanding the Genetic Code." Science, Vol. 302. no. 5645, pp. 584-585.
- Wang, Lei. (2003a)). "Expanding the Genetic Code of Escherichia coli." Essay - IUPAC Prize for Young Chemists. Retrieved March 20, 2007 from [[16]]
- Wang, Lei and Peter G. Schultz. (2005). "Expanding the Genetic Code." Angew. Chem. Int. Ed. 44, pp. 34–66. Retrieved March 24, 2007 from:[[17]]
- Wilhelm,Thomas and Svetlana Nikolajewa. (2004) “A new classification scheme of the genetic code.” Institute of Molecular Biotechnology. Retrieved March 21, 2007 from:[[18]]
- Weberndorfer, Mag. Gunther. (2002). "Computation Models of the Genetic Code Evolution Based on Empirical Potentials." Dissertation. Retrieved March 21, 2007 from:[[19]]
- Yarus, Michael, J. Gregory Caporaso, and Rob Knight. (2005). "Origins of the Genetic Code: The Escaped Triplet Theory." Annual Review of Biochemistry. Vol. 74: 179-198. Retrieved March 23, 2007 from:[[20]]
[edit] See also
- Autocatalytic set
- Chemical evolution
- Double helix
- DNA
- Epigenetics
- Epigenetic code
- Gene-centric view of evolution
- Gene expression
- Genetic algorithm
- Gene regulatory network
- Genetic code
- Genetics
- Genomes
- Genomics
- International Code of Zoological Nomenclature
- List of notable genes
- Origin of Life
- Protein
- Protein biosynthesis
- Pseudogene
- pyrrolysine, the 22nd genetically encoded amino acid.
- Regulation of gene expression
- RNA
- RNA world hypothesis
- selenocysteine, the 21st genetically encoded amino acid.
- The Major Transitions in Evolution
- Translation
[edit] External links
- Freeland Lab. (2005). "Evolving Code: Non Standard Genetic Codes." Biological Sciences Department at UCMC. Retrieved March 20, 2007 from: [[21]]
- Skybreak, Ardea. (2003). "Rare Variants of the Almost Entirely Universal Genetic Code are Evidence of Evolution, Not Design." Retrieved March 20, 2007 from: [[22]]
- Online DNA → Amino Acid Converter
- DNA Sequence → Protein Sequence converter
- DNA to protein translation (6 frames/13 genetic codes)
- The Codon Usage Database → Codon frequency tables for many organisms
- Evolving Code - a themed wiki devoted to the topic of how the genetic code evolved, and its effects on the subsequent evolution of the genome.
[edit] Tutorial and news
- The Dolan DNA Learning Center
- DNA Interactive
- DNA From The Beginning
- Science aid: Genetics for beginners
- (2004) "Finishing the euchromatic sequence of the human genome". Nature 431 (7011): 931-45. PMID 15496913.