Read Gödel, Escher, Bach: An Eternal Golden Braid Online
Authors: Douglas R. Hofstadter
Tags: #Computers, #Art, #Classical, #Symmetry, #Bach; Johann Sebastian, #Individual Artists, #Science, #Science & Technology, #Philosophy, #General, #Metamathematics, #Intelligence (AI) & Semantics, #G'odel; Kurt, #Music, #Logic, #Biography & Autobiography, #Mathematics, #Genres & Styles, #Artificial Intelligence, #Escher; M. C
Amino Acids
Proteins are composed of sequences of amino acids, which come in twenty primary varieties, each with a three-letter abbreviation:
ala
alanine
arg
arginine
asn
asparagines
asp
aspartic acid
cys
cysteine
gln
glutamine
glu
glutamic acid
gly
glycine
his
histidine
He
isoleucine
leu
leucine
lys
lysine
met methionine
phe phenylalanine
pro
praline
ser
serine
thr
threonine
trp
tryptophan
tyr
tyrosine
val
valine
Notice the slight numerical discrepancy with Typogenetics, where we had only fifteen
"amino acids" composing enzymes. An amino acid is a small molecule of roughly the same complexity as a nucleotide; hence the building blocks of proteins and of nucleic acids (
DNA
,
RNA
) are roughly of the same size. However, proteins are composed of much shorter sequences of components: typically, about three hundred amino acids make a complete protein, whereas a strand of
DNA
can consist of hundreds of thousands or millions of nucleotides.
Ribosomes and Tape Recorders
Now when a strand of m
RNA
, after its escape into the cytoplasm, encounters a ribosome, a very intricate and beautiful process called translation takes place. It could be said that this process of translation is at the very heart of
all of life, and there are many mysteries connected with it. But in essence it is easy to describe. Let us first give a picturesque image, and then render it more precise. Imagine the mRNA to be like a long piece of magnetic recording tape, and the ribosome to be like a tape recorder. As the tape passes through the playing head of the recorder, it is "read" and converted into music, or other sounds. Thus magnetic markings are "translated" into notes.
Similarly, when a "tape" of mRNA passes through the "playing head" of a ribosome, the
"notes" which are produced are amine acids, and the "pieces of music" which they make up are proteins. This is what translation is all about: it is shown in Figure 96.
The Genetic Code
But how can a ribosome produce a chain of amino acids when it is reading a chain of nucleotides This mystery was solved in the early 1960's by the efforts of a large number of people, and at the core of the answer lies the Genetic Code-a mapping from triplets of nucleotides into amino acids (see Fig. 94). This is in spirit extremely similar to the Typogenetic
Code,
except
that
here,
three
consecutive
bases
(or
nucleotides)
form a codon,
whereas there,
CUA
GAU
only two were
needed. Thus
, C u A g A u
there must be
4x4x4 (equals
64)
different
entries in the
A typical segment of mRNA
table, instead of
sixteen.
A
read first as two triplets
ribosome clicks
down a strand
(above), and second as three
of RNA three
nucleotides at
duplets (below): an example
a time-which is
to say, one.
of hemiolia in biochemistry
codon at a time
-and
each
time it does so,
it appends a single new amino acid to the protein it is presently manufacturing. Thus, a protein comes out of the ribosome amino acid by amino acid.
Tertiary Structure
However, as a protein emerges from a ribosome, it is not only getting longer and longer, but it is also continually folding itself up into an extraordinary three-dimensional shape, very much in the way that those funny little Fourth-of-July fireworks called "snakes"
simultaneously grow longer and curl up, when they are lit. This fancy shape is called the protein's tertiary structure (Fig. 95), while the amino acid sequence per se is called the primary structure of the protein. The tertiary structure is implicit in the primary structure, just as in Typogenetics. However, the recipe for deriving the tertiary structure, if you know only the primary structure, is by far more complex than that given in Typogenetics. In fact, it is one of the outstanding problems of contemporary molecular biology to figure out some rules by which the tertiary structure of a protein can be predicted if only its primary structure is known.
The Genetic Code.
U
C
A
G
phe
ser
tyr
cys
U
phe
ser
tyr
C
U
cys
leu
ser
punt.
A
punt.
leu
ser
punc.
trp
G
leu
pro
his
arg
U
leu
pro
his
arg
C
C
leu
pro
A
gin
arg
Ieu
pro
G
gln
arg
ile
thr
asn
ser
U
ile
thr
asn
ser
C
A
ile
thr
lys
arg
A
met
thr
lys
arg
G
G
val
ala
asp
gly
U
val
ala
asp
gly
C
val
ala
glu
gly
A
val
ala
glu
gly
G
FIGURE 94. The Genetic Code, by which each triplet in a strand of messenger RNA codes for one of twenty amino acids (or a punctuation mark).
Reductionistic Explanation of Protein Function
Another discrepancy between Typogenetics and true genetics-and this is probably the most serious one of all-is this: whereas in Typogenetics, each component amino acid of an enzyme is responsible for some specific "piece of the action", in real enzymes, individual amino acids cannot be assigned such clear roles. It is the tertiary structure as a whole which determines the mode in which an enzyme will function; there is no way one can say, "This
amino acid's presence means that such-and-such an operation will get performed". In other words, in real genetics, an individual amino acid's contribution to the enzyme's overall function is not "context-free". However, this fact should not be construed in any way as ammunition for an anti reductionist argument to the effect that "the whole [enzyme] cannot be explained as the sum of its parts". That would he wholly unjustified. What is justified is rejection of the simpler claim that "each amino acid contributes to the sum in a manner which is independent of the other amino acids present". In other words, the function of a protein cannot be considered to be built up from context-free functions of its parts; rather, one must consider how the parts interact. It is still possible in principle to write a computer program which takes as input the primary structure of a protein,
FIGURE 95. The structure of myoglobin, deduced from high-resolution X-ray data. The large-scale "twisted pipe" appearance is the tertiary structure; the finer helix inside-the
"alpha helix"-is the secondary structure. [From A. Lehninger, Biochemistry]
and firstly determines its tertiary structure, and secondly determines the function of the enzyme. This would be a completely reductionistic explanation of the workings of proteins, but the determination of the "sum" of the parts would require a highly complex algorithm.
The elucidation of the function of an enzyme, given its primary, or even its tertiary, structure, is another great problem of contemporary molecular biology.
Perhaps, in the last analysis, the function of the whole enzyme can be considered to be built up from functions of parts in a context-free manner, but where the parts are now considered to be individual particles, such as electrons and protons, rather than "chunks", such as amino acids. This exemplifies the "Reductionist's Dilemma": In order to explain everything in terms of context free sums, one has to go down to the level of physics; but then the number of particles is so huge as to make it only a theoretical "in-principle" kind of thing.
So, one has to settle for a context-dependent sum, which has two disadvantages. The first is that the parts are much larger units, whose behavior is describable only on a high level, and therefore indeterminately. The second is that the word "sum" carries the connotation that each part can be assigned a simple function and that the function of the whole is just a context-free sum of those individual functions. This just cannot be done when one tries to explain a whole enzyme's function, given its amino acids as parts. But for better or for worse, this is a general phenomenon which arises in the explanations of complex systems. In order to acquire an intuitive and manageable understanding of how parts interact-in short, in order to proceed-one often has to sacrifice the exactness yielded by a microscopic, context-free picture, simply because of its unmanageability. But one does not sacrifice at that time the faith that such an explanation exists in principle.
Transfer RNA and Ribosomes
Returning, then, to ribosomes and RNA and proteins, we have stated that a protein is manufactured by a ribosome according to the blueprint carried from the DNA's "royal chambers" by its messenger, RNA. This seems to imply that the ribosome can translate from the language of codons into the language of amino acids, which amounts to saying that the ribosome "knows" the Genetic Code. However, that amount of information is simply not present in a ribosome. So how does it do it? Where is the Genetic Code stored? The curious fact is that the Genetic Code is stored-where else?-in the DNA itself. This certainly calls for some explanation.
Let us back off from a total explanation for a moment, and give a partial explanation.
There are, floating about in the cytoplasm at any given moment, large numbers of four-leaf-clover-shaped molecules; loosely fastened (i.e., hydrogen-bonded) to one leaf is an amino acid, and on the opposite leaf there is a triplet of nucleotides called an anticodon. For our purposes, the other two leaves are irrelevant. Here is how these "clovers" are used by the ribosomes in their production of proteins. When a new
FIGURE 96. A section of mRNA passing through a ribosome. Floating nearby are t
RNA
molecules, carrying amino acids which are stripped off by the ribosome and appended to the growing protein. The Genetic Code is contained in the t
RNA
molecules, collectively. Note how the base-pairing (A-U, C-G) is represented by interlocking letter-forms in the diagram.
[Drawing by Scott E. Kim]
codon of m
RNA
clicks into position in the ribosome's "playing head", the ribosome reaches out into the cytoplasm and latches onto a clover whose anticodon is complementary to the m
RNA
codon. Then it pulls the clover into such a position that it can rip off the clover's amino acid, and stick it covalently onto the growing protein. (Incidentally, the bond between an amino acid and its neighbor in a protein is a very strong covalent bond, called a "peptide bond". For this reason, proteins are sometimes called "polypeptides".) Of course it is no accident that the "clovers" carry the proper amino acids, for they have all been manufactured according to precise instructions emanating from the "throne room".