|
|||
|
Foresight Update 6 - Table of Contents | Page1 | Page2 | Page3 | Page4 | Page5 |
With links added to 1997 WWW sources of information
Protein engineering is a new field; separate meetings, journals,
and books devoted to the topic have only appeared during the last
two to three years. One of the earliest uses of the term was in
an article contained in a special issue of Science
(February 11, 1983) that was devoted to the new field of
biotechnology: "Protein Engineering" by Kevin M. Ulmer
(Science 219:666-671). This article is an
overview that describes how several advances in different fields
have made it possible to attempt to modify many properties of
proteins by combining information on three-dimensional structure
and classical protein chemistry with new methods of genetic
engineering and molecular graphics. Ulmer concludes that this
developing technology will be used to further academic
understanding of how proteins work, and to produce altered
proteins for improved commercial products. He also envisions
"...paving the way for designing novel enzymes from first principles. Protein engineering thus represents the first major step toward a more general capability for molecular engineering which would allow us to structure matter atom by atom."
--Kevin M. Ulmer
The basic set of ideas works something like this. You use
computer graphics to display the three dimensional structure,
which has been experimentally determined for the protein that you
are studying, or, failing that, of a protein that is sufficiently
closely related to serve as a model. You then combine calculation
and guesswork to decide what modifications in structure might
bring about a desired change in the properties of the protein.
You then produce this new protein in one of two ways. If it is a
very small protein, often called a peptide, you can synthesize it
chemically by using the Merrifield solid phase technique. This
has the additional advantage that you can use chemical building
blocks in addition to the ones used in biological systems. If the
protein is not very small, you can use the new techniques of
biotechnology to make a gene that will encode an altered protein.
One part of this technology embodies solid phase
oligodeoxynucleotide synthesis, a variant on the Merrifield
technique, to synthesize a small piece of DNA to encode the
altered portion of the protein. This small piece is then used to
mutate the natural gene to the gene for the desired protein using
the gene splicing methods of recombinant DNA. The altered gene is
introduced into a vector-host system of the type used in
biotechnology to abundantly produce proteins that are not
"well-expressed" (produced) in nature. This protein can
then be purified and studied to determine how well your
predictions worked. This cycle of "guess-experiment-guess
again" was not possible prior to the advent of these
techniques.
A major part of the
intellectual effort of protein engineering is devoted to solving the "protein folding" problem. |
A major part of the intellectual effort of protein engineering is devoted to solving the "protein folding" problem. Enzymology, a branch of classical biochemistry, leads to the idea that a protein's function follows from its three-dimensional, or "tertiary," structure, and further, that its three-dimensional structure follows from the linear sequence of amino acid residues that comprises its "primary" structure. The linear sequence can be deduced from the DNA sequence of the gene that encodes the protein. The relationship between DNA and protein sequences is the genetic code, which was cracked during the late 1950's and early 1960's. Trying to understand how the primary structure determines the tertiary structure of the protein is, however, very much an unsolved problem at this time. It has often been referred to as "the second half of the genetic code." It is unclear just how much of the folding problem will have to be solved to permit the design of novel proteins. Eric Drexler (in Engines of Creation and a 1981 PNAS article) has pointed out that natural proteins may embody obscure sequence-structure relationships for evolutionary reasons, and that it may thus be possible to develop a much simpler sequence-structure code for designed proteins. Whatever the size of the problem, solving some version of the protein folding problem will be a key step to using protein engineering to design first generation assemblers.
Webmaster's Note: An
excellent introduction to protein structure is available
on the Internet. The following links are to the primary
site at Birkbeck College in England, and to the American
mirror at Brookhaven National Laboratory: |
A number of more recent overviews of protein engineering are
available. The Preface by Dale L. Oxender and C. Fred Fox, and
the Introduction by Carl O. Pabo to the book Protein
Engineering (Oxender and Fox, ed., Alan R. Liss, Inc., New
York) are brief statements of the origins of the field. A
somewhat more technical introduction is afforded by the article
"Protein Engineering" by R. J. Leatherbarrow and A. R.
Fersht (1986) in the inaugural issue of Protein Engineering
1:7-16. The latter article considers various techniques
used to produce desired mutations in the genes encoding proteins,
discusses several proteins that are being intensively studied
using these techniques, and summarizes the results of some of
these studies. An overview specifically targeted to using
chemical synthesis for small proteins instead of genetic
engineering techniques is "Protein engineering by Chemical
Means?" by R. E. Offord (1987), Protein Engineering
1:151-157.
To appreciate what is involved in protein engineering requires an
acquaintance with a number of fields. These include classical
biochemistry, especially of proteins; protein structure
determination, including new computer graphic methods to
represent protein structure and to calculate the effects of
different perturbations of the sequence upon the structure;
solid-phase methods for the chemical synthesis of proteins; the
new methods of genetic engineering, including both basic
molecular biology and the techniques of biotechnology. It is the
goal of this primer to provide a summary of some of this material
and a guide to further in-depth studies.
Foresight Update 6 - Table of Contents |
The following is a summary of the above article:
Hardware and software to model, analyze, and simulate novel
molecular structures is still in a very formative stage. The
modeling process begins with the coordinates of the 3-dimensional
structure (solved by X-ray crystallography) obtained (for
proteins) from the Brookhaven
National Laboratory's Protein Data Bank (PDB). Only about 300
protein structures have been determined, so in many cases,
modeling must be attempted using a known structure that has a
similar sequence to the unknown structure.
The molecular graphics program then converts the 3-dimensional
coordinates into a picture of the molecule, which can be
manipulated on the computer monitor to see specific bonds and
other features of the structure, just as a physical model could
be handled to observe various features from all angles.
Webmaster's Note: One
molecular graphics visualization tool available over the
Internet is RasMol, developed by Roger Sayle. RasMol is
available for UNIX, VMS, Macintosh and Microsoft Windows
(OS/2 and Windows NT). RasMol
is available by ftp. Excellent sources of information
about RasMol are: |
The next step is the
use of molecular mechanics programs (based on classical Newtonian
mechanics) that calculate forces among the various atoms and
minimize the overall energy of the conformation to calculate the
preferred actual structure of the protein. Advanced programs use
molecular dynamics to refine the structural calculations. The
most effective programs need to use a supercomputer for these
calculations. Consideration of nonbonded atomic states can
require 125 million floating point operations for a given energy.
Ab initio calculations to solve the (quantum mechanical)
Schrödinger equation can only deal with 10-20 atoms per molecule
because of computational limitations.
The market for molecular modeling packages appears to be in flux,
with too many packages and too few users at the moment. Software
suppliers presently include Polygen (Waltham, MA), Biosym
Technologies (San Diego, CA), and Tripos Associates (St. Louis,
MO). The latter package will soon include techniques to model by
homology--comparing structural motifs that occur frequently in
nature. This "knowledge based" approach includes an
analysis of the vast protein sequence (not 3-D structure)
database to find a useful set of related proteins whose
structures can be compared to model the unknown structure.
Manufacturers of hardware include SiliconGraphics (Mountain View,
CA) and Evans & Sutherland (Salt Lake City, UT).
Webmaster's Note: For more
current information on molecular modeling, see:
|
Molecular modeling may "provide the forum for chemists, physicists, computer scientists, genetic engineers, and protein purifiers to come together."
Foresight Update 6 - Table of Contents |
In a sense, molecular dynamics is the most fundamental aspect of the study of proteins (or any other molecules) from the perspective of nanotechnology. This area deals with how each of the constitutive atoms of molecules, large or small, move and thus provides a time-evolving structural basis for considering the properties of the molecules. If we wish to make molecular machines, we have to understand how the parts move so that we can make the machines function appropriately. I (JBL, 7/17/88) have little knowledge of the subject, so I give here a few references of places to get started.
Webmaster's Note: One
excellent introduction to molecular dynamics has been
provided on the WWW by Biosym/MSI at: http://lmb.niehs.nih.gov/LMB/docs/biosym/950/discover/General/Dynamics/Intro_Dyn.html This information is part of the comprehensive online documentation of their Discover program, located at: http://lmb.niehs.nih.gov/LMB/docs/biosym/950/discover/Disco_Home.html A database of known motions in proteins is available at: http://hyper.stanford.edu/~mbg/ftp/ProtMotDB/ProtMotDB.all.html |
The following is a summary of the above article:
"The molecules essential to life are never at rest; they
would be unable to function if they were rigid. The internal
motions that underlie their workings are best explored in
computer simulations." This introduction begins by pointing
out the limitations of trying to understand in detail how
proteins function by knowing only the static structure of the
crystal, determined by X-ray crystallography (or occasionally in
solution by NMR), which represents only the time-averaged
structure of the protein.
Better understanding of how proteins function is provided by
theoretical studies, based on experimental structural
information, that lead to computer simulations of how the protein
actually moves. "The most direct approach to protein
dynamics is to treat each atom in the protein as a particle
responding to forces in the way prescribed by Newtonian physics,
in accord with Newton's equations of motion." Remember that
average-sized proteins contain 5000 or more atoms. Chemical bonds
can be treated like springs, and many weaker forces between
non-bonded atoms must be considered so that the force on each
atom depends upon the positions of every other atom in the
protein. The X-ray crystal structure gives the necessary
information to begin the simulation of atomic movements. However,
because the X-ray structure is an average structure, it turns out
to be a very unrealistic picture of the state of any particular
molecule at any particular time so that a very complex set of
calculations must be performed to arrive at such an
"equilibrated" protein.
If the atoms in myoglobin
were fixed in the positions found in the X-ray-crystallographic structure, myoglobin would be useless |
This latter structure is used as the starting point for
molecular dynamics simulations of how the molecule will behave.
These calculations use steps of the order of a femtosecond. The
best simulations follow the protein for as long as a nanosecond
(a million such steps), requiring hundreds of hours of
supercomputer time. The combination of many small local motions
of individual amino acid residues and their constituent atoms can
produce more global displacements of different parts of the
protein. What sorts of movements are important over what time
scales is discussed in general terms. The particular example of
myoglobin, the oxygen-binding protein in muscle, is discussed.
The striking point is made that "If the atoms in myoglobin
were fixed in the positions found in the X-ray-crystallographic
structure, myoglobin would be useless: the time required for an
oxygen molecule to bind to the heme group or to get out again
when needed would be much longer than a whale's lifetime"
(or the lifetime of the universe, for that matter). Simulations
showed instead how fluctuations in the positions of specific
atoms allowed the oxygen to diffuse through the structure in a
reasonable amount of time.
Also discussed is how critical parts of enzymatic reactions
usually occur over millisecond time scales, a million times as
long as can be handled with present computers. Specialized
approximations can sometimes be used and are discussed for a few
cases. These illustrate "the important role of small, high
frequency fluctuations in facilitating some larger and more
collective motions of proteins." Karplus predicts that
eventually these techniques will lead to the ability to calculate
the rates of enzymatic reactions and the binding of small
molecules to larger ones, thus providing better ways to modify
proteins for industrial purposes.
The following is a summary of the above article:
This review is a bit more technical and considers more the
interplay of calculation and experiment to provide more
meaningful results. For example, the role of NMR in studying
internal motions of proteins is discussed. Conversely, the
application of molecular dynamics methods to NMR data is quite
useful in deriving three-dimensional protein structures from the
data. This process is referred to as "restrained
dynamics." The take-home lesson is the same as for the above
review, with the list of expected future practical developments
expanded to include the design of inhibitors to cure diseases.
A real understanding of molecular dynamics, of course, can not be
gotten from brief review articles. A good text book is probably Dynamics
of Proteins and Nucleic Acids by Andrew McCammon and
Stephen C Harvey. Cambridge University Press, New York, 1987.
xii, 234 pp., illus. $39.50.
I say probably because I haven't seen it yet (let alone read it),
but I saw two very favorable reviews: one (titled "Good
Vibrations") by B. Robson in BioEssays, Volume
8, No. 2 , p. 93-93 (February 1988)--admittedly a periodical
published by the same publisher as the book--and the other
(titled "Biomolecular Processes," in the more prosaic Science
fashion) by R. M. Levy in Science, 8 July, 1988, 241:234-235.
Both reviews agree that it is a very well-organized book and an
excellent place to begin to try to understand the field. Despite
starting from basics, the book is said to provide the background
needed to read the current literature of the field. The book is
about the time-dependent motions of these vital molecules. These
motions range from small-amplitude atomic vibrations that occur
in 0.1 picoseconds to large-scale allosteric transitions that
occur in milliseconds to several seconds. The theoretical and
computational methods are clearly described, with most emphasis
on the nanosecond scale since computational limitations make
detailed calculations on longer scales impractical, but these
slower processes are discussed in general terms. I take it from
what the reviewers say that molecular dynamics approaches can now
attempt to predict the three-dimensional structure of small
peptides, but since a large protein takes on the order of a
second to fold, and current simulations are limited to about a
nanosecond scale, we have a factor of a billion to go in
predicting three-dimensional structure for large proteins.
Foresight Update 6 - Table of Contents |
Abstract: "Prediction of the tertiary structures of
proteins may be carried out using a knowledge-based approach.
This depends on identification of analogies in secondary
structures, motifs, domains or ligand interactions between a
protein to be modeled and those of known three-dimensional
structures. Such techniques are of value in prediction of
receptor structures to aid in the design of drugs, herbicides or
pesticides, antigens in vaccine design, and novel molecules in
protein engineering."
The following is a summary of the above article:
After discussing the expected utility of structural knowledge in
applications from drug design to biological microchips, and
noting the fact that sequence information has increased much more
rapidly than 3-D structural information, this paper then
discusses the various steps involved in prediction of 3-D
structure from sequence:
The first step is to compare the sequence of the protein whose 3-D structure is to be predicted with the known sequences of other proteins available in the sequence database. Several algorithms are available to do this. If the new sequence is >25% similar to a sequence in the database, the match is easily distinguished above the background of randomized sequences. It is stated that an alignment score >6 standard deviations above random alignment will give reliable prediction of the secondary structures of most residues.
In cases where the 3-D structures of homologous proteins are known, structure is conserved in evolution more than is primary protein sequence. Often changes are concentrated in the surface loops of the protein. This observation provides the rationale for using the known structure of the homologous protein to predict the unknown structure.
The aligned sequences are used to predict where one should create insertions, deletions, and replacements in the known structure. This is done using computer graphics; a widely used program is called FRODO. Initial models are then refined by energy minimization programs on the computer to avoid steric clashes. References are given to research that has used this approach.
Since only about 100 out of the 300 3-D structures in the Brookhaven databank are nonhomologous, there is often more than one structure available to use as a basis for modeling. Several approaches for simultaneously using different model structures to predict the unknown structure are discussed.
Loops are the most difficult regions to construct because the majority of significant differences occur in these regions. Databases and examples for loop construction are discussed in some detail. Particular attention is given to beta-hairpin loops (loops between two adjacent antiparallel beta strands). Ab initio calculations using molecular dynamics are recommended when no structure sufficiently similar for use in modeling can be found.
"Where the proteins have sequence homology of 50% or more, the models predicted by the methods described here will be probably correct to better than 1 Angstrom although individual side chains may be more in error." Some improvement in accuracy can be had by using such energy minimization programs as AMBER or CHARMM. Since energy minimization as it is now done finds only a local minimum, it is only expected to be useful if the errors in the starting structure are less than an Angstrom [Note: This seems a quite stringent requirement to meet-JBL].
Several cases where modeled proteins have been subsequently studied by X-ray are discussed, with the results shown to have been mixed. It is suggested that the distributions of the hydrophobic side-chains and the nature of the solvent-accessible surfaces are the most sensitive indicators of the reasonableness of the model. [It should be noted that this whole modeling procedure, although useful in some situations, is still very inexact and requires a great deal of experience and knowledge to interpret.-JBL]
Two challenges are discussed: (1) To extend the method to cases where there is no obvious sequence homology, but there is reason to suspect that the structure is a member of a known family of structural motifs, and (2) To design novel molecules.
Dr. Thornton notes that the delicate balance between properly folded and alternate structures of a protein has been impossible to predict so far from energy minimization, so that people have tried to use empirical predictions based on the 300 or so protein structures that have been experimentally determined. These have been of limited value. Even simple predictions of secondary structure only, rather than complete tertiary structures, are only about 60% accurate. By considering in more detail characteristics of a particular type of secondary structure (beta-beta hairpin turns), certain sequence features associated with specific varieties of this structure were identified that improved prediction a bit (to over 70%). This is progress, but the empirical approach to protein sequence-structure relationships has a long way to go before we can use it to help design first generation assemblers. A general review of this process of knowledge-based prediction of protein structure; i.e. modeling the structure of an unknown protein based on the known structure of a protein of similar sequence, was published last year: "Knowledge-based prediction of protein structures and the design of novel molecules" by T. L. Blundell, B. L. Sibanda, M. J. E. Sternberg, J. M. Thornton. 1987. Nature 326:347-352. [NOTE: This paper is abstracted above.]
The following is a summary of the above article:
She observes that attempts to predict structure from
sequence have been shifting from calculations using energy
functions to the "more pragmatic" structure recognition
by pattern matching, as exemplified by a paper by Rooman and
Wodak in the same issue (see below). "The good news is that
short sequence patterns which reliably define secondary structure
do exist. The bad news is that the prediction accuracy ... is
still only about 60 per cent." Apparently the main problem
is the relative scarcity of structural data. The 60% accuracy is
especially discouraging since the original attempts by Chou and
Fasman and by Garnier, both in 1978, achieved this level of
accuracy using only the 20 protein structures that were known
then (vs. >300 today), and used only the helix-or
sheet-forming properties of individual residues rather than those
of short sequences. Results are quoted from several years ago
that only 20% of identical pentapeptides in unrelated proteins of
known structure adopt the same secondary structure. The paper
below does an automated and systematic search of the structure
database, and identifies some peptides that are very predictive,
although most peptide sequences are not very predictive.
The reason that most patterns are not predictive is apparently
that most occur only a few times in the database so that patterns
can not be adequately recognized. It is suggested that sequence
patterns should occur about 15 times for accurate prediction,
while most 3-residue sequences occur < 3 times in the current
database. Rooman and Wodak speculate that a database of 1500
structures will be needed for adequate prediction of secondary
structure, which, optimistically, could take 20 years to produce.
Thornton suggests that prediction might be improved by (1)
incorporating recent sequence interpretation techniques designed
to recognize very distantly related proteins so that the known
structure of one could be used to model the other, and (2) using
what is known in some cases about elements of super-secondary
structure--motifs of clustered secondary structure elements
associated with particular classes of proteins. She also mentions
a recent article in which neural networks were trained to
recognize secondary structure from sequences, and got predictions
that were 64% accurate.
Webmaster's Note: For more
current information on Dr. Thornton's work, see: http://www.biochem.ucl.ac.uk/bsm/biocomp/index.html |
Jim Lewis is a molecular biologist at Oncogen in Seattle. He is also the leader of the PATH HyperCard Project, a project of the Seattle Nanotechnology Study Group, which is working on a HyperCard stack on nanotechnology. The full text of Dr. Lewis's summary from which this adaptation was made is available from the Foresight Institute; send a stamped, self-addressed envelope with 65 cents postage.
Foresight Update 6 - Table of Contents | Page1 | Page2 | Page3 | Page4 | Page5 |
From Foresight Update 6, originally
published 1 August 1989.
Foresight thanks Dave Kilbridge for converting Update 6 to
html for this web page.
Home About Foresight Blog News & Events Roadmap About Nanotechnology Resources Facebook Contact Privacy Policy Foresight materials on the Web are ©1986–2024 Foresight Institute. All rights reserved. Legal Notices. |
Web site developed by Stephan Spencer and Netconcepts; maintained by James B. Lewis Enterprises. |