Department of ChemGtry
The University of Arizona
Tucson, Arizona 85?21, U.S.A.
1. Introduction
Atomic co-ordinates derived from high resolution crystallographio analyses are
available for more than 30 proteins (Dickerson, 1972). Some method of describing
these structures in a way that allows simple and objective comparisons among them
seems necessary. Particular importance is attached to descriptions of the molecular
surface and the environments of reactive groups because these features should most
closely relate to chemical properties. Lee t Richards (1971) have made an effective
approach to this problem through developing a computer program for calculating
the exposure of protein atoms to solvent. The present report extends this method by
focusing attention on the nature of contacts between atoms. Results are given for
native lysozyme and insulin and for changes in their surfaces that occur during
folding and several association reactions, including crystallization.
2. Methods
Like Lee & Richards (1971), we describe the protein by a set of solvated van der Waale’
spheres. The surface of a sphere is represented by a set of 92 test points that are nearly
uniformly distributed. Each atom of the protein is considered separately as a central atom
that is checked for overlap with all other atoms of the moleculctthe test atoms. The latter
The numbers of residues of esch type included in the averaging (a total of 127 non-terminel residues in lysozyme and 94 in insulin dimer) 8re given in
perentheses following the residue type. The atom designations 8re those used in Imoto et al. (1972). BB and SC stand for the sums over all b&ckbone and
over all side oh&n atoms of the residue, respectively. The first number column gives the average area (A’) exposed to solvent in the peptide model; the
second gives the root-mean-square deviation. Mean velues for terminal residues (Gly-X or X-Gly models) are given at the end of the list.
are divided into two categories. Near tat &ms axe those of a Gly-X-Gly tripeptide model
in which residue X contains the central atom. The tripeptide model for a half-cystine
residue of 8 disulflde includes the SG, CB and CA atoms of the partner half-cystine as
near atoms. M tat atum am ail other test atoms.
The exposure of a particular central 8tom to solvent is the area of the solvated sphere
that contains test points not occluded by test 8toms. Each test point on the surf8ce of a
central atom is considered separately with respect to all the test atoms. The test atom
given credit for occluding 8 test point is determined by the gre8test value for the ratio of
the solvated radius of the test atom to the distance from the test point to the center of
the test atom. The list of intemcting test atoms and the corresponding areas occluded on
8 particular central atom describes the environment of the central atom. This list, which
is the basic output of the computation, is stored on magnetic t8pe for subsequent use in
summations and comparisons.
The Gly-X-Gly tripeptide serves as a model for the environment of a central atom in
the unfolded protein. This model assumes that side chains of adjacent residues in 8n
unfolded chain on the average do not contact the central residue. The conformation used
for the tripeptide is that for the corresponding atoms of the n8tive protein. The area
exposed to solvent for a p8rtioulsr type of atom (or residue) therefore varies for this model
according to its location in the folded molecule. Table 1 gives averages for areas exposed to
solvent in the unfolded state and the corresponding root-mean-square deviations (a) for
each type of atom and (b) for the sums over all backbone atoms and over 8ll side chain
atoms for each type of residue. The smaJl values of the root-mean-square deviations show
that use of the native conformation for the tripeptide model introduces no systematic
error and is likely a better representation of the random coil than that obtained using a
single conformation for the tripeptide.
In cdculations for unfolded proteins, the test atoms are ne&r atoms only. In compute-
tions for folded molecules, the test atoms include both near and long atoms. Because near
and long test atoms axe considered on equal terms in determining which test atom is given
credit for occluding a test point, the area assigned to ne8r 8toms in calculations for a
folded protein is in general less than that occluded by the same near 8toms in the unfolded
Van der Wadd radii
A-f38 of
SOlveted Sphere
All nitrogen: -N-, -NH, ---NH,, --NH3 + I.6 106
Non-rrromatiooarbon: -CH-, -CH%, -CH, 2.0 145
Arom8tic ottrbon: =CH, =C-- 1.86 133
ofthsztwo zinc atom within the insulin hexamer
A. Zn2+ (0.0, 0.0, +8*l)t: Total area exposed to solvent in hexaxner= O-6Aa
Extents of coma&$
Dimer I Dimer II Dimer III
B Zn2+ (0.0, 0.0, -8O)t: Total area exposed to solvent in hexamer = 8.2 Aa
Extents of contact$
Dimer I DixnerII Dimer III
(Aa) P) (A2)
BlO His CEl 3.8 2.6 3.8
BlO His NE2 10.6 10.6 11.3
BlO His CD2 2.6 2.6 1.9
The input for a computation consists of the Cartesian co-ordinates of all atoms heavier
than hydrogen and for each such atom the residue number and a designator of the atom
typet. The co-ordinates for the tetragonal lysozyme crystal structure were obtained from
D. C. Phillips and colleagues (Blake et al., 1967; Blake, 1967) and those for the rhombo-
hedral 2 zinc insulin crystal structure1 from D. C. Hodgkin and colleagues (Blundell et
al., 1971). Both co-ordinate sets are those that were current in 1970 and both had been
refined by the method of Diamond (1966). We emphasize that the co-ordinates used in
Contact information for~ly.sozyme
The first line of each block gives the residue number and name of the oentral residue. The follow-
ing lines list in desaending order of signi&xnce all residues containing long atoms that occlude
surface of the central residue. The values are residue number, residue name, and surface area (Aa)
occluded on the central residue. Because only long atoms are considered, the area oooluded by a
residue adjacent to a central residue represents oontaots of only side chain atoms of the edjawmt
FIG. 4. Ooi plot for the insulin dimer of contacts between residues. Letter symbols give the
extent that an abscissa residue occludes area of an ordinate residue. Each increase in alphabet
stands for 15 da of occluded surface. The right-hand column gives the surface area (As) exposed
to solvent times l/30. Only long atom contacts are included in the sums; thus, the diagonal is
blank. The following gives the correspondence between the standard residue designations of Table 5
and those used in this Figure: -41 to A21 = 101 to 121; Bl to B30 = 201 to 230; A*1 to A*21 =
301 to 321; B*l to B*30 = 401 to 430. Regions of the plot corresponding to each chain are deline-
ated. Contacts within monomer units are given in the upper left and lower right quadrants.
Contacts between the monomers are shown in the upper right and lower left quadrants. Lines along
the diagonal indicate a-helical sections. The two enclosed regions off the diagonal represent the
j3-structure at the monomer-monomer interface.
NAG C’ 36 3 35 69 1 48 44 82 0 58 54 62 120 92
43 2 35 98 2 74 37 97 3 61 34 88 86 122
NAG A 20 10 49 143 38 50
18 9 27 185 46 58
NAG B 112 81 18 50 37 11 3
93 70 19 69 57 20 3
NAG C 5 21 68 56 68 68 62 90 95
8 22 46 96 89 7% 56 134 58
NAG D 31 110 93 8 76 6 53 9 50 57
32 11% 120 7 85 3 63 17 39 63
NAGF 6 64 34 16 24 3 74
4 94 26 7 14 0 99
Fru. 6. Contacts of the N-acetylglucosamine hexasaooharide and the OL-anomerof N-acetylgluoosamme with residues oflysozyme. The ac-anomer binds “anomalously”
with some contacts like those of the unit of the hexasaccharide bound at site C (Blake et aE., 1967). Contacts for hexasaccharide bound at sites A to F am given
separately. The upper number of each pair gives the area (As) of the protein residue occluded by the saooharide unit; the lower number gives the converse. NAG,
Polar Charged Non-polar
All Polar Charged Non-polar Uackbone Side chain
side chain side chain side chain
A Lysozyme
Unfolded 21,723 6176 2466 13,082 6840 15,884 3777 5141 6966
Native 6583 1811 1261 3511 1599 4984 1564 2302 1118
Hexasacoharide complex 5919 1586 1128 3205 1462 4457 1395 2162 900
Crystal lattice 4786 1261 944 2581 1157 3629 1064 1659 906
B Lysozyme diflerences
Unfolded-native 16,140 4364 1205 9571 4241 10,900 2213 2839 5848
complex 664 225 133 306 137 527 169 140 218
Native-lattice 1797 550 317 930 442 1355 500 643 212
c Insulirz
Unfolded dimor 17,348 4178 1954 11,215 4507 12,841 2565 4212 6065
Monomers 7334 1642 1278 4459 1557 5777 1420 2508 1849
Dimer 6023 1346 1169 3510 1245 4778 1249 2053 1477
Dimer in hexamer 4585 1130 8.59 2595 1017 3568 1119 1648 801
Dimer in lattice 3057 766 506 1784 739 2317 791 978 549
D .Irwulilz differem-
Unfolded dimer-native
dimer 11,325 2833 785 7705 3262 8063 1316 2159 4ii8S
Monomers-dimer 1311 297 109 949 312 999 171 455 372
Dime+dimer in hexamer 1438 215 310 915 228 1210 130 405 676
Dimer in hexamer.-dimer
in lattice 1528 364 353 811 278 1251 328 670 262
(d) Xzcmnzarytabulations
Table 7 gives results for lysozyme and insulin summed over classes of atoms (all
atoms ; polar, charged, non-polar; backbone, side chain) and over types of side chains
(polar, charged, non-polar). The side chain categories are specified as follows : charged,
those containing groups that bear charge at any pH in the range 0 to 12 ; polar, those
containing polar but no charged atoms; and non-polar, those containing only
non-polar atoms, and tryptophan, methionine and cystine. The values presented in
Table 7 are the areas exposed to solvent and in sections B and D changes in exposed
area (areas are in A”).
4. Discussion
(a) Comparison with results of Lee & Richards (1971)
The van der War&’ radii used in these computations (Table 2) differ significantly
from those of Lee & Richards (1971) in particular for side chain atoms for which Lee
& Richards use the uniform value of 1.8 8. The values of the static accessibility?
calculated for lysozyme from column 2 of Table 4 and the surface areas of Table 2
are very close to the values for lysozyme listed by Lee & Richards. If areas rather
than ratios of areas are considered, differences due to changes in radii become appar-
ent. Nevertheless, general conclusions drawn from the computations remain essentially
unaltered by the changes in radii. For example, Lee & Richards made the striking
point that a large fraction of the total surface of globular proteins is comprised of
non-polar atoms in the folded as well as in the unfolded state. The data of Table 7
(compare columns 1 and 4) confirm this conclusion; non-polar atoms constitute
0.53 and 0.60 of the lysozyme surface for the folded and unfolded molecules, respec-
tively. Because in the present calculation the van der Waals’ radii assigned to non-
polar atoms are larger than those assigned to polar atoms, the above fractions are
eaoh about 0.1 greater than those determined by Lee & Richards. The salient point is
that in spite of the crude model used in the computations, exposure values have semi-
quantitative reliability and trends within self-consistent sets of results appear to be
In explanation of the considerable non-polar surface in the folded state, exami-
nation of Table 7 (compare columns 4 and 9) shows that a relatively high proportion
(approx. two-thirds) of the non-polar surface of the folded molecules is associated with
non-polar atoms that are part of polar or charged side chains, e.g. the methylene
carbons of lysine. The extent to which the surface exposed in the unfolded state
becomes buried on folding is two to three times greater for non-polar residues than
for polar residues (charged and uncharged). This observation is consistent with the
“oil-drop” model of protein folding.
Cavities within the lysozyme structure located by the graphics display of Lee &
Richards (1971) do not exist according to the present calculations. This reflects the
t Defined by Lee & Richards (1971) as 100 x area of solvated sphere exposed to solvent/total
mea of solvated sphere.
(f ) Structural assumptions
It is assumed that computations based on the crystallographically defined co-
ordinates are relevant to solution properties. Two aspects of this assumption should
be discussed. First, uncertainty in the crystallographic results is generally estimated
at 0.5 b for protein molecules studied at 2 to 3 A resolution. In order to investigate
this difficulty, a random error was introduced into the Cartesian co-ordinates of
each atom of lysozyme using a Gaussian probability distribution that gave an arith-
metic average movement for each atom of O-39 A. Computations with this perturbed
set of co-ordinates show no significant changes in contacts between residues, i.e. very
few changes greater than 5 to 10 A2 in area of contact. Exposure of individual atoms
to solvent is also only slightly affected by this co-ordinate perturbation; atoms that
are completely buried according to the unperturbed calculations remain so and atoms
exposed to solvent undergo changes in exposure of approximately 5 A2. The exposure
summed over classes of atoms (as in Table 7) changes by less than 5%. Thus, the con-
clusions drawn from exposure and contact computations of the kind described here
are not sensitive to considerable error in. the co-ordinates.
Second, the conformation of a protein molecule in the crystal may differ from that
in solution. This problem has been considered by many workers (see review by Rupley,
1969). The time-average conformation of a protein as reflected in equilibrium
properties appears to be unaffected by crystallization. Surface side chains involved in
lattice contacts can be expected to undergo perturbation if they are relatively
unrestricted in solution.
X-ray studies of complexes of lysozyme with the /3(1-+4)-linked monomer, dimer
and trimer of N-acetylglucosamine have provided co-ordinate information for the
saccharide moieties binding at sites A through C. The co-ordinates for the moieties
binding at the remaining sites (for N-acetylglucosamine hexamer) are derived from
model building. A few of the side chains of the protein residues that are involved in
the binding of hexasaccharide (see Figs 2(a) and 6) also participate in lattice contacts
(see Fig. 2(b)).
The co-ordinates of the free insulin monomer and dimer are assumed. to be the same
as those in the hexamer. In the hexamer the first few residues of the B and B’ chains
are buried in the adjacent dimers. In the free dimer these residues are possibly
folded back onto the surface. Thus, the actual areas exposed to solvent in the free
dimer are presumably less than the computed values and the calculated changes in
exposed area brought about by the association of dimers to form hexamer are only
(g) Concluding remarks
Exposnre values for atoms can be summed in different ways, e.g. to describe
exposure of chromophores or other side chain elements of a protein. Values of this
kind based on the present calculations have been used (see review by Imoto et at.,
1972) for examining free energies of association reactions and for understanding
perturbations of ionizable groups and perturbations of chromophores. Exposure and
contact information define environment more precisely than terms such as “par-&&y
buried ‘I.
Lee 6%Richards (1971) have discussed the limitations in applying exposure compu-
tations that are based on the relatively crude model of hard-sphere atoms and on
the equilibrium structure determined by X-ray diffraction. In particular, conclusions
related to rate processes must be made with oaution. Nevertheless, the use of exposure
calculations is justified by the need to summarize structural information objectively
and semi-quantitatively and by the advantages of concise tabulations and graphical
We are grateful to Professor D. C. Phillips and his colleagues for the hospitahty they
extended and the encouragement they offered in the early stages of this work. We are
also indebted to Professor D. C. Hodgkin and her colleagues for giving us the co-ordinates
of insulin and for discussions on this structure. This work was supported by the American
Cancer Society, the National Institutes of Health and the University of Arizona Computer
Center. One of us (A. S.) thanks the National Institutes of Health for support in the form
of a postdoctoral fellowship from 1971 to 1973.
Blake, C. 6. F. (1967). Proc. Roy. SOL, London (ser. B), 167, 435-438.
Blake, C. C. F., Johnson, L. N., Mair, G. A., North, A. C. T., Phillips, D. 6. & Sarma, V. R.
(1967). Proc. Roy. SOL, London (ser. B), 167, 378-388.
Blundell, T. L., Cutfield, J. F., Cutfield, S. M., Dodson, E. J., Dodson, G. G., Hodgkin,
D. C., Mercola, D. A. & Vijayan, M. (1971). Natzcre (London), 231, 506-511.
Bondi, A. (1964). J. Phys. Chem. 68, 441-451.
Diamond, R. (1966). Acta Crystallog. 21, 253-266.
Diekerson, R. El. (1972). Annzc. Rev. Biochem. 41, 815-842.
Imoto, T., Johnson, L. N,, North, A. C. T., Phillips, D. C. & Rupley, J. A. (1972). The
Enzymes, 7, 665-868.
Lee, B. & Richards, F. M. (1971). J. Mol. Biol. 55, 379-400.
Nishikawa, K., Ooi, T., Isogai, Y. & Nobuhiko, S. (1972). J. Phys. Soo. (Jupm), 32, 1331-
Pauling, L. C. (1960). 5!‘heNature of the Chemical Bond, 3rd edn. Cornell .University Press,
Ithaca, New York.
Rupley, J. A. (1969). In Strzccture and Stability of Biological Molecules (Timasheff, S. N.
& Fasman, G. D., eds), pp. 291-352, Marcel Dekker, New York.
Venkatachalam, C. M. (1968). Biopolymers, 6, 1425-1436.