Shrake 1973
Shrake 1973
Shrake 1973
Department of ChemGtry
The University of Arizona
Tucson, Arizona 85?21, U.S.A.
1. Introduction
Atomic co-ordinates derived from high resolution crystallographio analyses are
available for more than 30 proteins (Dickerson, 1972). Some method of describing
these structures in a way that allows simple and objective comparisons among them
seems necessary. Particular importance is attached to descriptions of the molecular
surface and the environments of reactive groups because these features should most
closely relate to chemical properties. Lee t Richards (1971) have made an effective
approach to this problem through developing a computer program for calculating
the exposure of protein atoms to solvent. The present report extends this method by
focusing attention on the nature of contacts between atoms. Results are given for
native lysozyme and insulin and for changes in their surfaces that occur during
folding and several association reactions, including crystallization.
2. Methods
Like Lee & Richards (1971), we describe the protein by a set of solvated van der Waale’
spheres. The surface of a sphere is represented by a set of 92 test points that are nearly
uniformly distributed. Each atom of the protein is considered separately as a central atom
that is checked for overlap with all other atoms of the moleculctthe test atoms. The latter
The numbers of residues of esch type included in the averaging (a total of 127 non-terminel residues in lysozyme and 94 in insulin dimer) 8re given in
perentheses following the residue type. The atom designations 8re those used in Imoto et al. (1972). BB and SC stand for the sums over all b&ckbone and
over all side oh&n atoms of the residue, respectively. The first number column gives the average area (A’) exposed to solvent in the peptide model; the
second gives the root-mean-square deviation. Mean velues for terminal residues (Gly-X or X-Gly models) are given at the end of the list.
EXPOSURE CALCULATIONS 363
are divided into two categories. Near tat &ms axe those of a Gly-X-Gly tripeptide model
in which residue X contains the central atom. The tripeptide model for a half-cystine
residue of 8 disulflde includes the SG, CB and CA atoms of the partner half-cystine as
near atoms. M tat atum am ail other test atoms.
The exposure of a particular central 8tom to solvent is the area of the solvated sphere
that contains test points not occluded by test 8toms. Each test point on the surf8ce of a
central atom is considered separately with respect to all the test atoms. The test atom
given credit for occluding 8 test point is determined by the gre8test value for the ratio of
the solvated radius of the test atom to the distance from the test point to the center of
the test atom. The list of intemcting test atoms and the corresponding areas occluded on
8 particular central atom describes the environment of the central atom. This list, which
is the basic output of the computation, is stored on magnetic t8pe for subsequent use in
summations and comparisons.
The Gly-X-Gly tripeptide serves as a model for the environment of a central atom in
the unfolded protein. This model assumes that side chains of adjacent residues in 8n
unfolded chain on the average do not contact the central residue. The conformation used
for the tripeptide is that for the corresponding atoms of the n8tive protein. The area
exposed to solvent for a p8rtioulsr type of atom (or residue) therefore varies for this model
according to its location in the folded molecule. Table 1 gives averages for areas exposed to
solvent in the unfolded state and the corresponding root-mean-square deviations (a) for
each type of atom and (b) for the sums over all backbone atoms and over 8ll side chain
atoms for each type of residue. The smaJl values of the root-mean-square deviations show
that use of the native conformation for the tripeptide model introduces no systematic
error and is likely a better representation of the random coil than that obtained using a
single conformation for the tripeptide.
In cdculations for unfolded proteins, the test atoms are ne&r atoms only. In compute-
tions for folded molecules, the test atoms include both near and long atoms. Because near
and long test atoms axe considered on equal terms in determining which test atom is given
credit for occluding a test point, the area assigned to ne8r 8toms in calculations for a
folded protein is in general less than that occluded by the same near 8toms in the unfolded
model.
TABLE 2
Van der Wadd radii
A-f38 of
I&diust
SOlveted Sphere
(A)
(Aa,
I I
All nitrogen: -N-, -NH, ---NH,, --NH3 + I.6 106
I I
Non-rrromatiooarbon: -CH-, -CH%, -CH, 2.0 145
I I
Arom8tic ottrbon: =CH, =C-- 1.86 133
TABLES
ofthsztwo zinc atom within the insulin hexamer
Contucts
A. Zn2+ (0.0, 0.0, +8*l)t: Total area exposed to solvent in hexaxner= O-6Aa
Extents of coma&$
Dimer I Dimer II Dimer III
B Zn2+ (0.0, 0.0, -8O)t: Total area exposed to solvent in hexamer = 8.2 Aa
Extents of contact$
Dimer I DixnerII Dimer III
(Aa) P) (A2)
BlO His CEl 3.8 2.6 3.8
BlO His NE2 10.6 10.6 11.3
BlO His CD2 2.6 2.6 1.9
The input for a computation consists of the Cartesian co-ordinates of all atoms heavier
than hydrogen and for each such atom the residue number and a designator of the atom
typet. The co-ordinates for the tetragonal lysozyme crystal structure were obtained from
D. C. Phillips and colleagues (Blake et al., 1967; Blake, 1967) and those for the rhombo-
hedral 2 zinc insulin crystal structure1 from D. C. Hodgkin and colleagues (Blundell et
al., 1971). Both co-ordinate sets are those that were current in 1970 and both had been
refined by the method of Diamond (1966). We emphasize that the co-ordinates used in
---- I n
3
L-.
:-
!
-5:
-_‘--
L--__
<
I-
i
!_
.
__ --
-.
_-A
-=*,
_______------ I
*-A
-=-__
s--
---.
./
CT:_
* ___________- 1
awhzosh1
( zg ) pasodxa oalv
TABLE 4
Exposure and environment radts for lysozyme
0 23 7 CG c 28 28 CO 27 22 5
: ZL 20 co1 c 36 33 CG 2 31) 5
32 lb CEI 6 32 k3 CO 24 31 A
c 35 36 cz 22 17 36 HE P 29 10
0 26 29 CEZ 13 27 26
4 22 29 CD2 6 26 26 C2
HH2 322 23 :
14 IO 33 WI 28 22 23
6 10 40
6 *s 43 4b LSN 73 zia 61
0 26 ** CA is 30 A
ib 12 29 N 0 15 15
C t ti 9
38 ibb 136 0 * 26 i7
Ll 52 14 CB 28 32 2
* 26 16 cc D 22 Q
c 21 21 NOD2 26 i8 7
0 21 *9 Nom 2 47 3
35 32 **
2 14 34 36 SE9 7 206 195 47 ,“R 151 36 Jb
CA b 4Ir 36 CA 8 13 6
3 178 177 N 02618 N 0 t *c
[I 32 38 C 0 I3 6
0 29 ZD 0 26 9 0
G 16 22 C9 lb 0 6
6 16 32
* *4 61
0 24 71
1s HIS 43 2LP tr9 2 22 H
CA 9 27 ib c 16 73
31 AL1
CA
9 ALA c 119 t35 N
CA 0 33 58 c
N 0 PA 23 0
C G 16 33 CB 54 GLI 2 163 126
0 0 Ii 43 CA P 57 $9
CB 0 3t 77 32 ALA
CA F C
0 Zb
32 29
23
N 0 b 27 26
0 i lb 39
:z* 0
A 166 77
56
CG1 16 7b
CDI : 35 77
NH2 33 ib 17
NH1 JC 10 li 58 LE” 0 98 *35
CA A I7 5*
22 H 0 13 PA
GLY
CA :: ‘A
27 21
bo C A i7 30
N * 25 9
c 0 i5 20
0 P9 li II
EXPOSURE CALCULATIONS 367
I7 tl.r 19 26.3 223 ooz 0 31 40 2 ii 36 Pi 0 5 b
CL D 32 21 DOi a 22 z5 0 2c 56 C 0 10 9
II 0 Ll zi CC1 0 54 0 0 SO I3
67 GL” 76 6b 31 co1 4: 0 47 GE b 9 30
CA 39 32 6 ct 1 0 0
c" 0 29
8 14
3 79 PRO 79 ii9 iu CB NO02 33 b 3
CI 0 28 30 ot NOD1 39 0 a
0 36 0 7 II I) 721
c i 21 is 9* “IL II151 357 104 GL” 12 91 iP6
68 A:; 120 262 91 0 0 lb 30 CI 0 25 47 CA il 36 S3
is rr 9 co 25 27 il : 0 22
26 L3
IZ H I 20 lb
5b ILE 5 177 323 F 0 i6
is 15
0 co 1.7 b 39
CI 22 CC 4* 6 22 0 0 16 36
CR
& 2,
7 10
i? 27
0 (10 cvs 0 ii3 255 CGZ : :: ::
CG 3 43 3 CI 0 24 47 CC1 0 13 90
co 19 33 3
NE 6 21 3 : 0 IO
17 31
13
CZ 3 25 5
NH2 25 23 21 EB 0 13 40
.?i 60
H"i i6 36 5 SG c 29 55
59 ASN 33 237 ib9
GA 0 32 32 69 THR 3 239 221 6i SER 75 7z 135
H I 17 23 C1 0 25 33 CA Z 16 36
N 0 14 i* t 13 21)
C 0 17 25 c" 0 16 16
0 1 er 21 &I 49
6 14
8 19
27
CB 0 *9 38
ct2 P 49 56 OG 16 5 13
OGl 0 31 27
82 ALA 33 134 137
70 PRO 131 54 61 CI ib 35 28
Cd 13 3 9 N 0 34 6
N 0 5 9 C 0 16 17
ifI 16
1 21
26 63
PO 95 ALA 0 114 247
CA ‘ 32 62
83 LE" 0 240 3t9 N 0 26 t*
CA : 36 35 C 0 a 29
61 ARG 84 176 322 0 G 14 *4
CA 0 2c 47 ! 0 22
25 10
17 CE L 22 88
N t zi to ,"B 0 32 24
c 0 825 96 LIS b, 252 340
c C 16 14 CG : 4: ix CA 0 39 47
0 9 zz 19 co* 0 38 68
coi C 27 68 ," c
f 28
25 28
16
72 StR 3t 159 tot :9 62 33
22 39
30
CA 0 36 17 84 LE" 22 206 268
N F 23 14 CI 0 2.8 25 CG L 27 5r
C 0 7 13 N 0 26 10 CO I3 zc 49
0 21 7 11 C G 15 14 CE 5 31 60 CEI i 9 4b
CB 11 49 2, NZ 21 26 17 cz1 IO 74
OG D 36 20 :, 2
9 *i
26 22
CG G 33 49 119 169 146 c":, :
0 17
LO 71
69
73 ARG 147 126 102 co2 30 16 51 3 Pf+ 30 CEZ 0 13 4"
CA 13 14 2, CO1 0 38 58 CO2 Y 3 30
N 0 16 13
C G 9 26 85 SER 61 1~7 76 109 “AL 99 135 74
0 0 26 26 C1 17 ,?d 9 CA 3 25 16
:," 259 14
13 li
30 c 2* F 0
1 20
lb I73
c" 15 ;
CO 14 1r 24 0 : 19 22 0 0 20 21
z: it230 6 D CR 39 27 17 CB 13 6 0
06 4 33 it CG2 36 17 9
NH2 55 0 0 4b ILE 12 100 320 CGl 46 25 t
NM 17 6 24 b6 SER 69 50 164 CA 0 25 33
CA 2 16 41 N Li 23 15 110 ALA 11 133 184
74 ASN 47 165 119 N i 11 10 c c I7 13 c* 3 32 49
63 TRP 43 zro 559 CA 0 22 27 c I225 4 26 15
CA 0 33 35 N 2 3 i6 . &I c 21 41 F 0 17
2.3 29
21
C D 17 13 CG2 6 14 6Z 0 ‘ 20 30
," : :: :," 0 16 20 9 OG 21 I* 15 CGI 0 25 62 CLI L) 38 55
0 35 26 C4 24 13 36 COi z 2b *I
CB : 39 36 CG 0 23 25 iii TRP iZ 374 546
CG 0 20 39 99 YAL 3 157 316 CA 0 39 30
CD1 3 12 63 CA 0 33 32
NE 8 0 45 H c 20 23 2 0 24
I.¶ 16
25
CEI *: 0 66 75 LE" 9b 107 168 c 0 15 15 0 ! 30 19
CZ1 1 50 CA 3 is i4 0 3 ib t9 c"G" 0 26
27 36
52
cn 10 14 50 N 0 26 6 co II 25 66
CZ2 ‘ 19 50 C L 13 14 CGZ 0 28 65 CO1 0 36 48
cc.2 c 20 38 cti 0 17 87 2:1 D
C 22
25 46
36
co2 E 9 38 COB 112 14
11 20
3s
CG 6: II 26 bb ILE 5 122 315 100 SER 34 202 IOt Cl1 9 17 56
64 t"S 0 i4t 177 CO2 0 2 CA E 2, 21 CA 0 k9 22 cn 3 32 53
CA b 36 211 CDi 6 16 5t N 0 16 3 N 0 24 lb CIZ 35 b.¶
C 0 16 ti CEZ : 25 42
F c
0 17
23 23
16 76 CYS 20 113 183 0 0 16 30 COP 0 19 29
0 8 20 16 CA 2 25 36 CB 0 2 4i
CR 0 25 43 N 0 26 e0 CG.? 0 it 71 112 ARG iO.? 267 237
SG 0 2; 50 C 0 21 23 CGi 5 *7 44 CA 0 35 38
0 12 3 31 CD1 0 17 84 101 ASP 108 80 46 N 0'24 *a
65 ASN 57 162 171 CB 6 16 31) CA H 22 6 C ‘ 20 i4
CA 0 39 33 SG 6 20 40 89 THR 6‘ 201 124 N 0 23 .3
N 0 30 is CA D 35 35
N 0 24 17
C : to 14
0 2, t*
:z 4: :: ::
OGl 7 31 7
ioe GL” 6i 104 ti
66 ASP 35 181 171 90 ALA 23 62 191 CA 30 *3 6 113 A;: 106 98 140
CA I3 30 17 CA 9 17 46
N 0 24 9 H 0
0
10
i,
30
z5
c" (I
0 16
29 9
3 N 6 :: ::
B
C 1lI 3 C cl 31 14 0 c i ii I3
0 II 5 iz 0 t it 39 0 32 b 7
3 N i i6 ii CB ii 25 56 103 PSN 8, 69 66 21 16 27
St 0 :: :; c
0
0
6
16
*9
30
25
CA 6 6 13
A. SHRAKE AND J. A. RUPLEY
24 43
35 36
The velues given for each atom are from left to right (columns 2 to 4): the area (Aa) exposed to
solvent, the area occluded by polar long atoms end the area occluded by non-polar long stems,
respectively. The oorresponding sums of these areaa over all atoms of each residue we given on the
fist line of each block.
this paper are preliminary and are being refined in the crystallographic laboratories. The
effect of uncertainty in the co-ordinates is discussed in the Discussion (section (f)).
All computations were carried out on the CDC6400 computer of the University of
Arizona Computer Center except for preliminary work, which was done on the Argus
system of the Laboratory of Molecular Biophysics of Oxford Universityt. Run times on
the CDC6400 were approx. 5-6 and 4.2 mm for lysozyme and insulin dimer, respectively.
3. Results
(b) Exposure and change in exposure of backbone and side chain elements
Figure 1 shows the area exposed to solvent for the backbone and side ohain of eaoh
residue of native lysozyme and insulin dimer. The values plotted are summations
over the appropriate atoms of the results given in Tables 4 and 5. Graphs of this kind
are a convenient way to present the changes in exposed surface area that follow from
association reactions. Figure 2 shows the changes developed through binding of the
t During tenure of a Speaial Fellowship from the National Jnstitutes of Health held by J. A.
Rupley.
AA A GA T L
K I 5 18 r e
on ; nn P ”
T
I
I
5 ” P
&
$ 501 :
L , :1
- 9 0 k_____________________-_______!b._*__
____h .___
___ _____
______ .~~__
N
Qab. -__- ..i_l pl~_L__^IAi 1 .~ ip~m,~
_~1~.1_1._.i- _~J_1
IO 20 30 40 50 60 70 00 90 100 110 12’0
IO 20 30 40 50 60 70 80 90 100 I IO 120
Lysozyme residue number
(b)
A
s
”
P? s 7 :7
e
L G
e
” ”
i
eVn rHU “V ‘G T
:
lb)
(4
insulin residue number
E 3 LSH 75 53 233 ‘ 24 *6
C. fi 6
23
23
33
25
,” IA 36
ii c 0 : 1.3 34
CB 9 21 73
CB G 16 71 6 7 cvs 57 31 73
SG u 13 611 C1 2 22
N * i i.3 CR 9 16 II
* 7 % 25 is* 11.3 ,” 0 26 13 c c 3 I.3
0 39 22 0 i6 15
N
c
0 29 13
i; 2.9 11
EC4 17
17
14
21
16
21
& t'
. 30
*c 32
19 CG 6 13 7
HUDl 22 13 20 e 6 GLY 36 .66 122
56 25 9 16 NO02 32 13 7 CI 32 29 41
s 56
9 30
29
23
9
102
z
0
5 :: :;
0 16 34
e 9 SER
CL
36
c
6’4
lb
196
*9
920 GLY
CA
3G
26
Ok
36
35
96
26
II
ce 16 I7 17 ce 0 21 56 F Y 9 16 A9’ i GL” 76 60 93
mt 15 16 I5 CG 6 14 *5 0 c 9 3, CA 36 2, 28
CC2 65 14 6 CD1 6 i” 55 NEW 23 13 ill
CE1 1‘ ill 93 Btl GL” 131 6t 96 C c 15 *2
A 9 SER 89 72 89 C?. 6 23 36 0 17 7 25
CL 5 21 21 GEE I 36 45
CO2 t .16 b2 619 HIS it9 193 121
: 0 20
1C iI21 OH 6 32 *9 CPI 3 35 22
0 1: 7 26 H c 9 15
CB 36 14 13 620 CIS 16 155 163 CG 25 3 5 c I *c 21
Ot 36 6 0 CA ‘ 36 35 0 * 19 26 CD 5 0 0 0 t 2, 25
16 33 16 K2 02 13
14 **
35
6lJ ILE 64 232 ii6 ! 6 ii21 ZP
20
CA 0 *4 it 0 13 9 24 CGl c 24 69
CE1 56 12 co* 9 33 69
: ; :: :
A- 3 “AL 21 i35 266
&I 9 24
26 1: CL 2 I* 38
N c 9 26
::: 260 2I
33 =c
33
COI 47 17 30
75 206 i6i
9i3 GLU 51 172 255 3 35 25
CL 6 25 43 N 0 26 Li
: Y? 29 29
c” *6 24
26 26
23 2-4 22
CEI 0 9 75 C6 29 z-8 a*
EXPOSURE CALCULATIONS
0 0 26 28 H OEt 17 2
CR 0 33 hi &I c 25
16 21
49 I: :. :: :,”
SG G &* 59 CG i 27 36 0 : is 32 8.14 ALA 29ll
CB i 19 32 CA 42 P-24
Ei
WE1
0
25
2,5
1
4,
23
14
z ::
NOE2 2c 3 43 c”9 ::
9’15 622
49
20
24
25
i 54
c 76
c 96 8.25
79
38,
35
24
15
29
5L
39
39 ra
AVi9 7*4 62 192 355 1: 12 30 42
CA 6 22 25 3 22 2;
,” i 1s
14 11 0 29 30 a62
L 21 1 10 53
0 19 20 El’ 7 CYS 5L 50 99 tr 16 it
C’I : 20 36 CA 2 5 19
CG c 13 26 N c 2 26 LEV 126 127 tzli
co1 6 23 23 C 614 CA 5 P5 Ilr
Gil 19 1* ** & 32t 14c 25 N 25 15
CI 3 3 39 C 14
CE2 3 1" 56 SG 13 23 : 0 * ,"
C"2 it *II CB 9 9
OH *; 9 2(' CG 21 I]
46 16 25
30 13 35
39 lY7 159
9 21 6
17 15
1* 2
A.12 SER 2: 171 166 SG ‘ 38 39 5 T
CA in 39 35 32
N L 29 14 A*21 A'S!
CA
*3
16
5:
66
N c 24 18
C 2 16 13 96 119
OEND iG 1, 25 19 33
OCND'26 25 * N 15 2"
.C6 32 22 6 c 0 2 11
CG I ii 23 0 10 6 2*
NOOi 1 i7 39 L k2 43
NOW 16 8 22 :G” 0 32 49
GL" 27 I3 102
CA 27 27 33
‘ 22 2E
1,
:: 32
Wll LE"
CA
3 169 491
i 32 *, GL” 24 79
& 11
N c 29 22 CA 6 2i CG 2:
C 24 25 6 CO 21
0 s .21 29 6 CE de6
CR 2” 9 13 CB 3 28 PI 3 llz 33
CG 3 9 CG L 13 62
co1 19 4 6 CO1 0 ii a7 ALA 113 61
CEI 33 4 14 9* 2 "AL 122 38 73 CO2 * li 88 CA 2 16
cz 7 : 26 CA 5 3 9 H c 6
CEZ I, 6 2, N i 0 13 c 3
CD2
OH
ic
28
0
J
33
22
C 0 6 10 OENO 1: 13
0 2 12 34 215 DEW 3* 5
The insulin monomer aonsists of two polypeptide ahsine, A and B. The polypeptide chaina of
the aeoond monomer unit of the dimer are distinguished by aslterisks. See Tcbble4 for additional
description.
TABLE 6
Contact information for~ly.sozyme
The first line of each block gives the residue number and name of the oentral residue. The follow-
ing lines list in desaending order of signi&xnce all residues containing long atoms that occlude
surface of the central residue. The values are residue number, residue name, and surface area (Aa)
occluded on the central residue. Because only long atoms are considered, the area oooluded by a
residue adjacent to a central residue represents oontaots of only side chain atoms of the edjawmt
residue.
E
Et = 0
:c
6 a
14 3
1B 2 25
35 2 15
12 25
18 27
133 18 2
125 B 2
6 2 i% 10 0 1‘J 38 61
i 1 71 II 2 10 40 .5E
16
21
41
43
1
3 Id 50 43 47 17 3 :
15 11 38 44 26 a !
13 $6
54 00
EXPOSURE CALCULATIONS 366
FIG. 4. Ooi plot for the insulin dimer of contacts between residues. Letter symbols give the
extent that an abscissa residue occludes area of an ordinate residue. Each increase in alphabet
stands for 15 da of occluded surface. The right-hand column gives the surface area (As) exposed
to solvent times l/30. Only long atom contacts are included in the sums; thus, the diagonal is
blank. The following gives the correspondence between the standard residue designations of Table 5
and those used in this Figure: -41 to A21 = 101 to 121; Bl to B30 = 201 to 230; A*1 to A*21 =
301 to 321; B*l to B*30 = 401 to 430. Regions of the plot corresponding to each chain are deline-
ated. Contacts within monomer units are given in the upper left and lower right quadrants.
Contacts between the monomers are shown in the upper right and lower left quadrants. Lines along
the diagonal indicate a-helical sections. The two enclosed regions off the diagonal represent the
j3-structure at the monomer-monomer interface.
NAG C’ 36 3 35 69 1 48 44 82 0 58 54 62 120 92
43 2 35 98 2 74 37 97 3 61 34 88 86 122
NAG A 20 10 49 143 38 50
18 9 27 185 46 58
NAG B 112 81 18 50 37 11 3
93 70 19 69 57 20 3
NAG C 5 21 68 56 68 68 62 90 95
8 22 46 96 89 7% 56 134 58
NAG D 31 110 93 8 76 6 53 9 50 57
32 11% 120 7 85 3 63 17 39 63
NAGF 6 64 34 16 24 3 74
4 94 26 7 14 0 99
LyKW,ylne
Fru. 6. Contacts of the N-acetylglucosamine hexasaooharide and the OL-anomerof N-acetylgluoosamme with residues oflysozyme. The ac-anomer binds “anomalously”
with some contacts like those of the unit of the hexasaccharide bound at site C (Blake et aE., 1967). Contacts for hexasaccharide bound at sites A to F am given
separately. The upper number of each pair gives the area (As) of the protein residue occluded by the saooharide unit; the lower number gives the converse. NAG,
N-aoetylglucosamine.
- -- -- - __~.~___ I______
Polar Charged Non-polar
All Polar Charged Non-polar Uackbone Side chain
side chain side chain side chain
_”
A Lysozyme
Unfolded 21,723 6176 2466 13,082 6840 15,884 3777 5141 6966
Native 6583 1811 1261 3511 1599 4984 1564 2302 1118
Hexasacoharide complex 5919 1586 1128 3205 1462 4457 1395 2162 900
Crystal lattice 4786 1261 944 2581 1157 3629 1064 1659 906
B Lysozyme diflerences
Unfolded-native 16,140 4364 1205 9571 4241 10,900 2213 2839 5848
Native-hexasaccharide
complex 664 225 133 306 137 527 169 140 218
Native-lattice 1797 550 317 930 442 1355 500 643 212
c Insulirz
Unfolded dimor 17,348 4178 1954 11,215 4507 12,841 2565 4212 6065
Monomers 7334 1642 1278 4459 1557 5777 1420 2508 1849
Dimer 6023 1346 1169 3510 1245 4778 1249 2053 1477
Dimer in hexamer 4585 1130 8.59 2595 1017 3568 1119 1648 801
Dimer in lattice 3057 766 506 1784 739 2317 791 978 549
D .Irwulilz differem-
Unfolded dimer-native
dimer 11,325 2833 785 7705 3262 8063 1316 2159 4ii8S
Monomers-dimer 1311 297 109 949 312 999 171 455 372
Dime+dimer in hexamer 1438 215 310 915 228 1210 130 405 676
Dimer in hexamer.-dimer
in lattice 1528 364 353 811 278 1251 328 670 262
__I__ - ~~-~.1”~~..--“-“9_a^-“.~.-~__~,~,
368 A. SHRAKE AND J. A. RUPLEY
(d) Xzcmnzarytabulations
Table 7 gives results for lysozyme and insulin summed over classes of atoms (all
atoms ; polar, charged, non-polar; backbone, side chain) and over types of side chains
(polar, charged, non-polar). The side chain categories are specified as follows : charged,
those containing groups that bear charge at any pH in the range 0 to 12 ; polar, those
containing polar but no charged atoms; and non-polar, those containing only
non-polar atoms, and tryptophan, methionine and cystine. The values presented in
Table 7 are the areas exposed to solvent and in sections B and D changes in exposed
area (areas are in A”).
4. Discussion
(a) Comparison with results of Lee & Richards (1971)
The van der War&’ radii used in these computations (Table 2) differ significantly
from those of Lee & Richards (1971) in particular for side chain atoms for which Lee
& Richards use the uniform value of 1.8 8. The values of the static accessibility?
calculated for lysozyme from column 2 of Table 4 and the surface areas of Table 2
are very close to the values for lysozyme listed by Lee & Richards. If areas rather
than ratios of areas are considered, differences due to changes in radii become appar-
ent. Nevertheless, general conclusions drawn from the computations remain essentially
unaltered by the changes in radii. For example, Lee & Richards made the striking
point that a large fraction of the total surface of globular proteins is comprised of
non-polar atoms in the folded as well as in the unfolded state. The data of Table 7
(compare columns 1 and 4) confirm this conclusion; non-polar atoms constitute
0.53 and 0.60 of the lysozyme surface for the folded and unfolded molecules, respec-
tively. Because in the present calculation the van der Waals’ radii assigned to non-
polar atoms are larger than those assigned to polar atoms, the above fractions are
eaoh about 0.1 greater than those determined by Lee & Richards. The salient point is
that in spite of the crude model used in the computations, exposure values have semi-
quantitative reliability and trends within self-consistent sets of results appear to be
meaningful.
In explanation of the considerable non-polar surface in the folded state, exami-
nation of Table 7 (compare columns 4 and 9) shows that a relatively high proportion
(approx. two-thirds) of the non-polar surface of the folded molecules is associated with
non-polar atoms that are part of polar or charged side chains, e.g. the methylene
carbons of lysine. The extent to which the surface exposed in the unfolded state
becomes buried on folding is two to three times greater for non-polar residues than
for polar residues (charged and uncharged). This observation is consistent with the
“oil-drop” model of protein folding.
Cavities within the lysozyme structure located by the graphics display of Lee &
Richards (1971) do not exist according to the present calculations. This reflects the
t Defined by Lee & Richards (1971) as 100 x area of solvated sphere exposed to solvent/total
mea of solvated sphere.
EXPOSURE CALCULATIONS 189
(f ) Structural assumptions
It is assumed that computations based on the crystallographically defined co-
ordinates are relevant to solution properties. Two aspects of this assumption should
be discussed. First, uncertainty in the crystallographic results is generally estimated
at 0.5 b for protein molecules studied at 2 to 3 A resolution. In order to investigate
this difficulty, a random error was introduced into the Cartesian co-ordinates of
each atom of lysozyme using a Gaussian probability distribution that gave an arith-
metic average movement for each atom of O-39 A. Computations with this perturbed
set of co-ordinates show no significant changes in contacts between residues, i.e. very
few changes greater than 5 to 10 A2 in area of contact. Exposure of individual atoms
to solvent is also only slightly affected by this co-ordinate perturbation; atoms that
are completely buried according to the unperturbed calculations remain so and atoms
exposed to solvent undergo changes in exposure of approximately 5 A2. The exposure
summed over classes of atoms (as in Table 7) changes by less than 5%. Thus, the con-
clusions drawn from exposure and contact computations of the kind described here
are not sensitive to considerable error in. the co-ordinates.
Second, the conformation of a protein molecule in the crystal may differ from that
in solution. This problem has been considered by many workers (see review by Rupley,
1969). The time-average conformation of a protein as reflected in equilibrium
properties appears to be unaffected by crystallization. Surface side chains involved in
lattice contacts can be expected to undergo perturbation if they are relatively
unrestricted in solution.
X-ray studies of complexes of lysozyme with the /3(1-+4)-linked monomer, dimer
and trimer of N-acetylglucosamine have provided co-ordinate information for the
saccharide moieties binding at sites A through C. The co-ordinates for the moieties
binding at the remaining sites (for N-acetylglucosamine hexamer) are derived from
model building. A few of the side chains of the protein residues that are involved in
the binding of hexasaccharide (see Figs 2(a) and 6) also participate in lattice contacts
(see Fig. 2(b)).
EXPOSURE CALCULATIOKS 371
The co-ordinates of the free insulin monomer and dimer are assumed. to be the same
as those in the hexamer. In the hexamer the first few residues of the B and B’ chains
are buried in the adjacent dimers. In the free dimer these residues are possibly
folded back onto the surface. Thus, the actual areas exposed to solvent in the free
dimer are presumably less than the computed values and the calculated changes in
exposed area brought about by the association of dimers to form hexamer are only
approximate.
(g) Concluding remarks
Exposnre values for atoms can be summed in different ways, e.g. to describe
exposure of chromophores or other side chain elements of a protein. Values of this
kind based on the present calculations have been used (see review by Imoto et at.,
1972) for examining free energies of association reactions and for understanding
perturbations of ionizable groups and perturbations of chromophores. Exposure and
contact information define environment more precisely than terms such as “par-&&y
buried ‘I.
Lee 6%Richards (1971) have discussed the limitations in applying exposure compu-
tations that are based on the relatively crude model of hard-sphere atoms and on
the equilibrium structure determined by X-ray diffraction. In particular, conclusions
related to rate processes must be made with oaution. Nevertheless, the use of exposure
calculations is justified by the need to summarize structural information objectively
and semi-quantitatively and by the advantages of concise tabulations and graphical
display.
We are grateful to Professor D. C. Phillips and his colleagues for the hospitahty they
extended and the encouragement they offered in the early stages of this work. We are
also indebted to Professor D. C. Hodgkin and her colleagues for giving us the co-ordinates
of insulin and for discussions on this structure. This work was supported by the American
Cancer Society, the National Institutes of Health and the University of Arizona Computer
Center. One of us (A. S.) thanks the National Institutes of Health for support in the form
of a postdoctoral fellowship from 1971 to 1973.
REFERENCES
Blake, C. 6. F. (1967). Proc. Roy. SOL, London (ser. B), 167, 435-438.
Blake, C. C. F., Johnson, L. N., Mair, G. A., North, A. C. T., Phillips, D. 6. & Sarma, V. R.
(1967). Proc. Roy. SOL, London (ser. B), 167, 378-388.
Blundell, T. L., Cutfield, J. F., Cutfield, S. M., Dodson, E. J., Dodson, G. G., Hodgkin,
D. C., Mercola, D. A. & Vijayan, M. (1971). Natzcre (London), 231, 506-511.
Bondi, A. (1964). J. Phys. Chem. 68, 441-451.
Diamond, R. (1966). Acta Crystallog. 21, 253-266.
Diekerson, R. El. (1972). Annzc. Rev. Biochem. 41, 815-842.
Imoto, T., Johnson, L. N,, North, A. C. T., Phillips, D. C. & Rupley, J. A. (1972). The
Enzymes, 7, 665-868.
Lee, B. & Richards, F. M. (1971). J. Mol. Biol. 55, 379-400.
Nishikawa, K., Ooi, T., Isogai, Y. & Nobuhiko, S. (1972). J. Phys. Soo. (Jupm), 32, 1331-
1337.
Pauling, L. C. (1960). 5!‘heNature of the Chemical Bond, 3rd edn. Cornell .University Press,
Ithaca, New York.
Rupley, J. A. (1969). In Strzccture and Stability of Biological Molecules (Timasheff, S. N.
& Fasman, G. D., eds), pp. 291-352, Marcel Dekker, New York.
Venkatachalam, C. M. (1968). Biopolymers, 6, 1425-1436.