Academia.eduAcademia.edu

First-principles quantum chemistry in the life sciences

2004, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

AI-generated Abstract

The paper discusses advancements in first-principles quantum chemistry and its applicability to biochemical systems, focusing on methods such as ab initio and density functional theory (DFT). With improvements in computational power, these techniques now enable accurate modeling of complex biomolecular systems beyond traditional methods, thus enhancing the reliability of results in life sciences research. The paper provides specific examples from neurotransmitters, peptides, and DNA fragments while discussing the theoretical framework and future potential of these methodologies.

Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First−principles quantum chemistry in the life sciences Phil. Trans. R. Soc. Lond. A 2004 362, 2653-2670 doi: 10.1098/rsta.2004.1469 Email alerting service Receive free email alerts when new articles cite this article - sign up in the box at the top right-hand corner of the article or click here To subscribe to Phil. Trans. R. Soc. Lond. A go to: http://rsta.royalsocietypublishing.org/subscriptions This journal is © 2004 The Royal Society Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 10.1098/rsta.2004.1469 First-principles quantum chemistry in the life sciences By Tanja van M o u r i k Department of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, UK (t.vanmourik@ucl.ac.uk) Published online 24 September 2004 The area of computational quantum chemistry, which applies the principles of quantum mechanics to molecular and condensed systems, has developed drastically over the last decades, due to both increased computer power and the efficient implementation of quantum chemical methods in readily available computer programs. Because of this, accurate computational techniques can now be applied to much larger systems than before, bringing the area of biochemistry within the scope of electronic-structure quantum chemical methods. The rapid pace of progress of quantum chemistry makes it a very exciting research field; calculations that are too computationally expensive today may be feasible in a few months’ time! This article reviews the current application of ‘first-principles’ quantum chemistry in biochemical and life sciences research, and discusses its future potential. The current capability of first-principles quantum chemistry is illustrated in a brief examination of computational studies on neurotransmitters, helical peptides, and DNA complexes. Keywords: quantum chemistry; ab initio calculations; density functional theory calculations; gas-phase biomolecules 1. Introduction As a result of the rapid development of computational quantum chemistry, highaccuracy techniques can be applied to much larger systems than previously envisaged, bringing the area of biochemistry within the reach of electronic structure quantum chemistry. With the ongoing advances in computer architecture and quantum chemical software development, the range of molecular systems that can be studied continues to increase progressively. This paper will focus on the application of ‘firstprinciples’ (ab initio and density functional theory) quantum chemical methods in biochemical and life sciences research. The recent award of the 1998 Nobel Prize in Chemistry to Walter Kohn and John Pople reflects the enormous capability and value of first-principles quantum chemistry in today’s chemistry research. John Pople was honoured for the development of quantum chemical methods, whereas Walter Kohn’s contribution was the development of the relatively new density functional theory (DFT) methodology. One contribution of 17 to a Triennial Issue ‘Chemistry and life science’. Phil. Trans. R. Soc. Lond. A (2004) 362, 2653–2670 2653 c 2004 The Royal Society  Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2654 T. van Mourik First-principles quantum mechanical computations are traditionally not well suited for large molecules, because of their huge computational demand. The results of cheaper methods (such as force-field and semi-empirical methods), however, vary significantly depending on the specific parametrization. As first-principles methods do not require empirical calibration, they are applicable to all molecular systems and properties one may be interested in, even in the absence of experimental data. As a result of this, they have the potential of yielding much more accurate results than the methods traditionally used in the life sciences, thereby greatly enhancing the reliability of computational research in this area. A comprehensive appraisal of first-principles studies in biochemistry and the life sciences is, if at all possible, beyond the scope of this paper. Researchers apply a plethora of different methods, implemented in a large number of quantum chemistry programs, to calculate properties of a wide variety of biomolecular systems. In this paper, I will therefore not attempt to be complete, but instead I will present a selection of interesting research done in this field, and hope that these case studies illustrate the current potential of first-principles methods. I will start with a very brief review of the theory of first-principles methods, after which calculations in three subareas, neurotransmitters, peptides, and DNA fragments, will be discussed. The reviewed research is focused on gas-phase studies at a temperature of 0 K. The paper concludes with an outlook on the future potential of first-principles research in the life sciences. 2. First-principles methods Computational chemistry encompasses many methods, which can be divided into quantum chemical and classical (non-quantum chemical) methods. Classical methods consider only the nuclei of a molecular system, while ignoring the electronic motions. These methods calculate the energy of a molecular system using a force field, which is a parametric function of the positions of the nuclei only. Classical methods are very computationally efficient, and are generally applied to very large molecular systems. In contrast, quantum chemical (or electronic structure) methods do take the motions of the electrons into account—they deal with the computation of molecular electronic structures, and are therefore much more computationally demanding than their classical counterparts. Molecular quantum chemistry attempts to solve the time-independent Schrödinger equation: HΨ = EΨ . Solving this equation yields the wavefunction of a molecular system, from which all its molecular properties can (in principle) be derived. The Schrödinger equation can be solved exactly only for the hydrogen atom, and therefore, all quantum chemical methods are necessarily approximate. In this review, I will consider two groups of quantum chemical methods: the pure ‘ab initio’ methods and DFT. Ab initio calculations use no other input than the Schrödinger equation, a few fundamental physical constants, and the atomic numbers of the atoms present in the molecular system of interest. The simplest ab initio method is the Hartree–Fock (HF) method. In the HF method each electron only sees an average field of the other electrons; local distortions of the electron distribution are ignored. As a result, HF ignores electron correlation, which limits the accuracy achievable with this method. There are many approaches that attempt to include this electron correlation, such as Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences 2655 configuration interaction, coupled cluster and many-body perturbation-theory methods. The basic idea of perturbation theory is that the problem at hand is first solved for a simpler system, after which the solution is adjusted in the direction of the more complicated, true system. Møller–Plesset perturbation theory (Møller & Plesset 1934) considers the HF Hamiltonian to be the undistorted (simpler) system, and the electron correlation is computed as a perturbation. This leads to a series in which each higher-order method should (in principle) be more accurate than the previous one (however, the Møller–Plesset perturbation series does not always converge towards the ‘correct’ result, as first convincingly shown by Olsen et al . (1996)). The second-order method in the Møller–Plesset series, MP2, is one of the most common electron-correlation methods, because of its relative inexpensiveness as compared with other correlated ab initio methods. Another way to include electron correlation is via DFT. DFT is based on the notion that the total energy of a molecular system is a functional of the charge density (Hohenberg & Kohn 1964). The main idea of DFT is to describe a molecular system directly via its density, without first finding the wavefunction. However, the exact form of the universal energy density functional is unknown. The general strategy is therefore to approximate it by various model functionals. One of the most widely used functionals is B3LYP, devised by Becke (1993). The precise form of density functionals is commonly determined by fitting to atomic or molecular data. DFT is therefore strictly speaking not an ab initio method (though it is often labelled as one). In principle, ‘first principles’ is a synonym for ‘ab initio’. In this paper however, I use the term ‘first principles’ to include pure ab initio methods, as well as DFT. DFT is more computationally efficient than correlated ab initio methods and can therefore be applied to larger molecular systems. However, one of the major deficiencies of current density functionals is their inability to correctly account for the dispersion energy—the intermolecular energy contribution arising from the correlation between fluctuations in the electron distributions of neighbouring molecules (van Mourik & Gdanitz 2002)—which makes DFT less suitable for dispersion-dominated interactions (such as stacking interactions and hydrogen bonds to π electron clouds). Both ab initio and (Kohn–Sham) DFT methods generally use basis sets, which are collections of basis functions representing atomic orbitals (AOs). From a linear combination of these AOs, molecular orbitals (MOs) can be constructed. In HFbased methods, the MOs are a representation of the wavefunction, whereas in DFT the so-called Kohn–Sham orbitals are simply a way of representing the density. The expansion of the MOs (or Kohn–Sham orbitals) in a set of AOs is not an approximation if the basis is complete. Practically, however, finite basis sets are used, thereby introducing an additional approximation in the calculations. Particularly for ab initio methods it generally holds that, the larger the basis, the better the approximation, and the choice of basis set is therefore often determined by a trade-off between accuracy and computational cost. 3. First-principles calculations on neurotransmitters (a) Neurotransmitters Neurotransmitters are small chemical messengers that transmit information across the gap (‘synapse’) between nerve cells (‘neurons’). The neurotransmitter’s release Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2656 T. van Mourik NH2 O NH2 O OH NH2 NH2 HO O OH CH 3 OH OH OH GABA acetylkcholine NHCH3 HO OH dopamine NA NH2 N NH2 HO N OH adrenaline tryptamine serotonin Figure 1. The structures of a selection of neurotransmitters. from one side of the synapse to the other is triggered by an electrical impulse travelling along a nerve towards the synapse. The neurotransmitters cross the synapse and are received by a receptor, which is a protein that binds the neurotransmitter and suffers some conformational change as a result of it. The result is either a continued electrical impulse, or an inhibition of it, on the other side of the synapse. The neurotransmitters are a diverse group of chemical compounds ranging from simple monoamines such as glycine, γ-aminobutyrate (GABA) and acetylcholine, compounds containing single aromatic rings (dopamine, noradrenaline and adrenaline) or double (fused) rings (tryptamine and serotonin), to polypeptides such as the enkephalins (see figure 1). A common characteristic of these molecules is their flexibility, leading to a large number of possible conformations. A knowledge of the favoured shape of these molecules is of vital importance, as the shape influences their transport properties and plays an important role in the binding of these biomolecules to their receptor sites. In recent years, much effort has been devoted to illuminate the conformational preferences of neurotransmitters. (b) Studies on neurotransmitters in the gas phase A recent computational study provided a prediction for the structure and function of the β-adrenergic receptor, one of the receptors that bind the neurotransmitter adrenaline (Vaidehi et al . 2002). The novel aspect of this research was that the new set of methods used correctly predicted the structure and binding characteristics of the receptor, starting with nothing but a knowledge of its sequence of amino acids. However, such studies are still out of reach for first-principles quantum chemistry methods. The neurotransmitter-in-receptor study was feasible because it used newly developed computer programs for the prediction of protein structure and ligandPhil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences 2657 binding conformations, in which the interaction between atoms is calculated using force fields. Force fields methods apply a parametric function of the nuclear coordinates to calculate the atom–atom interactions and are much cheaper computationally than first-principles quantum chemistry methods. Even though first-principles studies of neurotransmitters in genuine physiological environments are not yet feasible, the area is not wholly out of reach of firstprinciples quantum chemistry. The strategy of physical chemists has traditionally been to reduce and simplify the molecular problem at hand. True to this approach, researchers have started to study neurotransmitters ‘in the gas phase’. Even though environmental effects play a very important role in modelling biological processes, an understanding of the intrinsic energetics of flexible biomolecules is essential. By studying the biomolecules and their hydrated clusters in the gas phase, the environmental effects are effectively eliminated, allowing the molecule–solvent interaction to be studied in a controlled environment. Comparison with studies in solution can then provide information about the relative importance of the intrinsic and environmental effects. The study of biomolecules in the gas phase can also be considered a first step in a progressive investigation from isolated (gas-phase) molecules, via small hydrated clusters, to completely solvated molecules, to molecules in truly physiological systems, such as the neurotransmitter in its binding site. There are several reasons for the recent upsurge in gas-phase studies of molecules of biological interest. Firstly, the structures of flexible molecules depend on subtle intramolecular interactions. A study on the conformational landscape of serotonin showed that semi-empirical methods, which are computationally much less demanding than first-principles methods, are incapable of accurately predicting the structures and relative stabilities of the different serotonin conformers (van Mourik & Emson 2002), indicating the need for higher-level quantum chemical methods to study flexible biomolecules. Due to ongoing software and hardware developments, it is nowadays feasible to perform accurate first-principles calculations on a large number of possible conformations of molecules of the size of those in figure 1. A second reason for the renewed interest in gas-phase biomolecules is due to developments in electronic, vibrational and microwave spectroscopy techniques, using free-jet gasphase expansions (Levy 1980; Smalley et al . 1974), which allow flexible molecules and their hydrated clusters to be studied in the gas phase at very low rotational and vibrational temperatures (Robertson & Simons 2001; Zwier 2001). The experimental and theoretical studies demonstrate a unique synergic relationship, thereby encouraging collaborative research: assignment of the experimental spectra is crucially dependent on comparison with spectra computed using quantum chemical techniques, whereas the experimental results are indispensable for validation of the theoretical results. As a result, a large number of combined experimental/theoretical papers on biomolecules in the gas phase, including the neurotransmitters tryptamine and serotonin (Carney & Zwier 2000, 2001; van Mourik & Emson 2002), the ephedra (Butz et al . 2001b, 2002), the adrenergic neurotransmitters and their analogues (Alagona & Ghio 2002; Butz et al . 2001a; Graham et al . 1999; Nagy et al . 2003; Snoek et al . 2003a, b) have appeared in the literature over recent years. (c) The adrenergic neurotransmitters As figure 1 shows, the neurotransmitters noradrenaline (NA) and adrenaline (A) have a very similar structure; the only difference is that a methyl group in adrenaline Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2658 T. van Mourik R R H O … O O H O … H H Figure 2. The two hydrogen-bonding configurations of the catechol hydroxyl groups. replaces one of the amino hydrogens in NA. Noradrenaline has one chiral centre (the side chain carbon atom bearing four different groups), and thus, every conformer has a corresponding non-superimposable mirror image. The two mirror images (the chiral forms R and S) are spectroscopically indistinguishable, and therefore only one form needs to be considered. The substitution of one of the amino hydrogens by a methyl group to form adrenaline introduces a second chiral centre (the nitrogen), resulting in the existence of two ‘diastereoisomers’, structures that are not mirror images of each other: one of these (1R2S/1S2R) is adrenaline, and the other (1S2S/1R2R) is pseudoadrenaline (PA). Thus, for A/PA, twice as many structures will need to be considered than for NA. The two molecules do not only have a similar structure, they also have a similar function in the body. Both are released as a response to mental or physical stress, and prepare the body for strenuous activity: the so-called ‘fight-or-flight’ syndrome, causing the familiar ‘adrenaline rush’. They also act as hormones and are secreted in response to a low blood-glucose level. They increase the amount of glucose released into the blood by the liver and decrease the use of glucose by muscle. Most of the flexibility of NA and adrenaline is located in the ethanolamine side chain. The orientation of the side chain is determined by four torsion angles. In addition, the catecholic OH groups have two different orientations supporting an intramolecular hydrogen bond (see figure 2). If one only considers side-chain conformations in which atomic groups on adjacent atoms are in a staggered position with respect to each other, and not those that are eclipsed, then one needs to take into account only three different orientations of each of the four side-chain torsion angles. With these restrictions, there are 34 (four flexible bonds, each having three different values for the torsion angle) × 2 (two different orientations of the catechol groups) = 162 possible conformations. Considering that a B3LYP/6-31+G∗ geometry optimization of one structure may take 1–2 days of computer time on a reasonably fast PC (1.7 GHz Pentium 4), it may be obvious that the full characterization of these molecules takes a fair amount of computer time. A theoretical search for the most stable NA, A, and PA conformer reveals a close competition between two structures: AG1a, which has an extended side chain, and Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences NA (AG1a) NA (GG1a) A (AG1a) A (GG1a) PA (AG1a) PA (GG1a) 2659 Figure 3. The two most stable conformers of the neurotransmitters NA, A and PA. GG1a, in which the side chain is folded back towards the catechol ring (Snoek et al . 2003b; van Mourik 2004) (see figure 3). The AG1a and GG1a structures are nearly isoenergetic, and which of the two is the most stable is dependent on the level of theory used. The narrow energy gap between the AG-type and GG-type conformers was also observed in 2-amino-1-phenylethanol (APE), the benzene analogue of NA (Graham et al . 1999; Macleod et al . 2003). However, whereas both AG1 and GG1 APE conformers were observed experimentally, almost the entire NA population was found to adopt the global-minimum structure (Snoek et al . 2003b). The reasons for Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2660 T. van Mourik the absence of a larger spread of populated conformers in jet-cooled NA are not well understood. In the experiments, NA was created by laser ablation into a supersonic argon expansion. The relatively large barriers (of the order of 10–20 kJ mol−1 (van Mourik & Früchtl 2005)) between different NA conformers indicate that it is unlikely that conformational relaxation occurs during cooling in the supersonic jet. Hence, it seems likely that laser-ablated NA is generated predominantly in the globalminimum conformation. Thus far there have been no jet-cooled experimental data available for A/PA. It would be very interesting to see which A and PA conformers are formed experimentally. (d ) Protonated neurotransmitters Under aqueous physiological conditions, biomolecules containing basic amino groups will exist predominately in their protonated form. Though some theoretical studies on protonated neurotransmitters have appeared in the literature (Alagona & Ghio 2002; Nagy et al . 2003), corresponding spectroscopic work has been lacking behind. This is partly due to the fact that it is notoriously difficult to produce reasonable quantities of protonated biomolecules at sufficiently low vibrational and rotational temperatures to allow their structural analysis. However, a recent spectroscopic study on protonated ethanolamine (Macleod & Simons 2004) introduces a potentially very promising approach to generate protonated molecules in high concentration in the gas phase. The method is based upon the photo-excitation of hydrogen-bonded complexes with phenol. It is found that the infrared spectrum of the [phenoxy-protonated-ethanolamine] complex is almost identical to the computed spectrum of free protonated ethanolamine. I expect that this new approach will pave the way for an upsurge in experimental and theoretical studies probing the structural preferences of protonated biomolecules. (e) Hydrated neurotransmitters Gas-phase biomolecule-(H2 O)n clusters bridge the gap between isolated and fully hydrated biomolecules. Their study allows the relative importance of individual water molecules to be investigated. To find the most stable hydrates, it is not sufficient to consider the water complexes of just the most stable isolated-biomolecule conformer, as the interaction with water may change the relative stability of the conformers (Butz et al . 2002). Indeed, the interaction with water may even change the biomolecular conformation to one that is non-existent in the absence of water, as found in calculations on adrenaline–(H2 O)2 (van Mourik 2004). Thus, as a minimum, one has to study hydrates involving several of the most stable isolated-biomolecule conformers. The functional groups of the adrenergic neurotransmitters provide many possible water-binding sites: calculations on 1:1 hydrates of NA and A (Snoek et al . 2003a; van Mourik 2004) show that there are about 10 different ways for a single water molecule to bind to these neurotransmitters. In addition, the number of local minima increases steeply with the number of constituents in the cluster, and thus, a full study of the hydrates is a formidable task. Structural data from spectroscopy experiments may help to reduce the conformational space that needs to be investigated, further stressing the importance of collaborative research in this field. Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2661 First-principles quantum chemistry in the life sciences peptide bond H O C C amino acids are linked by peptide bonds to form polypeptide chains: H O N C C H R2 H O N C C H R3 H O +H N 3 R1 amino-terminal residue amino acid residue amino acid residue N C H R4 C O− carboxyl-terminal residue Figure 4. The basic structure of a peptide. (f ) Anharmonic vibrational frequencies The interpretation of the experimental IR vibrational spectra in the studies discussed above crucially depends on comparison with reliable theoretical vibrational frequencies. Most standard algorithms to calculate vibrational frequencies employ the harmonic approximation, yielding harmonic vibrational frequencies. As the anharmonicity contribution to the vibrational frequencies can be rather large in hydrogenbonded systems, this complicates the comparison between computed and experimental infrared spectroscopic data. The application of scaling factors (Scott & Radom 1996) can remedy this to some extent, but it is very difficult to account for varying degrees of anharmonicity in the frequency modes using scaling factors. To overcome these deficiencies, techniques have been developed to compute anharmonic vibrational frequencies. One of the principal methods for calculating anharmonic frequencies is the vibrational self-consistent field (VSCF) method, of which the basic structure was first introduced in 1978 (Bowman 1978). Because VSCF and its correlation-corrected form, CC-VSCF (Jung & Gerber 1996), require the calculation of many points on the ab initio potential energy surface, they are very computationally demanding, restricting their application to only small molecules. However, promising new implementations of the CC-VSCF method, employing cost-reducing techniques such as the use of ab initio-improved semi-empirical potentials (Gerber et al . 2003), or the use of pseudo-potential basis sets for heavy atoms coupled with an algorithm to reduce the number of pair-coupling elements (Benoit 2004), show potential for widening the application field of the CC-VSCF method. 4. First-principles calculations on peptides (a) Peptide terminology Amino acids are the building blocks of peptides and proteins. An amino acid consists of a central α carbon, an amino (NH2 ) group, a carboxyl (COOH) group, and a distinctive R group, often called the side chain (for reasons mentioned below). In solution, and at neutral pH, amino acids predominantly occur in their zwitterionic forms, containing a protonated amino group (NH+ 3 ) and a deprotonated carboxyl group (COO− ). In peptides and proteins, amino acids are commonly called ‘residues’. They are linked by peptide bonds: the carboxyl group of one amino acid is joined to the amino group of another amino acid under elimination of a water molecule (see figure 4). The amino group of the first amino acid in a polypeptide chain, Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2662 T. van Mourik and the carboxyl group of the last amino acid, remain intact. By convention, the chain extends from the ‘amino (or N) terminus’ to the ‘carboxyl (or C) terminus’. The successive peptide bonds generate the ‘main chain’ or ‘backbone’ of the peptide, which consists of repeating NH–Cα H–C=O units. Thus, an n-residue peptide consists of a main chain and n side chains. The peptide group, (C=O)–N–H, is planar and rigid, but the peptide main chain has flexibility through the two bonds that flank the peptide bond, the Cα –C and N–Cα bonds. The dihedral angles that define the orientation of these bonds, δNCα CN and δCNCα C (ψ and φ), can be plotted against each other to produce a so-called ‘Ramachandran plot’ (Ramachandran et al . 1963). It turns out that many angle combinations almost never occur because they would produce collisions between different parts of the peptide (the exception is glycine, which, with only a hydrogen as its side chain, can adopt conformations that are forbidden for other residues). (b) The dimensionality problem One of the major problems of computational studies of peptides is their flexibility, resulting in large number of possible conformations. Even if one would ignore the flexibility of the side chains (which is reasonable only for glycine and alanine residues, which contain as their side chain just a hydrogen or methyl group, respectively), and would only take into account three different positions of the Ramachandran angles φ and ψ, then for an octapeptide one still needs to consider 314 = 4 782 969 different conformations. This dimensionality problem can be overcome to some extent by realizing that most polypeptide chains fold into regular periodic structures. These recurring structural motifs are called secondary structures. Some common secondary structure elements include the α helix, β pleated sheets and the β turn. Protein folding can be regarded as a process in which first secondary structure elements are formed before the protein assembles into its complete three-dimensional structure (Karplus & Weaver 1994). A detailed understanding of the properties of these secondary structures is therefore paramount to understanding protein folding. (c) Studies on helical peptides Until just a few years ago, first-principles calculations on peptides have focused mainly on elucidating conformational preferences of small peptides (dipeptides and tripeptides), as reviewed by Császár & Perczel (1999). However, researchers have started to do calculations on unprecedentedly large systems, thereby advancing the boundaries of first-principles quantum chemistry. With HF, calculations on complete proteins are becoming feasible. In 1998, van Alsenoy et al . (1998) performed a HF geometry optimization of crambin, a 64-residue protein containing 642 atoms. Very recently, Zhang et al . (2003) computed the interaction energy of the streptavidin– biotin complex, containing 1775 atoms, at the HF level of theory. This calculation was made possible by using a new computational technique, MFCC (molecular fragmentation with conjugate caps). Other researchers have used the (more expensive) DFT method to study helical peptides containing 10 (Elstner et al . 2000; Topol et al . 2001) or 17 (Wieczorek & Dannenberg 2003) alanine residues. In addition to traditional DFT, the study by Elstner et al . (2000) also used the self-consistent charge tight binding scheme (SCC-DFTB), which can be viewed as an approximation to Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences 2663 DFT, which enabled them to study 20-residue alanine-based peptides with inclusion of as many as 38 water molecules. Many studies have focused on α helical peptides because the α helix is the most common secondary structure motif in peptides. Most of these studies concentrate on alanine-rich peptides, as alanine has one of the highest helix propensities (Chakrabartty & Baldwin 1995; Marqusee et al . 1989; O’Neil & DeGrado 1990). Alanine peptides also have a slight advantage for theoretical studies, because alanine, with a methyl group as its side chain, is one of the simplest amino acids. Studies on alanine-based peptides are nevertheless computationally demanding. α-helices form hydrogen bonds between the C=O group of the nth residue and the NH group of the (n + 4)th residue, and thus, at least five residues are needed to form one α helical turn. Studies on α helical peptides therefore need to consider peptides containing at least five amino acid residues. However, calculations on alanine-based peptides show that, in vacuum, short peptides adopt the more tightly coiled 310 helical structure. 310 -helices form hydrogen bonds between the nth and (n + 3)th amino acid residues. Longer peptide chains, and inclusion of solvent effects, are needed to stabilize the α helix (Elstner et al . 2000), increasing the complexity of the problem. The inclusion in the peptide structure of residues more complex than glycine or alanine may also complicate the calculations. The larger side chains of these residues add to the dimensionality problem as they exhibit flexibility, and their hydrogen-bonding capabilities may affect the stability of the different structural motifs. HF calculations on isolated AKAAA-AKAAA (A = alanine, K = lysine) peptides indicate that lysine’s charged side chain (CH2 CH2 CH2 CH2 NH+ 3 ) tends to bind to the C-terminus of the peptide, thereby distorting the α-helical structure. Figure 5a shows how the side chain of the second lysine curves to enable hydrogen bonding to the carboxyl of the C-terminus. Increasing the peptide length improves the stability of the α helix, as indicated by the only slightly distorted α-helical structure of AKAAA-AKAAAAKAAA (see figure 5b). The inclusion of water molecules around the charged NH+ 3 group of lysine, as well as around the carboxyl group of the C-terminus, prevents the lysine chain to hydrogen-bond to the C-terminal carboxyl group, thereby reducing the distortion from the ideal α helical structure (figure 5c). 5. First-principles calculations on DNA fragments DNA (deoxyribonucleic acid) is one of the most important biomolecules, as it encodes the genetic information in the nucleus of cells. The basic structural units of DNA are nucleotides, which consist of a deoxyribose sugar, a phosphate group, and a base. The most common and well-known form of DNA is the double helix, of which the three-dimensional structure was deduced by Watson, Crick (Watson & Crick 1953), Franklin & Wilkins. This discovery, the 50th anniversary of which was widely celebrated last year, won Watson, Crick and Wilkins the 1962 Nobel Prize in Medicine and Physiology. In the DNA double helix, two helical nucleotide chains, coiled around a common axis, are held together by hydrogen bonding between the bases on opposite strands (see figure 6). DNA contains four different bases: thymine, cytosine, adenine and guanine. Adenine pairs with thymine, and guanine pairs with cytosine. Other forms of DNA are known to exist, however. For example, four-stranded DNA structures occur in telomeric DNA. Telomeres are the specialized ends of chromosomes that protect these ends from recombination and from being recognized as Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2664 T. van Mourik (a) (b) (c) Figure 5. HF/3-21G optimized structures of (AKAAA)n peptides. (a) AKAAA. (b) AKAAAAKAAA-AKAAA. The view down the helical spiral evidences the α-helical nature of the structure. (c) AKAAA-AKAAA + 10H2 O. The water molecules prevent the charged lysine chain to hydrogen-bond to the C-terminus. Figure 6. The DNA double helix. Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences 2665 Figure 7. Crystal structure of the potassium form of an Oxytricha nova Q-quadruplex. (a) (b) Figure 8. Guanine tetrad structures optimized with B3LYP/6-311++G∗∗ . (a) Bifurcated hydrogen-bonded G-tetrad; (b) G·G hydrogen-bonded G-tetrad. damaged DNA. Telomeric DNA contains guanine-rich segments, which can fold into four-stranded G-quadruplex structures (see figure 7). These quadruplexes result from stacking of guanine tetrads. Cations (NH+ 4 or monovalent metal cations), located either in the tetrad cavity or between successively stacked tetrads, are essential for structural integrity of the DNA quadruplex. Recently, there has been considerable interest in these G-quadruplexes, not least because of their potential use in the development of anti-cancer drugs (Neidle & Parkinson 2002). The DNA G-quadruplex has also attracted the interest of computational researchers, who have so far focused on the guanine tetrad and the cation-tetrad complex. First-principles calculations on systems of this size are computationally very demanding, and have only recently become feasible. The first HF and DFT calculations on the stability and structure of the G-tetrad were presented in 1999 (Gu et al . 1999). As the crystal structures Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2666 T. van Mourik of DNA quadruplexes show near-coplanar geometries for the guanine tetrads, these calculations were performed for the planar, C4h -symmetric, structure of the guanine tetrad. Gu and co-workers (Gu & Leszczynski 2000; Gu et al . 1999) found that, whereas the cation-containing G-tetrad adopts the G·G N1-carbonyl, N7-amino hydrogenbonded structure (in agreement with the crystal structures of the DNA quadruplexes), in the optimized geometry of the bare G-tetrad the guanine monomers are held together by bifurcated hydrogen bonds. Calculations performed in my group show that both the G·G hydrogen-bonded and bifurcated hydrogen-bonded C4h symmetric tetrad (see figure 8) can be located (van Mourik & Dingley 2005); however, these structures are transition states, not true minima, on the DFT/B3LYP potential energy surface (the true minimum is a twisted S4 -symmetric structure). Meyer et al . (2001) have shown that ions with a small radius (Li+ , Be2+ , Cu+ and Zn2+ ) cause non-planarity of the complex, which may prevent stacking of the Gtetrads. K+ is too large to fit in the central cavity and this ion therefore prefers to be located between successive tetrads in the Q-quadruplex. 6. Future prospects Computational quantum chemistry is a rapidly developing field. This is partly due to the development and implementation of innovative first-principles methods. These include more efficient approaches to existing methods, such as linear scaling algorithms (Goedecker & Scuseria 2003), as well as hybrid (QM/MM) methods based on the combination of classical (molecular mechanics (MM)) and quantum mechanics (QM) methodologies (Morokuma 2002). Promising new techniques that treat anharmonicity and quantum effects to calculate free energies of biomolecular systems, which are required at temperatures above 0 K, are being developed. Whereas diffusion quantum Monte Carlo (DQMC) techniques (Benoit & Clary 2000; Clary 2001; van Mourik et al . 2001) yield the vibrational ground state (at 0 K) of a molecular system, the torsional path integral Monte Carlo (TPIMC) technique (Miller & Clary 2002) does account for temperature effects. The construction of a global ab initio potential energy surface is not yet feasible for biomolecules larger than glycine (Miller & Clary 2004), and thus, both DQMC and TPIMC currently rely on force fields to calculate the potential energy surface of biomolecular systems. A second cause for the increasing capability of quantum chemistry is due to advances in computer technology. Moore (1965) observed that the speed of computers doubles roughly every 18 months (this observation has been dubbed ‘Moore’s law’). Furthermore, the dawn of parallel computer architectures and ‘Grid technology’ developments (Foster & Kesselman 1999) additionally increases the available computational power. However, the rapid increase in central-processing-unit speed and novel architectures also comes with huge challenges for scientists and engineers (Dunning et al . 2002), as algorithms and programs need to be rewritten and parallelized to make full use of the computational resources. The future potential of first-principles quantum chemistry can hardly be overestimated. I expect a dramatic increase in the application of first-principles quantum chemistry in the life sciences. This may lead to improved methodologies for drug development as well as an increased understanding of complex processes such as protein folding. As outlined in this paper, small biomolecular systems (such as Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences 2667 neurotransmitters, DNA fragments and peptides) are already being studied using ab initio and DFT methods, and papers appear in the literature reporting firstprinciples computational research on ever larger systems. With more powerful computers and more efficient computer algorithms, electronic-structure calculations on complete proteins and neurotransmitter–receptor systems will be feasible. I believe that research will become more interdisciplinary, with experimental and theoretical researchers working side by side on the same projects. In addition, despite the complexity of quantum chemical methods, computer programs are becoming more and more ‘user friendly’, inviting non-theoreticians to supplement their experimental research with complementary results from computer simulations. I gratefully acknowledge The Royal Society for their support under the University Research Fellowship scheme. I thank my collaborators in Oxford (Professor John P. Simons and Dr Lavina C. Snoek) and at University College London (Dr Andrew J. Dingley) for joint experimental/theoretical research and stimulating discussions. References Alagona, G. & Ghio, C. 2002 Interplay of intra- and intermolecular H-bonds for the addition of a water molecule to the neutral and N-protonated forms of noradrenaline. Int. J. Quant. Chem. 90, 641–656. Becke, A. D. 1993 Density functional thermochemistry. 3. The role of exact exchange. J. Chem. Phys. 98, 5648. Benoit, D. M. 2004 Fast vibrational self-consistent field calculations through a reduced modemode coupling scheme. J. Chem. Phys. 120, 562–573. Benoit, D. M. & Clary, D. C. 2000 Quaternion formulation of diffusion Monte Carlo for the rotation of rigid molecules in clusters. J. Chem. Phys. 113, 5193–5202. Bowman, J. M. 1978 Self-consistent field energies and wavefunctions for coupled oscillators. J. Chem. Phys. 68, 608–610. Butz, P., Kroemer, R. T., MacLeod, N. A., Robertson, E. G. & Simons, J. P. 2001a Conformational preferences of neurotransmitters: norephedrine and the adrenaline analogue, 2methylamino-1-phenylethanol. J. Phys. Chem. A 105, 1050–1056. Butz, P., Kroemer, R. T., MacLeod, N. A. & Simons, J. P. 2001b Conformational preferences of neurotransmitters: ephedrine and its diastereoisomer, pseudoephedrine. J. Phys. Chem. A 105, 544–551. Butz, P., Kroemer, R. T., MacLeod, N. A. & Simons, J. P. 2002 Hydration of neurotransmitters: a spectroscopic and computational study of ephedrine and its diastereoisomer pseudoephedrine. Phys. Chem. Chem. Phys. 4, 3566–3574. Carney, J. R. & Zwier, T. S. 2000 The infrared and ultraviolet spectra of individual conformational isomers of biomolecules: tryptamine. J. Phys. Chem. A 104, 8677–8688. Carney, J. R. & Zwier, T. S. 2001 Conformational flexibility in small biomolecules: tryptamine and 3-indole-propionic acid. Chem. Phys. Lett. 341, 77–85. Chakrabartty, A. & Baldwin, R. L. 1995 Stability of α-helices. Adv. Protein Chem. 46, 141–176. Clary, D. C. 2001 Torsional diffusion Monte Carlo: a method for quantum simulations of proteins. J. Chem. Phys. 114, 9725–9732. Császár, A. G. & Perczel, A. 1999 Ab initio characterization of building units in peptides and proteins. Prog. Biophys. Mol. Biol. 71, 243–309. Dunning Jr, T. H., Harrison, R. J., Feller, D. & Xantheas, S. S. 2002 Promise and challenge of high-performance computing, with examples of molecular modelling. Phil. Trans. R. Soc. Lond. A 360, 1089–1105. Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 2668 T. van Mourik Elstner, M., Jalkanen, K. J., Knapp-Mohammady, M., Frauenheim, T. & Suhai, S. 2000 DFT studies on helix formation in N-acetyl-(L-alanyl)n -N′ -methylamide for n = 1–20. Chem. Phys. 256, 15–27. Foster, I. & Kesselman, C. (eds) 1999 The Grid: blueprint for a new computing infrastructure. San Mateo, CA: Morgan Kaufmann. Gerber, R. B., Chaban, G. M., Gregurick, S. K. & Brauer, B. 2003 Vibrational spectroscopy and the development of new force fields for biological molecules. Biopolymers 68, 370–382. Goedecker, S. & Scuseria, G. 2003 Linear scaling electronic structure methods in chemistry and physics. Comput. Sci. Engng 5, 14–21. Graham, R. J., Kroemer, R. T., Mons, M., Robertson, E. G., Snoek, L. C. & Simons, J. P. 1999 Infrared ion-dip spectroscopy of a noradrenaline analogue: hydrogen bonding in 2-amino-1phenylethanol and its singly hydrated complex. J. Phys. Chem. A 103, 9706–9711. Gu, J. & Leszczynski, J. 2000 A remarkable alteration in the bonding pattern: an HF and DFT study on the interactions between the metal cations and the Hoogsteen hydrogen-bonded G-tetrad. J. Phys. Chem. A 104, 6308–6313. Gu, J., Leszczynski, J. & Bansal, M. 1999 A new insight into the structure and stability of Hoogsteen hydrogen-bonded G-tetrad: an ab initio SCF study. Chem. Phys. Lett. 311, 209– 314. Hohenberg, P. & Kohn, W. 1964 Inhomogeneous electron gas. Phys. Rev. A 136, 864–871. Jung, J. O. & Gerber, R. B. 1996 Vibrational wave functions and spectroscopy of (H2 O)n , n = 2, 3, 4, 5: vibrational self-consistent field with correlation corrections. J. Chem. Phys. 105, 10 332–10 348. Karplus, M. & Weaver, D. L. 1994 Protein folding dynamics: the diffusion-collision model and experimental data. Protein Sci. 3, 650–668. Levy, D. H. 1980 Laser spectroscopy of cold gas-phase molecules. A. Rev. Phys. Chem. 31, 197–225. Macleod, N. A. & Simons, J. P. 2004 Neurotransmitters in the gas phase: infrared spectroscopy and structure of protonated ethanolamine. Phys. Chem. Chem. Phys. 6, 2821–2826. Macleod, N. A., Robertson, E. G. & Simons, J. P. 2003 Hydration of neurotransmitters: a computational and spectroscopic study of a noradrenaline analogue, 2-amino-1-phenyl-ethanol. Mol. Phys. 101, 2199–2210. Marqusee, S., Robbins, V. H. & Baldwin, R. L. 1989 Unusually stable helix formation in short alanine-based peptides. Proc. Natl Acad. Sci. USA 86, 5286–5290. Meyer, M., Steinke, T., Brandl, M. & Sühnel, J. 2001 Density functional study of guanine and uracil quartets and of guanine quartet/metal ion complexes. J. Computat. Chem. 22, 109–124. Miller III, T. F. & Clary, D. C. 2002 Torsional path integral Monte Carlo method for the quantum simulation of large molecules. J. Chem. Phys. 116, 8262–8269. Miller III, T. F. & Clary, D. C. 2004 Quantum free energies of the conformers of glycine on an ab initio potential energy surface. Phys. Chem. Chem. Phys. 6, 2563–2571. Møller, C. & Plesset, M. S. 1934 Note on an approximation treatment for many-electron systems. Phys. Rev. 46, 618–622. Moore, G. E. 1965 Cramming more components onto integrated circuits. Electronics 38, 114– 117. Morokuma, K. 2002 New challenges in quantum chemistry: quests for accurate calculations for large molecular systems. Phil. Trans. R. Soc. Lond. A 360, 1149–1164. Nagy, P. I., Alagona, G., Ghio, C. & Takács-Novák, K. 2003 Theoretical conformational analysis for neurotransmitters in the gas phase and in aqueous solution: norepinephrine. J. Am. Chem. Soc. 125, 2770–2785. Neidle, S. & Parkinson, G. 2002 Telomere maintenance as a target for anticancer drug discovery. Nat. Rev. Drug. Discov. 1, 383–393. Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 First-principles quantum chemistry in the life sciences 2669 Olsen, J., Christiansen, O., Koch, H. & Jørgensen, P. 1996 Surprising cases of divergent behavior in Møller–Plesset perturbation theory. J. Chem. Phys. 105, 5082–5090. O’Neil, K. T. & DeGrado, W. F. 1990 A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. Science 250, 646–651. Ramachandran, G. N., Sasisekharan, V. & Ramakrishnan, C. J. 1963 Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7, 95–99. Robertson, E. G. & Simons, J. P. 2001 Getting into shape: conformational and supramolecular landscapes in small biomolecules and their hydrated clusters. Phys. Chem. Chem. Phys. 3, 1–18. Scott, A. P. & Radom, L. 1996 Harmonic vibrational frequencies: an evaluation of HartreeFock, Møller–Plesset, quadratic configuration interaction, density functional theory, and semiempirical scale factors. J. Phys. Chem. 100, 16 502–16 513. Smalley, R. E., Ramakrishna, B. L., Levy, D. H. & Wharton, L. 1974 Laser spectroscopy of supersonic molecular beams: application to the NO2 spectrum. J. Chem. Phys. 61, 4363– 4364. Snoek, L. C., van Mourik, T., Carçabal, P. & Simons, J. P. 2003a Neurotransmitters in the gas phase: hydrated noradrenaline. Phys. Chem. Chem. Phys. 5, 4519–4526. Snoek, L. C., van Mourik, T. & Simons, J. P. 2003b Neurotransmitters in the gas phase: a computational and spectroscopic study of noradrenaline. Mol. Phys. 101, 1239–1248. Topol, I. A., Burt, S. K., Deretey, E., Tang, T.-H., Perczel, A., Rashin, A. & Csizmadia, A. G. 2001 α- and 310 -helix interconversion: a quantum-chemical study on polyalanine systems in the gas phase and in aqueous solvent. J. Am. Chem. Soc. 123, 6054–6060. Vaidehi, N., Floriano, W. B., Trabanino, R., Hall, S. E., Freddolino, P., Choi, E. J., Zamanakos, G. & Goddard III, W. A. 2002 Prediction of structure and function of G protein-coupled receptors. Proc. Natl Acad. Sci. USA 99, 12 622–12 627. van Alsenoy, C., Yu, C.-H., Peeters, A., Martin, J. M. L. & Schäfer, L. 1998 Ab initio geometry determination of proteins. I. Crambin. J. Phys. Chem. A 102, 2246–2251. van Mourik, T. 2004 The shape of neurotransmitters in the gas phase: a theoretical study of adrenaline, pseudoadrenaline, and hydrated adrenaline. Phys. Chem. Chem. Phys. 6, 2827– 2837. van Mourik, T. & Dingley, A. J. 2005 (In preparation.) van Mourik, T. & Emson, L. E. V. 2002 A theoretical study of the conformational landscape of serotonin. Phys. Chem. Chem. Phys. 4, 5863–5871. van Mourik, T. & Früchtl, H. A. 2005 The potential energy landscape of noradrenaline. An electronic structure study. (Submitted.) van Mourik, T. & Gdanitz, R. J. 2002 A critical note on density functional theory studies on rare-gas dimers. J. Chem. Phys. 116, 9620–9623. van Mourik, T., Price, S. L. & Clary, D. C. 2001 Diffusion Monte Carlo simulations on uracil– water using an anisotropic atom–atom potential model. Faraday Disc. 118, 95–108. Watson, J. D. & Crick, F. H. C. 1953 A structure for deoxyribose nucleic acid. Nature 171, 737–738. Wieczorek, R. & Dannenberg, J. J. 2003 H-bonding cooperativity and energetics of α-helix formation of five 17-amino acid peptides. J. Am. Chem. Soc. 125, 8124–8129. Zhang, D. W., Xiang, Y. & Zhang, J. Z. H. 2003 New advance in computational chemistry: full quantum mechanical ab initio computation of streptavidin–biotin interaction energy. J. Phys. Chem. B 107, 12 039–12 041. Zwier, T. S. 2001 Laser spectroscopy of jet-cooled biomolecules and their water-containing clusters: water bridges and molecular conformation. J. Phys. Chem. A 105, 8827–8839. Phil. Trans. R. Soc. Lond. A (2004) Downloaded from rsta.royalsocietypublishing.org on May 24, 2011 AUTHOR PROFILE Tanja van Mourik Born in 1966 in Vlissingen, a small city on the Dutch coast, Tanja van Mourik studied chemistry at the University of Utrecht, the Netherlands, with a nine-month intermezzo at the Ruhr University of Bochum, Germany. She returned to Utrecht in July 1989, where in the same year she graduated cum laude. She obtained her PhD in the field of theoretical chemistry from the University of Utrecht in 1994, after which she spent three years as a Postdoctoral Associate at the Pacific Northwest National Laboratory in Richland, WA, USA. In 1997 she came to the UK to take up a postdoctoral research position at University College London. She was awarded a Royal Society University Fellowship in 2000, and has since been working as a University Research Fellow at the Department of Chemistry, University College London. Her research interests include the accurate quantum chemical computation of molecular properties in general, and more specifically the application of these methods to the study of small molecules of biological interest. 2670 View publication stats