0% found this document useful (0 votes)
11 views15 pages

ML_electronic

Uploaded by

tanatswagendere1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views15 pages

ML_electronic

Uploaded by

tanatswagendere1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

MIT Open Access Articles

Approaching coupled-cluster accuracy for molecular


electronic structures with multi-task learning (preprint)

The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.

Citation: Tang, Hao, Xiao, Brian, He, Wenhao, Subasic, Pero, Harutyunyan, Avetik R. et al. 2024.
"Approaching coupled-cluster accuracy for molecular electronic structures with multi-task
learning (preprint)." Nature Computational Science.

As Published: 10.1038/s43588-024-00747-9

Publisher: Springer Science and Business Media LLC

Persistent URL: https://hdl.handle.net/1721.1/157988

Version: Author's final manuscript: final author's manuscript post peer review, without
publisher's formatting or copy editing

Terms of use: Creative Commons Attribution-Noncommercial-Share Alike


Approaching coupled-cluster accuracy for molecular electronic structures with
multi-task learning

Hao Tang,1 Brian Xiao,2 Wenhao He,3 Pero Subasic,4 Avetik R.


Harutyunyan,4 Yao Wang,5 Fang Liu,5 Haowei Xu,6 and Ju Li1, 6
1
Department of Materials Science and Engineering,
Massachusetts Institute of Technology, MA 02139, USA
2
Department of Physics, Massachusetts Institute of Technology, MA 02139, USA
3
The Center for Computational Science and Engineering,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
4
Honda Research Institute USA, Inc., San Jose, CA 95134, USA
5
Department of Chemistry, Emory University, Atlanta, GA 30322, USA
6
Department of Nuclear Science and Engineering,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
(Dated: January 3, 2025)
(Corresponding authors: Haowei Xu: haoweixu@mit.edu; Ju Li: liju@mit.edu)
Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate
predictive models for various properties of molecules. However, most existing ML models for molec-
ular electronic properties use density functional theory (DFT) databases as ground truth in training,
and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML
method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations
as training data. Tested on hydrocarbon molecules, our model outperforms DFT with several widely-
used hybrid and double hybrid functionals in both computational costs and prediction accuracy of
various quantum chemical properties. As case studies, we apply the model to aromatic compounds
and semiconducting polymers on both ground state and excited state properties, demonstrating its
accuracy and generalization capability to complex systems that are inaccessible to calculate using
CCSD(T)-level methods due to scaling.

Computational methods for molecular and condensed- properties [19, 20], as well as the electronic band struc-
matter systems play essential roles in physics, chemistry, ture of condensed matter [21, 22]. Most of these meth-
and materials science, which can reveal underlying mech- ods take the density functional theory (DFT) results as
anisms of diverse physical phenomena and accelerate ma- the training data, using neural networks (NNs) to fit the
terials design [1]. Among various types of computational single-configurational representation (either the Kohn-
methods, quantum chemistry calculations of electronic Sham Hamiltonian or molecular orbitals) of the DFT
structure are usually the bottleneck, limiting the compu- calculations [15, 19, 21, 23]. Along with the rapid ad-
tational speed and scalability [2]. In recent years, ma- vances of ML techniques, the NN predictions match the
chine learning (ML) methods have been successfully ap- DFT results increasingly well, approaching the chemical
plied to accelerate molecular dynamics simulations and accuracy [8, 16]. However, as a mean-field level theory,
to improve their accuracy in many application scenar- DFT calculations themselves induce a systematic error,
ios [3]. Particularly, ML inter-atomic potential can pre- which is usually several times larger than the chemical
dict energy and force of molecular systems with sub- accuracy [24], limiting the overall accuracy of the ML
stantially lower computational costs compared to quan- model trained on DFT datasets.
tum chemistry methods [4–7]. Indeed, recent advances In comparison, the correlated wavefunction method
in ”universal” ML potential enable large-scale molecu- CCSD(T) is considered the gold-standard in quantum
lar dynamics simulation with the complexity of realistic chemistry [25]. It provides high accuracy predictions on
physical systems [8–11]. Besides ML inter-atomic poten- various molecular properties. Unfortunately, the compu-
tial, rapid advances also appear in another promising di- tational cost of CCSD(T) calculations has a rather un-
rection, namely the ML density functional, which focuses favorable scaling with system size. Hence, it can only
on further improving the energy prediction towards the handle small molecules with up to hundreds of electrons.
chemical accuracy (1 kcal/mol) [12, 13]. This urges the combination of CCSD(T) and ML meth-
Besides energy and force, other electronic proper- ods, which can potentially have both high accuracy and
ties that explicitly involve the electron degrees of free- low computational cost. However, the above-mentioned
dom are also essential in molecular simulations [14]. In ML methods that directly fit the single-configurational
the past few years, ML methods have also been ex- representation of the DFT calculations cannot be di-
tended to electronic structure of molecules, predicting rectly applied to the CCSD(T) training data. This is
various electronic properties such as electric multipole because CCSD(T) does not provide either Kohn-Sham
moments [15–17], electron population [18], excited states (KS) Hamiltonians or single-body electronic wavefunc-
2

tions due to the many-body quantum entanglement na- for the non-local exchange-correlation effects. Generally,
ture of its representation. the non-local exchange-correlation effects can be cap-
In this work, we develop a unified multi-task ML tured in CCSD(T) calculations. However, as mentioned
method for molecular electronic structures. Instead of before, the computational costs of CCSD(T) methods are
focusing solely on energy, our method also provides ac- formidably high for large system. The essence of our ML
curate predictions for various electronic properties; com- method is to obtain the non-local exchange-correlation
pared with ML models trained on DFT datasets, our effects from a NN, whose computational cost scales only
method learns from CCSD(T)-accuracy training data. linearly with system size.
The method incorporates the E3-equivariant neural net- To obtain the ML correction term, we build a NN
work (E3NN) [4, 26], where vectors and tensors are in- model to predict Vθ . The workflow consists of the input
volved in the message-passing step. For brevity we re- layer, the convolutional layer, and the output layer. The
fer to our method as ”Multi-task Electronic Hamilto- input layer takes atomic configurations as input, encod-
nian network” (MEHnet). Using hydrocarbon organic ing them into the node features xI,in for atom informa-
molecules as a testbed, our method predicts molecu- tion and edge features fIJ,in for bond information (I, J
lar energy within chemical accuracy as compared with are indices of atoms). The E3NN framework is employed
both CCSD(T) calculation and experiments. It also pre- for the convolutional layer (shown in Fig. 1b, see Meth-
dicts various properties, including electric dipole and ods for details) because of its good performance in pre-
quadrupole moments, atomic charge, bond order, energy dicting molecular properties [4]. The convolutional layer
gap, and electric polarizability with better accuracy than outputs xI,out and fIJ,out that encode E3 equivariant fea-
B3LYP, one of the most widely used hybrid DFT func- tures of atoms and bonds as well as their atomic envi-
tionals [27]. Our trained model shows robust general- ronment. Subsequently, the equivariant ML correction
ization capability from small molecules in the training Hamiltonian Vθ is constructed using xI,out and fIJ,out in
dataset (molecular weight < 100) to larger molecules the output layer. The effective electronic structure of a
such as Naphthalene and even semiconducting polymers molecule is obtained by solving the eigenvalue equations
(molecular weight up to several thousands). Systemat- of the total Hamiltonian Heff = F′ + Vθ , giving ϵi , the
ically predicting multiple electronic properties using a i-th energy level, and ci , the corresponding molecular
single model with local DFT computational speed, the orbital represented on atomic orbital basis set.
method provides a high-performance tool for computa- Multiple Learning Tasks.
tional chemistry and a promising framework for ML elec- Our scheme aims to predict multiple observable molec-
tronic structure calculations. ular properties (more than just energy). In order to
achieve reduced computational costs, we do not include
information about the entire electronic Hilbert space as
RESULTS learning targets. Instead, the MEHnet is trained on a
series of molecular properties in order to capture their
Computational Workflow. shared underlying representation, the effective single-
In this section, we briefly describe the theoretical back- body Hamiltonian Heff . The corresponding single-body
ground and model architecture of the MEHnet method. energy levels and molecular orbitals are used to evaluate
Detailed information can be found in the Methods sec- a series of ground state properties Og according to the
tion. Basically, we use a NN to simulate the non-local rules of quantum mechanics:
exchange-correlation interactions of a many-body sys-
tem. Then, a physics-informed approach is used to pre- OgMEHnet = fOg ({ϵi }, {ci }), Og = E, p⃗, Q, CI , BIJ ,
dict multiple properties from the output of a single NN. (1)
Given an input atomic configuration, our goal is to ac- where properties Og goes through the ground state en-
quire an effective single-body Hamiltonian matrix that is ergy (E), electric dipole (⃗ p) and quadrupole (Q) mo-
then used to predict quantum chemical properties from ments, Mulliken atomic charge [28] of each atom CI , and
physics principles, as shown in Fig. 1a. First, a fast-to- Mayer bond order [29] of each pair of atoms BIJ . We
evaluate single-configurational method, such as DFT or also evaluate the energy gap (first excitation energy, Eg )
Hartree-Fock method, is used to obtain a mean-field ef- and static electric polarizability α:
fective Hamiltonian, F′ . Note that F′ is easy and fast to
compute, but its accuracy is relatively low. We will use EgMEHnet = fEg ({ϵi }, {ci }, G),
(2)
F′ as the starting point of our ML model, and the total αMEHnet = fα ({ϵi }, {ci }, T).
effective Hamiltonian of the system Heff = F′ + Vθ is
obtained by adding the ML correction term Vθ . In the In principle, the ground state electronic structure does
current formalism, F′ is obtained from a local DFT cal- not contain the information of the energy gap and electric
culation, and it contains only local exchange-correlation polarizability. Therefore, we use the model-output cor-
contribution, and the correction term Vθ would account rection terms G (energy gap correction) and T (dielectric
3

FIG. 1. Schematic of the MEHnet electronic structure workflow. (a) Computation graph of the MEHnet method that predict
multiple quantum chemical properties from atomic configurations inputs. The computational graph consists of input layer
(green blocks), convolutional layer (blue block), and output layer (orange blocks). (b) Model architecture of the MEHnet with
two layers of graph convolution. The output contains both node feature xI,out and edge feature fIJ,out . (c) Training and
testing dataset generation. Each dot represents molecules with the same chemical formula, and is plotted as a function of
number of electrons and atoms. The blue and orange colors correspond to molecules in the training and generalization domains
respectively. The model is trained with small molecules, and is subsequently tested with large molecules. The dot size reflects
the number of conformers and/or vibrational configurations with the same chemical formula in the dataset (from 1 to 500, in
log scale).

screening matrix) to account for the excited states infor- molecule. The weights wV and wO are hyperparame-
mation and the linear response information, respectively. ters whose values are listed in Methods. The weights are
The function forms of fOg , fEg , and fα are elaborated chosen to balance the training tasks so that the training
in Methods D. Note that these properties are all derived errors of all tasks decrease to satisfactory levels. Mini-
from the underlying electronic structure, so they are in- mizing LTotal requires the back-propagation through the
ternally related. Therefore, multi-task learning meth- diagonalization of Heff (that is , calculating ∂ϵi /∂Heff
ods can utilize these relations to mutually enhance the and ∂ci /∂Heff ), which is numerically unstable with di-
model’s generalization capability. rect numerical differentiation. To overcome this issue,
The goal of our multi-task learning is to predict the we derive customized back-propagation schemes for each
properties listed above with coupled-cluster accuracy. property using perturbation theory in quantum mechan-
Hence, the total loss function LTotal constructed as fol- ics (see Methods for details), giving
lows:
X ∇θ ϵi = (ci )† (∇θ Vθ )ci
LTotal = lV + lO ,
X (cp )† (∇θ Vθ )ci (4)
O∈Og ∪{Eg ,α}
∇ θ ci = cp .
lO = wO × MSEloss(O MEHnet
,O label
), (3) ϵi − ϵp
p̸=i
wV X θ
lV = 2 |VIµ,Jν |2 . When evaluating the gradients of properties in Eqs. (1)
Nbasis
Iµ,Jν
and (2) using the chain rule, terms that analytically can-
Here for each property O, lO is the the mean-square er- cel each other are removed in numerical evaluation, mak-
ror (MSE) loss between OMEHnet and Olabel , the MEHnet ing the scheme numerically stable.
predictions (Eq. (1)(2)) and coupled-cluster labels in the Atomic configurations of molecules in our training
training dataset, respectively. Meanwhile, lV is a reg- dataset are generated by the workflow shown in Fig. 1c.
ularization that penalizes large correction matrix Vθ , Our dataset covers various classes of hydrocarbons (sat-
and Nbasis is the total number of basis functions in the urated, unsaturated, alicyclic, and aromatic) and molec-
4

FIG. 2. Benchmark of the model performance on the testing dataset. (a) Testing root-mean-square errors (RMSE) of different
quantities as a function of training dataset size. (b) Computational costs of different methods plotted against number of
electrons. The computational costs is measured as the calculation time (node hour) on a single Intel Xeon Platinum 8260 CPU
node with 48 cores with sufficient memory for all calculations. The scaling deviates from the theoretical asymptotic scaling N 7
for CCSD(T), because the parallelization efficiency is higher for larger molecules. In principle, the N 7 scaling for CCSD(T)
would appear in the large N limit. (c) Prediction RMSE of the energy (E per atom, reference to separate atoms), electric
dipole moment (⃗ p), electric quadrupole moment (Q), Mulliken atomic charge (C), Mayer bond order (B), energy gap (1st
excitation energy, Eg ), and static electric polarizability (α, a.u. means atomic unit) with respect to the coupled-cluster results.
The MEHnet method is compared with the B3LYP hybrid functional, DSD-PBEP86 double hybrid functional [36], DM21 ML
functional [12], and AIQM1 ML potential [11]. A representative atomic configuration of each chemical formula is plotted for
illustration.

ular structures (linear, branched, and cyclic), contain- systems of practical importance. The following discus-
ing both stable and metastable conformers with diverse sions focus on close-shell hydrocarbon molecules (except
types of carbon-carbon bonds (single, double, triple, for the QM9 version of MEHnet that we will describe
and conjugated π-bond, see Supplementary Section 1, later).
the Supplementary Information contained Ref. [30–34]).
The model’s generalization capability from small
Coupled cluster calculations are implemented for var-
molecules to large molecules is essential for its useful-
ious hydrocarbon molecules. The MEHnet model is
ness on complex systems where coupled cluster calcula-
trained on small-molecules training dataset (training do-
tions cannot be implemented on current computational
main, Fig. 1c). The model is then tested on both small
platforms, due to their formidable computational costs.
molecules in the training domain but outside the train-
To test the generalization capability and data efficiency
ing dataset (in-domain validation) and larger molecules
of our model, we train the model with varied train-
outside the training domain (out-of-domain validation).
ing dataset size Ntrain , ranging from 10 to 7440 atomic
Benchmark of Model Performance. configurations of hydrocarbon molecules. The testing
Then, we benchmark the performance of the MEHnet root-mean-square error (RMSE, absolute error in atomic
model and display potential applications of the model in units) of different trained properties exhibit a decreasing
5

TABLE I. Benchmark of MEHnet model’s RMSE in predicting different quantum chemical properties on the in-domain (ID)
testing dataset and out-of-domain (OOD) validation dataset with respect to the coupled cluster calculations. The numbers in
the table are ID/OOD RMSD. Other DFT and machine learning methods are compared. We leave some of the spaces blank
when the method does not directly output the quantity for fair comparison.
RMSE (ID/OOD) Hybrid Double Hybrid ML
Unit B3LYP B3PW91 DSD- PWPB95 DM21 AIQM1 MEHnet (ours)
PBEP86
Energy (/atom) kcal/mol 2.20/2.41 2.03/2.73 0.94/1.20 1.64/1.98 0.22/0.11 0.13/0.06 0.11/0.10
Dipole Debye 0.06/0.06 0.06/0.04 0.03/0.03 0.07/0.05 0.04/0.04 – 0.03/0.04
Quadrupole ea20 0.12/0.21 0.32/0.51 0.11/0.18 0.10/0.14 – – 0.03/0.12
Atomic Charge e 0.19/0.20 0.16/0.16 0.04/0.05 0.05/0.05 0.05/0.04 – 0.04/0.03
Bond Order – 0.05/0.03 0.06/0.04 0.04/0.02 0.06/0.03 0.06/0.03 – 0.02/0.02
Bandgap eV 0.59/0.63 0.65/0.54 3.71/3.26 2.19/1.98 1.71/1.47 – 0.26/0.31
Polarizability a.u. 2.22/4.32 2.53/4.72 4.74/8.05 – – – 1.85/3.91

trend when the training dataset size increases, as shown The MEHnet predictions consistently exhibit smaller
in Fig. 2a, indicating effective model generalization. No- RMSE than the hybrid (B3LYP and B3PW91 [40]), dou-
tably, the energy error has the fastest drops with a slope ble hybrid (DSD-PBEP86 [36] and PWPB95 [41]), and
−0.38
of −0.38 (that means the testing error ∝ Ntrain ). In DM21 [12] functionals on most molecular properties (Ta-
comparison, some of the recently developed advanced ML ble I, except electric dipole moment on the OOD dataset,
potentials (that directly learn energies and their deriva- where the DSD-PBEP86 gives the smallest RMSE). Re-
tives, such as the potentials in Ref. [4, 8]) exhibit lower markably, the RMSE of the combination energy predicted
slopes of about -0.25. This implies potential advantage by the MEHnet is about 0.1 kcal/mol (about 4 meV) per
of the multi-task method: as a multi-task method learns atom in both the ID and OOD dataset. Our method
different molecular properties through a shared represen- exhibits similar RMSE of combination energy compared
tation (the electronic structure), the domain information to the AIQM1 ML potential that features energy pre-
learned from one property can help the model’s gener- dictions within chemical accuracy. These results confirm
alization on predicting other properties [35], providing that the MEHnet’s predictions on reaction energies can
improved data efficiency. approach the quantum chemical accuracy (assuming that
Then, we benchmark the computational costs and pre- on average 1 mol molecules in reactants contain ∼10 mol
diction accuracy of our model trained on 7440 atomic atoms). Note that the B3LYP functional (with the def2-
configurations with 70 different molecules, which will be SVP basis set) exhibits large RMSEs for Mulliken charge
used in the rest of this paper. The MEHnet method mainly because of the basis set error [42]. Although using
exhibits smaller computational cost and slower scaling a large basis set for the B3LYP Mulliken charge gives a
with system size, as compared with the hybrid func- much smaller error, the MEHnet model still gives better
tional, double hybrid functional [36], and CCSD(T) overall accuracy (Supplementary Figure 3).
method (Fig. 2b). Compared to the hybrid functional, Besides the ground-state properties, MEHnet also pro-
our method avoids the expensive calculation of the ex- vides excited-state property Eg and linear response prop-
act exchange thus substantially reducing computational erty α with better overall accuracy than other methods
costs [27]. Using the gold-standard CCSD(T) calcu- in comparison (Table I). For intensive quantities (E per
lation as a reference, the prediction accuracy of the atom, C, B, and Eg ), the errors are on a similar level for
MEHnet method on various molecular properties is com- molecules with different sizes; for extensive quantities (p,
pared with that of several popular functionals and exist- Q, α), there is a trend of increasing error with increasing
ing ML methods, as shown in Fig. 2c and Table I. The system size, because the absolute values of these quanti-
comparison is implemented on both the in-domain (ID) ties themselves increase with system size. In addition, the
and out-of-domain (OOD) testing dataset of hydrocar- MEHnet model gives similar prediction accuracy among
bon molecules. Note that although the B3LYP hybrid different classes of hydrocarbons such as alkane, alkene,
functional is widely used, it is known to exhibit certain alkyne, and arene (Supplementary Figure 1), suggesting
failure modes in hydrocarbon molecules [37], so several consistent generalizability in the hydrocarbon chemical
other high-performance hybrid and double hybrid func- space.
tionals [37, 38] with the DFT-D3 correction [39] are in- Aromatic Molecules.
cluded into comparison (Supplementary Section 2). Hydrocarbon molecules have a vast structural space,
6

FIG. 3. Validation of the MEHnet’s predictions on gas phase aromatic hydrocarbon molecules, as compared with experimental
results. (a) Standard enthalpy of formation. The MEHnet predictions and experimental values from Ref. [43] (right axis)
are compared for 11 molecules (see Supplementary Table II for details). The difference between the MEHnet method and
experimental values are shown by the orange line, and the experimental uncertainty is shown by the red line (left axis).
(b) Infrared spectrum of benzene. The experimental data is from the NIST Chemistry WebBook [57]. Vibration modes
corresponding to the peaks are labeled following the convention in Ref. [58].

including various types of local atomic environments. To Besides small molecules, we also apply MEHnet
further examine the model’s generalization capability in to semiconducting polymers consisting of hundreds of
more complex structures, we apply MEHnet model to atoms, which are difficult to calculate by rigorous cor-
a series of aromatic hydrocarbon molecules synthesized related methods such as CCSD(T). The essential elec-
in experiments [43]. The gas phase standard enthalpy tronic properties of semiconducting polymers originates
of formation Hf is an essential thermochemical property from the conjugated π-bonds with delocalized molecu-
of molecules that can be accurately measured in experi- lar orbitals. As the delocalized molecular orbitals ex-
ments. In this regard, we use the MEHnet model to pre- tend through the whole molecule (Fig. 4a), the polymers’
dict Hf of various aromatic molecules in a comprehensive electronic properties also involve long-range correlation,
experimental review paper Ref. [43]. The MEHnet pre- making it challenging for ML methods. Therefore, it
dictions on Hf are well consistent with experiments on all is important to examine whether MEHnet can capture
molecules, and their difference is only around 0.1 ∼ 0.2 semiconducting polymers’ electronic properties involving
kcal/mol per atom, as shown in Fig. 3a. Note that the delocalized molecular orbitals.
MEHnet prediction error is on the same order of magni- Three kinds of semiconducting polymers, trans-
tude as the experimental error bar (though numerically polyacetylene (t-PA), cyclic polyacetylene (c-PA), and
larger), indicating high prediction accuracy. polyphenylene (PPP), are studied using the MEHnet
Besides thermochemical properties, MEHnet can also model. The model correctly captures the delocalized
predict spectral properties, as shown in Fig. 3b and Sup- π-bond feature of frontier orbitals, i.e., highest occu-
plementary Figure 4. Especially, infrared (IR) spec- pied molecular orbital (HOMO) and lowest unoccupied
trum reflects essential information of molecular vibra- molecular orbital (LUMO), as shown in Fig. 4a. Various
tional modes and their interaction with light. In previ- important electronic properties of semiconducting poly-
ous work on ML electronic structure [16], the predicted mers depend on the chain length, including the energy
peak intensity are usually inconsistent with the experi- gap Eg and polarizability α. We calculate such chain-
ment. In comparison, the MEHnet predictions on both length dependence (up to more than 400 atoms) using
the peak positions and intensity agree well with experi- the MEHnet method, as shown in Fig. 4b,c. One can
mental results in several common hydrocarbon molecules, see that Eg is larger for shorter oligomers and converges
and it also provides both the fundamental bands and to a smaller value for long chains. This is in analogy
combination bands known as “benzene fingers” in the IR to the size effect on the energy gap of quantum dots
spectrum. The good consistency of peak intensity is at- and quantum wells. The converged energy gap for long
tributed to accurate predictions on the transition dipole t-PA and PPP polymers calculated by MEHnet are in
moments that determine the intensity of light-matter in- reasonable agreement with the experimental values (rel-
teraction. Details on calculating the IR spectrum is elab- ative errors within 10%) [44, 45], which are shown as
orated in Supplementary Section 3, where we also show squares in Fig. 4b. The longitudinal static electric polar-
the calculated IR spectra for several other molecules. izability αxx (per monomer) is positively related to the
Large-scale Semiconducting Polymers. polymer chain length. This is because in longer chains,
7

TABLE II. RMSE of the QM9 version of MEHnet model on


the testing dataset of 4,000 randomly sampled configurations
in the QM9 dataset. The dash means that the bond order B
has no unit.
Property E/atom p Q C B Eg α
Unit kcal/mol Debye a.u. e – eV a.u.
RMSE 0.07 0.03 0.04 0.03 0.04 0.25 1.19

provide their values as a prediction to be examined by


future work.
QM9 version of MEHnet
Although the presented results in this paper mainly fo-
cus on hydrocarbons, our method is readily applicable to
systems with different elements. To examine the general-
ity of our method to the chemical space beyond hydrocar-
bons, we trained an MEHnet model on 10,000 molecules
randomly sampled from the QM9 dataset [48], a com-
mon quantum chemistry database including molecules
consisted of H, C, N, O, and F atoms. The model is
then tested on 4,000 other molecules randomly sampled
from the QM9 dataset, as shown in Table II and Supple-
mentary Figure 5. The prediction accuracy on the QM9
testing dataset is even better than that in the case of hy-
drocarbons (Table I), suggesting that our method can be
applied to more general cases with various types of ele-
ments. Details about the benchmark of the QM9 version
is elaborated in Supplementary Section 4.

DISCUSSION

The current MEHnet scheme also has several limita-


FIG. 4. MEHnet predictions for the electronic properties of tions: it is not readily applicable to periodic crystals,
semiconducting polymers. (a) Atomic structure and HOMO
open shell molecules, or molecules with strong multiref-
wavefunctions of t-PA, polyphenylene PPP, and c-PA. The
HOMO wavefunctions are visualized by isosurfaces at the level erence character.
of ±0.04 Å−3/2 (positive isosurface colored blue and nega- In principle, our approach can also be generalized to
tive isosurface colored yellow). (b) Energy gap and (c) static extended systems, where the periodic boundary condi-
electric polarizability of t-PA (blue lines), PPP (green lines), tion (PBC) is applied. The band structure and Bloch
and c-PA (orange lines) with different polymer chain length. wavefunction can then be obtained by solving the eigen-
Longitudinal polarizability αxx , horizontal polarizability αyy , value problem for each wave vector ⃗k after a Fourier
and vertical polarizability αzz are shown as solid, dashed, and
transformation from the real space to the reciprocal
dotted lines, respectively. Squares (blue for t-PA and green
for PPP) represent literature values for polymers in experi- space. Although the CCSD(T) method for training data
ments [44, 45] and correlated calculations [46], and blue dots generation does not directly support PBC, one can use
represent literature values for t-PA oligomers from the MP2 CCSD(T) calculations for finite atom clusters (that is,
correlated calculations [47]. a truncated and possibly passivated supercell) to train
the model and subsequently use the model to predict the
properties of extended systems. Alternatively, the train-
more delocalized electron distributions can have larger ing data of extended systems can be generated by high-
displacements under external electric field. The predicted accuracy methods other than CCSD(T), such as double-
αxx for t-PA oligomers and PPP polymers are in perfect hybrid DFT, that allows for PBC. Besides CCSD(T), our
agreement with previous correlated calculations using the scheme can also use other high-level quantum chemistry
high-accuracy MP2 method [46, 47], as shown in Fig. 4c. methods to generate the training labels of molecule prop-
The chain length-dependent Eg and α of cyclic PA, to erties. Quantum chemistry methods can be selected ac-
the best of our knowledge, have not been reported. We cording to the desired accuracy and the character of sys-
8

tems under consideration. trix SIµ,Jν ≡ ⟨ϕI,µ |ϕJ,ν ⟩ in the non-orthogonal atomic-
Note that the results of CCSD(T) calculations may not orbital representation, where Iµ is the row index and Jν
be consistently accurate for all molecules. Some of the is the column index. The S and F matrices are evaluated
polyaromatic hydrocarbons are more multi-reference in by the ORCA quantum chemistry program package [51]
nature (though it is rare in molecules studied in this pa- version 5.0.4 with the fast-to-evaluate BP86 local den-
per, see Supplementary Section 5), so that the CCSD(T) sity functional [52] and the medium-sized cc-pVDZ basis
calculations themselves exhibit larger errors for these set [50]. As the hydrocarbon molecules we study are all
molecules than for other molecules. As all our training close-shell molecules, we use spin-restricted DFT calcu-
and testing data takes CCSD(T) as the ground truth, lations to obtain F. We also assume the NN correction
our model cannot capture the strong multi-reference ef- term Vθ is spin-independent as well. Namely, the spin-up
fects that are not captured by CCSD(T). One possible and spin-down molecular orbitals and energy levels are
way to adapt the workflow to systems with strong mul- the same, and all molecular orbitals are either doubly
tireference nature is by using the multi-reference con- occupied or vacant.
figuration interaction (MRCI) method [49] to generate The total BP86 energy EBP86 equals the molecular or-
the training dataset. It is also possible to include one- Pne /2
bital energy 2 i=1 ϵi (where ϵi is the i-th molecular
particle reduced density matrix as an output descrip- orbital energy level and ne is the number of electrons)
tor of the MEHnet model to better describe electronic plus a many-body energy EMB :
structure with strong multi-reference nature, as demon-
strated in Ref. [16, 33]. The 1-rdm contains full infor-
ne /2
mation of ground-state single-body properties for both X
EBP86 = 2 ϵi + EMB (5)
single-reference and multi-reference systems. Adapting
i=1
the MEHnet method to more comprehensive datasets can
produce a general-purpose electronic structure predictor,
which is left for future work. EMB originates from the double-counting of electron-
electron interaction in the band structure energy and can
be obtained from the output of the ORCA BP86 DFT
METHODS calculation. Then, the Lowdin-symmetrized KS Hamil-
tonian [53] is obtained as
Graph encoding of atomic configuration
EMB
F′ ≡ S−1/2 FS−1/2 + I, (6)
The input layer takes atomic configurations as in- ne
put, including the information of atomic numbers
(Z1 , Z2 , · · · , Zn ) and atomic coordinates (⃗r1 , ⃗r2 , · · · , ⃗rn ) where the last term is an identity shift to account for the
of a n-atom system. A molecular graph is constructed, many-body energy term. In this case, the direct summa-
where atoms are mapped to graph nodes, while bonds be- tion of molecular orbital energies given by F′ equals:
tween atoms (neighboring atoms within a cut-off radius
rcut = 2 Å) are mapped to graph edges. The atomic num- ne /2 ne /2
X

X EMB
bers ZI of input elements are encoded as node features 2 eigi (F ) = 2 eigi (S−1/2 FS−1/2 + I)
ne
xI,in by one-hot embedding. The atomic coordinates i=1 i=1
are encoded as edge h features fIJ,ini ≡ [fc (rIJ ), Ylm (⃗eIJ )],
ne /2  
X
−1/2 −1/2 EMB
where fc (r) ≡ 1 r
cos (π rcut ) + 1 is a smooth cut-off =2 eigi (S FS )+
2
i=1
ne
function reflecting the bond length rIJ ≡ |⃗rI − ⃗rJ |, and
ne /2
Ylm (⃗eIJ ) is the spherical harmonic functions acting on X
−⃗ =2 ϵi + EMB ,
the unit vector ⃗eIJ ≡ |⃗⃗rrII −⃗
rJ
rJ | representing the bond ori- i=1
entation [26]. We include Ylm tensors up to l = 2. The (7)
electron wavefunction is represented using an atomic or- where eigi is a function that returns the i-th lowest eigen-
bital basis set {|ϕI,µ ⟩} [50], where I is the index of atom value of a matrix, and we use the fact that the en-
and µ is the index of atomic orbital basis function. ergy level ϵi is the eigenvalue of the Lowdin-symmetrized
Hamiltonian eigi (S −1/2 F S −1/2 ) [53]. After this transfor-
mation, the KS effective Hamiltonian F′ already includes
BP86 single-body effective Hamiltonian the many-body energy EMB , and the total electronic en-
ergy is just the summation of molecular orbital energies.
Then, a quantum chemistry calculation [51] (the Or- Adding the EMB term does not change the eigenfunction
bital Integrator block in Fig. 1a) is used to evaluate the and relative energy levels, so that all other properties are
single-body effective Hamiltonian FIµ,Jν and overlap ma- not changed by adding the EMB term.
9

Architecture of the convolutional layer to output dimension Irreps(I)⊗2 , where:

In the convolutional layer, the input feature first goes


through a Nspecies × Nspecies linear transformation (the
(
(2 × 0e + 1 × 1o) if I is H
first Self-Interaction block, Nspecies is the number of dif- Irreps(I) = (9)
ferent elements in the system) and an activation layer (3 × 0e + 2 × 1o + 1 × 2e) if I is C
(the first non-Linearity block). All activation layers in
MEHnet are realized by tanh function acting on scalar
features. Then, the input features go through the first- The output dimension corresponds to the irreducible rep-
step convolution: a fully connected tensor product (the resentation of the block diagonal terms of the Hamilto-
first Tensor Product block) of node feature xJ and the nian of the cc-pVDZ basis set. The output is then ar-
spherical Harmonic components of all connected edge fea- ranged into the NI × NI matrix form, VI,out , according
ture fIJ mapping to an irreducible representation ”8x0e to the Wigner-Eckart theorem [22], and symmetrized to
+ 8x1o + 8x2e” (denoted as Irreps1), meaning 8 even obtain Vnode (xI,out ) = λ2V (VI,out + VI,out T
). λV is a con-
scalar, 8 odd vector, and 8 even rank-2 tensor. Weights in stant hyperparameter set as 0.2 for our model. Similarly,
the fully connected tensor product are from a multilayer the off-diagonal term Vedge (fIJ,out ) in Eq. (8) applies a
perceptron (the first MLP block) taking fc (rIJ ) as input. linear layer from input dimension of Irreps2 to output
All MLP blocks in Fig. 1b has a 1×16×16×16×Nw struc- dimension Irreps(I, J) that equals the direct product of
ture and tanh activation function, where Nw is the num- Irreps(I) and Irreps(J). The outputs are then arranged
ber of weights in the tensor product. Then, in the Con- into the NI × NJ matrix and multiplied by λV , giving
catenation block, tensor products from different edges fIJ Vedge (fIJ,out ).
connected to the node I are summed to a new node fea-
ture on I. The new node features then go through a linear In addition, the energy gap correction term G is ob-
transformation (Self-Interaction block) that mapps to Ir- tained from a 8 × 32 × 3 MLP that takes the even scalars
reps1. In all Self-Interaction layers, linear combinations of xI,out as input and output a 3-component scalar array,
are only applied to features with the same tensor order. gI;0,1,2 , with tanh activation. The first component is for
The new node features are added to the original node attention pooling:
features undergoes a linear transformation, complete the
first-step convolution. The second step convolution has
the same architecture. The only difference is that the X egI,0
second Tensor Product block takes input node features GK = P g gI,K , K = 1, 2, (10)
J e
J,0

of Irreps1 and output features of ”8x0e + 8x0o + 8x1e I

+ 8x1o + 8x2e + 8x2o” (denoted as Irreps2), meaning


8 even and odd scalar, vector, and tensor, respectively.
The output of Self-Interaction blocks are also Irreps2. giving the two-component bandgap correction term G.
After another activation function, the node features are The polarizability correction term, the screening matrix
output as xI,out . Another tensor product acting on the T is obtained from the edge features fIJ,out going through
node features of two endpoints of each edge is applied to a Irreps2 to 32×0e+1×2e linear layer, an tanh activation
get the output bond feature fIJ,out , also with a dimension layer, and a 32 × 0e + 1 × 2e to 1 × 0e + 1 × 2e linear
of Irreps2 and weight parameters from the MLP taking layer. The 1 × 0e + 1 × 2e array is then multiplied by a
fc (rIJ ) as input. factor λT (set as 0.01 in our case) and rearranged into
Finally, the output features are used to construct the the symmetric matrix T’s 6 independent components.
correction matrix Vθ . The neural network correction ma-
trix is as follow:
(
θ [Vnode (xI,out )]µ,ν if I = J
VIµ,Jν = 1 1
2 [Vedge (fIJ,out )]µ,ν + 2 [Vedge (fJI,out )]ν,µ if I ̸= J
(8)
where Vnode (xI,out ) is a NI × NI symmetric matrix rear- Evaluating molecular properties
ranged from node features xI,out , while Vedge (fIJ,out ) is a
NI ×NJ matrix obtained from edge features fJI,out . Here
NI , NJ are the numbers of basis functions of the atom Using Heff , the electronic structure is evaluated by
θ eff i
I, J. Note that the output matrices V are Hermitian Schordinger Equation HP c = ϵi ci , and the molecu-
i i −1/2 i
and equivariant under rotation according to the trans- lar orbitals are |ψi ⟩ = I,µ c̃I,µ |ϕI,µ ⟩, c̃ = S c.
formation rule of the basis set {|ϕI,µ ⟩} . Vnode (xI,out ) The ground state properties in Eq. (1) is evaluated from
first apply a linear layer from input dimension of Irreps2 the electronic structure from physics principles, that is,
10

[28, 29]: BP86 functional with cc-pVDZ basis set implemented in


ORCA. The ZPE is corrected by the optimal scaling fac-
ne /2
X tor of 1.0393 according to Ref. [54]. Summing all energy
E MEHnet = ENN + 2 ϵi terms give the inner energy U , and the enthalpy is eval-
i=1
uated as H ≃ U + kB T (kB is the Boltzmann constant),
ne /2
X X where we use the ideal gas law. To obtain the standard
p⃗M EHnet = −2e (c̃iI,µ )∗ c̃iJ,ν ⟨ϕI,µ |⃗rˆ|ϕJ,ν ⟩ enthalpy of formation, we subtract the reference state en-
i=1 Iµ,Jν
thalpy of graphite and hydrogen gas at standard condi-
ne /2
X X tion. The reference enthalpy for each carbon and hydro-
QMEHnet = −2e (c̃iI,µ )∗ c̃iJ,ν ⟨ϕI,µ |⃗rˆ⃗rˆ|ϕJ,ν ⟩ gen atom are determined as -38.04639 a.u. and -0.57550
i=1 Iµ,Jν a.u., respectively, using CCSD(T) calculation with cc-
 
ne /2
XX pVTZ basis set combined with measured standard en-
CIMEHnet = e ZI − 2 (c̃iI,µ )∗ c̃iJ,ν SIµ,Jν  thalpy of formation of atomic carbon, atomic hydrogen,
i=1 Jµν and benzene. Atomic configurations of semiconducting
ne /2 polymers in Fig. 4 are relaxed using the PreFerred Po-
tential v5.0.0 [6, 7].
X X
MEHnet
BIJ =4 (c̃iK,λ )∗ c̃iI,µ SKλ,Jν (c̃jL,σ )∗ c̃jJ,ν SLσ,Iµ
i,j=1 KLµνλσ
(11)
where ENN is the Coulomb repulsion energy between nu- Perturbation theory-based back-propagation
clei, and e and ⃗rˆ are the electron charge and position
operator, respectively. In the MEHnet training, gradient of the loss function
Besides, using the ground state electronic structure, Eg to the model parameters needs to be calculated. Gradi-
can be roughly estimated as ϵne /2+1 − ϵne /2 , the HOMO- ent back-propagation schemes are well-developed for all
LUMO gap. However, in principle, the ground state elec- computation steps except solving the Schrodinger equa-
tronic structure (ϵn , cn ) does not contain the information tion. The gradients are numerically unstable when there
of excited states (once a electron is excited, ϵn and cn are near-degenerate energy levels, which is usually the
undergo relaxation and become different). Therefore, we case in molecules. Here, we first use quantum pertur-
use MEHnet to output two correction terms G1 and G2 . bation theory to obtain the first-order change of energy
Eg is then evaluated as a linear transformation of the levels and molecular orbitals:
HOMO-LUMO gap using G1 and G2 as the coefficients:
δϵi = (ci )† δH eff ci
EgMEHnet = (1 + G1 )(ϵne /2+1 − ϵne /2 ) + G2 (12) X (cp )† δH eff ci (15)
δci = cp
Evaluation of the static electric polarizability is done in ϵi − ϵp
p̸=i
two steps. First, we evaluate the single-particle polariz-
ability α0 using perturbation theory: Then, we have the gradients to model parameters as
Eq. 4. Using these equations, we derive the gradients
NX
basis ne /2
X ⃗rai⃗ria of each molecule properties in Eq. 11 as follow:
α0 = 2e2 (13)
ϵ − ϵi
i=1 a
a=ne /2+1
ne /2
where Nbasis is the number P of basis functions of the
X
∇θ f E = 2 ∇Vii
molecule, and ⃗rai ≡ a ∗ i
rˆ|ϕJ,ν ⟩.
Iµ,Jν I,µ c̃J,ν ⟨ϕI,µ |⃗
(c̃ ) i=1
However, the single-particle approximation used in ne /2 NX
basis
Eq. (13) does not consider the electric screening effect X ∇Vai
∇θ fp⃗ = −4e Re ⟨ψi |⃗rˆ|ψa ⟩
from electron-electron interaction. We use MEHnet to i=1 a=ne /2+1
ϵi − ϵa
output a screening matrix T and evaluate the corrected
ne /2 NX
basis
polarizability α as follow: X ∇Vai
∇θ fQ = −4e Re ⟨ψi |⃗rˆ⃗rˆ|ψa ⟩
ϵi − ϵa (16)
αMEHnet = (I + α0 T)−1 α0 . (14) i=1 a=ne /2+1
ne /2 NX
We evaluate the gas phase standard enthalpy of for- X basis
∇Vai (c̃i )† (II S)c̃a
∇θ fCI = −4e Re
mation of molecules in Fig. 3 using atomic configura-
i=1 a=ne /2+1
ϵi − ϵa
tions relaxed by the BP86 functional with cc-pVDZ basis
ne /2 NX
set. The total energy at the relaxed atomic configura- X basis
∇Vai
tion is then calculated by the MEHnet. The zero-point ∇θ fBIJ = 4 Re
i=1 a=ne /2+1
ϵi − ϵa
energy (ZPE) and thermal vibration, rotation, and trans-
i †
lation energy at T = 298.15 K are also calculated by the × (c̃ ) (SIJ PSII + SII PSIJ )c̃a
11

where ∇Vai ≡ (ca )† (∇θ Vθ )ci , the Nbasis × Nbasis ma- molecule structure to sample an ensemble of atomic con-
trix IJ is identity in the block diagonal part of atom figurations. The MD simulation uses PreFerred Potential
J and zero elsewhere. Meanwhile, we define P ≡ v4.0.0 [10] at a temperature of 2000 K that enables large
Pne /2 i i †
2 i=1 c̃ (c̃ ) . The essential method to avoid numerical bond distortion but does not break the bonds. Initial ve-
instability is to remove terms that can be proved to can- locity is set as Maxwell Boltzmann distribution with the
cel each other. Taking ∇θ fp⃗ as an example: in Eq. (4), same temperature. Langevin NVT dynamics is used with
the summation over p goes through all states except i. the friction factor of 0.001 fs−1 and timestep of 2 fs. The
But as the summed formula in Eq. (16) is antisymmetric TeaNet potential run for 100,000 steps for each chemical
to i and a, the terms that a goes from 1 to ne /2 cancel formula, and one atomic configuration is sampled every
each other. Only terms that a goes from ne /2 + 1 to 200 timesteps in the MD trajectory. 500 configurations
Nbasis have a non-zero contribution to the final gradient. (including the initial equilibrium configuration) are sam-
Therefore, i is always occupied, and a is always unoc- pled for each chemical formula in the training domain.
cupied in the summation. As close-shell molecules have 3/4 of the 10,000 configurations are sampled to form the
a finite bandgap, ϵi and ϵa are not close to each other training dataset, and the left 1/4 forms the in-domain
in any term of the summation, so evaluating Eq. 16 is testing dataset. The out-of-domain testing dataset con-
numerically stable. tains 500 configurations. Note that as we aim to include
Similarly, the gradients of Eg and α are as follow: structures out of equilibrium positions, geometric relax-
  ation is not needed before CCSD(T) calculation (oth-
∇θ fEg = (1 + G1 ) ∇Vne /2+1,ne /2+1 − ∇Vne /2,ne /2 erwise all structures will relax back to the equilibrium
+ (ϵne /2+1 − ϵne /2 )∇θ G1 + ∇θ G2 positions).
(17) Then, a CCSD(T) calculation with the cc-pVTZ basis
To calculate the gradient of α, we first evaluate the gra- set is implemented in ORCA [51] for each selected config-
dient of α0 , and then derive ∇θ fα using the chain rule: uration, giving the training labels of total energy, electric
NXbasis ne /2  dipole and quadrupole moment, Mulliken atomic charge,
2
X ⃗rai⃗ria (∇Vii − ∇Vaa ) and Mayer bond order. An EOM-CCSD calculation [56]
∇θ α0 = 2e Re
(ϵa − ϵi )2 with the cc-pVDZ basis set is then implemented to ob-
a=ne /2+1 i=1


 tain the first excitation energy (energy gap). Finally, we
X ⃗rai ⃗rip ∇Vpa ⃗rpa ∇Vai conduct a polarizability calculation with the CCSD and
−2 +
(ϵa − ϵi ) (ϵp − ϵa ) (ϵp − ϵi )  cc-pVDZ basis set. The overlap matrix S and starting-
p̸=a,i
point effective Hamiltonian F is obtained from a BP86
∇θ fα = (I + α0 T)−1 (∇θ α0 )(I − Tα) DFT calculations with the cc-pVDZ basis set.
− α0 (∇θ T)(I + α0 T)−1 α
(18)
The above equations give gradients of all terms in the loss Model training
function expressed by gradients to the direct outputs of
the MEHnet, ∇θ Vθ , ∇θ G, and ∇θ T. The weight parameters in the loss function is listed
as follow: wV = 0.1, wE = 1, wp⃗ = 0.2, wQ = 0.01,
wC = 0.01, wB = 0.02, wEg = 0.2, wα = 3 × 10−5 . All
Dataset generation quantities are in atomic unit. The model training is im-
plemented by full gradient descend (FGD) with Adam
First, 85 small hydrocarbon molecule structures are optimizer. For the finally deployed model (7,440 train-
collected from the PubChem database [55]. The train- ing data points), it is first trained on 1240 data points
ing domain and generalization domain include 20 and 3 sampled from the whole training dataset for 5000 FGD
different chemical formula shown as the horizontal axis steps with initial learning rate of 0.01. The learning rate
labels of the first 20 and last 3 columns in Fig. 1c, re- is decayed by a constant factor γ1 = 10−1/10 per 500
spectively. Each chemical formula includes up to 5 dif- steps. Then, the model is trained on the whole dataset
ferent molecules (conformers) taken from the PubChem with 7,440 data points for 6,000 FGD steps with a learn-
database. The total number of molecules (conformers) in ing rate of 0.001. For other models trained on smaller
the training domain and out-of-domain testing dataset is dataset in Fig. 2a in the main text, the model is trained
70 and 15, respectively. The full list of molecules and for 3,000 FGD steps with initial learning rate of 0.01
the number of atomic configurations for each molecule decayed by γ2 = 10−1/6 per 500 steps. As the model
are listed in Supplementary Table I. Principles of select- trained on 640 data points do not converge in the 3000-
ing these molecules and their diversity are discussed in step training, we implement a 10,000-step training, with
Supplementary Section 1. an initial learning rate of 0.01 that decays by γ1 every
Molecular dynamics (MD) simulation with TeaNet in- 500 steps in the first 5,000 steps and keeps constant at
teratomic potential [6, 7] is then performed for each the last 5,000 steps.
12

DATA AVAILABILITY [2] Kulik, H. J. et al. Roadmap on machine learning in elec-


tronic structure. Electronic Structure 4, 023004 (2022).
[3] Dral, P. O. Quantum Chemistry in the Age of Machine
Source data for Figures 1c, Figure 2, Figure 3, and
Learning (Elsevier, 2022).
Figure 4 are availabel with this manuscript as Supple- [4] Batzner, S. et al. E (3)-equivariant graph neural net-
mentary Data file. The training and testing dataset are works for data-efficient and accurate interatomic poten-
available with this manuscript through Figshare. tials. Nature communications 13, 2453 (2022).
[5] Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep
potential molecular dynamics: a scalable model with the
CODE AVAILABILITY accuracy of quantum mechanics. Physical review letters
120, 143001 (2018).
[6] Takamoto, S., Izumi, S. & Li, J. Teanet: Universal neu-
The source code to generate the training dataset, ral network interatomic potential inspired by iterative
train the MEHnet model, and apply the trained electronic relaxations. Comput. Mater. Sci. 207, 111280
MEHnet model to hydrocarbon molecules has been (2022).
deposited into a publicly available GitHub repository [7] Takamoto, S., Okanohara, D., Li, Q. & Li, J. Towards
https://github.com/htang113/Multi-task-electronic and universal neural network interatomic potential. J. Mate-
is available as Supplementary Software. The repository riomics 9, 447–454 (2023).
contains two branches: the branch v1.6 is for all results [8] Merchant, A. et al. Scaling deep learning for materials
discovery. Nature 624, 80–85 (2023).
of hydrocarbon molecules in this paper, and the branch [9] Chen, C. & Ong, S. P. A universal graph deep learn-
v2.0 is for the benchmark on the QM9 dataset. ing interatomic potential for the periodic table. Nature
Computational Science 2, 718–728 (2022).
[10] Takamoto, S. et al. Towards universal neural network
ACKNOWLEDGEMENTS potential for material discovery applicable to arbitrary
combination of 45 elements. Nature Communications 13,
This work was supported by Honda Research Insti- 2991 (2022).
tute (HRI-USA). H.T. acknowledges support from the [11] Zheng, P., Zubatyuk, R., Wu, W., Isayev, O. & Dral,
P. O. Artificial intelligence-enhanced quantum chemical
Mathworks Engineering Fellowship. The calculations method with broad applicability. Nature communications
in this work were performed in part on the Matlantis 12, 7022 (2021).
high-speed universal atomistic simulator, the Texas Ad- [12] Kirkpatrick, J. et al. Pushing the frontiers of density
vanced Computing Center (TACC), the MIT Super- functionals by solving the fractional electron problem.
Cloud, and the National Energy Research Scientific Com- Science 374, 1385–1389 (2021).
puting (NERSC). [13] Bogojeski, M., Vogt-Maranto, L., Tuckerman, M. E.,
Müller, K.-R. & Burke, K. Quantum chemical accu-
racy from density functional approximations via machine
AUTHOR CONTRIBUTIONS STATEMENT learning. Nature communications 11, 5223 (2020).
[14] Helgaker, T., Jorgensen, P. & Olsen, J. Molecular
electronic-structure theory (John Wiley & Sons, 2013).
All authors contributed to the discussions of theory [15] Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller,
and results and to writing the manuscript. J. L. de- K.-R. & Maurer, R. J. Unifying machine learning and
signed and guided the project and formulated the re- quantum chemistry with a deep neural network for molec-
search goals; H. T., H. X., B. X., and W. H. designed and ular wavefunctions. Nature communications 10, 5024
(2019).
developed the computational method and code package,
[16] Shao, X., Paetow, L., Tuckerman, M. E. & Pavanello, M.
generated the quantum chemistry dataset, implemented Machine learning electronic structure methods based on
the ML training and applications, and did data analysis the one-electron reduced density matrix. Nature commu-
and visualization. A. H. initiated the theme and formu- nications 14, 6281 (2023).
lated the research goals; Y. W., F. L., and P. S. provided [17] Feng, C., Xi, J., Zhang, Y., Jiang, B. & Zhou, Y. Ac-
important comments for the manuscript. curate and interpretable dipole interaction model-based
machine learning for molecular polarizability. Journal
of Chemical Theory and Computation 19, 1207–1217
COMPETING INTERESTS STATEMENT (2023).
[18] Fan, G., McSloy, A., Aradi, B., Yam, C.-Y. & Frauen-
heim, T. Obtaining electronic properties of molecules
The authors declare no competing interests. through combining density functional tight binding with
machine learning. The Journal of Physical Chemistry
Letters 13, 10132–10139 (2022).
[19] Cignoni, E. et al. Electronic excited states from physi-
cally constrained machine learning. ACS Central Science
[1] Carter, E. A. Challenges in modeling materials proper- (2023).
ties without experimental input. Science 321, 800–803 [20] Dral, P. O. & Barbatti, M. Molecular excited states
(2008). through a machine learning lens. Nature Reviews Chem-
13

istry 5, 388–405 (2021). chemistry, kinetics, and noncovalent interactions. Physi-


[21] Li, H. et al. Deep-learning density functional theory cal Chemistry Chemical Physics 13, 6670–6688 (2011).
hamiltonian for efficient ab initio electronic-structure [39] Grimme, S., Antony, J., Ehrlich, S. & Krieg, H. A con-
calculation. Nature Computational Science 2, 367–377 sistent and accurate ab initio parametrization of density
(2022). functional dispersion correction (dft-d) for the 94 ele-
[22] Gong, X. et al. General framework for e (3)-equivariant ments h-pu. The Journal of chemical physics 132 (2010).
neural network representation of density functional the- [40] Becke, A. D. Density-functional thermochemistry. III.
ory hamiltonian. Nature Communications 14, 2848 The role of exact exchange. The Journal of Chemical
(2023). Physics 98, 5648–5652 (1993). URL https://doi.org/
[23] Unke, O. et al. Se (3)-equivariant prediction of molecu- 10.1063/1.464913.
lar wavefunctions and electronic densities. Advances in [41] Goerigk, L. & Grimme, S. Efficient and accurate double-
Neural Information Processing Systems 34, 14434–14447 hybrid-meta-gga density functionals evaluation with the
(2021). extended gmtkn30 database for general main group
[24] Mardirossian, N. & Head-Gordon, M. Thirty years of thermochemistry, kinetics, and noncovalent interactions.
density functional theory in computational chemistry: an Journal of chemical theory and computation 7, 291–309
overview and extensive assessment of 200 density func- (2011).
tionals. Molecular physics 115, 2315–2372 (2017). [42] Jablonnski, M. & Palusiak, M. Basis set and method de-
[25] Bartlett, R. J. & Musial, M. Coupled-cluster theory in pendence in atoms in molecules calculations. The Journal
quantum chemistry. Reviews of Modern Physics 79, 291 of Physical Chemistry A 114, 2240–2244 (2010).
(2007). [43] Slayden, S. W. & Liebman, J. F. The energetics of aro-
[26] Geiger, M. & Smidt, T. e3nn: Euclidean neural networks. matic hydrocarbons: an experimental thermochemical
arXiv preprint arXiv:2207.09453 (2022). perspective. Chemical reviews 101, 1541–1566 (2001).
[27] Tirado-Rives, J. & Jorgensen, W. L. Performance of [44] Heeger, A. J. Nobel lecture: Semiconducting and metallic
b3lyp density functional methods for a large set of or- polymers: The fourth generation of polymeric materials.
ganic molecules. Journal of chemical theory and compu- Reviews of Modern Physics 73, 681 (2001).
tation 4, 297–306 (2008). [45] Grem, G., Leditzky, G., Ullrich, B. & Leising, G. Re-
[28] Mulliken, R. S. Electronic population analysis on lcao– alization of a blue-light-emitting device using poly (p-
mo molecular wave functions. i. The Journal of chemical phenylene). Advanced Materials 4, 36–37 (1992).
physics 23, 1833–1840 (1955). [46] Otto, P., Piris, M., Martinez, A. & Ladik, J. Dynamic
[29] Mayer, I. Bond order and valence indices: A personal (hyper) polarizability calculations for polymers with lin-
account. Journal of computational chemistry 28, 204– ear and cyclic π-conjugated elementary cells. Synthetic
221 (2007). metals 141, 277–280 (2004).
[30] Sun, Q. et al. Recent developments in the PySCF [47] Champagne, B. et al. Assessment of conventional den-
program package. The Journal of Chemical Physics sity functional schemes for computing the polarizabilities
153, 024109 (2020). URL https://doi.org/10.1063/ and hyperpolarizabilities of conjugated oligomers: An ab
5.0006074. initio investigation of polyacetylene chains. The Journal
[31] Hess Jr, B. A., Schaad, L. J., Carsky, P. & Zahradnik, of chemical physics 109, 10489–10498 (1998).
R. Ab initio calculations of vibrational spectra and their [48] Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilien-
use in the identification of unusual molecules. Chemical feld, O. A. Quantum chemistry structures and properties
Reviews 86, 709–730 (1986). of 134 kilo molecules. Scientific data 1, 1–7 (2014).
[32] Maslen, P. E., Handy, N. C., Amos, R. D. & Jayatilaka, [49] Szalay, P. G., Muller, T., Gidofalvi, G., Lischka, H. &
D. Higher analytic derivatives. iv. anharmonic effects in Shepard, R. Multiconfiguration self-consistent field and
the benzene spectrum. The Journal of chemical physics multireference configuration interaction methods and ap-
97, 4233–4254 (1992). plications. Chemical reviews 112, 108–181 (2012).
[33] Hazra, S., Patil, U. & Sanvito, S. Predicting the one- [50] Dunning Jr, T. H. Gaussian basis sets for use in corre-
particle density matrix with machine learning. Journal lated molecular calculations. i. the atoms boron through
of Chemical Theory and Computation (2024). neon and hydrogen. The Journal of chemical physics 90,
[34] Lee, T. J. & Taylor, P. R. A diagnostic for determining 1007–1023 (1989).
the quality of single-reference electron correlation meth- [51] Neese, F., Wennmohs, F., Becker, U. & Riplinger, C. The
ods. International Journal of Quantum Chemistry 36, orca quantum chemistry program package. The Journal
199–207 (1989). of chemical physics 152 (2020).
[35] Caruana, R. Multitask learning. Machine learning 28, [52] Perdew, J. P. Density-functional approximation for the
41–75 (1997). correlation energy of the inhomogeneous electron gas.
[36] Kozuch, S. & Martin, J. M. Spin-component-scaled dou- Physical review B 33, 8822 (1986).
ble hybrids: an extensive search for the best fifth-rung [53] Löwdin, P.-O. On the non-orthogonality problem con-
functionals blending dft and perturbation theory. Jour- nected with the use of atomic wave functions in the the-
nal of Computational Chemistry 34, 2327–2344 (2013). ory of molecules and crystals. The Journal of Chemical
[37] Karton, A. How reliable is dft in predicting relative en- Physics 18, 365–375 (1950).
ergies of polycyclic aromatic hydrocarbon isomers? com- [54] Kesharwani, M. K., Brauer, B. & Martin, J. M. Fre-
parison of functionals from different rungs of jacob’s lad- quency and zero-point vibrational energy scale factors
der. Journal of Computational Chemistry 38, 370–382 for double-hybrid density functionals (and other selected
(2017). methods): can anharmonic force fields be avoided? The
[38] Goerigk, L. & Grimme, S. A thorough benchmark of den- Journal of Physical Chemistry A 119, 1701–1714 (2015).
sity functional methods for general main group thermo-
14

[55] Kim, S. et al. PubChem 2023 update. Nucleic Acids [57] Linstrom, P. & W.G. Mallard, E. NIST Chemistry Web-
Research 51, D1373–D1380 (2022). URL https://doi. Book, NIST Standard Reference Database Number 69,
org/10.1093/nar/gkac956. Ch. Infrared Spectra (National Institute of Standards
[56] Krylov, A. I. Equation-of-motion coupled-cluster meth- and Technology, Gaithersburg MD, 20899, 2024).
ods for open-shell and electronically excited species: The [58] Wilson Jr, E. B. The normal modes and frequencies of
hitchhiker’s guide to fock space. Annu. Rev. Phys. Chem. vibration of the regular plane hexagon model of the ben-
59, 433–462 (2008). zene molecule. Physical Review 45, 706 (1934).

You might also like