ML_electronic
ML_electronic
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation: Tang, Hao, Xiao, Brian, He, Wenhao, Subasic, Pero, Harutyunyan, Avetik R. et al. 2024.
"Approaching coupled-cluster accuracy for molecular electronic structures with multi-task
learning (preprint)." Nature Computational Science.
As Published: 10.1038/s43588-024-00747-9
Version: Author's final manuscript: final author's manuscript post peer review, without
publisher's formatting or copy editing
Computational methods for molecular and condensed- properties [19, 20], as well as the electronic band struc-
matter systems play essential roles in physics, chemistry, ture of condensed matter [21, 22]. Most of these meth-
and materials science, which can reveal underlying mech- ods take the density functional theory (DFT) results as
anisms of diverse physical phenomena and accelerate ma- the training data, using neural networks (NNs) to fit the
terials design [1]. Among various types of computational single-configurational representation (either the Kohn-
methods, quantum chemistry calculations of electronic Sham Hamiltonian or molecular orbitals) of the DFT
structure are usually the bottleneck, limiting the compu- calculations [15, 19, 21, 23]. Along with the rapid ad-
tational speed and scalability [2]. In recent years, ma- vances of ML techniques, the NN predictions match the
chine learning (ML) methods have been successfully ap- DFT results increasingly well, approaching the chemical
plied to accelerate molecular dynamics simulations and accuracy [8, 16]. However, as a mean-field level theory,
to improve their accuracy in many application scenar- DFT calculations themselves induce a systematic error,
ios [3]. Particularly, ML inter-atomic potential can pre- which is usually several times larger than the chemical
dict energy and force of molecular systems with sub- accuracy [24], limiting the overall accuracy of the ML
stantially lower computational costs compared to quan- model trained on DFT datasets.
tum chemistry methods [4–7]. Indeed, recent advances In comparison, the correlated wavefunction method
in ”universal” ML potential enable large-scale molecu- CCSD(T) is considered the gold-standard in quantum
lar dynamics simulation with the complexity of realistic chemistry [25]. It provides high accuracy predictions on
physical systems [8–11]. Besides ML inter-atomic poten- various molecular properties. Unfortunately, the compu-
tial, rapid advances also appear in another promising di- tational cost of CCSD(T) calculations has a rather un-
rection, namely the ML density functional, which focuses favorable scaling with system size. Hence, it can only
on further improving the energy prediction towards the handle small molecules with up to hundreds of electrons.
chemical accuracy (1 kcal/mol) [12, 13]. This urges the combination of CCSD(T) and ML meth-
Besides energy and force, other electronic proper- ods, which can potentially have both high accuracy and
ties that explicitly involve the electron degrees of free- low computational cost. However, the above-mentioned
dom are also essential in molecular simulations [14]. In ML methods that directly fit the single-configurational
the past few years, ML methods have also been ex- representation of the DFT calculations cannot be di-
tended to electronic structure of molecules, predicting rectly applied to the CCSD(T) training data. This is
various electronic properties such as electric multipole because CCSD(T) does not provide either Kohn-Sham
moments [15–17], electron population [18], excited states (KS) Hamiltonians or single-body electronic wavefunc-
2
tions due to the many-body quantum entanglement na- for the non-local exchange-correlation effects. Generally,
ture of its representation. the non-local exchange-correlation effects can be cap-
In this work, we develop a unified multi-task ML tured in CCSD(T) calculations. However, as mentioned
method for molecular electronic structures. Instead of before, the computational costs of CCSD(T) methods are
focusing solely on energy, our method also provides ac- formidably high for large system. The essence of our ML
curate predictions for various electronic properties; com- method is to obtain the non-local exchange-correlation
pared with ML models trained on DFT datasets, our effects from a NN, whose computational cost scales only
method learns from CCSD(T)-accuracy training data. linearly with system size.
The method incorporates the E3-equivariant neural net- To obtain the ML correction term, we build a NN
work (E3NN) [4, 26], where vectors and tensors are in- model to predict Vθ . The workflow consists of the input
volved in the message-passing step. For brevity we re- layer, the convolutional layer, and the output layer. The
fer to our method as ”Multi-task Electronic Hamilto- input layer takes atomic configurations as input, encod-
nian network” (MEHnet). Using hydrocarbon organic ing them into the node features xI,in for atom informa-
molecules as a testbed, our method predicts molecu- tion and edge features fIJ,in for bond information (I, J
lar energy within chemical accuracy as compared with are indices of atoms). The E3NN framework is employed
both CCSD(T) calculation and experiments. It also pre- for the convolutional layer (shown in Fig. 1b, see Meth-
dicts various properties, including electric dipole and ods for details) because of its good performance in pre-
quadrupole moments, atomic charge, bond order, energy dicting molecular properties [4]. The convolutional layer
gap, and electric polarizability with better accuracy than outputs xI,out and fIJ,out that encode E3 equivariant fea-
B3LYP, one of the most widely used hybrid DFT func- tures of atoms and bonds as well as their atomic envi-
tionals [27]. Our trained model shows robust general- ronment. Subsequently, the equivariant ML correction
ization capability from small molecules in the training Hamiltonian Vθ is constructed using xI,out and fIJ,out in
dataset (molecular weight < 100) to larger molecules the output layer. The effective electronic structure of a
such as Naphthalene and even semiconducting polymers molecule is obtained by solving the eigenvalue equations
(molecular weight up to several thousands). Systemat- of the total Hamiltonian Heff = F′ + Vθ , giving ϵi , the
ically predicting multiple electronic properties using a i-th energy level, and ci , the corresponding molecular
single model with local DFT computational speed, the orbital represented on atomic orbital basis set.
method provides a high-performance tool for computa- Multiple Learning Tasks.
tional chemistry and a promising framework for ML elec- Our scheme aims to predict multiple observable molec-
tronic structure calculations. ular properties (more than just energy). In order to
achieve reduced computational costs, we do not include
information about the entire electronic Hilbert space as
RESULTS learning targets. Instead, the MEHnet is trained on a
series of molecular properties in order to capture their
Computational Workflow. shared underlying representation, the effective single-
In this section, we briefly describe the theoretical back- body Hamiltonian Heff . The corresponding single-body
ground and model architecture of the MEHnet method. energy levels and molecular orbitals are used to evaluate
Detailed information can be found in the Methods sec- a series of ground state properties Og according to the
tion. Basically, we use a NN to simulate the non-local rules of quantum mechanics:
exchange-correlation interactions of a many-body sys-
tem. Then, a physics-informed approach is used to pre- OgMEHnet = fOg ({ϵi }, {ci }), Og = E, p⃗, Q, CI , BIJ ,
dict multiple properties from the output of a single NN. (1)
Given an input atomic configuration, our goal is to ac- where properties Og goes through the ground state en-
quire an effective single-body Hamiltonian matrix that is ergy (E), electric dipole (⃗ p) and quadrupole (Q) mo-
then used to predict quantum chemical properties from ments, Mulliken atomic charge [28] of each atom CI , and
physics principles, as shown in Fig. 1a. First, a fast-to- Mayer bond order [29] of each pair of atoms BIJ . We
evaluate single-configurational method, such as DFT or also evaluate the energy gap (first excitation energy, Eg )
Hartree-Fock method, is used to obtain a mean-field ef- and static electric polarizability α:
fective Hamiltonian, F′ . Note that F′ is easy and fast to
compute, but its accuracy is relatively low. We will use EgMEHnet = fEg ({ϵi }, {ci }, G),
(2)
F′ as the starting point of our ML model, and the total αMEHnet = fα ({ϵi }, {ci }, T).
effective Hamiltonian of the system Heff = F′ + Vθ is
obtained by adding the ML correction term Vθ . In the In principle, the ground state electronic structure does
current formalism, F′ is obtained from a local DFT cal- not contain the information of the energy gap and electric
culation, and it contains only local exchange-correlation polarizability. Therefore, we use the model-output cor-
contribution, and the correction term Vθ would account rection terms G (energy gap correction) and T (dielectric
3
FIG. 1. Schematic of the MEHnet electronic structure workflow. (a) Computation graph of the MEHnet method that predict
multiple quantum chemical properties from atomic configurations inputs. The computational graph consists of input layer
(green blocks), convolutional layer (blue block), and output layer (orange blocks). (b) Model architecture of the MEHnet with
two layers of graph convolution. The output contains both node feature xI,out and edge feature fIJ,out . (c) Training and
testing dataset generation. Each dot represents molecules with the same chemical formula, and is plotted as a function of
number of electrons and atoms. The blue and orange colors correspond to molecules in the training and generalization domains
respectively. The model is trained with small molecules, and is subsequently tested with large molecules. The dot size reflects
the number of conformers and/or vibrational configurations with the same chemical formula in the dataset (from 1 to 500, in
log scale).
screening matrix) to account for the excited states infor- molecule. The weights wV and wO are hyperparame-
mation and the linear response information, respectively. ters whose values are listed in Methods. The weights are
The function forms of fOg , fEg , and fα are elaborated chosen to balance the training tasks so that the training
in Methods D. Note that these properties are all derived errors of all tasks decrease to satisfactory levels. Mini-
from the underlying electronic structure, so they are in- mizing LTotal requires the back-propagation through the
ternally related. Therefore, multi-task learning meth- diagonalization of Heff (that is , calculating ∂ϵi /∂Heff
ods can utilize these relations to mutually enhance the and ∂ci /∂Heff ), which is numerically unstable with di-
model’s generalization capability. rect numerical differentiation. To overcome this issue,
The goal of our multi-task learning is to predict the we derive customized back-propagation schemes for each
properties listed above with coupled-cluster accuracy. property using perturbation theory in quantum mechan-
Hence, the total loss function LTotal constructed as fol- ics (see Methods for details), giving
lows:
X ∇θ ϵi = (ci )† (∇θ Vθ )ci
LTotal = lV + lO ,
X (cp )† (∇θ Vθ )ci (4)
O∈Og ∪{Eg ,α}
∇ θ ci = cp .
lO = wO × MSEloss(O MEHnet
,O label
), (3) ϵi − ϵp
p̸=i
wV X θ
lV = 2 |VIµ,Jν |2 . When evaluating the gradients of properties in Eqs. (1)
Nbasis
Iµ,Jν
and (2) using the chain rule, terms that analytically can-
Here for each property O, lO is the the mean-square er- cel each other are removed in numerical evaluation, mak-
ror (MSE) loss between OMEHnet and Olabel , the MEHnet ing the scheme numerically stable.
predictions (Eq. (1)(2)) and coupled-cluster labels in the Atomic configurations of molecules in our training
training dataset, respectively. Meanwhile, lV is a reg- dataset are generated by the workflow shown in Fig. 1c.
ularization that penalizes large correction matrix Vθ , Our dataset covers various classes of hydrocarbons (sat-
and Nbasis is the total number of basis functions in the urated, unsaturated, alicyclic, and aromatic) and molec-
4
FIG. 2. Benchmark of the model performance on the testing dataset. (a) Testing root-mean-square errors (RMSE) of different
quantities as a function of training dataset size. (b) Computational costs of different methods plotted against number of
electrons. The computational costs is measured as the calculation time (node hour) on a single Intel Xeon Platinum 8260 CPU
node with 48 cores with sufficient memory for all calculations. The scaling deviates from the theoretical asymptotic scaling N 7
for CCSD(T), because the parallelization efficiency is higher for larger molecules. In principle, the N 7 scaling for CCSD(T)
would appear in the large N limit. (c) Prediction RMSE of the energy (E per atom, reference to separate atoms), electric
dipole moment (⃗ p), electric quadrupole moment (Q), Mulliken atomic charge (C), Mayer bond order (B), energy gap (1st
excitation energy, Eg ), and static electric polarizability (α, a.u. means atomic unit) with respect to the coupled-cluster results.
The MEHnet method is compared with the B3LYP hybrid functional, DSD-PBEP86 double hybrid functional [36], DM21 ML
functional [12], and AIQM1 ML potential [11]. A representative atomic configuration of each chemical formula is plotted for
illustration.
ular structures (linear, branched, and cyclic), contain- systems of practical importance. The following discus-
ing both stable and metastable conformers with diverse sions focus on close-shell hydrocarbon molecules (except
types of carbon-carbon bonds (single, double, triple, for the QM9 version of MEHnet that we will describe
and conjugated π-bond, see Supplementary Section 1, later).
the Supplementary Information contained Ref. [30–34]).
The model’s generalization capability from small
Coupled cluster calculations are implemented for var-
molecules to large molecules is essential for its useful-
ious hydrocarbon molecules. The MEHnet model is
ness on complex systems where coupled cluster calcula-
trained on small-molecules training dataset (training do-
tions cannot be implemented on current computational
main, Fig. 1c). The model is then tested on both small
platforms, due to their formidable computational costs.
molecules in the training domain but outside the train-
To test the generalization capability and data efficiency
ing dataset (in-domain validation) and larger molecules
of our model, we train the model with varied train-
outside the training domain (out-of-domain validation).
ing dataset size Ntrain , ranging from 10 to 7440 atomic
Benchmark of Model Performance. configurations of hydrocarbon molecules. The testing
Then, we benchmark the performance of the MEHnet root-mean-square error (RMSE, absolute error in atomic
model and display potential applications of the model in units) of different trained properties exhibit a decreasing
5
TABLE I. Benchmark of MEHnet model’s RMSE in predicting different quantum chemical properties on the in-domain (ID)
testing dataset and out-of-domain (OOD) validation dataset with respect to the coupled cluster calculations. The numbers in
the table are ID/OOD RMSD. Other DFT and machine learning methods are compared. We leave some of the spaces blank
when the method does not directly output the quantity for fair comparison.
RMSE (ID/OOD) Hybrid Double Hybrid ML
Unit B3LYP B3PW91 DSD- PWPB95 DM21 AIQM1 MEHnet (ours)
PBEP86
Energy (/atom) kcal/mol 2.20/2.41 2.03/2.73 0.94/1.20 1.64/1.98 0.22/0.11 0.13/0.06 0.11/0.10
Dipole Debye 0.06/0.06 0.06/0.04 0.03/0.03 0.07/0.05 0.04/0.04 – 0.03/0.04
Quadrupole ea20 0.12/0.21 0.32/0.51 0.11/0.18 0.10/0.14 – – 0.03/0.12
Atomic Charge e 0.19/0.20 0.16/0.16 0.04/0.05 0.05/0.05 0.05/0.04 – 0.04/0.03
Bond Order – 0.05/0.03 0.06/0.04 0.04/0.02 0.06/0.03 0.06/0.03 – 0.02/0.02
Bandgap eV 0.59/0.63 0.65/0.54 3.71/3.26 2.19/1.98 1.71/1.47 – 0.26/0.31
Polarizability a.u. 2.22/4.32 2.53/4.72 4.74/8.05 – – – 1.85/3.91
trend when the training dataset size increases, as shown The MEHnet predictions consistently exhibit smaller
in Fig. 2a, indicating effective model generalization. No- RMSE than the hybrid (B3LYP and B3PW91 [40]), dou-
tably, the energy error has the fastest drops with a slope ble hybrid (DSD-PBEP86 [36] and PWPB95 [41]), and
−0.38
of −0.38 (that means the testing error ∝ Ntrain ). In DM21 [12] functionals on most molecular properties (Ta-
comparison, some of the recently developed advanced ML ble I, except electric dipole moment on the OOD dataset,
potentials (that directly learn energies and their deriva- where the DSD-PBEP86 gives the smallest RMSE). Re-
tives, such as the potentials in Ref. [4, 8]) exhibit lower markably, the RMSE of the combination energy predicted
slopes of about -0.25. This implies potential advantage by the MEHnet is about 0.1 kcal/mol (about 4 meV) per
of the multi-task method: as a multi-task method learns atom in both the ID and OOD dataset. Our method
different molecular properties through a shared represen- exhibits similar RMSE of combination energy compared
tation (the electronic structure), the domain information to the AIQM1 ML potential that features energy pre-
learned from one property can help the model’s gener- dictions within chemical accuracy. These results confirm
alization on predicting other properties [35], providing that the MEHnet’s predictions on reaction energies can
improved data efficiency. approach the quantum chemical accuracy (assuming that
Then, we benchmark the computational costs and pre- on average 1 mol molecules in reactants contain ∼10 mol
diction accuracy of our model trained on 7440 atomic atoms). Note that the B3LYP functional (with the def2-
configurations with 70 different molecules, which will be SVP basis set) exhibits large RMSEs for Mulliken charge
used in the rest of this paper. The MEHnet method mainly because of the basis set error [42]. Although using
exhibits smaller computational cost and slower scaling a large basis set for the B3LYP Mulliken charge gives a
with system size, as compared with the hybrid func- much smaller error, the MEHnet model still gives better
tional, double hybrid functional [36], and CCSD(T) overall accuracy (Supplementary Figure 3).
method (Fig. 2b). Compared to the hybrid functional, Besides the ground-state properties, MEHnet also pro-
our method avoids the expensive calculation of the ex- vides excited-state property Eg and linear response prop-
act exchange thus substantially reducing computational erty α with better overall accuracy than other methods
costs [27]. Using the gold-standard CCSD(T) calcu- in comparison (Table I). For intensive quantities (E per
lation as a reference, the prediction accuracy of the atom, C, B, and Eg ), the errors are on a similar level for
MEHnet method on various molecular properties is com- molecules with different sizes; for extensive quantities (p,
pared with that of several popular functionals and exist- Q, α), there is a trend of increasing error with increasing
ing ML methods, as shown in Fig. 2c and Table I. The system size, because the absolute values of these quanti-
comparison is implemented on both the in-domain (ID) ties themselves increase with system size. In addition, the
and out-of-domain (OOD) testing dataset of hydrocar- MEHnet model gives similar prediction accuracy among
bon molecules. Note that although the B3LYP hybrid different classes of hydrocarbons such as alkane, alkene,
functional is widely used, it is known to exhibit certain alkyne, and arene (Supplementary Figure 1), suggesting
failure modes in hydrocarbon molecules [37], so several consistent generalizability in the hydrocarbon chemical
other high-performance hybrid and double hybrid func- space.
tionals [37, 38] with the DFT-D3 correction [39] are in- Aromatic Molecules.
cluded into comparison (Supplementary Section 2). Hydrocarbon molecules have a vast structural space,
6
FIG. 3. Validation of the MEHnet’s predictions on gas phase aromatic hydrocarbon molecules, as compared with experimental
results. (a) Standard enthalpy of formation. The MEHnet predictions and experimental values from Ref. [43] (right axis)
are compared for 11 molecules (see Supplementary Table II for details). The difference between the MEHnet method and
experimental values are shown by the orange line, and the experimental uncertainty is shown by the red line (left axis).
(b) Infrared spectrum of benzene. The experimental data is from the NIST Chemistry WebBook [57]. Vibration modes
corresponding to the peaks are labeled following the convention in Ref. [58].
including various types of local atomic environments. To Besides small molecules, we also apply MEHnet
further examine the model’s generalization capability in to semiconducting polymers consisting of hundreds of
more complex structures, we apply MEHnet model to atoms, which are difficult to calculate by rigorous cor-
a series of aromatic hydrocarbon molecules synthesized related methods such as CCSD(T). The essential elec-
in experiments [43]. The gas phase standard enthalpy tronic properties of semiconducting polymers originates
of formation Hf is an essential thermochemical property from the conjugated π-bonds with delocalized molecu-
of molecules that can be accurately measured in experi- lar orbitals. As the delocalized molecular orbitals ex-
ments. In this regard, we use the MEHnet model to pre- tend through the whole molecule (Fig. 4a), the polymers’
dict Hf of various aromatic molecules in a comprehensive electronic properties also involve long-range correlation,
experimental review paper Ref. [43]. The MEHnet pre- making it challenging for ML methods. Therefore, it
dictions on Hf are well consistent with experiments on all is important to examine whether MEHnet can capture
molecules, and their difference is only around 0.1 ∼ 0.2 semiconducting polymers’ electronic properties involving
kcal/mol per atom, as shown in Fig. 3a. Note that the delocalized molecular orbitals.
MEHnet prediction error is on the same order of magni- Three kinds of semiconducting polymers, trans-
tude as the experimental error bar (though numerically polyacetylene (t-PA), cyclic polyacetylene (c-PA), and
larger), indicating high prediction accuracy. polyphenylene (PPP), are studied using the MEHnet
Besides thermochemical properties, MEHnet can also model. The model correctly captures the delocalized
predict spectral properties, as shown in Fig. 3b and Sup- π-bond feature of frontier orbitals, i.e., highest occu-
plementary Figure 4. Especially, infrared (IR) spec- pied molecular orbital (HOMO) and lowest unoccupied
trum reflects essential information of molecular vibra- molecular orbital (LUMO), as shown in Fig. 4a. Various
tional modes and their interaction with light. In previ- important electronic properties of semiconducting poly-
ous work on ML electronic structure [16], the predicted mers depend on the chain length, including the energy
peak intensity are usually inconsistent with the experi- gap Eg and polarizability α. We calculate such chain-
ment. In comparison, the MEHnet predictions on both length dependence (up to more than 400 atoms) using
the peak positions and intensity agree well with experi- the MEHnet method, as shown in Fig. 4b,c. One can
mental results in several common hydrocarbon molecules, see that Eg is larger for shorter oligomers and converges
and it also provides both the fundamental bands and to a smaller value for long chains. This is in analogy
combination bands known as “benzene fingers” in the IR to the size effect on the energy gap of quantum dots
spectrum. The good consistency of peak intensity is at- and quantum wells. The converged energy gap for long
tributed to accurate predictions on the transition dipole t-PA and PPP polymers calculated by MEHnet are in
moments that determine the intensity of light-matter in- reasonable agreement with the experimental values (rel-
teraction. Details on calculating the IR spectrum is elab- ative errors within 10%) [44, 45], which are shown as
orated in Supplementary Section 3, where we also show squares in Fig. 4b. The longitudinal static electric polar-
the calculated IR spectra for several other molecules. izability αxx (per monomer) is positively related to the
Large-scale Semiconducting Polymers. polymer chain length. This is because in longer chains,
7
DISCUSSION
tems under consideration. trix SIµ,Jν ≡ ⟨ϕI,µ |ϕJ,ν ⟩ in the non-orthogonal atomic-
Note that the results of CCSD(T) calculations may not orbital representation, where Iµ is the row index and Jν
be consistently accurate for all molecules. Some of the is the column index. The S and F matrices are evaluated
polyaromatic hydrocarbons are more multi-reference in by the ORCA quantum chemistry program package [51]
nature (though it is rare in molecules studied in this pa- version 5.0.4 with the fast-to-evaluate BP86 local den-
per, see Supplementary Section 5), so that the CCSD(T) sity functional [52] and the medium-sized cc-pVDZ basis
calculations themselves exhibit larger errors for these set [50]. As the hydrocarbon molecules we study are all
molecules than for other molecules. As all our training close-shell molecules, we use spin-restricted DFT calcu-
and testing data takes CCSD(T) as the ground truth, lations to obtain F. We also assume the NN correction
our model cannot capture the strong multi-reference ef- term Vθ is spin-independent as well. Namely, the spin-up
fects that are not captured by CCSD(T). One possible and spin-down molecular orbitals and energy levels are
way to adapt the workflow to systems with strong mul- the same, and all molecular orbitals are either doubly
tireference nature is by using the multi-reference con- occupied or vacant.
figuration interaction (MRCI) method [49] to generate The total BP86 energy EBP86 equals the molecular or-
the training dataset. It is also possible to include one- Pne /2
bital energy 2 i=1 ϵi (where ϵi is the i-th molecular
particle reduced density matrix as an output descrip- orbital energy level and ne is the number of electrons)
tor of the MEHnet model to better describe electronic plus a many-body energy EMB :
structure with strong multi-reference nature, as demon-
strated in Ref. [16, 33]. The 1-rdm contains full infor-
ne /2
mation of ground-state single-body properties for both X
EBP86 = 2 ϵi + EMB (5)
single-reference and multi-reference systems. Adapting
i=1
the MEHnet method to more comprehensive datasets can
produce a general-purpose electronic structure predictor,
which is left for future work. EMB originates from the double-counting of electron-
electron interaction in the band structure energy and can
be obtained from the output of the ORCA BP86 DFT
METHODS calculation. Then, the Lowdin-symmetrized KS Hamil-
tonian [53] is obtained as
Graph encoding of atomic configuration
EMB
F′ ≡ S−1/2 FS−1/2 + I, (6)
The input layer takes atomic configurations as in- ne
put, including the information of atomic numbers
(Z1 , Z2 , · · · , Zn ) and atomic coordinates (⃗r1 , ⃗r2 , · · · , ⃗rn ) where the last term is an identity shift to account for the
of a n-atom system. A molecular graph is constructed, many-body energy term. In this case, the direct summa-
where atoms are mapped to graph nodes, while bonds be- tion of molecular orbital energies given by F′ equals:
tween atoms (neighboring atoms within a cut-off radius
rcut = 2 Å) are mapped to graph edges. The atomic num- ne /2 ne /2
X
′
X EMB
bers ZI of input elements are encoded as node features 2 eigi (F ) = 2 eigi (S−1/2 FS−1/2 + I)
ne
xI,in by one-hot embedding. The atomic coordinates i=1 i=1
are encoded as edge h features fIJ,ini ≡ [fc (rIJ ), Ylm (⃗eIJ )],
ne /2
X
−1/2 −1/2 EMB
where fc (r) ≡ 1 r
cos (π rcut ) + 1 is a smooth cut-off =2 eigi (S FS )+
2
i=1
ne
function reflecting the bond length rIJ ≡ |⃗rI − ⃗rJ |, and
ne /2
Ylm (⃗eIJ ) is the spherical harmonic functions acting on X
−⃗ =2 ϵi + EMB ,
the unit vector ⃗eIJ ≡ |⃗⃗rrII −⃗
rJ
rJ | representing the bond ori- i=1
entation [26]. We include Ylm tensors up to l = 2. The (7)
electron wavefunction is represented using an atomic or- where eigi is a function that returns the i-th lowest eigen-
bital basis set {|ϕI,µ ⟩} [50], where I is the index of atom value of a matrix, and we use the fact that the en-
and µ is the index of atomic orbital basis function. ergy level ϵi is the eigenvalue of the Lowdin-symmetrized
Hamiltonian eigi (S −1/2 F S −1/2 ) [53]. After this transfor-
mation, the KS effective Hamiltonian F′ already includes
BP86 single-body effective Hamiltonian the many-body energy EMB , and the total electronic en-
ergy is just the summation of molecular orbital energies.
Then, a quantum chemistry calculation [51] (the Or- Adding the EMB term does not change the eigenfunction
bital Integrator block in Fig. 1a) is used to evaluate the and relative energy levels, so that all other properties are
single-body effective Hamiltonian FIµ,Jν and overlap ma- not changed by adding the EMB term.
9
where ∇Vai ≡ (ca )† (∇θ Vθ )ci , the Nbasis × Nbasis ma- molecule structure to sample an ensemble of atomic con-
trix IJ is identity in the block diagonal part of atom figurations. The MD simulation uses PreFerred Potential
J and zero elsewhere. Meanwhile, we define P ≡ v4.0.0 [10] at a temperature of 2000 K that enables large
Pne /2 i i †
2 i=1 c̃ (c̃ ) . The essential method to avoid numerical bond distortion but does not break the bonds. Initial ve-
instability is to remove terms that can be proved to can- locity is set as Maxwell Boltzmann distribution with the
cel each other. Taking ∇θ fp⃗ as an example: in Eq. (4), same temperature. Langevin NVT dynamics is used with
the summation over p goes through all states except i. the friction factor of 0.001 fs−1 and timestep of 2 fs. The
But as the summed formula in Eq. (16) is antisymmetric TeaNet potential run for 100,000 steps for each chemical
to i and a, the terms that a goes from 1 to ne /2 cancel formula, and one atomic configuration is sampled every
each other. Only terms that a goes from ne /2 + 1 to 200 timesteps in the MD trajectory. 500 configurations
Nbasis have a non-zero contribution to the final gradient. (including the initial equilibrium configuration) are sam-
Therefore, i is always occupied, and a is always unoc- pled for each chemical formula in the training domain.
cupied in the summation. As close-shell molecules have 3/4 of the 10,000 configurations are sampled to form the
a finite bandgap, ϵi and ϵa are not close to each other training dataset, and the left 1/4 forms the in-domain
in any term of the summation, so evaluating Eq. 16 is testing dataset. The out-of-domain testing dataset con-
numerically stable. tains 500 configurations. Note that as we aim to include
Similarly, the gradients of Eg and α are as follow: structures out of equilibrium positions, geometric relax-
ation is not needed before CCSD(T) calculation (oth-
∇θ fEg = (1 + G1 ) ∇Vne /2+1,ne /2+1 − ∇Vne /2,ne /2 erwise all structures will relax back to the equilibrium
+ (ϵne /2+1 − ϵne /2 )∇θ G1 + ∇θ G2 positions).
(17) Then, a CCSD(T) calculation with the cc-pVTZ basis
To calculate the gradient of α, we first evaluate the gra- set is implemented in ORCA [51] for each selected config-
dient of α0 , and then derive ∇θ fα using the chain rule: uration, giving the training labels of total energy, electric
NXbasis ne /2 dipole and quadrupole moment, Mulliken atomic charge,
2
X ⃗rai⃗ria (∇Vii − ∇Vaa ) and Mayer bond order. An EOM-CCSD calculation [56]
∇θ α0 = 2e Re
(ϵa − ϵi )2 with the cc-pVDZ basis set is then implemented to ob-
a=ne /2+1 i=1
tain the first excitation energy (energy gap). Finally, we
X ⃗rai ⃗rip ∇Vpa ⃗rpa ∇Vai conduct a polarizability calculation with the CCSD and
−2 +
(ϵa − ϵi ) (ϵp − ϵa ) (ϵp − ϵi ) cc-pVDZ basis set. The overlap matrix S and starting-
p̸=a,i
point effective Hamiltonian F is obtained from a BP86
∇θ fα = (I + α0 T)−1 (∇θ α0 )(I − Tα) DFT calculations with the cc-pVDZ basis set.
− α0 (∇θ T)(I + α0 T)−1 α
(18)
The above equations give gradients of all terms in the loss Model training
function expressed by gradients to the direct outputs of
the MEHnet, ∇θ Vθ , ∇θ G, and ∇θ T. The weight parameters in the loss function is listed
as follow: wV = 0.1, wE = 1, wp⃗ = 0.2, wQ = 0.01,
wC = 0.01, wB = 0.02, wEg = 0.2, wα = 3 × 10−5 . All
Dataset generation quantities are in atomic unit. The model training is im-
plemented by full gradient descend (FGD) with Adam
First, 85 small hydrocarbon molecule structures are optimizer. For the finally deployed model (7,440 train-
collected from the PubChem database [55]. The train- ing data points), it is first trained on 1240 data points
ing domain and generalization domain include 20 and 3 sampled from the whole training dataset for 5000 FGD
different chemical formula shown as the horizontal axis steps with initial learning rate of 0.01. The learning rate
labels of the first 20 and last 3 columns in Fig. 1c, re- is decayed by a constant factor γ1 = 10−1/10 per 500
spectively. Each chemical formula includes up to 5 dif- steps. Then, the model is trained on the whole dataset
ferent molecules (conformers) taken from the PubChem with 7,440 data points for 6,000 FGD steps with a learn-
database. The total number of molecules (conformers) in ing rate of 0.001. For other models trained on smaller
the training domain and out-of-domain testing dataset is dataset in Fig. 2a in the main text, the model is trained
70 and 15, respectively. The full list of molecules and for 3,000 FGD steps with initial learning rate of 0.01
the number of atomic configurations for each molecule decayed by γ2 = 10−1/6 per 500 steps. As the model
are listed in Supplementary Table I. Principles of select- trained on 640 data points do not converge in the 3000-
ing these molecules and their diversity are discussed in step training, we implement a 10,000-step training, with
Supplementary Section 1. an initial learning rate of 0.01 that decays by γ1 every
Molecular dynamics (MD) simulation with TeaNet in- 500 steps in the first 5,000 steps and keeps constant at
teratomic potential [6, 7] is then performed for each the last 5,000 steps.
12
[55] Kim, S. et al. PubChem 2023 update. Nucleic Acids [57] Linstrom, P. & W.G. Mallard, E. NIST Chemistry Web-
Research 51, D1373–D1380 (2022). URL https://doi. Book, NIST Standard Reference Database Number 69,
org/10.1093/nar/gkac956. Ch. Infrared Spectra (National Institute of Standards
[56] Krylov, A. I. Equation-of-motion coupled-cluster meth- and Technology, Gaithersburg MD, 20899, 2024).
ods for open-shell and electronically excited species: The [58] Wilson Jr, E. B. The normal modes and frequencies of
hitchhiker’s guide to fock space. Annu. Rev. Phys. Chem. vibration of the regular plane hexagon model of the ben-
59, 433–462 (2008). zene molecule. Physical Review 45, 706 (1934).