Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm

Quantum Hamiltonian-Based Models &
the Variational Quantum Thermalizer Algorithm

Guillaume Verdon,1, 2, 3, ∗ Jacob Marks,1, 4, † Sasha Nanda,1, 5 Stefan Leichenauer,1 and Jack Hidary1
1
X, The Moonshot Factory, Mountain View, CA
2
Institute for Quantum Computing, University of Waterloo, ON, Canada
3
Department of Applied Mathematics, University of Waterloo, ON, Canada
4
Department of Physics, Stanford University, Stanford, CA
5
Division of Physics, Mathematics and Astronomy, California Institute of Technology, CA
(Dated: October 7, 2019)
We introduce a new class of generative quantum-neural-network-based models called Quantum
Hamiltonian-Based Models (QHBMs). In doing so, we establish a paradigmatic approach for
quantum-probabilistic hybrid variational learning, where we efficiently decompose the tasks of learn-
ing classical and quantum correlations in a way which maximizes the utility of both classical and
arXiv:1910.02071v1 [quant-ph] 4 Oct 2019
quantum processors. In addition, we introduce the Variational Quantum Thermalizer (VQT) for gen-
erating the thermal state of a given Hamiltonian and target temperature, a task for which QHBMs
are naturally well-suited. The VQT can be seen as a generalization of the Variational Quantum
Eigensolver (VQE) to thermal states: we show that the VQT converges to the VQE in the zero
temperature limit. We provide numerical results demonstrating the efficacy of these techniques in
illustrative examples. We use QHBMs and the VQT on Heisenberg spin systems, we apply QHBMs
to learn entanglement Hamiltonians and compression codes in simulated free Bosonic systems, and
finally we use the VQT to prepare thermal Fermionic Gaussian states for quantum simulation.
I. INTRODUCTION the first member of a new class of algorithms, which we

call quantum-probabilistic hybrid variational algorithms,
As near-term quantum devices move beyond the which are a combination of classical probabilistic vari-
point of classical simulability, also known as quantum ational inference [29–35] and quantum variational algo-
supremacy [1], it is of utmost importance for us to rithms.
discover new applications for Noisy Intermediate Scale More specifically, in this paper we focus on the task of
Quantum devices [2] which will be available and ready generative modelling of mixed quantum states. It is gen-
in the next few years. Among the most promising ap- erally accepted that one must employ a quantum-based
plications for near-term devices are Quantum Machine representation to efficiently learn the quantum correla-
Learning (QML) [3–14], Quantum Simulation (QS) [15– tions of a pure quantum state, as classical representa-
20], and Quantum-enhanced Optimization (QEO) [9, 21– tions of quantum states scale poorly in the presence of
28]. Recent advances in these three areas have been dom- quantum correlations such as entanglement [36]. Mixed
inated by a class of algorithms called hybrid quantum- quantum states, arising from probabilistic mixtures of
classical variational algorithms. In these algorithms, a pure quantum states, generally exhibit both classical and
classical computer aids the quantum computer in a search quantum correlations. One must therefore learn a hy-
over a parameterized class of circuits. These parame- bridized representation featuring both quantum correla-
terized quantum circuits are sometimes called quantum tions and classical correlations.
neural networks [3–5]. Key to the success of quantum- Within the new paradigm of quantum-probabilistic hy-
classical algorithms is hybridization: in the near-term, brid machine learning, we introduce a class of models
quantum computers will be used as co-processors for clas- called Quantum Hamiltonian-Based Models (QHBM).
sical devices. The work in this paper proposes a new way These models are constructed as thermal states (i.e.,
to hybridize certain quantum simulation and QML tasks quantum exponential distributions) of a parameterized
in a way that fully takes advantage of the strengths of modular Hamiltonian. As a first set of applications for
both devices. this class of models, we explore the learning of unknown
The rise of variational quantum algorithms can be quantum mixed states, given access to several copies
traced back to the invention and implementation of the (quantum samples) of this quantum-probabilistic distri-
Variational Quantum Eigensolver [16], which sparked a bution. As a second class of applications for QHBMs,
Cambrian explosion of works in the field of near-term al- we consider the task of generating the thermal state of a
gorithms. Similarly, in this paper, not only do we intro- quantum system given knowledge of the system’s Hami-
duce a direct generalization of the VQE, but we introduce tonian.
Before proceeding with the main body of the paper,
let us establish a broader context from the point of view
of classical machine learning. The field of machine learn-
∗ Equal contribution; contact: gverdon@x.team ing (ML) has seen several breakthroughs in recent years.
† Equal contribution These successes have often been attributed to the rapid
2
advancements in deep learning [37]. In deep learning, II. QUANTUM HAMILTONIAN-BASED

one learns representations of data distributions. Such MODELS
representations can consist of a neural network [38], a
tensor network [39, 40], or more generally a parameter- In order to represent the hybrid quantum-classical
ized network of compositions of differentiable mappings. statistics of mixed states, the QHBM ansatz is structured
The parameters are trained by optimizing some metric in terms of a “simple” (i.e. nearly-classical) parameter-
of success called the loss function, often via gradient de- ized latent mixed state which is passed through a unitary
scent with backpropagation of errors [41]. One of the quantum neural network to produce a visible mixed state.
major tasks in unsupervised deep learning is so-called In this section we will introduce the general framework
generative modelling of a distribution. of QHBM before proceeding to training and examples in
In discriminative machine learning models, a model subsequent sections. The flow of classical and quantum
learns the conditional probability of a target, Y , given information in a QHBM is illustrated in Figure 1, and
the observation x, i.e., P (Y |X = x). Generative models background on quantum neural networks is provided in
take on the task of learning the joint probability distribu- Appendix A.
tion P (X, Y ), enabling a trained generative model to gen- The variational latent distribution ρ̂θ with variational
erate new data that looks approximately like the training parameters θ is constructed in such a way that the
data. Generative models have been used for many unsu- preparation of ρ̂θ is simple, from a quantum computa-
pervised tasks, including generating new datapoints akin tional standpoint.1 Quantum correlations are incorpo-
to a dataset, inpainting [42], denoising [43], superreso- rated through the unitary Û (φ) with model parameters
lution [44], compression [45], and much more. Perhaps φ.
the most widespread generative techniques in classical Our complete variational mixed state is then
machine learning are Generative Adversarial Networks
(GANs) [46], which pit a generative network and an ad- ρ̂θφ = Û (φ)ρ̂θ Û † (φ). (1)
versarial network against each other, and variational au- We call this state the variational visible state. It is
toencoders (VAEs) [30], which maximize a variational the output of the inference mechanism of our compos-
lower bound to the log likelihood of the data in order ite model, and will be either a thermal state or learned
to increase the probability of the model to generate the approximation to a target mixed state, depending on the
dataset, and hence, similar datapoints. Both architec- task.
tures have demonstrated great successes and have re-
mained competitive with each other in terms of perfor-
mance and scalability [47, 48]. A. Modular Hamiltonians and the Exponential
Recently, a third type of generative algorithm — the Ansatz
generalized form of an Energy Based Model (EBM) —
has been gaining traction in the classical machine learn- Quantum Hamiltonian-Based Models are quantum
ing community. This new architecture, derived as gen- analogues of classical energy-based models, an analogy
eralization of early energy-based architectures of neural which will be made clear in this section.
networks such as Boltzmann machines, has been shown to Without loss of generality, we can consider the latent
be competitive with GANs and VAEs for generative tasks distribution to be a thermal state of a certain Hamilto-
at large scales [49]. The EBM approach draws its inspi- nian:
ration from distributions encountered in physics, namely
1 −K̂θ
thermal (exponential) distributions, where the probabil- ρ̂θ = Zθ e , Zθ = tr[e−K̂θ ]. (2)
ity of a given sample is proportional to the exponential of
a certain function, called the energy. Instead of sampling We call this K̂θ the latent modular Hamiltonian, and it
the exponential distribution from a fixed energy function, is one of a class of operators parameterized by the la-
EBMs have a parameterization over a hypothesis space tent variational parameters θ. Here Zθ = tr[e−Kθ ] is the
of energy functions, and the parameters which maximize model partition function, which is, notably, also parame-
the likelihood (relative entropy) to the dataset can be terized by the latent variational parameters. Now, given
found via optimization. Boltzmann machines, a type of this form for the latent state, notice that the variational
energy-based model with an energy function inspired by visible state is a thermal state of a related Hamiltonian:
Ising models, have long been in use. The recent innova- 1 −Û (φ)K̂θ Û † (φ) 1 −K̂θφ
ρ̂θφ = Zθ e ≡ Zθ e , (3)
tion, however, has been to use a neural network in or-
der to have a more general and flexible parameterization where we define the model modular Hamiltonian as
over the space of energy functions. The differentiabil- K̂θφ ≡ Û (φ)K̂θ Û † (φ), which is parameterized by both
ity of neural networks is then used to generate thermal
samples according to Stochastic Langevin Gradient Dy-
namics [50]. The construction presented in this paper is
1
analogous to this energy-based construction, generalized We define this more precisely a few paragraphs below when giving
to the quantum domain. examples of structures for the latent space distribution.
Quantum Hamiltonian-Based Model 3
the latent variational parameters θ and the model param-

eters φ. Thus, we see that our QHBM ansatz represents a
class of quantum distributions which are thermal states
of parameterized Hamiltonians. As we will see below,
this exponential structure is useful for computing rela-
tive entropies of the model against target distributions.
We note that the above structure is in direct anal-
ogy with classical energy-based models. In such models,
the variational distribution is of the P exponential form
pθ (x) = Z1θ e−Eθ (x) , where Zθ ≡ x e −Eθ (x)
and the
energy function Eθ (x) is parameterized by a neural net- FIG. 1. Quantum-classical information flow diagram
work. The network is trained so that the samples from for hybrid quantum-probabilistic inference for a Quantum
pθ mimic those of a target data distribution. Hamiltonian-Based Model with a general classical latent dis-
In place of a parameterized classical energy function, tribution. Here, we have unitaries V̂x which map the quantum
we have a parameterized modular Hamiltonian operator, computer’s initial state |0i to computational the computa-
and the variational model is a thermal state of this operation basis state |xi which corresponds to the sampled value
tor. This justifies why we call it a Quantum Hamiltonian- x ∼ pθ (x) on a given run.
Based Model instead of simply an Energy-Based model.
The thermal state of the Hamiltonian is designed to repli-
cate the quantum statistics of the target data. ansatz to be one that is quantumly simple, i.e., of low-
We will distinguish two distinct tasks within the complexity for the quantum computer to prepare. The
QHBM framework. The first is generative learning of two types of latent distributions employed in this paper
a target mixed state, given access to copies of said mixed will be either a factorized latent state, or a general classi-
state. We call this task Quantum Modular Hamilto- cal latent space distribution. Let us now introduce these
nian Learning (QMHL). As described above, we should two cases and dive further into the specifics of each.
think of this task as finding an effective modular Hamil-
tonian for the target state. The second task involves
being given a target Hamiltonian, and generating ap- 1. Factorized Latent Space Models
proximate thermal states of some temperature with re-
spect to that Hamiltonian. In our framework, this means A first choice for the latent space structure is a factor-
that we variationally learn an effective modular Hamilto- ized latent state, which is a latent state of the the form
nian which reproduces the statistics of the target thermal
state. We call this task Variational Quantum Thermal- N
O
ization (VQT). We will treat QMHL and VQT in depth ρ̂θ = ρ̂j (θj ), (5)
in Sections III A and III B, respectively. j=1
Before discussing in more detail the structure of the

latent space, let us draw attention to two quantities which where the total quantum system is separated into N
will be important for the training process in both VQT smaller-dimensional subsystems, each with their own set
and QMHL: the partition function and the Von Neumann of variational parameters θj for which the mixed states
entropy of the model. Comparing equations (2) and (3), of the subsystems ρ̂j (θj ) are uncorrelated.
we see that the partition function Zθ is the same for both This structure has several useful benefits. Notice that,
the latent state and the visible state. Furthermore the due to this tensor product structure, the latent modular
Von Neumann entropy of the latent and visible states Hamiltonian becomes a sum of modular Hamiltonians of
are also identical due to the invariance of Von Neumann the subsystems,
entropy under unitary action, N
X O
K̂θ ≡ K̂j (θj ), ρ̂θ = 1
Z θj e−K̂j (θj ) , (6)
S(ρ̂θ ) = S Û (φ)ρ̂θ Û † (φ) = S(ρ̂θφ ). (4) j j=1
We will leverage this in our training of the QHBM, where where Zθj = tr[e−K̂j (θj ) ] is the j th subsystem partition
efficient computation or estimation of the latent state en- function, and K̂j (θj ) the j th subsystem modular Hamil-
tropy will give us knowledge of the visible state entropy. tonian. This sum decomposition becomes useful when
estimating expectation values of the modular Hamilto-
nian, as it becomes a sum of expectation values of the
B. Structure of the Latent Space subsytems’ modular Hamiltonians,
X
In this section we will discuss the form of the latent hK̂θ i = hK̂j (θj )i. (7)
model ρ̂θ . We consider a good choice of a latent space j
4
From the above expression, we see that the partition general N -outcome multinoulli [29] distribution. The to-
function
Q is a product of the subsystem partition functions tal number of parameters for such a latent space param-
Zθ = j Zθj , and hence the logarithm of the partition eterization with factorized subsystems scales as the sum
PN
function becomes a sum: of the subsystem dimensions, j=1 dj , which is much
PN smaller than the most general (non factorized) latent
log(Zθ ) = j=1 log(Zθj ). (8) distribution, whose number of parameters scales as the
QN
product j=1 dj , which is the total dimension of the sys-
Furthermore, the entropy of the latent state (and hence tem’s Hilbert space.
of the visible state, via equation (4)) becomes additive The second type of modular Hamiltonian we use, in
over the entropies of the subsystems: Section IV B, is the number operator of a continuous-
PN variable quantum mode (qumode), or harmonic oscilla-
S(ρ̂θ ) = j=1 S ρ̂j (θj ) (9) tor,

K̂j (θj ) = 2j (x̂2 + p̂2 ) = θj â†j âj + 21 .
θ
This is convenient, as estimating N entropies of states (11)
in d-dimensional Hilbert space is much simpler generally
than computing the entropy of a state in dN -dimensional The exponential distribution of such modular Hamiltoni-
space. ans then becomes a single-mode thermal state [55] which
Another feature of a factorized state is that the num- has a Wigner phase space representation as a symmetric
ber of parameters used to describe such a distribution is Gaussian. The single parameter per mode here, θj , mod-
linear in the number of subsystems N . The precise num- ulates the variance of the Gaussian. This single-mode
ber of parameters depends on the structure of the states thermal state is the closest thing we have to a latent
within each subsystem. There are many possibilities, and product of Gaussians, which is the standard choice of la-
we will present some concrete examples below. tent distributions in several variational inference models
Finally, by learning a completely decorrelated (in in classical machine learning, including variational au-
terms of both entanglement and classical correlations) toencoders [30]. Gaussian states of such quantum modes
representation in latent space, we are effectively learning are very natural to prepare on continuous-variable quan-
a representation which has a natural orthogonal basis tum computers [56], and are also emulatable on digital
for its latent space, allowing for latent space interpola- quantum computers [13].
tion, independent component analysis, compression code Finally, as will be explored in Section IV C, our third
learning, principal component analysis, and many other type of latent subsystem modular Hamiltonian is the par-
spin-off applications. In classical machine learning, this ticle number operator for Fermions, K̂j (θj ) = θj ĉ†j ĉj .
is known as a disentangled representation [51], and there
have been several recent works adressing this machine
learning task [52]. 3. General Classical Latent Space Models
In some cases, a decorrelated (disentangled) represen-

2. Examples of Latent Subsystem Modular Hamiltonians tation for the latent space is not possible or capable
of yielding accurate results. In classical deep learning,
this factorized latent space prior assumption has been
For the QHBMs considered in this paper, as explored one of the main reasons that VAEs with uncorrelated
in-depth in Section IV, we use several different types Gaussian priors perform worse than GANs on several
of latent subsystem modular Hamiltonians within the tasks. To remedy this situation, several classical algo-
factorized-latent-state framework of equation (6). We rithms for more accurate variational inference of latent
will review them here. space distributions have popped up, notably including
The first class of modular Hamiltonians employed in Neural ODEs [35], normalizing flow-based models [33],
our experiments are qudit operators [53] diagonal in the and neural energy-based models [49].
computational basis. In our particular numerics in Sec- This same generalization can be applied to QHBMs,
tion IV A, we use two-level systems, i.e. qubits. The moving from the factorized latent space structure of (6)
more general qudit case is akin to the eigenbasis repre- to more general distributions. This should be understood
sentation of the Hamiltonian of a multi-level atom, as a delegation of parts of the representation to a classi-
Pdj cal device running classical probabilistic inference. The
K̂j (θj ) = k=1 θjk |kihk|j , (10) job of the classical device is to sample from the classical
latent space distribution.
j d
where the the eigenvalues {θjk }k=1 of this Hamiltonian In the most general formulation, the latent state can
form the set of variational parameters of the distribution. be represented as
The latent distribution of each subsystem is then effec- X
tively a softmax [54] function of the eigenvalues of this ρ̂(θ) = pθ (x)|xihx|, (12)
Hamiltonian, and can be considered as equivalent to a x∈Ω
5
where the summation is formal, and runs over the in- Generally, given a target mixed state σ̂D which we
dex set Ω of some basis in Hilbert space, which we call want to generatively model, we know that there exists
the computational basis. Here the probability distribu- a diagonal representation of this mixed state:
tion pθ (x) is generally a parameterized class of varia-
tional probability distributions over the computational ∀ σ̂D , ∃ ŴD : ŴD† D̂D ŴD = σ̂D , D̂D =
P
λx |xihx|
x
basis domain Ω. Since a general categorical (multinoulli) (15)
distribution would need a number of parameters which where tr(D̂D ) = 1, and WD is a unitary operator. Thus
scales as |Ω|, the dimension of the global Hilbert space, the approach outlined above has, in principle, the capac-
which is generally exponentially large (e.g., |Ω| = 2N for ity to represent any mixed state. The challenge remains
N qubits), one needs a more efficient parameterization of to pick a proper prior for both the unitary quantum neu-
the latent distribution. This is where classical algorithms ral network Û (φ) and for the parameterization of the
for probabilistic and variational inference come into play. classical latent distribution pθ (x) (or equivalently, the la-
The classical computer is thus tasked with the varia- tent energy function Eθ (x)). A good ansatz is one which
tional learning of this classical distribution. There are a uses knowledge about the physics of the system to form
plethora of techniques for variational inference to choose a prior for both the latent space and the unitary trans-
from, as were listed above. The key feature we need from formation. Although it may be tempting to use a general
such classical algorithms is the ability to estimate the log- multinoulli distribution for the latent distribution com-
likelihood of the model distribution, the gradients of the bined with a universal random quantum neural network
(log) partition function, and the entropy. These quan- for the unitary ansatz, the model capacity in this case is
tities will become useful in various generative learning far too large. One encounters not only exponential over-
tasks, see Section III for more details. head for parameter estimation of the high-dimensional
The most natural fit for our needs are the modern vari- multinoulli distribution, but also the quantum version
ant of classical energy-based models [49]. In EBMs, a of the no-free lunch theorem [4]. In Section IV, we see
neural network Eθ : Ω 7→ R parameterizes the energy several scenarios where the structure of the latent space
map from the computational basis to a real value, i.e. for and the unitary transformation are well-adapted to the
any value x ∈ Ω it can produce the corresponding energy complexity of the situation at hand.
Eθ (x). Furthermore, due to the easy differentiability of
neural networks, one can rather easily compute gradi-
ents of the energy function, ∇θ Eθ (x) and ∇x Eθ (x). To
leverage these gradients, one uses Langevin dynamics or III. QUANTUM GENERATIVE MODELLING
Hamiltonian Monte Carlo to generate samples x ∼ pθ (x) OF MIXED & THERMAL STATES
according to the exponential distribution of the energy
function, In this section, we introduce two types of generative
1 −Eθ (x) tasks for quantum Hamiltonian-based models which are
pθ (x) = Zθ e (13)
also valid for more general quantum-probabilistic models.
−Eθ (x)
P
where Zθ ≡ x∈Ω e is the partition function. The first task is quantum modular Hamiltonian learn-
As a straightforward feature of this approach, given an ing (QMHL), where one effectively learns the logarithm
ensemble of points sampled from a different distribu- (modular Hamiltonian) of an unknown data mixed state.
tion x ∼ q(x), one can straightforwardly use the neu- Once trained, one is then able to reproduce copies of this
ral network to evaluate expectation values of the energy unknown mixed state via an exponential (thermal) distri-
with respect to this distribution, Ex∼q(x) [E(x)]. Addi- bution of the learned modular Hamiltonian. The second
tionnally, this approach can provide an estimate and/or tasks is dual to the first task: given a known Hamilto-
gradients
P of the entropy of the distribution S(pθ ) = nian, learn to produce copies of the thermal state of this
x∈Ω −pθ (x) log pθ (x). Given the abilities of the classi-
Hamiltonian. While both tasks can be considered in the
cal model above, one can use the classical
P algorithm to broad class of generative modelling, the first task is a
generate the latent mixed state ρ̂(θ) = x∈Ω pθ (x)|xihx| quantum machine learning task, while the second may
by sampling x ∼ pθ and preparing the computational ba- be considered as a quantum simulation task.
sis state |xihx| after each sample. Furthermore, we can As we have already detailed how to construct several
define the modular Hamiltonian for this latent variational variants of the QHBM with a variety of latent space
model as structure in Section II — including details on how to
X compute expectations of the modular Hamiltonian, gen-
K̂θ = Eθ (x)|xihx|. (14)
erate samples of the latent state, and compute quantities
x∈Ω
such as the partition function and entropy of the model
Thus one can estimate expectation values of this operator — we will focus on the general framework of the QHML
by feeding standard basis measurement results through and VQT without too many specifics about how to com-
the neural network function Eθ (x) and averaging. We pute the various quantities involved. For more details
now have outlined all the ingredients needed for inference on gradients of the loss functions introduced here, see
and training of this type of hybrid QHBM. Appendix B.
6
A. Quantum Modular Hamiltonian Learning of model’s candidate approximation to σ̂D , where θ, φ are
Mixed Quantum States variational parameters. In classical probabilistic machine
learning, one minimizes the Kullback-Leibler divergence
Analogous with the classical case, quantum generative (relative entropy), an approach known as “expectation
models seek to learn to replicate the statistics and cor- propagation.” [29] We directly generalize this method
relation structure of a given quantum data distribution to quantum probabilistic machine learning, now aim-
from which we have a finite number of samples. QHBMs ing to minimize the quantum relative entropy between
make this learning process feasible by positing a form the quantum data distribution and our quantum model’s
that inherently separates the tasks of learning the quan- candidate distribution, subject to variations of the pa-
tum correlations in the learned representation of a quan- rameters. Thus we aim to find
tum distribution.
argminθ,φ D(σ̂D kρ̂θφ ), (16)
Typical classical distributions are a set D of samples
drawn from some underlying distribution d ∼ pD . The where D(σ̂kρ̂) = tr(σ̂ log σ̂) − tr(σ̂ log ρ̂) is the quantum
sampled dataset distribution is simply the categorical dis- relative entropy. Due to the positivity of relative entropy,
tribution over the datapoints d ∈ D. A quantum dataset the optimum of the above cost function is achieved if and
generally can be a similar mix of various datapoints, in only if the variational distribution of our model is equal
this case quantum states σ̂d observed at different frequen- to the quantum data distribution:
cies (inverse probabilities). The effective
P mixed state rep-
resenting this dataset is then σ̂D = d∈D pd σ̂d , where we D(σ̂D kρ̂θφ ) ≥ 0, D(σ̂D kρ̂θφ ) = 0 ⇐⇒ σ̂D = ρ̂θφ .
obtain the mixed state σ̂d with probability pd from our (17)
datasetVariational Quantum
D. As modelling a Thermalizer
mixture of states is effectively
equivalent to the task of modellingvariational
one mixed state, we We can use this as a variational principle: the relative
will focus on simply learning to approximate
parameters a single data entropy is our loss function and we can find optimal pa-
Quantum
mixed state σ̂D using (classical)
a variational quantum-probabilistic rameters of the model {θ ∗ , ϕ∗ } such that ρ̂θ∗ φ∗ ≈ σ̂D .
model. We assume processing
that we have access to several copies As we will see below, the use of QHBMs becomes crucial
unit directly as quantum data avail-
of this data mixed state for the tractability of evaluation of the relative entropy,
able in quantum memory. making its use as a loss function possible. Details on
the gradient
Classical output computation of the loss are discussed in Ap-
(e.g.,pendix
signal B.
classification
Now, for a target quantum data mixed state σ̂D and
label)a QHBM variational state ρ̂θφ of the form described in
Equation (3), our goal is to minimize the forward relative
entropy,
D(σ̂D kρ̂θφ ) = −S(σ̂D ) + tr(σ̂D K̂θφ ) + log(Zθ ). (18)
Notice that the first term, S(σ̂D ) = −tr(σ̂D log σ̂D ), the
entropy of the data distribution, is independent of our
variational parameters, and hence, for optimization pur-
poses, irrelevant to include. This is convenient as the
entropy of the dataset is a priori unknown. We can thus
FIG. 2. Information flow for the process of training for the
use the last two terms as our loss function for QHML.
QHBM applied to Modular Hamiltonian Learning. Grayscale These terms are known as the quantum cross entropy
indicates classical information processing happening on a clas- between our data state and our model,
sical device while colored registers and operations are stored
and executed on the quantum device. The θ parameters de- Lqmhl (θ, φ) ≡ −tr(σ̂D log ρ̂θφ ) = tr(σ̂D K̂θφ ) + log(Zθ ).
termine the latent space distribution and thus the modular (19)
Hamiltonian K̂θ . From the known latent distribution, one
can estimates of the parameterized partition function log Zθ We call this loss function the quantum variational cross
on the classical device. One then applies the inverse unitary entropy loss. The first term can be understood as the
quantum neural network Û † (φ) and estimates the expecta- expectation value of the model’s modular Hamiltonian
tion value of the modular Hamiltonian K̂θ at the ouput via with respect to the data mixed state. In order to esti-
several runs of inference on the quantum device and measure- mate this modular energy expectation, it is first useful to
ment. The partition function and modular expectation are express the modular Hamiltonian of the model in terms of
then combined to yield the quantum variational cross entropy the latent modular Hamiltonian: K̂θ,φ = Û (φ)K̂θ Û † (φ).
loss function (20). Then, by the cyclicity of the trace, we can rewrite the loss
as
Thus, in complete generality, we consider the prob-
lem of learning a mixed state σ̂D , and let ρ̂θφ be our Lqmhl (θ, φ) = tr([Û † (φ)σ̂D Û (φ)]K̂θ ) + log(Zθ ). (20)
7
To better understand this, we define the pulled-back data reason it is called a relative free energy is because it is
state σ̂D,φ ≡ Û † (φ)σ̂D Û (φ), which is the state obtained ˆ ≡ tr(Ĥ ξ)
the difference of free energy F (ξ) ˆ − 1 S(ξ),
β
ˆ up
by feeding our quantum data through the unitary quan- to a factor of β, between our ansatz state and the true
tum neural network circuit in reverse (or rather, inverse). thermal state,
Thus, this term in the loss is equivalent to the expecta-
tion value of the latent modular Hamiltonian with respect D(ρ̂θφ kσ̂β ) = βF (ρ̂θφ ) − βF (σ̂β ).
to the pulled-back data, hK̂θ iσ̂D,φ . The flow of quantum
and classical information for the task of Modular Hamil- Further note that the positivity of relative entropy im-
tonian Learning is depicted in Figure 2. See Section II plies that the minimum of free energy is achieved by the
for details on estimating the partition function, depend- thermal state;
ing on the chosen latent space structure. D(ρ̂θφ kσ̂β ) = 0 =⇒ F (ρ̂θφ ) = F (σ̂β ) and ρ̂θφ = σ̂β .
Finally, an interesting feature to note about about this
choice of cross-entropy loss function is that in the limit Thus, given a Variational
variationalQuantum Thermalizer
ansatz for the thermal state, by
where the model approximates the data state, i.e,. when variational
minimizing the free energy as our loss function,
the relative entropy in (17) converges to zero, then the parameters
Quantum
loss function itself converges to the entropy of the data: Lvqt (θ, φ) = βF (ρ̂θφ ) = βtr(ρ̂θφ Ĥ)(classical)
− S(ρ̂θφ ),
processing
Lqmhl (θ, φ)
ρ̂θφ →σ̂D
−→ S(σ̂D ) (21) unit
we find optimal parameters {θ ∗ , ϕ∗ } such that ρ̂θ∗ φ∗ ≈
σ̂β .
Classica
This means that, as a by-product of the learning process,
(e.g., sig
after convergence of the training of our model, we are au- classifica
tomatically provided with an estimate of the entropy of label)
our data distribution simply by observing what the loss
value has converged to. This has wide-ranging implica-
tions about the potential use of QHBM and QMHL, com-
bined with quantum simulation, to allow us to estimate
entropies and various information theoretic quantities us-
ing quantum computers. More on this in the discussion
in Section V.
B. Variational Quantum Thermalizer Algorithm:

Quantum Simulation of Thermal States FIG. 3. Information flow for the process of inference and
training for the QHBM applied to VQT. Grayscale indicates
classical information processing happening on a classical de-
In this section we formally introduce the Variational
vice while colored registers and operations are stored and exe-
Quantum Thermalizer (VQT) algorithm, the free energy cuted on the quantum device. Here we focus on a general case
variational principle behind it, how it relates to the Varia- for the latent variational distribution. The θ parameters de-
tional Quantum Eigensolver (VQE) [16] in a certain limit, termine the latent space distribution. From this distribution,
and how one can use QHBMs for this task. one can compute the entropy Sθ classically. Using samples
The variational quantum thermalization task can be from the latent distribution x ∼ pθ (x), one applies a quan-
described as the following: given a Hamiltonian Ĥ and a tum operation to prepare the state |xihx| via the unitary V̂x
target inverse temperature β = 1/T , we wish to generate from the initial state of the quantum device. One then ap-
an approximation to thermal state plies the unitary quantum neural network Û (φ) and estimates
the expectation value of the Hamiltonian Ĥ at the output via
1 −β Ĥ
σ̂β = Zβ e , Zβ = tr(e−β Ĥ ). (22) several runs of classical sampling of x and measurement. The
entropy and energy expectation are then combined into the
Here Zβ is known as the thermal partition function. free energy loss for optimization.
Our strategy will be to phrase this quantum simulation
task as a quantum-probabilistic variational learning task. Now that we have our loss function, we can briefly ex-
Suppose we have a quantum-probabilistic ansatz for this amine how one could use a QHBM ansatz for the above.
thermal state, ρ̂θφ with parameters {θ, φ}. We can con- The great advantage of the QHBM structure here is that
sider the relative entropy between this unknown thermal the entropy of the variational model distribution is that
state and our variational model, of the latent distribution, as pointed out in equation
(4). As the entropy of the latent variational distribution
D(ρ̂θφ kσ̂β ) = −S(ρ̂θφ ) − tr(ρ̂θφ log σ̂β ) (23) is stored on the classical computer and assumed to be
= −S(ρ̂θφ ) + βtr(ρ̂θφ Ĥ) + log Zβ . known a priori, only the evaluation of the energy expec-
tation (the first term in the above) requires the quantum
This is known as the quantum relative free energy of our computer. Thus, the number of runs required to esti-
model with respect to the target Hamiltonian Ĥ. The mate the loss function should be similar to the number
8
of runs required for the Variational Quantum Eigensolver BCS superconducting states. Bosonic Gaussian states
and other variational algorithms whose loss only depends include coherent and squeezed states, and already enable
on expectation values [15]. (when coupled with measurement) a plethora of appli-
Note that, in the case where our state converges to the cations in quantum communication, including quantum
true thermal state upon convergence of the training, then teleportation and quantum key distribution. In addition,
the value of our loss function will give us the free energy Gaussian states are interesting in their own right, and
of the thermal state, which is proportional to the log of play a significant role in quantum information theory [79–
the thermal partition function: 81], as well as quantum field theory in curved space-time
[82] and quantum cosmology [83].
ρ̂θφ →σ̂β
Lvqt (θ, φ) −→ βF (σ̂β ) = − log Zβ . Gaussian states are special in that our latent-space fac-
torization postulate is justified physically in addition to
This is effectively a variational free energy principle, and holding numerically. As a consequence of Williamson’s
the basis of the VQT. theorem [84], we are guaranteed that for any thermal
As a final note, let us see how we recover the Varia- state, there exists a Gaussian transformation that decou-
tional Quantum Eigensolver and its variational principle ples all subsystems in latent space (a procedure often re-
in the limit of low temperature. If we divide our loss ferred to as normal mode decomposition [85]). For Gaus-
function by β, i.e., directly minimizing the free energy, sian systems, we can also work directly with 2N × 2N
L̃(θ, φ) = β1 Lvqt (θ, φ) = F (ρ̂θφ ), then in the limit of covariance matrices rather than density matrices.
zero temperature,
β→∞
L̃(θ, φ) −→ hĤiθφ , A. Variational Quantum Thermalizer for the 2D
Heisenberg Model
we recover the loss function for the ground state Varia-
tional Quantum Eigensolver (VQE), and thus we recover
As a demonstration of the effectiveness of VQT for
the ground state variational principle. Note that in the
spin systems (qubits), we learn a thermal state of the
limit of zero temperature, there is no need for latent space
two-dimensional Heisenberg model, which exhibits spin-
parameters as there is no entropy and the latent state is
frustration despite its simplicity. To be precise, we con-
unitarily equivalent to any starter pure state.
sider the model
Thus, the VQT is truly the most natural generalization
of the VQE to non-zero temperatures states.
X X
Ĥheis = Jh Ŝi · Ŝj + Jv Ŝi · Ŝj (24)
hijih hijiv
IV. APPLICATIONS & EXPERIMENTS where h (v) denote horizontal (vertical) bonds, and
h·i represent nearest-neighbor pairs. As laid out in
Our framework is general enough to apply in many sit- Section II, we use a factorized latent space distribu-
uations of interest in quantum computing [57], quantum tion. Even though we are considering a two-dimensional
communication [58], quantum sensing [59], and quan- model, we simulate VQT using an imagined one-
tum simulation [60–75]. In this section we focus on dimensional array of qubits. As such, we parameterize
applications to quantum simulation and quantum com- our quantum neural network with only single qubit rota-
munication, which illustrate the QHBM frameworks for tions and two-qubit rotations between adjacent qubits in
spins/qubits, Bosonic, and Fermionic systems. our simulated one-dimensional quantum computer.
In the Bosonic and Fermionic examples, we restrict In particular, our paramterized unitary is
ourselves to Gaussian states for two purposes: first, composed out of three layers of gates. Each
these highlight the ways in which our framework is well- layer consists of an arbitrary single qubit rota-
suited to utilizing problem structure; and second, clas- tion of the form exp[i(φ1j X̂j + φ2j Ŷj + φ3j Ẑj )] ap-
sical methods exist for simulating quantum systems re- plied to each qubit, and a two-qubit rotation
stricted to these types of states, so they serve as a veri- exp[i(φ4j X̂j X̂j+1 + φ5j Ŷj Ŷj+1 + φ6j Ẑj Ẑj+1 )] applied
fication and benchmarking of our methods. This allows to a complete set of neighboring qubits. The neighbors
us to reach system sizes which would not be simulable on acted upon by two-qubit gates are staggered in sequen-
classical computers were we to simulate the wavefunc- tial layers. No parameters are shared within or between
tion in the Hilbert space directly. However, we empha- layers. So in the first layer, the first and second qubits,
size that Quantum Hamiltonian-Based Models apply to third and fourth, and so on, are coupled, while in the
a much broader class of quantum states. second layer it is the second and third qubits, fourth
Gaussian states are the class of thermal (and ground) and fifth, etc., which are coupled. In addition, we do
states of all Hamiltonians that are quadratic in creation not assume periodic boundary conditions for our qubits,
and annihiliation operators (in second quantization). On and as such do not include any gates between the first
the Fermionic side, this class contains a variety of tight- and last qubits.
binding models, including topological models like the Su- While we use relative entropy as our cost function, we
Schrieffer-Heeger (SSH) model [76–78], and mean-field also look at the trace distance and state fidelity of the
9
TABLE I. QHBM Experiments Ingredients

Physical System Qubits (Spins) Gaussian Bosons Gaussian Fermions
Base Quantities Ŝi , Ŝj x̂j , p̂k ĉ2j−1 , ĉ2j
Representation ρ̂ ΓB = 21 T r(ρ̂{ξ̂, ξ̂T }) ΓF = 2i T r(ρ̂[ξ̂, ξ̂T ])
Ansatz ρ̂θφ = Û † (φ)ρ̂(θ)Û (φ) S T (φ)Γθ S(φ) OT (φ)Γθ O(φ)
N N N
O 1 − pj (θj ) 0 M νj (θj ) 0 M 0 λj (θj )
Latent Ansatz ρ̂(θ) = Γθ = Γθ =
0 pj (θj ) 0 νj (θj ) −λj (θj ) 0
j=1 j=1 j=1
reconstructed target states, as defined in Appendix C.

As the modular Hamiltonian is not unique, in modular
Hamiltonian learning we choose to compare the thermal
states generated from target and learned Hamiltonian.
In Appendix D, we investigate the performance of the
factorized latent space ansatz on systems that in general
do not factorize.
FIG. 5. Performance metrics for VQT ansatz reconstruc-

tion of thermal state of two-dimensional anti-ferromagnetic
Heisenberg model with Nx = Ny = 2, at β = 0.5, Jx =
1.0, Jy = 1.0. Upper: Training loss (Free energy of ansatz
FIG. 4. Density Matrix visualizations for thermal state of minus free energy of target state; Center: trace distance;
a two-dimensional Heisenberg model. Left: Target thermal Lower: mixed state fidelity. Solid line denotes mean, and
state of Hamiltonian; Right: VQT reconstruction after 200 shaded region represents 95% confidence interval. Model
training steps. Specific model parameters are Nx = Ny = 2, trained on 100 randomly chosen sets of initial parameters
at β = 2.6, Jx = 1.0, Jy = 0.6. Performance is sustained picked uniformly from the unit interval.
for larger systems, but harder to visualize due to sparsity of
density matrix elements.
cessing [55]. In this context, our parameterized ansatz
In Fig. 4, we show the density matrix elements of the finds a decoupled representation of subsystem of quan-
target thermal state, and the reconstructed density ma- tum modes in latent space, and the entropy is effectively
trix elements learned from the corresponding Hamilto- described by the modes’ effective temperature. High tem-
nian via VQT after 200 variational steps. A few itera- perature modes contribute most to the entropy, and low
tions later, the differences become too small to visualize temperature modes contain relatively little information.
and then the procedure converges. This decorrelated structure could be used to devise quan-
Figure 5 shows mean values and 95% confidence inter- tum approximate compression codes.
vals for three performance metrics over 100 VQT training We recall that for Bosonic Gaussian states, the prob-
iterations, for 100 randomly initialized sets of variational lem of dimensionality is made tractable by working with
parameters. This illustrates that our model trains gener- the phase space quadratures x̂ and p̂ defined in Ap-
ically and is not strongly dependent on initial parame- pendix E. The real symmetric covariance matrix ΓB asso-
ters. Furthermore, while our model was only trained to ciated to a mixed state ρ̂ is given by the matrix elements
minimize relative entropy, it is worth noting that it suc- 1
cessfully learns with respect to trace distance and fidelity. Γab
B = Tr(ρ̂{ξâ , ξˆb }), (25)
2
where each component of ξ̂ is one of the x̂ or p̂ quadra-
B. Learning Quantum Compression Codes tures.
We also remind the reader that Gaussian transforma-
Continuous Variable (CV) systems offer an intriguing tions act as symplectic transformations on the covariance
alternative to discrete variable quantum information pro- matrix, and that the symplectic transformation that di-
10
agonalizes ΓB ,
Nb
T
M ν j0
SΓB S = , (26)
0 νj
j=1
also diagonalizes the Hamiltonian. In this context, mod-

ular Hamiltonian learning consists of finding the sym-
plectic diagonalization of the Hamiltonian given quan-
tum access to the covariance matrix. For more de-
tails on Bosonic Gaussian Quantum Information, see Ap-
pendix E.
To keep our method as close to implementation as pos-
sible, we directly parameterize the space of symplectic
matrices in terms of the effective action of single mode
phase shifts and squeezers, and two-mode beam splitters
as detailed in Appendix F, and in [86].
We consider a translationally invariant linear harmonic FIG. 7. Modular modes of reduced harmonic chain. Blue
chain diamonds denote Modular Hamiltonian Learning results, and
dashed red line denotes exact result from Williamson decom-
position. Modes arranged in order of increasing eigenvalue
X
Ĥlhc = ωx̂2j + p̂2j + 2χx̂j x̂j+1 , (27)
(from top to bottom, and from left to right). System under
j
consideration was a harmonic chain of length 100 lattice sites
where x̂N +1 = x̂1 , as in [85]. At the point ω = 2|χ|, (subsystem of 12 sites), with ω = 1, χ = −0.2.
the system is critical and the operator Ĥlhc becomes un-
bounded from below.
all of the latent mode information, we can still approxi-
First, we find the ground state of a long chain (we use
mately reconstruct the original state.
200 sites) close to criticality. We then cut the chain at two
In the case of Bosonic Gaussian systems, the columns
points and keep one of the two parts of the bipartition,
of the diagonalizing symplectic transformation are the
performing a partial trace over the rest of the system.
modular modes of the reduced system. Thus, QHBM are
Given this reduced density matrix for the subsystem, we
essentially directly learning these modes. In Fig. 7, we
use QMHL to find an approximate representation of a
show modular mode reconstructions (after shifting and
modular Hamiltonian that would generate this as a ther-
rescaling) for a translationally invariant harmonic chain
mal state.
away from criticality.
C. Preparation of Fermionic Thermal States
In recent years, great progress has been achieved in

using quantum systems to simulate quantum dynamics.
This quantum simulation has been achieved in a variety
FIG. 6. Compression of latent space representation for re-
duced state of harmonic chain near criticality (ω = 1.0, χ = of physical implementations, including ultracold atomic
0.499), with Modular Hamiltonian Learning. Subsystem of gases in optical lattices [87], trapped ions [88], and su-
10 modes traced out from full chain of length 200. Plots visu- perconducting circuits [89]. QHBM are applicable to any
alize absolute value of difference between reconstructed and of these implementations, where VQT could be used to
true covariance matrix elements, at single-shot compression prepare an approximate thermal state. Moreover, mod-
ratios of 0.1, 0.4, 0.7 and 0.9. Color scale is relative to largest ular Hamiltonian learning could facilitate the simulation
matrix element of the true covariance matrix. of out-of-equilibrium time-evolution, going beyond what
is feasible with classical methods.
Our ansatz for QHBM is perfectly suited for latent As a proof of principle of the application of QHBM
space compression. The entropy of our ansatz is only to these problems, we use the VQT to learn an efficient
dependent on the latent space parameters, where we preparation of d-wave superconducting thermal states.
have uncoupled oscillators at different effective temper- The BCS mean-field theory Hamiltonian is quadratic in
atures. Compression consists of systematically setting creation and annihilation operators, so we restrict our
low-temperature oscillators to their ground state. attention to Fermionic Gaussian states. These states are
By running our circuit forward (encoding), compress- useful in a variety of quantum chemistry applications,
ing, and then backward (decoding), we see that for the where they are often desired as initial states for further
reduced state of the chain, even when we remove almost quantum processing routines [90]. They have also been
11
used to model impurities in metals [91, 92], which are Training Loss Target
key to understanding much of the collective phenomena
in strongly-correlated materials [93–95].
In analogy with the Bosonic Gaussian case considered
above, we remind the reader that for Fermionic Gaus-
sian systems, the real anti-symmetric covariance matrix
ΓF associated to a mixed state ρ̂ is given by the matrix
elements
i
Γab
B = Tr(ρ̂[ξâ , ξˆb ]), (28)
2 Training Snapshots
where now ξ̂ = (ĉ1 , . . . , ĉ2N ) is a vector of Majorana op-

erators derived from the system’s physical creation and
annihilation operators.
Fermionic Gaussian transformations act as orthogonal
transformations on covariance matrices, and the same
FIG. 8. VQT training for d -wave thermal state with Nx =
orthogonal that diagonalizes the covariance matrix, Ny = 2, with β = 1, t = 0.3, ∆ = 0.2. Upper left: VQT
Nf training loss (free energy) verus training iteration. Upper
M 0 −λj right: Depiction of matrix elements of target covariance ma-
OΓF OT = , (29) trix in Majorana basis. Below: Snapshot visualization of ma-
λj 0
j=1 trix elements of variationally parameterized covariance matrix
in Majorana basis at various points in training process.
also diagonalizes the Hamiltonian. For more details, see
Appendix G.
The VQT uses minimization of relative entropy to at intervals of 20 learning steps. After 150 steps, the
learn the orthogonal transformation concurrently with target covariance matrix and VQT reconstruction are in-
the effective frequencies of the decoupled Fermionic discernible by fidelity, entropy and trace distance. We
modes, essentially learning a representation of the Bo- note that in classical simulations of QHBM for Gaus-
goliubov transformation. However, implementing this on sian Fermions, our models were able to learn successfully
a discrete-variable quantum computer requires the addi- for much larger systems (up to 50 Fermions), but the
tional initialization steps of applying a qubit-to-Fermion covariance matrices became too large to clearly discern
transformation, such as the Jordan-Wigner transforma- individual matrix elements as in Figure 8.
tion, and then transforming to the basis of Majorana
modes. The orthogonal transformation itself can be di-
rectly parameterized in terms of Givens rotations [96] on V. DISCUSSION & FUTURE WORK
pairs of modes. This in turn allows for the analytic com-
putation and propagation of variable gradients through Recent literature in quantum machine learning and
the quantum circuit. quantum-classical variational algorithms has focused on
For the completely general BCS Hamiltonian and in- the learning of pure quantum state approximations to the
terpretations of its constituent terms, we refer the reader ground or excited states of quantum systems. If we wish
to Appendix H. For present purposes, we consider the to simulate and model the physics of realistic quantum
case of uniform Cooper pairing, i.e. ∆ij = ∆, and with- systems, we must allow for the possibility of interplay
out loss of generality we set the chemical potential to between classical and quantum correlations.
zero, µ = 0, leading to a Hamiltonian of the form In this paper we introduced the general formalism of
QHBMs. Previous techniques for learning mixed quan-
tum states have been tailored specifically for low-rank
(â†i,σ âj,σ + â†j,σ âi,σ )
X
Ĥdx2 −y2 = − t (30)
density matrices [11]. Our method, which minimizes rel-
hi,ji,σ
ative entropy rather than Hilbert-Schmidt distance, is
(â†i,↑ â†j,↓ − â†i,↓ â†j,↑ + h.c.).
X
∆ generic and can be applied to mixed and thermal states
hi,ji of any rank. It has the additional benefits of yielding
estimates of mixed state entropy, free energy, and the di-
While the d -wave Hamiltonian is Gaussian, the Cooper agonalizing transformation of the target system, the last
pairing induces terms such as â†1,↑ â†2,↓ . As a result, we of which enables modular time evolution and facilitates
switch to the Majorana basis before learning a state full quantum simulation of a previously unknown system.
preparation circuit. See Appendix G for details. In Fig- This further opens up the possibility of using quantum
ure 8, we illustrate the training of VQT on a typical machine learning to compute state entropies of analyt-
d-wave thermal state. The snapshots on the bottom are ically intractable systems. Considering the vast body
covariance matrices output by the partially trained QNN of literature exploring the computation of entropies in
12
physical quantum systems [97], this novel application of the power of hybridized quantum-classical neural net-
quantum-classical hybrid computing could be quite sig- works trained using hybrid quantum-classical backprop-
nificant. agation, originally introduced in [13], then later iterated
Latent space factorization, while not obligatory for upon in [109]. Given previous successes in combining
QHBM, leads to widespread applications based on learn- classical differentiable models and quantum variational
ing decorrelated (factorized) quantum representations, circuits, we anticipate this implementation to be straight-
akin to classical disentangled representations [51]. In the forward given the framework laid out in this paper. Since
context of unsupervised learning, learning a disentangled such a full hybridization would allow us to go beyond
latent space representation could be useful for tasks such full-rank multinoulli estimation for the general classical
as latent space interpolation [98] and latent space vector latent distribution QHBM, this would be the piece which
arithmetic, quantum principal component analysis [99], would unlock the scalability of this algorithm to highly
clustering [100], and more generally entropy-based quan- non-trivial large-scale quantum systems. We plan to ex-
tum compression as demonstrated in Section IV B. Such plore this in upcoming work.
compression could also be used for unsupervised pre- QHBM present several key advantages relative to other
training in discriminative learning [101]. Progressively forms of unsupervised quantum learning with quantum
learning to compress layer-wise while growing the depth neural networks such as quantum GANs [110]. GANs
of the unitary quantum neural network would yield a (both quantum and classical) are notoriously difficult
procedure akin to layer-wise pretraining in classical ma- to train, and once trained, difficult to extract physical
chine learning [101]. Compression could also be used for quantities from. QHBM represent physical quantities
denoising quantum data, something previous approaches more directly, and as such are much more suitable for
to quantum autoencoders were not suitable for [102]. applications that involve physical quantum data. From
As for the case of using a diagonal latent space Hamil- our preliminary numerics, our framework also appears to
tonian for QMHL, we are then effectively learning the train very robustly and with few iterations. Furthermore,
eigenbasis of the data density matrix. QHBM require less quantum circuit depth during train-
The great advantage over previously proposed varia- ing than quantum GANs, which require both a quantum
tional quantum algorithms for quantum state diagonal- generator and quantum discriminator. The latter is a key
ization is that our method employs the relative entropy consideration for possible implementation on near-term
rather than a Hilbert-Schmidt metric, thus allowing us intermediate scale quantum devices [2].
to learn to diagonalize much higher rank quantum states. A key point is that in Quantum Modular Hamiltonian
Furthermore, learning to diagonalize a quantum density Learning (QMHL) and VQT, we are not just learning
matrix is essentially related to the Quantum Principal a distribution for the modular Hamiltonian and ther-
Component Analysis algorithm [103] for classical data, mal state respectively. Rather, we are learning efficient
and other related quantum machine learning algorithms approximate parameterizations of these quantities with
[8, 104]. Our approach could be seen as a variational our quantum circuit. In the context of VQT, this gives
alternative method for these algorithms, circumventing us the ability to directly prepare a thermal state on
the need for a very long quantum circuit for the quan- our quantum computer using polynomial resources, from
tum state exponentiation [105], which has been deemed knowledge of a corresponding Hamiltonian. For Mod-
intractable even for far-term quantum computers when ular Hamiltonian Learning, once an efficient ansatz has
compiled [106]. This does not remove the requirement of learned the optimal value of parameters such that its out-
the state preparation (which requires QRAM or a special put approximates our data mixed state, this gives us the
oracle [107]). Though, in the case of quantum data, this ability to reproduce as many copies of the learned mixed
method does not need access to such exotic components, state distributions as desired. In addition, the modu-
and has the potential to demonstrate a quantum advan- lar Hamiltonian itself provides invaluable information re-
tage for learning the unitary which diagonalizes either a lated to topological properties, thermalization, and non-
quantum Hamiltonian or quantum density matrix. equilibrium dynamics.
Future iterations on this work could modify the loss In particular, Modular Hamiltonian Learning gives one
function, potentially introducing a quantum variant of access to the eigenvalues of the density matrix and the
the Evidence Lower Bound (ELBO) for the quantum rel- unitary that diagonalizes the Modular Hamiltonian. Ap-
ative entropy loss. This would, in turn, yield a quan- plying the QNN to a quantum state brings it into the
tum form of Variational Autoencoders (VAEs) [30]. Both eigenbasis of the Modular Hamiltonian. In this basis,
QHBMs and VAEs could potentially be used to learn an an exponentiation of the diagonal latent modular Hamil-
effective mixed state representation of an unknown quan- tonian (which we have a classical description of) imple-
tum state from partial quantum tomographic data [108]. ments modular time evolution. We can apply the inverse
We aim to tackle this Quantum Neural State Tomogra- QNN to return to the original computational basis.
phy with QHBMs in future work. In the same vein, QMHL provides the ability to probe
In a near-term future iteration, we plan to fully com- a system at different temperatures, something that mixed
bine classical neural network-based Energy-Based Models state learning on its own does not. Given access to sam-
[49] with our QHBM. Recent papers have demonstrated ples from a thermal state at some temperature, one can
13
thus generate typical samples from the same system at VI. ACKNOWLEDGEMENTS
another temperature by learning the modular Hamilto-
nian and systematically changing the latent space param- Numerical experiments in this paper were executed us-
eters. ing a custom combination of Cirq [111], TensorFlow [112],
and OpenFermion [90]. GV, JM, and SN would like to
thank the team at X for the hospitality and support dur-
Finally, we underscore the generality of our framework. ing their respective Quantum@X residencies where this
Just as classical EBM are inspired by physics but can work was completed. X, formerly known as Google[x], is
be applied to problems far abstracted from real physical part of the Alphabet family of companies, which includes
systems, so too can our QHBM be employed for any task Google, Verily, Waymo, and others (www.x.company).
which involves learning a quantum distribution. GV acknowledges funding from NSERC.
[1] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, classical algorithms, New Journal of Physics 18, 023023
N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and (2016).
H. Neven, Characterizing quantum supremacy in near- [16] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-
term devices, Nature Physics 14, 595 (2018). Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,
[2] J. Preskill, Quantum computing in the nisq era and be- A variational eigenvalue solver on a photonic quantum
yond, Quantum 2, 79 (2018). processor, Nature communications 5, 4213 (2014).
[3] E. Farhi and H. Neven, Classification with quantum [17] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,
neural networks on near term processors, arXiv preprint M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-
arXiv:1802.06002 (2018). efficient variational quantum eigensolver for small
[4] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- molecules and quantum magnets, Nature 549, 242
bush, and H. Neven, Barren plateaus in quantum neural (2017).
network training landscapes, Nature communications 9, [18] O. Higgott, D. Wang, and S. Brierley, Variational quan-
4812 (2018). tum computation of excited states, Quantum 3, 156
[5] G. Verdon, M. Broughton, and J. Biamonte, A quantum (2019).
algorithm to train neural networks using low-depth cir- [19] J. R. McClean, Z. Jiang, N. Rubin, R. Babbush, and
cuits, arXiv preprint arXiv:1712.05304 (2017). H. Neven, Decoding errors using quantum subspace ex-
[6] P.-L. Dallaire-Demers and N. Killoran, Quantum gen- pansions, (2019).
erative adversarial networks, Physical Review A 98, [20] R. M. Parrish, E. G. Hohenstein, P. L. McMahon,
012324 (2018). and T. J. Martı́nez, Quantum computation of elec-
[7] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fu- tronic transitions using a variational quantum eigen-
jii, Quantum circuit learning, Physical Review A 98, solver, Physical Review Letters 122, 230401 (2019).
032309 (2018). [21] E. Farhi, J. Goldstone, and S. Gutmann, A quantum
[8] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, approximate optimization algorithm, arXiv preprint
N. Wiebe, and S. Lloyd, Quantum machine learning, arXiv:1411.4028 (2014).
Nature 549, 195 (2017). [22] E. Farhi and A. W. Harrow, Quantum supremacy
[9] G. Verdon, M. Broughton, J. R. McClean, K. J. through the quantum approximate optimization algo-
Sung, R. Babbush, Z. Jiang, H. Neven, and rithm, arXiv preprint arXiv:1602.07674 (2016).
M. Mohseni, Learning to learn with quantum neural [23] Z. Jiang, E. Rieffel, and Z. Wang, A qaoa-inspired cir-
networks via classical neural networks, arXiv preprint cuit for grover’s unstructured search using a transverse
arXiv:1907.05415 (2019). field, arXiv preprint arXiv:1702.0257 (2017).
[10] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. [24] G. Verdon, J. M. Arrazola, K. Brádler, and N. Killo-
Sornborger, and P. J. Coles, Quantum-assisted quantum ran, A quantum approximate optimization algorithm for
compiling, Quantum 3, 140 (2019). continuous problems, arXiv preprint arXiv:1902.00409
[11] R. LaRose, A. Tikku, É. ONeel-Judy, L. Cincio, and (2019).
P. J. Coles, Variational quantum state diagonalization, [25] Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel,
npj Quantum Information 5, 8 (2019). xy-mixers: analytical and numerical results for qaoa,
[12] M. Schuld, A. Bocharov, K. Svore, and N. Wiebe, arXiv preprint arXiv:1904.09314 (2019).
Circuit-centric quantum classifiers, arXiv preprint [26] F. G. Brandao, M. Broughton, E. Farhi, S. Gutmann,
arXiv:1804.00633 (2018). and H. Neven, For fixed control parameters the quan-
[13] G. Verdon, J. Pye, and M. Broughton, A universal train- tum approximate optimization algorithm’s objective
ing algorithm for quantum deep learning, arXiv preprint function value concentrates for typical instances, arXiv
arXiv:1806.09729 (2018). preprint arXiv:1812.04170 (2018).
[14] M. Benedetti, E. Lloyd, and S. Sack, Parameterized [27] Z. Jiang, E. G. Rieffel, and Z. Wang, Near-optimal
quantum circuits as machine learning models, arXiv quantum circuit for grover’s unstructured search using
preprint arXiv:1906.07682 (2019). a transverse field, Physical Review A 95, 062317 (2017).
[15] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru- [28] L. Li, M. Fan, M. Coram, P. Riley, and S. Leichenauer,
Guzik, The theory of variational hybrid quantum- Quantum optimization with a novel gibbs objective
14
function and ansatz architecture search, arXiv preprint [46] I. Goodfellow J., J. Pouget-Abadie, M. Mirza, B. Xu,
arXiv:1909.07621 (2019). D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
[29] K. P. Murphy, Machine learning: a probabilistic perspec- Generative adversarial nets, in Proceedings of the Inter-
tive (MIT press, 2012). national Conference on Neural Information Processing
[30] D. P. Kingma and M. Welling, Auto-encoding varia- Systems (2014) p. 26722680.
tional bayes, arXiv preprint arXiv:1312.6114 (2013). [47] A. Razavi, A. v. d. Oord, and O. Vinyals, Generating di-
[31] R. M. Neal, Probabilistic inference using Markov chain verse high-fidelity images with vq-vae-2, arXiv preprint
Monte Carlo methods (Department of Computer Sci- arXiv:1906.00446 (2019).
ence, University of Toronto Toronto, Ontario, Canada, [48] A. Brock, J. Donahue, and K. Simonyan, Large scale
1993). gan training for high fidelity natural image synthesis,
[32] M. Betancourt, A conceptual introduction to hamil- arXiv preprint arXiv:1809.11096 (2018).
tonian monte carlo, arXiv preprint arXiv:1701.02434 [49] Y. Du and I. Mordatch, Implicit generation and gen-
(2017). eralization in energy-based models, arXiv preprint
[33] D. J. Rezende and S. Mohamed, Variational inference arXiv:1903.08689 (2019).
with normalizing flows, arXiv preprint arXiv:1505.05770 [50] M. Welling and Y. Whye Teh, Bayesian learning via
(2015). stochastic gradient langevin dynamics, in ICML’11 Pro-
[34] C. Louizos and M. Welling, Multiplicative normalizing ceedings of the 28th International Conference on In-
flows for variational bayesian neural networks, in Pro- ternational Conference on Machine Learning (2011) p.
ceedings of the 34th International Conference on Ma- 681688.
chine Learning-Volume 70 (JMLR. org, 2017) pp. 2218– [51] Y. Bengio, A. Courville, and P. Vincent, Representation
2227. learning: A review and new perspectives, IEEE trans-
[35] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. actions on pattern analysis and machine intelligence 35,
Duvenaud, Neural ordinary differential equations, in 1798 (2013).
Advances in neural information processing systems [52] C. P. Burgess, I. Higgins, A. Pal, L. Matthey,
(2018) pp. 6571–6583. N. Watters, G. Desjardins, and A. Lerchner, Un-
[36] M. M. Wilde, Quantum information theory (Cambridge derstanding disentangling in beta-vae, arXiv preprint
University Press, 2013). arXiv:1804.03599 (2018).
[37] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, [53] J. Marks, T. Jochym-O’Connor, and V. Gheorghiu,
nature 521, 436 (2015). Comparison of memory thresholds for planar qudit ge-
[38] J. Schmidhuber, Deep learning in neural networks: An ometries, New Journal of Physics 19, 113022 (2017).
overview, Neural networks 61, 85 (2015). [54] D. Koller, U. Lerner, and D. Angelov, A general algo-
[39] W. J. Huggins, P. Patil, B. Mitchell, K. B. Whaley, rithm for approximate inference and its application to
and M. Stoudenmire, Towards quantum machine learn- hybrid bayes nets, in Proceedings of the Fifteenth con-
ing with tensor networks, Quantum Science and tech- ference on Uncertainty in artificial intelligence (Morgan
nology (2018). Kaufmann Publishers Inc., 1999) pp. 324–333.
[40] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman, [55] A. Serafini, Quantum Continuous Variables: A Primer
B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Le- of Theoretical Methods (CRC Press, 2017).
ichenauer, Tensornetwork: A library for physics and ma- [56] S. Lloyd and S. L. Braunstein, Quantum computation
chine learning, arXiv preprint arXiv:1905.01330 (2019). over continuous variables, in Quantum Information with
[41] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, A Continuous Variables (Springer, 1999) pp. 9–17.
theoretical framework for back-propagation, in Proceed- [57] M. A. Nielsen and I. Chuang, Quantum computation
ings of the 1988 connectionist models summer school, and quantum information (2002).
Vol. 1 (CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988) [58] N. Gisin and R. Thew, Quantum communication, Na-
pp. 21–28. ture photonics 1, 165 (2007).
[42] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, [59] C. L. Degen, F. Reinhard, and P. Cappellaro, Quantum
M. Hasegawa-Johnson, and M. N. Do, Semantic image sensing, Reviews of modern physics 89, 035002 (2017).
inpainting with deep generative models, in Proceedings [60] I. Kassal, S. P. Jordan, P. J. Love, M. Mohseni, and
of the IEEE Conference on Computer Vision and Pat- A. Aspuru-Guzik, Polynomial-time quantum algorithm
tern Recognition (2017) pp. 5485–5493. for the simulation of chemical dynamics, Proceedings
[43] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Man- of the National Academy of Sciences 105, 1868118686
zagol, Extracting and composing robust features with (2008).
denoising autoencoders, in Proceedings of the 25th inter- [61] I. Kassal, J. D. Whitfield, A. Perdomo-Ortiz, M.-H.
national conference on Machine learning (ACM, 2008) Yung, and A. Aspuru-Guzik, Simulating chemistry us-
pp. 1096–1103. ing quantum computers, Annual Review of Physical
[44] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunning- Chemistry 62, 185207 (2011).
ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, [62] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik,
et al., Photo-realistic single image super-resolution us- Simulation of electronic structure hamiltonians using
ing a generative adversarial network, in Proceedings of quantum computers, Molecular Physics 109, 735750
the IEEE conference on computer vision and pattern (2011).
recognition (2017) pp. 4681–4690. [63] H. Wang, S. Kais, A. Aspuru-Guzik, and M. R. Hoff-
[45] L. Theis, W. Shi, A. Cunningham, and F. Huszár, mann, Quantum algorithm for obtaining the energy
Lossy image compression with compressive autoen- spectrum of molecular systems, Phys. Chem. Chem.
coders, arXiv preprint arXiv:1703.00395 (2017). Phys. 10, 5388 (2008).
15
[64] B. P. Lanyon, J. D. Whitfield, G. G. Gillett, M. E. [quant-ph].

Goggin, M. P. Almeida, I. Kassal, J. D. Biamonte, [81] F. de Melo, P. Ćwikliński, and B. M. Terhal, The power
M. Mohseni, B. J. Powell, M. Barbieri, and et al., To- of noisy fermionic quantum computation, New Journal
wards quantum chemistry on a quantum computer, Na- of Physics 15, 013015 (2013).
ture Chemistry 2, 106111 (2010). [82] B. S. DeWitt, Quantum field theory in curved space-
[65] A. Aspuru-Guzik, A. D. Dutoi, P. J. Love, and M. Head- time, Physics Reports 19, 295 (1975).
Gordon, Simulated quantum computation of molecular [83] V. Mukhanov and S. Winitzki, Introduction to quantum
energies, Science 309, 1704 (2005). effects in gravity (Cambridge University Press, 2007).
[66] H. Lamm and S. Lawrence, Simulation of nonequi- [84] J. Williamson, On the algebraic problem concerning the
librium dynamics on a quantum computer, Physi- normal forms of linear dynamical systems, American
cal Review Letters 121, 10.1103/physrevlett.121.170501 Journal of Mathematics 58, 141163 (1936).
(2018). [85] A. Serafni, Quantum Continuous Variables (CRC Press,
[67] M. Endres, M. Cheneau, T. Fukuhara, C. Weitenberg, 2017).
P. Schauß, C. Gross, L. Mazza, M. C. Bañuls, L. Pol- [86] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld,
let, I. Bloch, and S. Kuhr, Observation of correlated N. Quesada, and S. Lloyd, Continuous-variable quan-
particle-hole pairs and string order in low-dimensional tum neural networks (2018), arXiv:1806.06871 [quant-
mott insulators, Science 334, 200 (2011). ph].
[68] M. Cheneau, P. Barmettler, D. Poletti, M. Endres, [87] I. Bloch, Ultracold quantum gases in optical lattices,
P. Schau, T. Fukuhara, C. Gross, I. Bloch, C. Kollath, Nature physics 1, 23 (2005).
and S. Kuhr, Light-cone-like spreading of correlations [88] D. Kielpinski, C. Monroe, and D. J. Wineland, Architec-
in a quantum many-body system, Nature 481, 484487 ture for a large-scale ion-trap quantum computer, Na-
(2012). ture 417, 709 (2002).
[69] T. Fukuhara, S. Hild, J. Zeiher, P. Schauß, I. Bloch, [89] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank,
M. Endres, and C. Gross, Spatially resolved detection E. Jeffrey, T. C. White, J. Mutus, A. G. Fowler,
of a spin-entanglement wave in a bose-hubbard chain, B. Campbell, et al., Superconducting quantum circuits
Phys. Rev. Lett. 115, 035302 (2015). at the surface code threshold for fault tolerance, Nature
[70] S. De Franceschi, R. Hanson, W. G. van der Wiel, J. M. 508, 500 (2014).
Elzerman, J. J. Wijpkema, T. Fujisawa, S. Tarucha, and [90] J. R. McClean, K. J. Sung, I. D. Kivlichan, X. Bonet-
L. P. Kouwenhoven, Out-of-equilibrium kondo effect in Monroig, Y. Cao, C. Dai, E. S. Fried, C. Gid-
a mesoscopic device, Phys. Rev. Lett. 89, 156801 (2002). ney, B. Gimby, P. Gokhale, T. Hner, T. Hardikar,
[71] H. E. Türeci, M. Hanl, M. Claassen, A. Weichselbaum, V. Havlek, O. Higgott, C. Huang, J. Izaac, Z. Jiang,
T. Hecht, B. Braunecker, A. Govorov, L. Glazman, W. Kirby, X. Liu, S. McArdle, M. Neeley, T. O’Brien,
A. Imamoglu, and J. von Delft, Many-body dynamics B. O’Gorman, I. Ozfidan, M. D. Radin, J. Romero,
of exciton creation in a quantum dot by optical absorp- N. Rubin, N. P. D. Sawaya, K. Setia, S. Sim, D. S.
tion: A quantum quench towards kondo correlations, Steiger, M. Steudtner, Q. Sun, W. Sun, D. Wang,
Phys. Rev. Lett. 106, 107402 (2011). F. Zhang, and R. Babbush, Openfermion: The elec-
[72] C. Latta, F. Haupt, M. J. Hanl, A. Weichselbaum, tronic structure package for quantum computers, arXiv
M. Claassen, W. Wuester, P. Fallahi, S. Faelt, L. Glaz- preprint arXiv:1710.07629 (2017).
man, J. von Delft, H. E. Türeci, and A. Imamoglu, [91] S. Bravyi and D. Gosset, Complexity of quantum
Quantum quench of kondo correlations in optical ab- impurity problems, Communications in Mathematical
sorption, Nature 474, 627 (2011). Physics 356, 451 (2017).
[73] A. M. Kaufman, M. E. Tai, A. Lukin, M. Rispoli, [92] B. Bauer, D. Wecker, A. J. Millis, M. B. Hastings, and
R. Schittko, P. M. Preiss, and M. Greiner, Quan- M. Troyer, Hybrid quantum-classical approach to cor-
tum thermalization through entanglement in an isolated related materials, Phys. Rev. X 6, 031045 (2016).
many-body system, Science 353, 794800 (2016). [93] P. W. Anderson, Localized magnetic states in metals,
[74] G. P. Berman, V. Y. Rubaev, and G. M. Zaslavsky, Phys. Rev. 124, 41 (1961).
The problem of quantum chaos in a kicked harmonic [94] R. Schmidt, J. D. Whalen, R. Ding, F. Camargo,
oscillator, Nonlinearity 4, 543 (1991). G. Woehl, S. Yoshida, J. Burgdörfer, F. B. Dunning,
[75] L. M. Sieberer, T. Olsacher, A. Elben, M. Heyl, E. Demler, H. R. Sadeghpour, and T. C. Killian, Theory
P. Hauke, F. Haake, and P. Zoller, Digital quantum sim- of excitation of rydberg polarons in an atomic quantum
ulation, trotter errors, and quantum chaos of the kicked gas, Phys. Rev. A 97, 022707 (2018).
top, npj Quantum Information 5, 10.1038/s41534-019- [95] F. Camargo, R. Schmidt, J. D. Whalen, R. Ding,
0192-5 (2019). G. Woehl, S. Yoshida, J. Burgdörfer, F. B. Dunning,
[76] A. J. Heeger, S. Kivelson, J. R. Schrieffer, and W. P. H. R. Sadeghpour, E. Demler, and T. C. Killian, Cre-
Su, Solitons in conducting polymers, Rev. Mod. Phys. ation of rydberg polarons in a bose gas, Phys. Rev. Lett.
60, 781 (1988). 120, 083401 (2018).
[77] W. P. Su, J. R. Schrieffer, and A. J. Heeger, Soliton exci- [96] D. Wecker, M. B. Hastings, N. Wiebe, B. K. Clark,
tations in polyacetylene, Phys. Rev. B 22, 2099 (1980). C. Nayak, and M. Troyer, Solving strongly correlated
[78] W. P. Su, J. R. Schrieffer, and A. J. Heeger, Solitons in electron models on a quantum computer, Phys. Rev. A
polyacetylene, Phys. Rev. Lett. 42, 1698 (1979). 92, 062318 (2015).
[79] S. Bravyi, Lagrangian representation for fermionic lin- [97] H. Casini and M. Huerta, Entanglement entropy in free
ear optics (2004), arXiv:quant-ph/0404180 [quant-ph]. quantum field theory, Journal of Physics A: Mathemat-
[80] S. Bravyi and R. Koenig, Classical simulation of dis- ical and Theoretical 42, 504007 (2009).
sipative fermionic linear optics (2011), arXiv:1112.2184
16
[98] P. Bojanowski, A. Joulin, D. Lopez-Paz, and A. Szlam, [115] E. Campbell, Random compiler for fast hamiltonian
Optimizing the latent space of generative networks, simulation, Physical review letters 123, 070503 (2019).
arXiv preprint arXiv:1707.05776 (2017). [116] M. Abramowitz and I. Stegun, 1965, handbook of math-
[99] S. Wold, K. Esbensen, and P. Geladi, Principal compo- ematical functions, Appl. Math. Ser 55 (2006).
nent analysis, Chemometrics and intelligent laboratory [117] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and
systems 2, 37 (1987). N. Killoran, Evaluating analytic gradients on quantum
[100] T. Hofmann, Unsupervised learning by probabilistic la- hardware, arXiv preprint arXiv:1811.11184 (2018).
tent semantic analysis, Machine learning 42, 177 (2001). [118] A. Harrow and J. Napp, Low-depth gradient mea-
[101] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, surements can improve convergence in variational
P. Vincent, and S. Bengio, Why does unsupervised pre- hybrid quantum-classical algorithms, arXiv preprint
training help deep learning?, Journal of Machine Learn- arXiv:1901.05374 (2019).
ing Research 11, 625 (2010). [119] Z. Jiang, K. J. Sung, K. Kechedzhi, V. N. Smelyanskiy,
[102] J. Romero, J. P. Olson, and A. Aspuru-Guzik, Quantum and S. Boixo, Quantum algorithms to simulate many-
autoencoders for efficient compression of quantum data, body physics of correlated fermions, Phys. Rev. Applied
Quantum Science and Technology 2, 045001 (2017). 9, 044036 (2018).
[103] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum [120] M. A. de Gosson, Symplectic Methods in Harmonic
principal component analysis, Nature Physics 10, 631 Analysis and in Mathematical Physics (Springer, 2011).
(2014). [121] G. Ortiz, J. E. Gubernatis, E. Knill, and R. Laflamme,
[104] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum sup- Quantum algorithms for fermionic simulations, Phys.
port vector machine for big data classification, Physical Rev. A 64, 022319 (2001).
review letters 113, 130503 (2014). [122] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik,
[105] T. R. Bromley and P. Rebentrost, Batched quantum Simulation of electronic structure hamiltonians us-
state exponentiation and quantum hebbian learning, ing quantum computers, Molecular Physics 109, 735
Quantum Machine Intelligence 1, 31 (2019). (2011).
[106] A. Scherer, B. Valiron, S.-C. Mau, S. Alexander, [123] J. T. Seeley, M. J. Richard, and P. J. Love, The bravyi-
E. Van den Berg, and T. E. Chapuran, Concrete re- kitaev transformation for quantum computation of elec-
source analysis of the quantum linear-system algorithm tronic structure, The Journal of Chemical Physics 137,
used to compute the electromagnetic scattering cross 224109 (2012).
section of a 2d target, Quantum Information Process- [124] S. B. Bravyi and A. Y. Kitaev, Fermionic quantum com-
ing 16, 60 (2017). putation, Annals of Physics 298, 210 (2002).
[107] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum [125] R. C. Ball, Fermions without fermion fields, Phys. Rev.
random access memory, Physical review letters 100, Lett. 95, 176407 (2005).
160501 (2008). [126] F. Verstraete and J. I. Cirac, Mapping local hamiltoni-
[108] G. D’Ariano and P. L. Presti, Quantum tomography for ans of fermions to local hamiltonians of spins, Journal
measuring experimentally the matrix elements of an ar- of Statistical Mechanics: Theory and Experiment 2005,
bitrary quantum operation, Physical review letters 86, P09012 (2005).
4195 (2001). [127] J. D. Whitfield, V. c. v. Havlı́ček, and M. Troyer, Local
[109] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and spin operators for fermion simulations, Phys. Rev. A 94,
N. Killoran, Evaluating analytic gradients on quantum 030301 (2016).
hardware, Physical Review A 99, 032331 (2019).
[110] S. Lloyd and C. Weedbrook, Quantum generative ad-
versarial networks for learning and loading random dis- Appendix A: Quantum Neural Networks and
tributions, arXiv preprint arXiv:1804.09139 (2018). Gradients
[111] Cirq: A python framework for creating, editing, and
invoking noisy intermediate scale quantum (nisq) cir-
cuits., https://github.com/quantumlib/Cirq. A Quantum Neural Network can generally be written
[112] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, as a product of layers of unitaries in the form
C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,
S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is- L
Y
ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev- Û (φ) = V̂ ` Û ` (φ` ), (A1)
enberg, D. Mané, R. Monga, S. Moore, D. Murray, `=1
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,
K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, where the `th layer of the QNN consists of the product of
F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, V̂ ` , a non-parametric unitary, and Û ` (φ` ) a unitary with
M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large- (possibly) multiple variational parameters (note the su-
scale machine learning on heterogeneous systems (2015), perscripts here represent indices rather than exponents).
software available from tensorflow.org.
The multi-parametric unitary of a given layer can itself
[113] D. Gottesman, Stabilizer codes and quantum error cor-
rection, Ph.D. thesis, California Institute of Technology
be generally comprised of multiple unitaries {Ûj` (φ`j )}M `
j=1
(1997). applied in parallel:
[114] M. Suzuki, Fractal decomposition of exponential opera-
M
tors with applications to many-body theories and monte Ò
carlo simulations, Physics Letters A 146, 319 (1990). Û ` (φ` ) ≡ Ûj` (φ`j ), (A2)
j=1
17
here I` represents the set of indices of the corresponding which, in the case where there are M continuous param-
to the `th layer. Finally, each of these unitaries Ûj` can be eters, involves 2M evaluations of the objective function,
expressed as the exponential of some generator ĝj` , which each evaluation varying the parameters by in some di-
itself can be any Hermitian operator on n qubits (thus rection, thereby giving us an estimate of the gradient
expressible as a linear combination of n-qubit Pauli’s), of the function with a precision O(ε2 ). Here the ∆k
is a unit-norm perturbation vector in the k th direction
Kj`
` ` X of parameter space, (∆k )j = δjk . In general, one may
Ûj` (φ`j ) = e−iφj ĝj , ĝj` = βkj` P̂k , (A3) use lower-order methods, such as forward difference with
k=1 O(ε) error from M + 1 objective queries [3], or higher
order methods, such as a five-point stencil method, with
where P̂k ∈ Pn (Paulis on n-qubits [113]) and βkj` ∈ R for O(4 ) error from 4M queries [116].
all k, j, `. For a given j and `, in the case where all the As recently pointed out in various works [117, 118],
Pauli terms commute, i.e. [P̂k , P̂m ] = 0 for all m, k such given knowledge of the form of the ansatz (e.g. as in
that βm j`
, βkj` 6= 0, one can simply decompose the unitary (A3)), one can measure the analytic gradients of the ex-
into a product of exponentials of each term, pectation value of the circuit for Hamiltonians which have
Y ` j` a single-term in their Pauli decomposition (A3) (or, al-
Ûj` (φ`j ) = e−iφj βk P̂k . (A4) ternatively, if the Hamiltonian has a spectrum {±λ} for
k some positive λ). For multi-term Hamiltonians, in [117]
Otherwise, in instances where the various terms do not a method to obtain the analytic gradients is proposed
commute, one may apply a Trotter-Suzuki decomposition which uses a linear combination of unitaries. Recent
of this exponential [114], or other quantum simulation methods use a stochastic estimation rule to reduce the
methods [115]. number of runs needed and keep the optimization effi-
In order to optimize the parameters of an ansatz from cient [118]. In our numerics featured in this paper, we
equation (A1), we need a cost function to optimize. In simply used first-order finite (central) difference methods
the case of standard variational quantum algorithms this for our gradients.
cost function is most often chosen to be the expectation
value of a cost Hamiltonian,
Appendix B: Gradients of QHBMs for generative
f (φ) = hĤiφ ≡ hΨ0 |Û † (φ)Ĥ Û (φ)|Ψ0 i (A5) modelling
where |Ψ0 i is the input state to the parametric circuit. 1. Gradients of model parameters
In general, the cost Hamiltonian can be expressed as a
linear combination of operators, e.g. in the form
Using our notation for the pulled-back data state
N
X σ̂D,φ ≡ Û † (φ)σ̂D Û (φ), the gradient with respect to
Ĥ = αk ĥk ≡ α · ĥ, (A6) model parameters φ will be given by
k=1
∂φj L(θ, φ) = ∂φj tr(K̂θ σ̂D,φ ) = ∂φj hK̂θ iσ̂D,φ , (B1)
where we defined a vector of coefficients α ∈ RN and
a vector of N operators ĥ. Often this decomposition simply is the gradient of the latent modular Hamilto-
is chosen such that each of these sub-Hamiltonians are nian expectation value with respect to the data state
in the n-qubit Pauli group ĥk ∈ Pn . The expectation pushed through the reverse quantum neural network
value of this Hamiltonian is then generally evaluated via U † (φ). Taking gradients of the expectation value of a
quantum expectation estimation, i.e. by taking the linear multi-term observable is the typical scenario encountered
combination of expectation values of each term in regular VQE and quantum neural networks, there exist
multiple strategies to get these gradients (see Appendix
N
X A).
f (φ) = hĤiφ = αk hĥk iφ ≡ α · hφ , (A7)
k=1
2. Gradients of variational parameters

here we introduced the vector of expectations hφ ≡ hĥiφ .
In the case of non-commuting terms, the various expecta-
tion values hĥk iφ terms are estimated over separate runs. For gradients of the loss function with respect to model
Now that we have established how to evaluate the loss parameters θ, note that we need to take gradients of both
function, let us describe how to obtain gradients of the the modular Hamiltonian terms and the partition func-
cost function with respect to the parameters. A simple tion. For gradients of the modular Hamiltonian expec-
approach is to use simple finite-difference methods, for tation term with respect to the variational parameters
example, the central difference method, θ, we change the observable we are taking the expecta-
tion value of, this is not as common in the literature on
∂k f (φ) = 1
2ε [f (φ + ε∆k ) − f (φ − ε∆k )] + O(ε2 ) (A8) quantum neural networks, but it is fairly simple:
18
Appendix C: Distance Metrics
∂θm hK̂θ iσ̂D,φ = h∂θm K̂θ iσ̂D,φ = h∂θm K̂m (θm )iσ̂D,φ ,
The trace distance is defined to be
(B2)
1
q
it is simply the expectation value of the gradient. Note T (ρ̂, σ̂) := T r[ (ρ̂ − σ̂)† (ρ̂ − σ̂)], (C1)
2
that here we assumed the latent modular Hamiltonian
was of the form and generic mixed state fidelity is calculated as
X
K̂θ = K̂m (θ), (B3) h qp p i 2
m F (ρ̂, σ̂) := T r ρ̂σ̂ ρ̂ (C2)
which is valid both for a diagonal latent multinoulli vari-
able (K̂m = |mihm|) and a factorized latent distribution,
in which case each K̂m can be understood as a subsystem Appendix D: Investigation of Ansatz Performance
Hamiltonian.
For simple cases where the parameterized modular In the case of Gaussian states, we are guaranteed that
Hamiltonian is simply a scalar parameter times a fixed a factorized latent space solution exists. For qubit sys-
operator M̂ , i.e., K̂m (θm ) = θm M̂m , then the above sim- tems with inter-qubit coupling, this is not the case. Of
plifies to course fully parameterizing both the unitary and the clas-
sical distribution should allow for perfect reconstruction
∂θm hK̂θ iσ̂D,φ = hM̂m iσ̂D,φ = tr(M̂m σ̂D,φ ), (B4) - and hybrid quantum-classical algorithms should enable
which is a simple expectation value. the efficient extraction of typical samples. That said, it is
Now, to get the gradient of the partition function, interesting to see just how much representational power
given our assumed knowledge of each of the simple modu- the product ansatz actually has when applied to non-
lar Hamiltonian, we should have an analytical expression Gaussian systems. To this end, we investigate its perfor-
for the partition function which we can use. Notice, mance both as a function of temperature, and in com-
X parison to the actual performance of the general ansatz,
log Zθ = log Zj (θj ) (B5) which has a fully parameterizable diagonal distribution.
j Our simulations are small in scale and restricted in the
X classes of models considered, but promising nonetheless.
= log[tr(e−K̂j (θj ) )] In Figure 9, we show trace distance and fidelity for a
j one-dimensional heisenberg model with transverse and
X X
= log( kj (θj )) longitudinal magnetic fields,
j kj
X X
where kj (θj ) are the eigenvalues of the jth parameterized Ĥ = − Ŝi · Ŝj + (hx Sjx + hz Sjz ), (D1)
hiji j
modular Hamiltonian K̂j (θj ).
We can use this form to derive the gradient of this term where the presence of perpendicular field terms induces
with respect to model parameters quantum fluctuations. In the limit β → 0, the thermal
fluctuations completely dominate the quantum, and the
∂θm log Zθ = ∂θm log[tr(e−K̂m (θm ) )] (B6)
resulting thermal state is completely mixed. The essen-
1 −K̂m (θm ) tially perfect reconstruction in that limit - and with lit-
= Zm (θm ) ∂θm tr(e )
tle variance - reflects the high approximate degeneracy
1
= − Zm (θm)
tr(e−K̂m (θm ) ∂θm K̂m (θm )). of the solution space. In the opposite limit, β → ∞,
the thermal state approaches the ground state. Classical
Once again, for simple parameterized modular Hamilto- correlations vanish and the state becomes pure. In this
nians of the form K̂m (θm ) = θm M̂m , then this formula limit, we achieve asymptotically ideal reconstruction with
becomes high probability, but occasionally the variational proce-
1
∂θm log Zθ = − Zm (θ tr(e−θm M̂m M̂m ) (B7) dure gets stuck in local minima, resulting in the high
m)
variance around the mean.
which is straightforward to evaluate analytically classi- In the intermediate regime both quantum and classical
cally given knowledge of the spectrum of M̂m . Techni- correlations are important, and the lack of generality in
cally, there is no need to use the quantum computer to factorized latent space results in slight performance losses
evaluate this part of the gradient. with respect to a generic ansatz. For large systems, the
Thus, we have seen how to take gradients of all terms of fidelity and trace distance will not necessary perform well
the loss function. Thus, by simply training on the quan- on models trained to optimize relative entropy.
tum cross entropy between the data and our model, we In the reconstruction tables for fidelity II(a) and trace
can theoretically fully train our generative model using distance, II(b), QMHL was performed at inverse temper-
gradient-based techniques. ature β = 1.3, which is in the general region where we
19
Trace Distance Appendix E: Continuous Variable Gaussian

Quantum Information
0.2
By Wick’s theorem, Gaussian states (Bosonic or
0.1 Fermionic) are completely specified by their first and sec-
0.0 ond moments. Mathematically, this means that all higher
Fidelity order correlation functions (n-point correlators) can be
1.0 written as sums of products of second order correlators:
0.9
XY
hξâ1 · · · ξân i = hξâj · · · ξˆbj i (E1)
For thermal states, the mean displacement is zero and

0.0 2.5 5.0 7.5 10.0 12.5 15.0 the density matrix ρ̂ is completely replaced by the co-
variance matrix Γ (2N × 2N for systems of N Bosons).
The ‘position’ and ‘momentum’ quadratures, defined
FIG. 9. Performance metrics for QMHL with factorized la- by
tent space ansatz on the thermal state of a fixed Hamiltonian
at different temperatures. Solid dark line denotes mean, and
shaded translucent region denotes 95% confidence interval. 1
Model under consideration is a four qubit one-dimensional
x̂j = √ (âj + â†j ) (E2)
2
Heisenberg model J = −1.0 with transverse and longitudinal
i
magnetic fields, Jx = 0.3, Jz = 0.2, and 50 trial initializa- p̂j = √ (âj − â†j ), (E3)
tions were chosen for each inverse temperature β. 2
are conjugate variables, satisfying the canonical
Bosonic commutation relations, [x̂j , p̂k ] = 2iδjk . If we
TABLE II. Reconstruction metrics for QMHL with latent combine all of the quadratures into a vector of operators,
space product (factorized) and general ansatz, computed for a ξ = (x̂1 , . . . , x̂N , p̂1 , . . . , p̂N ), then the Bosonic commuta-
family of random coupling models on a four qubit chain after tions take the form
convergence or 200 training steps. For each of the 10 model
Hamiltonians, 10 random initializations were chosen for each
ansatz. Mean, min and max are with respect to these initial- [ξ a , ξ b ] = iΩab , (E4)
izations, and the values reported represent averages over the
class of Hamiltonians. where
(a) Fidelity

Ansatz Mean Min Max 0 IN
Ω= (E5)
Factorized 0.871 0.752 0.919 −IN 0
General 0.935 0.881 0.950
is a 2N × 2N skew-symmetric matrix known as the sym-
(b) Trace Distance plectic form, and IN is the N × N identity matrix.
Ansatz Mean Min Max Bosonic Gaussian transformations are then transfor-
Factorized 0.221 0.172 0.367 mations that preserve the symplectic form, i.e. S ∈
General 0.173 0.153 0.249 Sp(2Nb ) where
S T ΩS = Ω. (E6)
expect most drastic deviation from the factorized latent
space assumption. We see that the performance loss as The uncertainty principle (encoded in the commuta-
compared to the generic ansatz is relatively small. Ran- tion relations) constrains the covariance matrix to ΓB ≥
dom models were chosen from the class iΩ.
In symplectic diagonal form, we now have independent
X X effective oscillators, and the entropy takes the simplified
Ĥ = (αij X̂i X̂j + βij Ŷi Ŷj + γij Ẑi Ẑj ) + hj · Ŝj , form
hiji j
(D2) X ωj
SB (σ) = − log(1 − e−ωj ), (E7)
j
eωj − 1
where αij , βij , γij , hj were all sampled independently
from the uniform distribution on the interval [−1, 1] ⊂ R, where ωj is the frequency of the j th effective oscillator,
and allowed to differ on each site and edge. and β, ~, and kB are set to unity.
20
The logarithm of the partition function also takes a can be associated to a unitary U = X + iY ∈
simplified form: C N ×N . The space of such unitaries is spanned by ex-
ponentiating anti-Hermitian matrices. Each Hermitian
X ωj matrix is described by N (N − 1)/2 complex parame-
logZθ = − − log(1 − e−ωj ), (E8)
2 ters. Finally, we can generate Hermitian matrices by
j
anti-symmetrizing (with hermitian conjugate) complex-
valued lower-diagonal matrices. We can thus classically
Appendix F: Gaussian Model Parameterizations parameterize the space of symplectic transformations by
reading the sequence of decompositions backwards start-
ing from lower-diagonal matrices. Altogether, a symplec-
In Section IV, we utilize the Gaussian nature of the
tic matrix for an N -mode system may be described by
problems to better parameterize the unitary transforma-
2N 2 − N real parameters.
tions. Here, we explain those parameterizations and how
In quantum optical setups, O1 and O2 represent pas-
one may operationally implement such parameterized
sive transformations, such as performed by beam split-
transformations on various quantum computing plat-
ters and phase shifters, and D can be implemented via a
forms.
sequence of single-mode squeezers, as explained in [85].
1. Fermions
Appendix G: Gaussian Fermionic Quantum Systems
As mentioned in the main text, Fermionic Gaussian

transformations are given by orthogonal transformations In this appendix, we give a more detailed prescription
acting on Fermionic modes. Multiple methods exist for for how one could implement QHBM for Fermionic Gaus-
parameterizing the class of orthogonal transformations. sian systems. While we employ the Jordan-Wigner trans-
In classical simulation of QHBM, one can take advantage formation [121, 122] to symbolically map Fermionic op-
of a consequence of the lie algebra - lie group correspon- erators to qubit operators, we note that other mappings,
dence: the space of real orthogonal matrices of size N like Bravyi-Kitaev (BK) [123, 124] or Ball-Verstraete-
is spanned by eA paremeterized by anti-symmetrizing a Cirac (BVC) [125–127] transformations also suffice as
lower-diagonal matrix with N (N − 1)/2 non-zero entries. long as appropriate parameterizations are chosen.
For QHBM on a real quantum computer, it is prefer- In the JW transformation, effective Fermionic creation
able to use Givens rotations on pairs of modes. The ma- and annihilation operators are induced by combining
trix describing this two-mode mixing is a rotation, qubit operators,
1

cos(φ) − sin(φ) âk = (X̂k + iŶk )Ẑ1 · · · Ẑk−1
G(φ) = . (F1)
sin(φ) cos(φ) 2
1
Such a rotation can be generalized to include complex â†k = (X̂k − iŶk )Ẑ1 · · · Ẑk−1 , (G1)
2
phases, but that is unnecessary when staying within the
confines of the special orthogonal group. The space of where X̂k , Ŷk , and Ẑk are Pauli operators on qubit k.
such orthogonal transformations can be systematically In this language, it’s natural to consider the occupation
decomposed into sequences of Givens rotations which act of the k th effective Fermionic mode,
on pairs of modes, as detailed in [119].
1
2. Bosons n̂k ≡ ĉ†k ĉk = (1 − Ẑk ), (G2)
2
By the Bloch-Messiah decomposition [120], any sym- and we see that we can ’measure’ the occupation of
plectic matrix S of size 2N can be brought into the form the mode just by measuring the Pauli-Z operator on the
corresponding qubit.
S = O1 DO2 (F2) In making this prescription, we are implicitly defining
an ordering on the qubits. For one-dimensional systems,
where O1 and O2 are real orthogonal and sym-
this is natural, and all next neighbor effective Fermionic
plectic, and D is a diagonal matrix of the form
hopping gates are local in qubit operators:
diag(d1 , d2 , . . . dN , d−1 −1 −1
1 , d2 , ..., dN ).
For classical simulation of QHBM, it is useful to de-
compose this further. Every real orthogonal symplectic 1
matrix, ĉ†k ĉk+1 + ĉ†k+1 ĉk = (X̂k X̂k+1 + Ŷk Ŷk+1 ). (G3)
2

X Y
O= ∈ R2N ×2N For higher-dimensional systems, methods have been
−Y X developed that handle the extra phases involved.
21
For each pair of physical Fermionic creation and an- In one-dimensional systems, the JW transformation
nihilation operators, âj , â†j , we can associate a pair of makes for a remarkably simple implementation of Givens
virtual Majorana modes, ĉ2j−1 , ĉ2j , according to rotations, including two CNOT gates and one controlled-
phase rotation. For higher-dimensional systems, tech-
niques exist for dealing with the phases that arise in the
ĉ2j−1 = âj + â†j (G4) JW mapping. Moreover, Givens rotations are incredibly
close to the physical operations performed on real quan-
ĉ2j = i(âj − â†j ) (G5)
tum devices.
The canonical transformation from physical Fermionic Finally, the derivative of a Givens rotation analytically
operators to Majorana operators is a local rotation in is
operator space for each Fermion individually. While
the defining equations for the Majoranas are cosmeti-
cally identical (up to a multiplicative constant) to those π
G0 (φ) = G(φ +
), (G8)
defining Bosonic conjugate variables p̂ and q̂, these Ma- 2
jorana Fermions satisfy the canonical Fermionic anti- meaning that gradients can be computed easily using
commutation relations, {ĉk , ĉl } = 2δkl . exact methods, by physically measuring expectation val-
In the Majorana basis, any Fermionic Gaussian Hamil- ues with parameters shifted by π2 .
tonian takes the generic form
2Nf
X
ĤF = i hij ĉi ĉj + E, (G6) Appendix H: D-wave Superconductivity
i,j
where Nf is the total number of Fermions, and E is The mean-field BCS Hamiltonian has become a
some constant energy shift associated with the transfor- paradigmatic phenomenological model in the study of su-
mation to Majorana modes. perconductors.
In this basis, we can approach the Fermionic problem The generic model is described by Hamiltonian
in a similar manner to the Bosonic problem - we want
to decompose the system (now written in terms of Ma-
joranas) into independent effective Fermionic oscillators.
(â†i,σ âj,σ + â†j,σ âi,σ ) + µ â†i,σ âi,σ
X X
Said in another way, we want to find a set of Fermionic Ĥdx2 −y2 = − t
operators, {ˆ ˜c1 , . . . , c̃2Nf } and a corresponding set of en- hi,ji,σ i,σ
ergies {1 , . . . , 2Nf }, such that
∆ij (â†i,↑ â†j,↓ − â†i,↓ â†j,↑ + âj,↓ âi,↑ − âj,↑ âi,↓ )
X
hi,ji
2Nf (H1)
j c̃†j c̃j + const.
X
ĤF = (G7)
j=1
defined for Fermions (electrons) on a two-dimensional
As mentioned in the main text, this diagonalization lattice of size Nx by Ny sites, where â (â†j ) is a Fermionic
can be performed by an orthogonal transformation on annihilation (creation) operator on lattice site j. Each
the covariance matrix associated with the thermal state. site hosts two spin-orbitals (one for spin up, and one for
In general, an orthogonal transformation on Fermionic spin down), giving a total of 2×Nx ×Ny Fermions. Here,
modes must also contain particle-hole transformations, µ represents a chemical potential that sets the filling den-
but working in the Majorana basis these are not neces- sity. t denotes next-neighbor hopping strength, and ∆ij
sary. A generic orthogonal matrix requires no more than parameterizes the superconducting gap (less formally, it
O(Nf2 ) such Givens rotations. incentivizes the formation of Cooper pairs).

Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm

Uploaded by

Copyright:

Available Formats

Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm

Uploaded by

Copyright:

Available Formats

Quantum Hamiltonian-Based Models &

the Variational Quantum Thermalizer Algorithm

I. INTRODUCTION the first member of a new class of algorithms, which we

advancements in deep learning [37]. In deep learning, II. QUANTUM HAMILTONIAN-BASED

the latent variational parameters θ and the model param-

Before discussing in more detail the structure of the

In some cases, a decorrelated (disentangled) represen-

D(σ̂D kρ̂θφ ) = −S(σ̂D ) + tr(σ̂D K̂θφ ) + log(Zθ ). (18)

B. Variational Quantum Thermalizer Algorithm:

TABLE I. QHBM Experiments Ingredients

reconstructed target states, as defined in Appendix C.

FIG. 5. Performance metrics for VQT ansatz reconstruc-

also diagonalizes the Hamiltonian. In this context, mod-

C. Preparation of Fermionic Thermal States

In recent years, great progress has been achieved in

where now ξ̂ = (ĉ1 , . . . , ĉ2N ) is a vector of Majorana op-

[64] B. P. Lanyon, J. D. Whitfield, G. G. Gillett, M. E. [quant-ph].

2. Gradients of variational parameters

Appendix C: Distance Metrics

Trace Distance Appendix E: Continuous Variable Gaussian

For thermal states, the mean displacement is zero and

As mentioned in the main text, Fermionic Gaussian

You might also like