Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm
Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm
Quantum Hamiltonian-Based Models & The Variational Quantum Thermalizer Algorithm
quantum processors. In addition, we introduce the Variational Quantum Thermalizer (VQT) for gen-
erating the thermal state of a given Hamiltonian and target temperature, a task for which QHBMs
are naturally well-suited. The VQT can be seen as a generalization of the Variational Quantum
Eigensolver (VQE) to thermal states: we show that the VQT converges to the VQE in the zero
temperature limit. We provide numerical results demonstrating the efficacy of these techniques in
illustrative examples. We use QHBMs and the VQT on Heisenberg spin systems, we apply QHBMs
to learn entanglement Hamiltonians and compression codes in simulated free Bosonic systems, and
finally we use the VQT to prepare thermal Fermionic Gaussian states for quantum simulation.
We will leverage this in our training of the QHBM, where where Zθj = tr[e−K̂j (θj ) ] is the j th subsystem partition
efficient computation or estimation of the latent state en- function, and K̂j (θj ) the j th subsystem modular Hamil-
tropy will give us knowledge of the visible state entropy. tonian. This sum decomposition becomes useful when
estimating expectation values of the modular Hamilto-
nian, as it becomes a sum of expectation values of the
B. Structure of the Latent Space subsytems’ modular Hamiltonians,
X
In this section we will discuss the form of the latent hK̂θ i = hK̂j (θj )i. (7)
model ρ̂θ . We consider a good choice of a latent space j
4
From the above expression, we see that the partition general N -outcome multinoulli [29] distribution. The to-
function
Q is a product of the subsystem partition functions tal number of parameters for such a latent space param-
Zθ = j Zθj , and hence the logarithm of the partition eterization with factorized subsystems scales as the sum
PN
function becomes a sum: of the subsystem dimensions, j=1 dj , which is much
PN smaller than the most general (non factorized) latent
log(Zθ ) = j=1 log(Zθj ). (8) distribution, whose number of parameters scales as the
QN
product j=1 dj , which is the total dimension of the sys-
Furthermore, the entropy of the latent state (and hence tem’s Hilbert space.
of the visible state, via equation (4)) becomes additive The second type of modular Hamiltonian we use, in
over the entropies of the subsystems: Section IV B, is the number operator of a continuous-
PN variable quantum mode (qumode), or harmonic oscilla-
S(ρ̂θ ) = j=1 S ρ̂j (θj ) (9) tor,
K̂j (θj ) = 2j (x̂2 + p̂2 ) = θj â†j âj + 21 .
θ
This is convenient, as estimating N entropies of states (11)
in d-dimensional Hilbert space is much simpler generally
than computing the entropy of a state in dN -dimensional The exponential distribution of such modular Hamiltoni-
space. ans then becomes a single-mode thermal state [55] which
Another feature of a factorized state is that the num- has a Wigner phase space representation as a symmetric
ber of parameters used to describe such a distribution is Gaussian. The single parameter per mode here, θj , mod-
linear in the number of subsystems N . The precise num- ulates the variance of the Gaussian. This single-mode
ber of parameters depends on the structure of the states thermal state is the closest thing we have to a latent
within each subsystem. There are many possibilities, and product of Gaussians, which is the standard choice of la-
we will present some concrete examples below. tent distributions in several variational inference models
Finally, by learning a completely decorrelated (in in classical machine learning, including variational au-
terms of both entanglement and classical correlations) toencoders [30]. Gaussian states of such quantum modes
representation in latent space, we are effectively learning are very natural to prepare on continuous-variable quan-
a representation which has a natural orthogonal basis tum computers [56], and are also emulatable on digital
for its latent space, allowing for latent space interpola- quantum computers [13].
tion, independent component analysis, compression code Finally, as will be explored in Section IV C, our third
learning, principal component analysis, and many other type of latent subsystem modular Hamiltonian is the par-
spin-off applications. In classical machine learning, this ticle number operator for Fermions, K̂j (θj ) = θj ĉ†j ĉj .
is known as a disentangled representation [51], and there
have been several recent works adressing this machine
learning task [52]. 3. General Classical Latent Space Models
where the summation is formal, and runs over the in- Generally, given a target mixed state σ̂D which we
dex set Ω of some basis in Hilbert space, which we call want to generatively model, we know that there exists
the computational basis. Here the probability distribu- a diagonal representation of this mixed state:
tion pθ (x) is generally a parameterized class of varia-
tional probability distributions over the computational ∀ σ̂D , ∃ ŴD : ŴD† D̂D ŴD = σ̂D , D̂D =
P
λx |xihx|
x
basis domain Ω. Since a general categorical (multinoulli) (15)
distribution would need a number of parameters which where tr(D̂D ) = 1, and WD is a unitary operator. Thus
scales as |Ω|, the dimension of the global Hilbert space, the approach outlined above has, in principle, the capac-
which is generally exponentially large (e.g., |Ω| = 2N for ity to represent any mixed state. The challenge remains
N qubits), one needs a more efficient parameterization of to pick a proper prior for both the unitary quantum neu-
the latent distribution. This is where classical algorithms ral network Û (φ) and for the parameterization of the
for probabilistic and variational inference come into play. classical latent distribution pθ (x) (or equivalently, the la-
The classical computer is thus tasked with the varia- tent energy function Eθ (x)). A good ansatz is one which
tional learning of this classical distribution. There are a uses knowledge about the physics of the system to form
plethora of techniques for variational inference to choose a prior for both the latent space and the unitary trans-
from, as were listed above. The key feature we need from formation. Although it may be tempting to use a general
such classical algorithms is the ability to estimate the log- multinoulli distribution for the latent distribution com-
likelihood of the model distribution, the gradients of the bined with a universal random quantum neural network
(log) partition function, and the entropy. These quan- for the unitary ansatz, the model capacity in this case is
tities will become useful in various generative learning far too large. One encounters not only exponential over-
tasks, see Section III for more details. head for parameter estimation of the high-dimensional
The most natural fit for our needs are the modern vari- multinoulli distribution, but also the quantum version
ant of classical energy-based models [49]. In EBMs, a of the no-free lunch theorem [4]. In Section IV, we see
neural network Eθ : Ω 7→ R parameterizes the energy several scenarios where the structure of the latent space
map from the computational basis to a real value, i.e. for and the unitary transformation are well-adapted to the
any value x ∈ Ω it can produce the corresponding energy complexity of the situation at hand.
Eθ (x). Furthermore, due to the easy differentiability of
neural networks, one can rather easily compute gradi-
ents of the energy function, ∇θ Eθ (x) and ∇x Eθ (x). To
leverage these gradients, one uses Langevin dynamics or III. QUANTUM GENERATIVE MODELLING
Hamiltonian Monte Carlo to generate samples x ∼ pθ (x) OF MIXED & THERMAL STATES
according to the exponential distribution of the energy
function, In this section, we introduce two types of generative
1 −Eθ (x) tasks for quantum Hamiltonian-based models which are
pθ (x) = Zθ e (13)
also valid for more general quantum-probabilistic models.
−Eθ (x)
P
where Zθ ≡ x∈Ω e is the partition function. The first task is quantum modular Hamiltonian learn-
As a straightforward feature of this approach, given an ing (QMHL), where one effectively learns the logarithm
ensemble of points sampled from a different distribu- (modular Hamiltonian) of an unknown data mixed state.
tion x ∼ q(x), one can straightforwardly use the neu- Once trained, one is then able to reproduce copies of this
ral network to evaluate expectation values of the energy unknown mixed state via an exponential (thermal) distri-
with respect to this distribution, Ex∼q(x) [E(x)]. Addi- bution of the learned modular Hamiltonian. The second
tionnally, this approach can provide an estimate and/or tasks is dual to the first task: given a known Hamilto-
gradients
P of the entropy of the distribution S(pθ ) = nian, learn to produce copies of the thermal state of this
x∈Ω −pθ (x) log pθ (x). Given the abilities of the classi-
Hamiltonian. While both tasks can be considered in the
cal model above, one can use the classical
P algorithm to broad class of generative modelling, the first task is a
generate the latent mixed state ρ̂(θ) = x∈Ω pθ (x)|xihx| quantum machine learning task, while the second may
by sampling x ∼ pθ and preparing the computational ba- be considered as a quantum simulation task.
sis state |xihx| after each sample. Furthermore, we can As we have already detailed how to construct several
define the modular Hamiltonian for this latent variational variants of the QHBM with a variety of latent space
model as structure in Section II — including details on how to
X compute expectations of the modular Hamiltonian, gen-
K̂θ = Eθ (x)|xihx|. (14)
erate samples of the latent state, and compute quantities
x∈Ω
such as the partition function and entropy of the model
Thus one can estimate expectation values of this operator — we will focus on the general framework of the QHML
by feeding standard basis measurement results through and VQT without too many specifics about how to com-
the neural network function Eθ (x) and averaging. We pute the various quantities involved. For more details
now have outlined all the ingredients needed for inference on gradients of the loss functions introduced here, see
and training of this type of hybrid QHBM. Appendix B.
6
A. Quantum Modular Hamiltonian Learning of model’s candidate approximation to σ̂D , where θ, φ are
Mixed Quantum States variational parameters. In classical probabilistic machine
learning, one minimizes the Kullback-Leibler divergence
Analogous with the classical case, quantum generative (relative entropy), an approach known as “expectation
models seek to learn to replicate the statistics and cor- propagation.” [29] We directly generalize this method
relation structure of a given quantum data distribution to quantum probabilistic machine learning, now aim-
from which we have a finite number of samples. QHBMs ing to minimize the quantum relative entropy between
make this learning process feasible by positing a form the quantum data distribution and our quantum model’s
that inherently separates the tasks of learning the quan- candidate distribution, subject to variations of the pa-
tum correlations in the learned representation of a quan- rameters. Thus we aim to find
tum distribution.
argminθ,φ D(σ̂D kρ̂θφ ), (16)
Typical classical distributions are a set D of samples
drawn from some underlying distribution d ∼ pD . The where D(σ̂kρ̂) = tr(σ̂ log σ̂) − tr(σ̂ log ρ̂) is the quantum
sampled dataset distribution is simply the categorical dis- relative entropy. Due to the positivity of relative entropy,
tribution over the datapoints d ∈ D. A quantum dataset the optimum of the above cost function is achieved if and
generally can be a similar mix of various datapoints, in only if the variational distribution of our model is equal
this case quantum states σ̂d observed at different frequen- to the quantum data distribution:
cies (inverse probabilities). The effective
P mixed state rep-
resenting this dataset is then σ̂D = d∈D pd σ̂d , where we D(σ̂D kρ̂θφ ) ≥ 0, D(σ̂D kρ̂θφ ) = 0 ⇐⇒ σ̂D = ρ̂θφ .
obtain the mixed state σ̂d with probability pd from our (17)
datasetVariational Quantum
D. As modelling a Thermalizer
mixture of states is effectively
equivalent to the task of modellingvariational
one mixed state, we We can use this as a variational principle: the relative
will focus on simply learning to approximate
parameters a single data entropy is our loss function and we can find optimal pa-
Quantum
mixed state σ̂D using (classical)
a variational quantum-probabilistic rameters of the model {θ ∗ , ϕ∗ } such that ρ̂θ∗ φ∗ ≈ σ̂D .
model. We assume processing
that we have access to several copies As we will see below, the use of QHBMs becomes crucial
unit directly as quantum data avail-
of this data mixed state for the tractability of evaluation of the relative entropy,
able in quantum memory. making its use as a loss function possible. Details on
the gradient
Classical output computation of the loss are discussed in Ap-
(e.g.,pendix
signal B.
classification
Now, for a target quantum data mixed state σ̂D and
label)a QHBM variational state ρ̂θφ of the form described in
Equation (3), our goal is to minimize the forward relative
entropy,
Notice that the first term, S(σ̂D ) = −tr(σ̂D log σ̂D ), the
entropy of the data distribution, is independent of our
variational parameters, and hence, for optimization pur-
poses, irrelevant to include. This is convenient as the
entropy of the dataset is a priori unknown. We can thus
FIG. 2. Information flow for the process of training for the
use the last two terms as our loss function for QHML.
QHBM applied to Modular Hamiltonian Learning. Grayscale These terms are known as the quantum cross entropy
indicates classical information processing happening on a clas- between our data state and our model,
sical device while colored registers and operations are stored
and executed on the quantum device. The θ parameters de- Lqmhl (θ, φ) ≡ −tr(σ̂D log ρ̂θφ ) = tr(σ̂D K̂θφ ) + log(Zθ ).
termine the latent space distribution and thus the modular (19)
Hamiltonian K̂θ . From the known latent distribution, one
can estimates of the parameterized partition function log Zθ We call this loss function the quantum variational cross
on the classical device. One then applies the inverse unitary entropy loss. The first term can be understood as the
quantum neural network Û † (φ) and estimates the expecta- expectation value of the model’s modular Hamiltonian
tion value of the modular Hamiltonian K̂θ at the ouput via with respect to the data mixed state. In order to esti-
several runs of inference on the quantum device and measure- mate this modular energy expectation, it is first useful to
ment. The partition function and modular expectation are express the modular Hamiltonian of the model in terms of
then combined to yield the quantum variational cross entropy the latent modular Hamiltonian: K̂θ,φ = Û (φ)K̂θ Û † (φ).
loss function (20). Then, by the cyclicity of the trace, we can rewrite the loss
as
Thus, in complete generality, we consider the prob-
lem of learning a mixed state σ̂D , and let ρ̂θφ be our Lqmhl (θ, φ) = tr([Û † (φ)σ̂D Û (φ)]K̂θ ) + log(Zθ ). (20)
7
To better understand this, we define the pulled-back data reason it is called a relative free energy is because it is
state σ̂D,φ ≡ Û † (φ)σ̂D Û (φ), which is the state obtained ˆ ≡ tr(Ĥ ξ)
the difference of free energy F (ξ) ˆ − 1 S(ξ),
β
ˆ up
by feeding our quantum data through the unitary quan- to a factor of β, between our ansatz state and the true
tum neural network circuit in reverse (or rather, inverse). thermal state,
Thus, this term in the loss is equivalent to the expecta-
tion value of the latent modular Hamiltonian with respect D(ρ̂θφ kσ̂β ) = βF (ρ̂θφ ) − βF (σ̂β ).
to the pulled-back data, hK̂θ iσ̂D,φ . The flow of quantum
and classical information for the task of Modular Hamil- Further note that the positivity of relative entropy im-
tonian Learning is depicted in Figure 2. See Section II plies that the minimum of free energy is achieved by the
for details on estimating the partition function, depend- thermal state;
ing on the chosen latent space structure. D(ρ̂θφ kσ̂β ) = 0 =⇒ F (ρ̂θφ ) = F (σ̂β ) and ρ̂θφ = σ̂β .
Finally, an interesting feature to note about about this
choice of cross-entropy loss function is that in the limit Thus, given a Variational
variationalQuantum Thermalizer
ansatz for the thermal state, by
where the model approximates the data state, i.e,. when variational
minimizing the free energy as our loss function,
the relative entropy in (17) converges to zero, then the parameters
Quantum
loss function itself converges to the entropy of the data: Lvqt (θ, φ) = βF (ρ̂θφ ) = βtr(ρ̂θφ Ĥ)(classical)
− S(ρ̂θφ ),
processing
Lqmhl (θ, φ)
ρ̂θφ →σ̂D
−→ S(σ̂D ) (21) unit
we find optimal parameters {θ ∗ , ϕ∗ } such that ρ̂θ∗ φ∗ ≈
σ̂β .
Classica
This means that, as a by-product of the learning process,
(e.g., sig
after convergence of the training of our model, we are au- classifica
tomatically provided with an estimate of the entropy of label)
our data distribution simply by observing what the loss
value has converged to. This has wide-ranging implica-
tions about the potential use of QHBM and QMHL, com-
bined with quantum simulation, to allow us to estimate
entropies and various information theoretic quantities us-
ing quantum computers. More on this in the discussion
in Section V.
of runs required for the Variational Quantum Eigensolver BCS superconducting states. Bosonic Gaussian states
and other variational algorithms whose loss only depends include coherent and squeezed states, and already enable
on expectation values [15]. (when coupled with measurement) a plethora of appli-
Note that, in the case where our state converges to the cations in quantum communication, including quantum
true thermal state upon convergence of the training, then teleportation and quantum key distribution. In addition,
the value of our loss function will give us the free energy Gaussian states are interesting in their own right, and
of the thermal state, which is proportional to the log of play a significant role in quantum information theory [79–
the thermal partition function: 81], as well as quantum field theory in curved space-time
[82] and quantum cosmology [83].
ρ̂θφ →σ̂β
Lvqt (θ, φ) −→ βF (σ̂β ) = − log Zβ . Gaussian states are special in that our latent-space fac-
torization postulate is justified physically in addition to
This is effectively a variational free energy principle, and holding numerically. As a consequence of Williamson’s
the basis of the VQT. theorem [84], we are guaranteed that for any thermal
As a final note, let us see how we recover the Varia- state, there exists a Gaussian transformation that decou-
tional Quantum Eigensolver and its variational principle ples all subsystems in latent space (a procedure often re-
in the limit of low temperature. If we divide our loss ferred to as normal mode decomposition [85]). For Gaus-
function by β, i.e., directly minimizing the free energy, sian systems, we can also work directly with 2N × 2N
L̃(θ, φ) = β1 Lvqt (θ, φ) = F (ρ̂θφ ), then in the limit of covariance matrices rather than density matrices.
zero temperature,
β→∞
L̃(θ, φ) −→ hĤiθφ , A. Variational Quantum Thermalizer for the 2D
Heisenberg Model
we recover the loss function for the ground state Varia-
tional Quantum Eigensolver (VQE), and thus we recover
As a demonstration of the effectiveness of VQT for
the ground state variational principle. Note that in the
spin systems (qubits), we learn a thermal state of the
limit of zero temperature, there is no need for latent space
two-dimensional Heisenberg model, which exhibits spin-
parameters as there is no entropy and the latent state is
frustration despite its simplicity. To be precise, we con-
unitarily equivalent to any starter pure state.
sider the model
Thus, the VQT is truly the most natural generalization
of the VQE to non-zero temperatures states.
X X
Ĥheis = Jh Ŝi · Ŝj + Jv Ŝi · Ŝj (24)
hijih hijiv
IV. APPLICATIONS & EXPERIMENTS where h (v) denote horizontal (vertical) bonds, and
h·i represent nearest-neighbor pairs. As laid out in
Our framework is general enough to apply in many sit- Section II, we use a factorized latent space distribu-
uations of interest in quantum computing [57], quantum tion. Even though we are considering a two-dimensional
communication [58], quantum sensing [59], and quan- model, we simulate VQT using an imagined one-
tum simulation [60–75]. In this section we focus on dimensional array of qubits. As such, we parameterize
applications to quantum simulation and quantum com- our quantum neural network with only single qubit rota-
munication, which illustrate the QHBM frameworks for tions and two-qubit rotations between adjacent qubits in
spins/qubits, Bosonic, and Fermionic systems. our simulated one-dimensional quantum computer.
In the Bosonic and Fermionic examples, we restrict In particular, our paramterized unitary is
ourselves to Gaussian states for two purposes: first, composed out of three layers of gates. Each
these highlight the ways in which our framework is well- layer consists of an arbitrary single qubit rota-
suited to utilizing problem structure; and second, clas- tion of the form exp[i(φ1j X̂j + φ2j Ŷj + φ3j Ẑj )] ap-
sical methods exist for simulating quantum systems re- plied to each qubit, and a two-qubit rotation
stricted to these types of states, so they serve as a veri- exp[i(φ4j X̂j X̂j+1 + φ5j Ŷj Ŷj+1 + φ6j Ẑj Ẑj+1 )] applied
fication and benchmarking of our methods. This allows to a complete set of neighboring qubits. The neighbors
us to reach system sizes which would not be simulable on acted upon by two-qubit gates are staggered in sequen-
classical computers were we to simulate the wavefunc- tial layers. No parameters are shared within or between
tion in the Hilbert space directly. However, we empha- layers. So in the first layer, the first and second qubits,
size that Quantum Hamiltonian-Based Models apply to third and fourth, and so on, are coupled, while in the
a much broader class of quantum states. second layer it is the second and third qubits, fourth
Gaussian states are the class of thermal (and ground) and fifth, etc., which are coupled. In addition, we do
states of all Hamiltonians that are quadratic in creation not assume periodic boundary conditions for our qubits,
and annihiliation operators (in second quantization). On and as such do not include any gates between the first
the Fermionic side, this class contains a variety of tight- and last qubits.
binding models, including topological models like the Su- While we use relative entropy as our cost function, we
Schrieffer-Heeger (SSH) model [76–78], and mean-field also look at the trace distance and state fidelity of the
9
agonalizes ΓB ,
Nb
T
M ν j0
SΓB S = , (26)
0 νj
j=1
used to model impurities in metals [91, 92], which are Training Loss Target
key to understanding much of the collective phenomena
in strongly-correlated materials [93–95].
In analogy with the Bosonic Gaussian case considered
above, we remind the reader that for Fermionic Gaus-
sian systems, the real anti-symmetric covariance matrix
ΓF associated to a mixed state ρ̂ is given by the matrix
elements
i
Γab
B = Tr(ρ̂[ξˆa , ξˆb ]), (28)
2 Training Snapshots
physical quantum systems [97], this novel application of the power of hybridized quantum-classical neural net-
quantum-classical hybrid computing could be quite sig- works trained using hybrid quantum-classical backprop-
nificant. agation, originally introduced in [13], then later iterated
Latent space factorization, while not obligatory for upon in [109]. Given previous successes in combining
QHBM, leads to widespread applications based on learn- classical differentiable models and quantum variational
ing decorrelated (factorized) quantum representations, circuits, we anticipate this implementation to be straight-
akin to classical disentangled representations [51]. In the forward given the framework laid out in this paper. Since
context of unsupervised learning, learning a disentangled such a full hybridization would allow us to go beyond
latent space representation could be useful for tasks such full-rank multinoulli estimation for the general classical
as latent space interpolation [98] and latent space vector latent distribution QHBM, this would be the piece which
arithmetic, quantum principal component analysis [99], would unlock the scalability of this algorithm to highly
clustering [100], and more generally entropy-based quan- non-trivial large-scale quantum systems. We plan to ex-
tum compression as demonstrated in Section IV B. Such plore this in upcoming work.
compression could also be used for unsupervised pre- QHBM present several key advantages relative to other
training in discriminative learning [101]. Progressively forms of unsupervised quantum learning with quantum
learning to compress layer-wise while growing the depth neural networks such as quantum GANs [110]. GANs
of the unitary quantum neural network would yield a (both quantum and classical) are notoriously difficult
procedure akin to layer-wise pretraining in classical ma- to train, and once trained, difficult to extract physical
chine learning [101]. Compression could also be used for quantities from. QHBM represent physical quantities
denoising quantum data, something previous approaches more directly, and as such are much more suitable for
to quantum autoencoders were not suitable for [102]. applications that involve physical quantum data. From
As for the case of using a diagonal latent space Hamil- our preliminary numerics, our framework also appears to
tonian for QMHL, we are then effectively learning the train very robustly and with few iterations. Furthermore,
eigenbasis of the data density matrix. QHBM require less quantum circuit depth during train-
The great advantage over previously proposed varia- ing than quantum GANs, which require both a quantum
tional quantum algorithms for quantum state diagonal- generator and quantum discriminator. The latter is a key
ization is that our method employs the relative entropy consideration for possible implementation on near-term
rather than a Hilbert-Schmidt metric, thus allowing us intermediate scale quantum devices [2].
to learn to diagonalize much higher rank quantum states. A key point is that in Quantum Modular Hamiltonian
Furthermore, learning to diagonalize a quantum density Learning (QMHL) and VQT, we are not just learning
matrix is essentially related to the Quantum Principal a distribution for the modular Hamiltonian and ther-
Component Analysis algorithm [103] for classical data, mal state respectively. Rather, we are learning efficient
and other related quantum machine learning algorithms approximate parameterizations of these quantities with
[8, 104]. Our approach could be seen as a variational our quantum circuit. In the context of VQT, this gives
alternative method for these algorithms, circumventing us the ability to directly prepare a thermal state on
the need for a very long quantum circuit for the quan- our quantum computer using polynomial resources, from
tum state exponentiation [105], which has been deemed knowledge of a corresponding Hamiltonian. For Mod-
intractable even for far-term quantum computers when ular Hamiltonian Learning, once an efficient ansatz has
compiled [106]. This does not remove the requirement of learned the optimal value of parameters such that its out-
the state preparation (which requires QRAM or a special put approximates our data mixed state, this gives us the
oracle [107]). Though, in the case of quantum data, this ability to reproduce as many copies of the learned mixed
method does not need access to such exotic components, state distributions as desired. In addition, the modu-
and has the potential to demonstrate a quantum advan- lar Hamiltonian itself provides invaluable information re-
tage for learning the unitary which diagonalizes either a lated to topological properties, thermalization, and non-
quantum Hamiltonian or quantum density matrix. equilibrium dynamics.
Future iterations on this work could modify the loss In particular, Modular Hamiltonian Learning gives one
function, potentially introducing a quantum variant of access to the eigenvalues of the density matrix and the
the Evidence Lower Bound (ELBO) for the quantum rel- unitary that diagonalizes the Modular Hamiltonian. Ap-
ative entropy loss. This would, in turn, yield a quan- plying the QNN to a quantum state brings it into the
tum form of Variational Autoencoders (VAEs) [30]. Both eigenbasis of the Modular Hamiltonian. In this basis,
QHBMs and VAEs could potentially be used to learn an an exponentiation of the diagonal latent modular Hamil-
effective mixed state representation of an unknown quan- tonian (which we have a classical description of) imple-
tum state from partial quantum tomographic data [108]. ments modular time evolution. We can apply the inverse
We aim to tackle this Quantum Neural State Tomogra- QNN to return to the original computational basis.
phy with QHBMs in future work. In the same vein, QMHL provides the ability to probe
In a near-term future iteration, we plan to fully com- a system at different temperatures, something that mixed
bine classical neural network-based Energy-Based Models state learning on its own does not. Given access to sam-
[49] with our QHBM. Recent papers have demonstrated ples from a thermal state at some temperature, one can
13
thus generate typical samples from the same system at VI. ACKNOWLEDGEMENTS
another temperature by learning the modular Hamilto-
nian and systematically changing the latent space param- Numerical experiments in this paper were executed us-
eters. ing a custom combination of Cirq [111], TensorFlow [112],
and OpenFermion [90]. GV, JM, and SN would like to
thank the team at X for the hospitality and support dur-
Finally, we underscore the generality of our framework. ing their respective Quantum@X residencies where this
Just as classical EBM are inspired by physics but can work was completed. X, formerly known as Google[x], is
be applied to problems far abstracted from real physical part of the Alphabet family of companies, which includes
systems, so too can our QHBM be employed for any task Google, Verily, Waymo, and others (www.x.company).
which involves learning a quantum distribution. GV acknowledges funding from NSERC.
[1] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, classical algorithms, New Journal of Physics 18, 023023
N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and (2016).
H. Neven, Characterizing quantum supremacy in near- [16] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-
term devices, Nature Physics 14, 595 (2018). Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,
[2] J. Preskill, Quantum computing in the nisq era and be- A variational eigenvalue solver on a photonic quantum
yond, Quantum 2, 79 (2018). processor, Nature communications 5, 4213 (2014).
[3] E. Farhi and H. Neven, Classification with quantum [17] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,
neural networks on near term processors, arXiv preprint M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-
arXiv:1802.06002 (2018). efficient variational quantum eigensolver for small
[4] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- molecules and quantum magnets, Nature 549, 242
bush, and H. Neven, Barren plateaus in quantum neural (2017).
network training landscapes, Nature communications 9, [18] O. Higgott, D. Wang, and S. Brierley, Variational quan-
4812 (2018). tum computation of excited states, Quantum 3, 156
[5] G. Verdon, M. Broughton, and J. Biamonte, A quantum (2019).
algorithm to train neural networks using low-depth cir- [19] J. R. McClean, Z. Jiang, N. Rubin, R. Babbush, and
cuits, arXiv preprint arXiv:1712.05304 (2017). H. Neven, Decoding errors using quantum subspace ex-
[6] P.-L. Dallaire-Demers and N. Killoran, Quantum gen- pansions, (2019).
erative adversarial networks, Physical Review A 98, [20] R. M. Parrish, E. G. Hohenstein, P. L. McMahon,
012324 (2018). and T. J. Martı́nez, Quantum computation of elec-
[7] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fu- tronic transitions using a variational quantum eigen-
jii, Quantum circuit learning, Physical Review A 98, solver, Physical Review Letters 122, 230401 (2019).
032309 (2018). [21] E. Farhi, J. Goldstone, and S. Gutmann, A quantum
[8] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, approximate optimization algorithm, arXiv preprint
N. Wiebe, and S. Lloyd, Quantum machine learning, arXiv:1411.4028 (2014).
Nature 549, 195 (2017). [22] E. Farhi and A. W. Harrow, Quantum supremacy
[9] G. Verdon, M. Broughton, J. R. McClean, K. J. through the quantum approximate optimization algo-
Sung, R. Babbush, Z. Jiang, H. Neven, and rithm, arXiv preprint arXiv:1602.07674 (2016).
M. Mohseni, Learning to learn with quantum neural [23] Z. Jiang, E. Rieffel, and Z. Wang, A qaoa-inspired cir-
networks via classical neural networks, arXiv preprint cuit for grover’s unstructured search using a transverse
arXiv:1907.05415 (2019). field, arXiv preprint arXiv:1702.0257 (2017).
[10] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. [24] G. Verdon, J. M. Arrazola, K. Brádler, and N. Killo-
Sornborger, and P. J. Coles, Quantum-assisted quantum ran, A quantum approximate optimization algorithm for
compiling, Quantum 3, 140 (2019). continuous problems, arXiv preprint arXiv:1902.00409
[11] R. LaRose, A. Tikku, É. ONeel-Judy, L. Cincio, and (2019).
P. J. Coles, Variational quantum state diagonalization, [25] Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel,
npj Quantum Information 5, 8 (2019). xy-mixers: analytical and numerical results for qaoa,
[12] M. Schuld, A. Bocharov, K. Svore, and N. Wiebe, arXiv preprint arXiv:1904.09314 (2019).
Circuit-centric quantum classifiers, arXiv preprint [26] F. G. Brandao, M. Broughton, E. Farhi, S. Gutmann,
arXiv:1804.00633 (2018). and H. Neven, For fixed control parameters the quan-
[13] G. Verdon, J. Pye, and M. Broughton, A universal train- tum approximate optimization algorithm’s objective
ing algorithm for quantum deep learning, arXiv preprint function value concentrates for typical instances, arXiv
arXiv:1806.09729 (2018). preprint arXiv:1812.04170 (2018).
[14] M. Benedetti, E. Lloyd, and S. Sack, Parameterized [27] Z. Jiang, E. G. Rieffel, and Z. Wang, Near-optimal
quantum circuits as machine learning models, arXiv quantum circuit for grover’s unstructured search using
preprint arXiv:1906.07682 (2019). a transverse field, Physical Review A 95, 062317 (2017).
[15] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru- [28] L. Li, M. Fan, M. Coram, P. Riley, and S. Leichenauer,
Guzik, The theory of variational hybrid quantum- Quantum optimization with a novel gibbs objective
14
function and ansatz architecture search, arXiv preprint [46] I. Goodfellow J., J. Pouget-Abadie, M. Mirza, B. Xu,
arXiv:1909.07621 (2019). D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
[29] K. P. Murphy, Machine learning: a probabilistic perspec- Generative adversarial nets, in Proceedings of the Inter-
tive (MIT press, 2012). national Conference on Neural Information Processing
[30] D. P. Kingma and M. Welling, Auto-encoding varia- Systems (2014) p. 26722680.
tional bayes, arXiv preprint arXiv:1312.6114 (2013). [47] A. Razavi, A. v. d. Oord, and O. Vinyals, Generating di-
[31] R. M. Neal, Probabilistic inference using Markov chain verse high-fidelity images with vq-vae-2, arXiv preprint
Monte Carlo methods (Department of Computer Sci- arXiv:1906.00446 (2019).
ence, University of Toronto Toronto, Ontario, Canada, [48] A. Brock, J. Donahue, and K. Simonyan, Large scale
1993). gan training for high fidelity natural image synthesis,
[32] M. Betancourt, A conceptual introduction to hamil- arXiv preprint arXiv:1809.11096 (2018).
tonian monte carlo, arXiv preprint arXiv:1701.02434 [49] Y. Du and I. Mordatch, Implicit generation and gen-
(2017). eralization in energy-based models, arXiv preprint
[33] D. J. Rezende and S. Mohamed, Variational inference arXiv:1903.08689 (2019).
with normalizing flows, arXiv preprint arXiv:1505.05770 [50] M. Welling and Y. Whye Teh, Bayesian learning via
(2015). stochastic gradient langevin dynamics, in ICML’11 Pro-
[34] C. Louizos and M. Welling, Multiplicative normalizing ceedings of the 28th International Conference on In-
flows for variational bayesian neural networks, in Pro- ternational Conference on Machine Learning (2011) p.
ceedings of the 34th International Conference on Ma- 681688.
chine Learning-Volume 70 (JMLR. org, 2017) pp. 2218– [51] Y. Bengio, A. Courville, and P. Vincent, Representation
2227. learning: A review and new perspectives, IEEE trans-
[35] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. actions on pattern analysis and machine intelligence 35,
Duvenaud, Neural ordinary differential equations, in 1798 (2013).
Advances in neural information processing systems [52] C. P. Burgess, I. Higgins, A. Pal, L. Matthey,
(2018) pp. 6571–6583. N. Watters, G. Desjardins, and A. Lerchner, Un-
[36] M. M. Wilde, Quantum information theory (Cambridge derstanding disentangling in beta-vae, arXiv preprint
University Press, 2013). arXiv:1804.03599 (2018).
[37] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, [53] J. Marks, T. Jochym-O’Connor, and V. Gheorghiu,
nature 521, 436 (2015). Comparison of memory thresholds for planar qudit ge-
[38] J. Schmidhuber, Deep learning in neural networks: An ometries, New Journal of Physics 19, 113022 (2017).
overview, Neural networks 61, 85 (2015). [54] D. Koller, U. Lerner, and D. Angelov, A general algo-
[39] W. J. Huggins, P. Patil, B. Mitchell, K. B. Whaley, rithm for approximate inference and its application to
and M. Stoudenmire, Towards quantum machine learn- hybrid bayes nets, in Proceedings of the Fifteenth con-
ing with tensor networks, Quantum Science and tech- ference on Uncertainty in artificial intelligence (Morgan
nology (2018). Kaufmann Publishers Inc., 1999) pp. 324–333.
[40] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman, [55] A. Serafini, Quantum Continuous Variables: A Primer
B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Le- of Theoretical Methods (CRC Press, 2017).
ichenauer, Tensornetwork: A library for physics and ma- [56] S. Lloyd and S. L. Braunstein, Quantum computation
chine learning, arXiv preprint arXiv:1905.01330 (2019). over continuous variables, in Quantum Information with
[41] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, A Continuous Variables (Springer, 1999) pp. 9–17.
theoretical framework for back-propagation, in Proceed- [57] M. A. Nielsen and I. Chuang, Quantum computation
ings of the 1988 connectionist models summer school, and quantum information (2002).
Vol. 1 (CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988) [58] N. Gisin and R. Thew, Quantum communication, Na-
pp. 21–28. ture photonics 1, 165 (2007).
[42] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, [59] C. L. Degen, F. Reinhard, and P. Cappellaro, Quantum
M. Hasegawa-Johnson, and M. N. Do, Semantic image sensing, Reviews of modern physics 89, 035002 (2017).
inpainting with deep generative models, in Proceedings [60] I. Kassal, S. P. Jordan, P. J. Love, M. Mohseni, and
of the IEEE Conference on Computer Vision and Pat- A. Aspuru-Guzik, Polynomial-time quantum algorithm
tern Recognition (2017) pp. 5485–5493. for the simulation of chemical dynamics, Proceedings
[43] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Man- of the National Academy of Sciences 105, 1868118686
zagol, Extracting and composing robust features with (2008).
denoising autoencoders, in Proceedings of the 25th inter- [61] I. Kassal, J. D. Whitfield, A. Perdomo-Ortiz, M.-H.
national conference on Machine learning (ACM, 2008) Yung, and A. Aspuru-Guzik, Simulating chemistry us-
pp. 1096–1103. ing quantum computers, Annual Review of Physical
[44] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunning- Chemistry 62, 185207 (2011).
ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, [62] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik,
et al., Photo-realistic single image super-resolution us- Simulation of electronic structure hamiltonians using
ing a generative adversarial network, in Proceedings of quantum computers, Molecular Physics 109, 735750
the IEEE conference on computer vision and pattern (2011).
recognition (2017) pp. 4681–4690. [63] H. Wang, S. Kais, A. Aspuru-Guzik, and M. R. Hoff-
[45] L. Theis, W. Shi, A. Cunningham, and F. Huszár, mann, Quantum algorithm for obtaining the energy
Lossy image compression with compressive autoen- spectrum of molecular systems, Phys. Chem. Chem.
coders, arXiv preprint arXiv:1703.00395 (2017). Phys. 10, 5388 (2008).
15
[98] P. Bojanowski, A. Joulin, D. Lopez-Paz, and A. Szlam, [115] E. Campbell, Random compiler for fast hamiltonian
Optimizing the latent space of generative networks, simulation, Physical review letters 123, 070503 (2019).
arXiv preprint arXiv:1707.05776 (2017). [116] M. Abramowitz and I. Stegun, 1965, handbook of math-
[99] S. Wold, K. Esbensen, and P. Geladi, Principal compo- ematical functions, Appl. Math. Ser 55 (2006).
nent analysis, Chemometrics and intelligent laboratory [117] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and
systems 2, 37 (1987). N. Killoran, Evaluating analytic gradients on quantum
[100] T. Hofmann, Unsupervised learning by probabilistic la- hardware, arXiv preprint arXiv:1811.11184 (2018).
tent semantic analysis, Machine learning 42, 177 (2001). [118] A. Harrow and J. Napp, Low-depth gradient mea-
[101] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, surements can improve convergence in variational
P. Vincent, and S. Bengio, Why does unsupervised pre- hybrid quantum-classical algorithms, arXiv preprint
training help deep learning?, Journal of Machine Learn- arXiv:1901.05374 (2019).
ing Research 11, 625 (2010). [119] Z. Jiang, K. J. Sung, K. Kechedzhi, V. N. Smelyanskiy,
[102] J. Romero, J. P. Olson, and A. Aspuru-Guzik, Quantum and S. Boixo, Quantum algorithms to simulate many-
autoencoders for efficient compression of quantum data, body physics of correlated fermions, Phys. Rev. Applied
Quantum Science and Technology 2, 045001 (2017). 9, 044036 (2018).
[103] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum [120] M. A. de Gosson, Symplectic Methods in Harmonic
principal component analysis, Nature Physics 10, 631 Analysis and in Mathematical Physics (Springer, 2011).
(2014). [121] G. Ortiz, J. E. Gubernatis, E. Knill, and R. Laflamme,
[104] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum sup- Quantum algorithms for fermionic simulations, Phys.
port vector machine for big data classification, Physical Rev. A 64, 022319 (2001).
review letters 113, 130503 (2014). [122] J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik,
[105] T. R. Bromley and P. Rebentrost, Batched quantum Simulation of electronic structure hamiltonians us-
state exponentiation and quantum hebbian learning, ing quantum computers, Molecular Physics 109, 735
Quantum Machine Intelligence 1, 31 (2019). (2011).
[106] A. Scherer, B. Valiron, S.-C. Mau, S. Alexander, [123] J. T. Seeley, M. J. Richard, and P. J. Love, The bravyi-
E. Van den Berg, and T. E. Chapuran, Concrete re- kitaev transformation for quantum computation of elec-
source analysis of the quantum linear-system algorithm tronic structure, The Journal of Chemical Physics 137,
used to compute the electromagnetic scattering cross 224109 (2012).
section of a 2d target, Quantum Information Process- [124] S. B. Bravyi and A. Y. Kitaev, Fermionic quantum com-
ing 16, 60 (2017). putation, Annals of Physics 298, 210 (2002).
[107] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum [125] R. C. Ball, Fermions without fermion fields, Phys. Rev.
random access memory, Physical review letters 100, Lett. 95, 176407 (2005).
160501 (2008). [126] F. Verstraete and J. I. Cirac, Mapping local hamiltoni-
[108] G. D’Ariano and P. L. Presti, Quantum tomography for ans of fermions to local hamiltonians of spins, Journal
measuring experimentally the matrix elements of an ar- of Statistical Mechanics: Theory and Experiment 2005,
bitrary quantum operation, Physical review letters 86, P09012 (2005).
4195 (2001). [127] J. D. Whitfield, V. c. v. Havlı́ček, and M. Troyer, Local
[109] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and spin operators for fermion simulations, Phys. Rev. A 94,
N. Killoran, Evaluating analytic gradients on quantum 030301 (2016).
hardware, Physical Review A 99, 032331 (2019).
[110] S. Lloyd and C. Weedbrook, Quantum generative ad-
versarial networks for learning and loading random dis- Appendix A: Quantum Neural Networks and
tributions, arXiv preprint arXiv:1804.09139 (2018). Gradients
[111] Cirq: A python framework for creating, editing, and
invoking noisy intermediate scale quantum (nisq) cir-
cuits., https://github.com/quantumlib/Cirq. A Quantum Neural Network can generally be written
[112] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, as a product of layers of unitaries in the form
C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,
S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is- L
Y
ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev- Û (φ) = V̂ ` Û ` (φ` ), (A1)
enberg, D. Mané, R. Monga, S. Moore, D. Murray, `=1
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,
K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, where the `th layer of the QNN consists of the product of
F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, V̂ ` , a non-parametric unitary, and Û ` (φ` ) a unitary with
M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large- (possibly) multiple variational parameters (note the su-
scale machine learning on heterogeneous systems (2015), perscripts here represent indices rather than exponents).
software available from tensorflow.org.
The multi-parametric unitary of a given layer can itself
[113] D. Gottesman, Stabilizer codes and quantum error cor-
rection, Ph.D. thesis, California Institute of Technology
be generally comprised of multiple unitaries {Ûj` (φ`j )}M `
j=1
(1997). applied in parallel:
[114] M. Suzuki, Fractal decomposition of exponential opera-
M
tors with applications to many-body theories and monte Ò
carlo simulations, Physics Letters A 146, 319 (1990). Û ` (φ` ) ≡ Ûj` (φ`j ), (A2)
j=1
17
here I` represents the set of indices of the corresponding which, in the case where there are M continuous param-
to the `th layer. Finally, each of these unitaries Ûj` can be eters, involves 2M evaluations of the objective function,
expressed as the exponential of some generator ĝj` , which each evaluation varying the parameters by in some di-
itself can be any Hermitian operator on n qubits (thus rection, thereby giving us an estimate of the gradient
expressible as a linear combination of n-qubit Pauli’s), of the function with a precision O(ε2 ). Here the ∆k
is a unit-norm perturbation vector in the k th direction
Kj`
` ` X of parameter space, (∆k )j = δjk . In general, one may
Ûj` (φ`j ) = e−iφj ĝj , ĝj` = βkj` P̂k , (A3) use lower-order methods, such as forward difference with
k=1 O(ε) error from M + 1 objective queries [3], or higher
order methods, such as a five-point stencil method, with
where P̂k ∈ Pn (Paulis on n-qubits [113]) and βkj` ∈ R for O(4 ) error from 4M queries [116].
all k, j, `. For a given j and `, in the case where all the As recently pointed out in various works [117, 118],
Pauli terms commute, i.e. [P̂k , P̂m ] = 0 for all m, k such given knowledge of the form of the ansatz (e.g. as in
that βm j`
, βkj` 6= 0, one can simply decompose the unitary (A3)), one can measure the analytic gradients of the ex-
into a product of exponentials of each term, pectation value of the circuit for Hamiltonians which have
Y ` j` a single-term in their Pauli decomposition (A3) (or, al-
Ûj` (φ`j ) = e−iφj βk P̂k . (A4) ternatively, if the Hamiltonian has a spectrum {±λ} for
k some positive λ). For multi-term Hamiltonians, in [117]
Otherwise, in instances where the various terms do not a method to obtain the analytic gradients is proposed
commute, one may apply a Trotter-Suzuki decomposition which uses a linear combination of unitaries. Recent
of this exponential [114], or other quantum simulation methods use a stochastic estimation rule to reduce the
methods [115]. number of runs needed and keep the optimization effi-
In order to optimize the parameters of an ansatz from cient [118]. In our numerics featured in this paper, we
equation (A1), we need a cost function to optimize. In simply used first-order finite (central) difference methods
the case of standard variational quantum algorithms this for our gradients.
cost function is most often chosen to be the expectation
value of a cost Hamiltonian,
Appendix B: Gradients of QHBMs for generative
f (φ) = hĤiφ ≡ hΨ0 |Û † (φ)Ĥ Û (φ)|Ψ0 i (A5) modelling
where |Ψ0 i is the input state to the parametric circuit. 1. Gradients of model parameters
In general, the cost Hamiltonian can be expressed as a
linear combination of operators, e.g. in the form
Using our notation for the pulled-back data state
N
X σ̂D,φ ≡ Û † (φ)σ̂D Û (φ), the gradient with respect to
Ĥ = αk ĥk ≡ α · ĥ, (A6) model parameters φ will be given by
k=1
∂φj L(θ, φ) = ∂φj tr(K̂θ σ̂D,φ ) = ∂φj hK̂θ iσ̂D,φ , (B1)
where we defined a vector of coefficients α ∈ RN and
a vector of N operators ĥ. Often this decomposition simply is the gradient of the latent modular Hamilto-
is chosen such that each of these sub-Hamiltonians are nian expectation value with respect to the data state
in the n-qubit Pauli group ĥk ∈ Pn . The expectation pushed through the reverse quantum neural network
value of this Hamiltonian is then generally evaluated via U † (φ). Taking gradients of the expectation value of a
quantum expectation estimation, i.e. by taking the linear multi-term observable is the typical scenario encountered
combination of expectation values of each term in regular VQE and quantum neural networks, there exist
multiple strategies to get these gradients (see Appendix
N
X A).
f (φ) = hĤiφ = αk hĥk iφ ≡ α · hφ , (A7)
k=1
∂θm hK̂θ iσ̂D,φ = h∂θm K̂θ iσ̂D,φ = h∂θm K̂m (θm )iσ̂D,φ ,
The trace distance is defined to be
(B2)
1
q
it is simply the expectation value of the gradient. Note T (ρ̂, σ̂) := T r[ (ρ̂ − σ̂)† (ρ̂ − σ̂)], (C1)
2
that here we assumed the latent modular Hamiltonian
was of the form and generic mixed state fidelity is calculated as
X
K̂θ = K̂m (θ), (B3) h qp p i 2
m F (ρ̂, σ̂) := T r ρ̂σ̂ ρ̂ (C2)
which is valid both for a diagonal latent multinoulli vari-
able (K̂m = |mihm|) and a factorized latent distribution,
in which case each K̂m can be understood as a subsystem Appendix D: Investigation of Ansatz Performance
Hamiltonian.
For simple cases where the parameterized modular In the case of Gaussian states, we are guaranteed that
Hamiltonian is simply a scalar parameter times a fixed a factorized latent space solution exists. For qubit sys-
operator M̂ , i.e., K̂m (θm ) = θm M̂m , then the above sim- tems with inter-qubit coupling, this is not the case. Of
plifies to course fully parameterizing both the unitary and the clas-
sical distribution should allow for perfect reconstruction
∂θm hK̂θ iσ̂D,φ = hM̂m iσ̂D,φ = tr(M̂m σ̂D,φ ), (B4) - and hybrid quantum-classical algorithms should enable
which is a simple expectation value. the efficient extraction of typical samples. That said, it is
Now, to get the gradient of the partition function, interesting to see just how much representational power
given our assumed knowledge of each of the simple modu- the product ansatz actually has when applied to non-
lar Hamiltonian, we should have an analytical expression Gaussian systems. To this end, we investigate its perfor-
for the partition function which we can use. Notice, mance both as a function of temperature, and in com-
X parison to the actual performance of the general ansatz,
log Zθ = log Zj (θj ) (B5) which has a fully parameterizable diagonal distribution.
j Our simulations are small in scale and restricted in the
X classes of models considered, but promising nonetheless.
= log[tr(e−K̂j (θj ) )] In Figure 9, we show trace distance and fidelity for a
j one-dimensional heisenberg model with transverse and
X X
= log( kj (θj )) longitudinal magnetic fields,
j kj
X X
where kj (θj ) are the eigenvalues of the jth parameterized Ĥ = − Ŝi · Ŝj + (hx Sjx + hz Sjz ), (D1)
hiji j
modular Hamiltonian K̂j (θj ).
We can use this form to derive the gradient of this term where the presence of perpendicular field terms induces
with respect to model parameters quantum fluctuations. In the limit β → 0, the thermal
fluctuations completely dominate the quantum, and the
∂θm log Zθ = ∂θm log[tr(e−K̂m (θm ) )] (B6)
resulting thermal state is completely mixed. The essen-
1 −K̂m (θm ) tially perfect reconstruction in that limit - and with lit-
= Zm (θm ) ∂θm tr(e )
tle variance - reflects the high approximate degeneracy
1
= − Zm (θm)
tr(e−K̂m (θm ) ∂θm K̂m (θm )). of the solution space. In the opposite limit, β → ∞,
the thermal state approaches the ground state. Classical
Once again, for simple parameterized modular Hamilto- correlations vanish and the state becomes pure. In this
nians of the form K̂m (θm ) = θm M̂m , then this formula limit, we achieve asymptotically ideal reconstruction with
becomes high probability, but occasionally the variational proce-
1
∂θm log Zθ = − Zm (θ tr(e−θm M̂m M̂m ) (B7) dure gets stuck in local minima, resulting in the high
m)
variance around the mean.
which is straightforward to evaluate analytically classi- In the intermediate regime both quantum and classical
cally given knowledge of the spectrum of M̂m . Techni- correlations are important, and the lack of generality in
cally, there is no need to use the quantum computer to factorized latent space results in slight performance losses
evaluate this part of the gradient. with respect to a generic ansatz. For large systems, the
Thus, we have seen how to take gradients of all terms of fidelity and trace distance will not necessary perform well
the loss function. Thus, by simply training on the quan- on models trained to optimize relative entropy.
tum cross entropy between the data and our model, we In the reconstruction tables for fidelity II(a) and trace
can theoretically fully train our generative model using distance, II(b), QMHL was performed at inverse temper-
gradient-based techniques. ature β = 1.3, which is in the general region where we
19
0.9
XY
hξˆa1 · · · ξˆan i = hξˆaj · · · ξˆbj i (E1)
S T ΩS = Ω. (E6)
expect most drastic deviation from the factorized latent
space assumption. We see that the performance loss as The uncertainty principle (encoded in the commuta-
compared to the generic ansatz is relatively small. Ran- tion relations) constrains the covariance matrix to ΓB ≥
dom models were chosen from the class iΩ.
In symplectic diagonal form, we now have independent
X X effective oscillators, and the entropy takes the simplified
Ĥ = (αij X̂i X̂j + βij Ŷi Ŷj + γij Ẑi Ẑj ) + hj · Ŝj , form
hiji j
(D2) X ωj
SB (σ) = − log(1 − e−ωj ), (E7)
j
eωj − 1
where αij , βij , γij , hj were all sampled independently
from the uniform distribution on the interval [−1, 1] ⊂ R, where ωj is the frequency of the j th effective oscillator,
and allowed to differ on each site and edge. and β, ~, and kB are set to unity.
20
The logarithm of the partition function also takes a can be associated to a unitary U = X + iY ∈
simplified form: C N ×N . The space of such unitaries is spanned by ex-
ponentiating anti-Hermitian matrices. Each Hermitian
X ωj matrix is described by N (N − 1)/2 complex parame-
logZθ = − − log(1 − e−ωj ), (E8)
2 ters. Finally, we can generate Hermitian matrices by
j
anti-symmetrizing (with hermitian conjugate) complex-
valued lower-diagonal matrices. We can thus classically
Appendix F: Gaussian Model Parameterizations parameterize the space of symplectic transformations by
reading the sequence of decompositions backwards start-
ing from lower-diagonal matrices. Altogether, a symplec-
In Section IV, we utilize the Gaussian nature of the
tic matrix for an N -mode system may be described by
problems to better parameterize the unitary transforma-
2N 2 − N real parameters.
tions. Here, we explain those parameterizations and how
In quantum optical setups, O1 and O2 represent pas-
one may operationally implement such parameterized
sive transformations, such as performed by beam split-
transformations on various quantum computing plat-
ters and phase shifters, and D can be implemented via a
forms.
sequence of single-mode squeezers, as explained in [85].
1. Fermions
Appendix G: Gaussian Fermionic Quantum Systems
1
cos(φ) − sin(φ) âk = (X̂k + iŶk )Ẑ1 · · · Ẑk−1
G(φ) = . (F1)
sin(φ) cos(φ) 2
1
Such a rotation can be generalized to include complex â†k = (X̂k − iŶk )Ẑ1 · · · Ẑk−1 , (G1)
2
phases, but that is unnecessary when staying within the
confines of the special orthogonal group. The space of where X̂k , Ŷk , and Ẑk are Pauli operators on qubit k.
such orthogonal transformations can be systematically In this language, it’s natural to consider the occupation
decomposed into sequences of Givens rotations which act of the k th effective Fermionic mode,
on pairs of modes, as detailed in [119].
1
2. Bosons n̂k ≡ ĉ†k ĉk = (1 − Ẑk ), (G2)
2
By the Bloch-Messiah decomposition [120], any sym- and we see that we can ’measure’ the occupation of
plectic matrix S of size 2N can be brought into the form the mode just by measuring the Pauli-Z operator on the
corresponding qubit.
S = O1 DO2 (F2) In making this prescription, we are implicitly defining
an ordering on the qubits. For one-dimensional systems,
where O1 and O2 are real orthogonal and sym-
this is natural, and all next neighbor effective Fermionic
plectic, and D is a diagonal matrix of the form
hopping gates are local in qubit operators:
diag(d1 , d2 , . . . dN , d−1 −1 −1
1 , d2 , ..., dN ).
For classical simulation of QHBM, it is useful to de-
compose this further. Every real orthogonal symplectic 1
matrix, ĉ†k ĉk+1 + ĉ†k+1 ĉk = (X̂k X̂k+1 + Ŷk Ŷk+1 ). (G3)
2
X Y
O= ∈ R2N ×2N For higher-dimensional systems, methods have been
−Y X developed that handle the extra phases involved.
21
For each pair of physical Fermionic creation and an- In one-dimensional systems, the JW transformation
nihilation operators, âj , â†j , we can associate a pair of makes for a remarkably simple implementation of Givens
virtual Majorana modes, ĉ2j−1 , ĉ2j , according to rotations, including two CNOT gates and one controlled-
phase rotation. For higher-dimensional systems, tech-
niques exist for dealing with the phases that arise in the
ĉ2j−1 = âj + â†j (G4) JW mapping. Moreover, Givens rotations are incredibly
close to the physical operations performed on real quan-
ĉ2j = i(âj − â†j ) (G5)
tum devices.
The canonical transformation from physical Fermionic Finally, the derivative of a Givens rotation analytically
operators to Majorana operators is a local rotation in is
operator space for each Fermion individually. While
the defining equations for the Majoranas are cosmeti-
cally identical (up to a multiplicative constant) to those π
G0 (φ) = G(φ +
), (G8)
defining Bosonic conjugate variables p̂ and q̂, these Ma- 2
jorana Fermions satisfy the canonical Fermionic anti- meaning that gradients can be computed easily using
commutation relations, {ĉk , ĉl } = 2δkl . exact methods, by physically measuring expectation val-
In the Majorana basis, any Fermionic Gaussian Hamil- ues with parameters shifted by π2 .
tonian takes the generic form
2Nf
X
ĤF = i hij ĉi ĉj + E, (G6) Appendix H: D-wave Superconductivity
i,j
where Nf is the total number of Fermions, and E is The mean-field BCS Hamiltonian has become a
some constant energy shift associated with the transfor- paradigmatic phenomenological model in the study of su-
mation to Majorana modes. perconductors.
In this basis, we can approach the Fermionic problem The generic model is described by Hamiltonian
in a similar manner to the Bosonic problem - we want
to decompose the system (now written in terms of Ma-
joranas) into independent effective Fermionic oscillators.
(â†i,σ âj,σ + â†j,σ âi,σ ) + µ â†i,σ âi,σ
X X
Said in another way, we want to find a set of Fermionic Ĥdx2 −y2 = − t
operators, {ˆ ˜c1 , . . . , c̃2Nf } and a corresponding set of en- hi,ji,σ i,σ
ergies {1 , . . . , 2Nf }, such that
∆ij (â†i,↑ â†j,↓ − â†i,↓ â†j,↑ + âj,↓ âi,↑ − âj,↑ âi,↓ )
X
hi,ji
2Nf (H1)
j c̃†j c̃j + const.
X
ĤF = (G7)
j=1
defined for Fermions (electrons) on a two-dimensional
As mentioned in the main text, this diagonalization lattice of size Nx by Ny sites, where â (â†j ) is a Fermionic
can be performed by an orthogonal transformation on annihilation (creation) operator on lattice site j. Each
the covariance matrix associated with the thermal state. site hosts two spin-orbitals (one for spin up, and one for
In general, an orthogonal transformation on Fermionic spin down), giving a total of 2×Nx ×Ny Fermions. Here,
modes must also contain particle-hole transformations, µ represents a chemical potential that sets the filling den-
but working in the Majorana basis these are not neces- sity. t denotes next-neighbor hopping strength, and ∆ij
sary. A generic orthogonal matrix requires no more than parameterizes the superconducting gap (less formally, it
O(Nf2 ) such Givens rotations. incentivizes the formation of Cooper pairs).