Hybrid Quantum Algorithms For Flow Problems: Sachin - Bharadwaj@nyu - Edu Katepalli - Sreenivasan@nyu - Edu
Hybrid Quantum Algorithms For Flow Problems: Sachin - Bharadwaj@nyu - Edu Katepalli - Sreenivasan@nyu - Edu
Hybrid Quantum Algorithms For Flow Problems: Sachin - Bharadwaj@nyu - Edu Katepalli - Sreenivasan@nyu - Edu
and high precision Quantum Linear Systems Algorithms (QLSA) for simulating such flows. The
utility of this simulator is shown by extracting error estimates and a power law scaling that relates
T0 (a parameter crucial to Hamiltonian simulations) to the condition number κ of the simulations
matrix, and allows the prediction of an optimal scaling parameter for accurate eigenvalue estimation.
Further, we append two speedup preserving algorithms for (a) the functional form or sparse quantum
state preparation and (b) in situ quantum post-processing to compute a nonlinear function of the
velocity field, namely the the viscous dissipation rate, resulting in an end-to-end complexity of
O(poly log(N/ϵ)κ/ϵQP P ), where N is the size of the linear system of equations, ϵ is the accuracy of
the solution and ϵQP P is the accuracy of post processing. This work demonstrates a possible way
towards quantum simulation of fluid flows, and highlights the special considerations needed at the
gate level implementation of QC.
I. INTRODUCTION this nascent era, which has been called the Noisy Inter-
mediate Scale Quantum (NISQ) era, QC’s applications
omputer simulations of nonlinear physical systems— already extend [10, 11] across finance, chemistry, biol-
such as turbulent flows, glassy systems, climate physics, ogy, communication and cryptography, but not as much
molecular dynamics and protein folding—are formidably in areas that are predominantly nonlinear, such as fluid
hard to perform on even the most powerful supercom- dynamics.
puters of today or of foreseeable future. In particular, This work attempts to pave the way for utilizing QC in
the state-of-the-art Direct Numerical Simulations (DNS) Computational Fluid Dynamics (CFD) research, which
of turbulent flows [1–3] governed by the Navier-Stokes we have termed [12] Quantum Computation of Fluid Dy-
equations, or of turbulent reacting flow problems and namics (QCFD). An initial comprehensive survey of var-
combustion [4], both of which involve massive simula- ious possible directions of QCFD was made in [12]. Real-
tions with high grid resolutions, not only reveal fine de- istic CFD simulations with quantum advantages require
tails of the flow physics [5, 6], but also constantly contend one to quantumly solve general nonlinear PDEs such as
with the limits of supercomputers on which the codes run the Navier-Stokes equations. However, it is worth noting
[7]. However, simulation sizes required to settle funda- that the fundamental linearity of quantum mechanics it-
mental asymptotic theories, or simulate turbulent sys- self blockades encoding of nonlinear terms, thus forcing a
tems such as the Sun or cyclones, or to simulate flows linearization of some kind [13–15], which typically results
around complex geometries of practical interest, would in an infinite dimensional linear system. In such cases the
require computing power that is several orders of mag- inaccessibility to the required large number of qubits (and
nitude higher than is currently available. Reaching such thus exponentially large vector spaces) leads to inevitable
computational prospects calls for a paradigm shift in the truncation errors, limiting the focus to weakly nonlinear
computing technology. problems [13]. Therefore the ability to solve high dimen-
One such potential candidate is Quantum Computing sional linear systems in an end-to-end manner [16] while
(QC)[8], which has striven to establish its advantage over capturing the flow physics is crucial to simulating non-
classical counterparts by promising polynomial or expo- linear flow problems. Our goal here is to present various
nential speedups [9]. Even though QC has been around steps involved in the process of solving simple and ide-
for the last two decades, the subject is still nascent. In alized problems, including providing estimates of scaling
and errors involved.
To this end, we unveil here a high performance quan-
tum simulator which we call QFlowS (Quantum Flow
∗ sachin.bharadwaj@nyu.edu Simulator), designed particularly to simulate fluid flows.
† katepalli.sreenivasan@nyu.edu Built on a C++ platform, it offers both QC and CFD
2
tools in one place. With QFlowS we implement a modi- retical upper bounds of their complexities. The general
fied version of the class of algorithms, now termed Quan- form of the governing PDEs considered here (assuming
tum Linear Systems Algorithms (QLSA). Under some no body forces or source terms) are given by the momen-
caveats, these algorithms promise to solve a linear sys- tum conservation and continuity relations of the kind
tem of equations given by the matrix inversion problem
Ax = b, with up to an exponential speedup compared ∂u 1 2
+ C · ∇u = ∇ u − ∇p, (1)
to known classical algorithms. In recent years a num- ∂t Re
ber of efforts based on continuum methods using QLSA ∂u 1 2
= ∇ u − ∇p, (2)
[17, 18], variational quantum algorithms [19, 20], ampli- ∂t Re
tude estimation methods [21, 22], and quantum-inspired
methods [23], have been undertaken to solve linear and
∇ · u = 0. (3)
nonlinear PDEs. However most of these efforts have been
theoretical, lacking gate-level quantum numerical simu- where u = (u, v, w) is the velocity field, C is a constant
lations and analysis of the resulting flow field, or proper advection velocity, p is pressure field, Re = U D/ν is the
estimates of the actual errors involved. Reynolds number, U is the characteristic velocity, ν is the
In particular, a full-gate level quantum simulation is kinematic viscosity and D is the separation between the
implemented on QFlowS to solve the unsteady Poiseuille boundaries. Eq. (3) enforces the incompressibility con-
and Couette flow problems, extendable with little change dition while eq. (2) describes the well-known unsteady
to the advection diffusion problem with constant advec- channel (or Poiseuille/Couette flow) (with C = 0 in eq.
tion velocity. We implement both the fundamental form (1)) which in the 1D case (running example for all dis-
of QLSA, called the Harrow-Hassidim-Lloyd (HHL) algo- cussions from here on) reduces to
rithm [24] and its more recent counterpart, [25] based on
the linear combination of unitaries (LCU). In addition, ∂u 1 ∂ 2 u ∂p
we prescribe suitable quantum state preparation proto- = − , (4)
∂t Re ∂y 2 ∂x
cols and propose a novel quantum post-processing (QPP)
protocol to compute in situ nonlinear functions of the where the velocity varies only along y (wall-normal di-
resulting flow solution. In particular we obtain the vis- ∂p
rection), and the pressure gradient ∂x is set to be a con-
2
cous dissipation rate ε = ν⟨( ∂u∂y ) ⟩ of the flow field u and stant. The boundary conditions are no-slip with u(0, t) =
viscosity ν. Together, this forms an end-to-end imple- u(D, t) = 0 for the Poiseuille flow and u(0, t) = 0 and
mentation, which alleviates, to some extent, the caveats u(D, t) = 1 for the Couette flow. The initial condition
of both quantum state preparation and the measure- for the temporal evolution is set to be a uniform flow
ment of qubits—which are otherwise the major limiters u(y, 0) = uin = 1.
of the theoretical quantum advantage [11, 17, 24, 26]. We reiterate that this problem is simple from the
Although the proposed algorithms are far from realistic standpoint of the sophisticated advances of classical
fluid simulations, they make quantum implementations CFD. However, this is an excellent starting point for
more amenable for small systems, and inform the running demonstrating the viability of quantum algorithms for
of codes on near-term NISQ machines while attempting CFD, which is the spirit in which this work is presented.
to preserve the quantum advantage.
25
QFlowS
optimal
20
15
Speedup
10
0
0 4 8 12 16 20 24
Number of threads
FIG. 1: (a) shows the modified QLSA circuit with QSP, forming an oracle UV that is employed recursively in the
QPP protocol for computing viscous dissipation rate by a combination of Quantum Analog-Digital converters. (b)
The working flowchart of the hybrid quantum-classical algorithm. (c) The strong scaling performance of QFlowS with
super-linear speedup when run on single node, parallelized with OpenMP up to a total of 24 threads on NYU’s Greene
supercomputer. The performance is for simulations of a 20 qubit circuit of depth 422, performing a Quantum Fourier
Transform (QFT) and inverse QFT algorithm. The standard deviations are smaller than the blue square symbols
around the mean,computed over an ensemble of 8000 simulations with random initial quantum states.
the elements of the matrices Abe1 , Abe2 , Af e and vec- sical solvers (and with additional errors due to quantum
tors bbe1 , bbe2 , bf e are done classically. Certain param- measurements). In any case, it is still worthwhile estab-
eters required (as elucidated later) for quantum state lishing how the method fares as a plausible alternative to
preparation (e.g., rotation angles and decision trees) and classical simulations.
for Hamiltonian simulation (time T0∗ ) are pre-computed The final solution is either: (a) simply read by quan-
classically as well. N from here on refers to the di- tum measurements for post-processing on a classical de-
mension of final matrix system that results from these vice, or (b) the solution is post-processed using the QPP
considerations. With this on hand, the inputs are first protocol introduced here in situ for a quantum device.
loaded on the QC by the quantum state preparation al- The former, at the level of a simulator, allows one to val-
gorithms (QSP-1,2) and the resulting linear system of idate the correctness of solutions and redesign the circuit
equations is then solved by QLSA. In the case of it- as required. While in the latter case, only a single tar-
erative BE, Abe1 ū = bbe1 is solved for velocities ū, get qubit and few ancillas are measured, which outputs
at every time step until convergence (residue reaching one observable—which is a real-valued nonlinear func-
a tolerance ≤ ϵtol = 10−6 ), which is checked classi- tion of the velocity field. Apart from computing nonlin-
cally. In a contrasting setup, BE and FE are used to ear functions, this circumvents expensive and noisy mea-
set up, respectively, Abe2 ũ = bbe2 and Af e ũ = bf e , surements of entire quantum states and more importantly
giving ũ = [u(y, 0), u(y, dt), · · · , u(y, T )], in one shot, preserves quantum advantage (to the extent possible).
∀t ∈ [0, T ]. It is important to note that even the BE
method can be setup such that the solution is computed
∀t in one go. However, in the absence of efficient state
QUANTUM FLOW SIMULATOR - QFLOWS
preparation and measurement protocols, measuring the
solution and re-preparing the state for the next time step
are O(Ng ) operations that eliminate any quantum advan- In [12, 29] several commercially available quantum
tage, making the overall algorithm no better than clas- simulation packages are listed. Most of them, for in-
stance Qiskit (IBM), Quipper [30] and QuEST [31], are
4
constructed for general purpose quantum simulations more precise class of QLSA methods based on the lin-
and are highly optimized for such operations, making ear combination of unitaries (LCU) [25] which we shall
it hard to customize the fundamental subroutines and refer to as QLSA-2. In [25] it was shown that under
data structures for CFD calculations. On the other hand, similar caveats of QLSA-1, wen have: n
For a hermitian n
there are softwares such as ANSYS and OpenFOAM that and invertible matrix A ∈ C2 ×2 , vector b ∈ C2 ,
perform solely classical CFD simulations. With the mo- given oracles to prepare A and b in O(polylog(N )), and
tivation of having a single bespoke quantum simulator a prescribed precision of ϵ > 0, there exists an algorithm
for CFD, we unveil here a high performance, gate level that computes a solution x such that |||x⟩ − |A−1 b⟩|| ≤ ϵ
quantum-simulation toolkit, which we call QFlowS ; it is in O(polylog(N/ϵ)κ). This work implements algorithms
based on a C++ core and designed to be used both in- due to both these methods [24, 25]. In the QCFD con-
dependently or as part of other software packages. It has text, we now explore methods suitable for preparing
a current capability of 30+ qubit simulation of custom {bbe1 , bbe2 , bf e } and the matrices {Abe1 , Abe2 , Af e }, to
quantum circuits. It also has several built-in gates and enable the post-processing of the solution ũ, in order to
quantum circuits that could be used readily, while also construct an end-to-end method.
being able to probe different quantum state metrics (such
as the norm, density matrix and entanglement). Along
with these, it includes basic CFD tools needed to set up Quantum State Preparation
flow problems making it versatile for QCFD simulations.
Noise modelling is in progress and forms the major part of
future software development. QFlowS is also continually To prepare quantum states that encode bbe1 , bbe2 and
being parallelized for optimal performance on supercom- bf e , we implement two different methods, both offering
puters. For instance, figure 1(c) shows the strong scaling sub-exponential circuit depth complexity:
performance using OpenMP. The performance is mea- (1) In the case of iterative BE, the vector bbe1 , pre-
sured while running on NYU’s Greene supercomputing pared at every time step, is generally fully dense with
facility. On a single medium-memory computer node (48 sparsity sb ∼ O(Nbe1 ). In the specific cases of Poiseuille
cores: 2x Intel Xeon Platinum 8268 24C 205W 2.9GHz and Couette flows, and for the specific initial conditions
Processor) and for a choice of 20 qubits, we measure the considered here, the state prepared at every time step
2
log(b)
run-time (by omitting the one-time initial overhead pro- forms a discrete log-concave distribution (i.e., ∂ ∂y 2 <
cesses) of a QFT-IQFT circuit action on an ensemble of 0 for ∀t ≥ 0 ), which could also be confirmed from the
randomly initialized quantum states. We observe near analytical solution given by eq. (5) known for this case
optimal and at times super-optimal scaling with increas- as
ing number of threads up to 24. (Super-optimality arises
∞
"
when the quantum circuit is sparse, causing lesser quan- X 2(1 − (−1)k ) ∂p Re
tum entanglement. Every single circuit layer operation is u(y, t) = 1+ (5)
kπ ∂x (kπ)2
distributed over many worker threads, whose cache size k=1
#
exceeds the size of quantum state subspace being han- kπy −t kπy 2
( ) Re ∂p
dled, thus making them closely parallel). SI Appendix, sin e Re D − y(1 − y).
D 2 ∂x
Section 2, summarizes features of QFlowS.
Even if the exact solution is not known, provided the ini-
tial condition is the only state preparation involved in the
QUANTUM LINEAR SYSTEMS ALGORITHM algorithm, flexibility exists for most flow simulations in
(QLSA)
choosing initial conditions that are log-concave, so that
one could invoke a Grover-Rudolph state preparation [38]
One of the first quantum protocols for solving equa- technique (or its more evolved off-springs [39, 40]) to of-
tions of the form A⃗x = ⃗b is the HHL algorithm [24] which fer an efficient way to encode data. Two comments are
we refer to here as QLSA-1. In [24] it was shown nthat: n
useful. (i) Though this method could be used for ar-
For a hermitian and non-singular matrix A ∈ C2 ×2 , bitrary state-vectors (at the cost of exponential circuit
2n n
vector b ∈ C (N = 2 ), given oracles to prepare A and depth), for an efficient state preparation, some informa-
b in O(polylog(N )), and a prescribed precision of ϵ > 0, tion on the functional form of the state needs to be known
there exists an algorithm that computes a solution x such a priori —from analytical solutions, classical CFD, or
that |||x⟩ − |A−1 b⟩|| ≤ ϵ in O(polylog(N )s2 κ2 /ϵ), where by the measurement of the quantum circuit at intermit-
κ is the condition number of the matrix. This shows tent time-steps, peeking into its instantaneous functional
that the algorithm is exponentially faster than classi- form. Here, we implement a similar method, which we
cal alternatives, but there are important caveats [26]). shall refer to as QSP-1, based on [38, 41], where it was
Later works [32, 33] attempted to address these caveats, shown that: Given a vector bbe1 ∈ RNbe1 ,state b′ be1 can
while some others [34–36] fundamentally improved the be prepared such that, ||bbe1 ⟩−|b′ be1 ⟩| < O(1/ploy(Nbe1 ))
method by reducing error complexity from poly(1/ϵ) to in O(log(Nbe1 )) steps. (ii) Measuring all qubits of the
poly(log(1/ϵ)). Consequently refs. [17, 35, 37] led to a register (∼ O(Nbe1 )) at every time step compromises
5
the exponential speed-up and could introduce measure- is nearly the same, (i) We see some spurious oscillation
ment errors. However, such a method of recursive state like error in the velocity profile for the BE case as seen
preparation and measurement could still prove to be use- in figure 2(d) for higher QP E . (ii) The BE case how-
ful with quantum advantage for a very small number of ever has no stability based restrictions as the FE case
qubits [42]. In any case, we implement this method here making it more flexible on the choice of dt. (iii) When
to explore if such a BE scheme gives accurate results with accuracy in temporal discretization is of the concern the
or without quantum advantage. one shot FE fares better. The performance can also be
(2) In the case of the one shot methods, an alternative measured by computing the fidelity of the solution as
quantum state preparation method can be considered shown in the inset of figure 3(a), which shows the BE
since bf e is generally larger in size ∼ O((m + p)Ng ) (this to perform better than FE. However fidelity might not
discussion applies similarly to bbe2 ). When considered always be a good indicator to performance as illustrated
together with all other registers that are initially set to in SI Appendix Section 1.B. Both QLSA-1 and QLSA-2
|0⟩, bf e is a highly sparse state vector with sb ∼ O(Ng m). rely on a variant of the phase estimation algorithm which
For such states, we implement a sparse state preparation generally contributes most to the total error QPE; in
protocol [43], which we shall refer to as QSP-2. It pro- QLSA-2, the Gapped Phase Estimation (GPE) is rather
vides an optimal circuit depth that scales only polyno- computationally inexpensive and less erronous than in
mially with vector size. This method involves construct- QLSA-1[17]). In the case of phase estimation, the opera-
ing decision trees forming an alternative way to represent tor/matrix under consideration is exponentiated first as
quantum states. Careful optimization on the structure of eiAT0 , where T0 is the Hamiltonian simulation time. An
these trees leads to efficient state preparation whose com- optimal choice T0∗ (unknown a priori) scales on the eigen-
plexity depends on the number of continuous pathways values λj that are spectrally decomposed in the basis of
∗
in the resulting tree structure. Thus, rephrasing here the A as j eiλj T0 |uj ⟩⟨uj | to produce the best QQP E -bit bi-
P
result in [43], we have: Given an n-qubit initial state of nary representation |λ̃j ⟩QQP E . It also minimizes possible
size N = 2n , all set to |0⟩, except for a sparse vector sub- truncation errors and any spurious quantum numerical
space bf e ∈ RNf e (Nf e = O((m + p)Ng )), with sparsity diffusion.
sb = m ≪ N , then with only single qubit and CNOT In case of QLSA-1 it is important to note, since we
gates, one can prepare such a state with in O(2kn) time, are interested in estimating λ−1 eventually, the smallest
k × O(n) CNOT gates and using 1 ancillary qubit, where eigenvalue will contribute the most to the error. There-
k(≤ m) is the number of branch paths of the decision fore ensuring that the smallest value representable (least
tree.
count) with QP E qubits = 2−QP E is ≤ λ̃min is essential.
Both QSP-1 & 2 are elucidated with an example in SI The error for all cases shown in figure 3(a) has a grad-
Appendix, Section 3. ual step like decay because increasing QP E in small steps
(of O(1)) does not lower the least count appreciably (in
log10 or loge ) as QP E gets larger. In case of QLSA-2,
though it evades a full blown QPE, the right choice of T0
FLOW SIMULATION RESULTS (for the Fourier approach [17]) and the LCU coefficients
is still crucial for better accuracy.
We construct and solve the system given by eqs. (4) At the level of the flow field when we probed further,
for Ng = 10 and Re = 10. We observe that the quan- the choice of T0 seemed to exhibit non-trivial effects; for
tum solutions for the velocity field capture the physics instance, in the iterative BE case, when gradually in-
both qualitatively and quantitatively. To discuss closely creasing QP E qubits, the converged solution either un-
the utility of QFlowS we consider results from QLSA- dershot or overshot the analytical solution initially. This
1. As shown in figure 2(a) the converged steady state is captured in figure 3(b), where the error ϵ (with respect
solution (using iterative BE) undershoots the analytic to the analytical solution uan of the center line velocity
solution for 7 qubits, and performs better with a higher solution) oscillates around ϵ = 0 before converging to it
number of qubits (QP E ≥ 14) that are allocated for the for QP E > 12. For the specific case shown here, a choice
Quantum Phase Estimation (QPE) algorithm, which in of T0∗ = 1.75 (dotted black line) has the least oscillation
turn decides the quantum numerical precision. Similarly, of the error and best accuracy.
the converged solution for the one shot FE and BE cases We can now ask: what combination T Q = (T0 , QP E )
also become more accurate with respect to both analyti- gives the least error? To answer this better, we take
cal and classical CFD solutions, with increasing number a sample matrix equation system (of size 8 × 8 and
of qubits, as seen in figure 2(c) and (d) respectively. κ = 18.8795) and solve it for different T Q. We then make
Our experience is that, between the three schemes, the a contour plot of the QLSA error ϵQLSA = ||uQ − uC ||,
one shot FE and BE turns out to be more accurate than as shown in figure 3(c) and trace the path of least er-
iterative BE for increasing number of qubits, as shown ror ϵmin for each T Q. Further, the range of the T0
in figure 3(a), where the error ϵrms is computed with scan can be reduced with some initial estimates to the
respect to the analytical solution. Among the two, one lower and upper
√ bounds of λmin and λmax √ [44] such
shot schemes, though quantitatively their error behavior as, β1 − β2 N − 1 ≤ λmin ≤ β1 − β2 / N − 1 and
6
7 qubits (3 qubit QPE)
1 1
0.9 0.9
0.8 0.8
t=0.049383
0.7 0.7 t=1.037
t=2.0247
t=0.049383 t=3.0123
0.6 0.6
t=0.54321 t=4
t=1.037 t=4.9877
y
0.5 0.5
y
t=1.5309 t=5.9753
t=2.0247 t=6.963
0.4 0.4
analytical t=7.9506
t=8.9383
0.3 0.3 t=9.9259
analytical
0.2 0.2
0.1 0.1
0 0
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5
u u
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
y
y
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.2 0.4 0.6 0.8 1 1.2 1.4
u u
FIG. 2: (a) and (b) show the quantum simulation of the flow field evolving forward in time towards steady state
(analytical parabolic solution shown as solid black line) using BE scheme that uses 7 and 14 qubits (3 and 10 QPE
qubits (QP E ), respectively), for N = 10, Re = 10, ∂p/∂x = −2 and dt = 0.01. The accuracy of the converged solution
improves for higher QP E . Here the velocity field is solved for at every time step; (c) shows increasingly accurate
converged solutions with increasing QP E ∈ {3, 4, 5, 6, 7}, but solved using the FE scheme where the velocity field is
solved for all time steps in one shot, and only the final solution is extracted. Here α = 0.5 is set to meet the the von
Neumann stability criterion and the parameter T0∗ = 5.0 is fixed. (d) also shows the one shot method, solved with
a BE scheme. Though α = 0.5 is set to maintain the same time step size as (c) for comparison), there is no hard
constraint given that the method is unconditionally stable. Here T0∗ = 8.5.
√ √
β1 + β2 / N − 1 ≤ λmax ≤ β1 + β2 N − 1, where qubits, which lowers the overall L2 error ≤ 2−QP E . This
β1 = Tr(A)/N and β2 = (Tr(A2 )/N −β12 )1/2 . We observe means the minimum number of qubits needed to attain
that the optimal T0∗ for all combinations lies in a fairly an error ϵ grows as a power law (QP E )min ∼ 2.92ϵ−0.1158 ,
small range ∆T0 ∼ 0.1. This is a unique value lying along as shown in the inset of figure 3(d). If not for the right
the median of this range, T0∗ ≈ 1.3 for which the system choice of T0∗ , for QP E > 3, one would spend more numer-
performs best. This means that all or most eigenvalues ous qubits O(1.44 log(ϵ−1 )) to lower the overall error. We
are best represented in binary form with QQP E qubits note that ϵ as computed here suppresses the error from
(one or some of the eigenvalues could also turn out to be finite differences, which is ϵf d ∼ O(∆y 2 , ∆t) and it’s im-
represented exactly). Further, given T0∗ , with increasing portant to note that this error plagues both quantum
number of qubits, the minimum error exhibits a power and classical solutions.The quantum solution, however,
law decay ϵmin ∼ Q−6.81
PE as shown in figure 3(d), reach- gets closer to the classical solution with increasing qubits
ing ∼ 10−5 at around 13 qubits. The exponent becomes (QP E = 9, yellow curve).
increasingly negative with decreasing κ, since the range
of eigenvalues becomes smaller and more eigenvalues tend Thus, being able to estimate T0∗ fairly accurately re-
to be easily representable with a given number of qubits. duces the overall computational resources required as
The thick horizontal red line shows the least count for well as the error, making it amenable for NISQ devices.
QP E = 13; here, ϵmin < 2−13 . This favorable possibility Though there have been several analytical, asymptotic
arises because a subspace of the solution set could have prescriptions for the choice of T0∗ [17, 24, 45], the ex-
had near exact representation using the given number of act choice remains elusive. To shed better light, QFlowS
is equipped with QPE optimizer subroutine which, on
7
3 10 -2
1 1
10 -4
3 4 5 6 7
0 0
-1
-1
-2
-2 T 0 =0.5 T 0 =1.5 T 0 =2.0
-3 T 0 =1.0 T 0 =1.75 error = 0
-3
3 3.5 4 4.5 5 5.5 6 6.5 7 7 8 9 10 11 12 13 14
13 -1
10 0 14
-2 12
12
10
11 -3 8
10 -1
6
-4
10
-5 4
9 10 -2 -5 -4 -3 -2 -1
10 10 10 10 10
-6
8
-7 10 -3
7
-8
6
-9 10 -4
5
-10
4
0.5 1 1.5 2 10 -5
4 5 6 7 8 9 10 11 12 13
FIG. 3: (a) shows the value of absolute root-mean-square (RMS) error for the BE and FE cases shown in 2((a)-(c))
against QPE qubits of both the step-by-step and one shot methods. To compare directly the one shot BE and FE
methods, we show in the inset the quantity (1-fidelity) as a function of QP E . (b) shows the absolute error computed
with respect to the analytical solution for varying T0 . The black dotted line joining upright triangles shows the case for
which the error magnitude and oscillation around the zero line are the least. (c) shows the contour plot of the QLSA
toot-mean-square error ϵQLSA computed with respect to the classical inversion solution for different T Q = (QP E , T0 )
pairs. The dotted black line traces the locus of the least/minimum error for each (T, Q) pair in that range, which
shows an oscillation around a unique median value T0∗ specific to a given matrix (or κ). (d) shows the decay in the
(ϵQLSA )min , extracted from (c) with respect to only QP E for a fixed κ = 18.8795 and T0∗ = 1.3. The inset plots the
QP E (ordinate) required to achieve a specified ϵQLSA (abscissa).
the basis of the nature of the flow problem and the nu- matrix size m = ⌈log2 (T /dt)⌉ and viscosity ν = 1/Re.
merical method (finite differences) used, estimates T0∗ by Figure 4(a) shows for a specific case of T = 1, dt =
very minimal classical pre-processing. Since κ decides 0.001, κ grows as a stretched exponential with decreasing
the range of eigenvalues and the invertibility of the ma- (increasing) ν (Re) and saturates for very low ν, while for
trix, it forms a common link that characterizes matrices a fixed ν = 0.1, κ grows exponentially with m as shown
for different systems with similar sparsity. Therefore, a in the inset of figure 4(a). However, the overall behavior
relation that uniquely connects κ with T0∗ would make of κ with both ν and the system size m is shown in figure
a reasonable basis for prescribing T0∗ for different system 4(b), where the κ − ν curves transition from exponential
configurations. We provide such a relation which, though to stretched exponential fits as plotted for increasing m
generalizable in behavior, is specific to: (1) the class of obtaining the relation κ ≤ m(e−0.02mν +2). This relation
elliptic and parabolic linear PDEs considered here; (2) fi- confirms that κ is bounded and is not exponentially large.
nite difference based numerical formulations that give rise Here we explicitly highlight the dependence of κ on ν and
to either sparse, band diagonal, lower triangular, Toeplitz note that it is also bounded from above by κ = 3(m +
or circulant matrices. p + 1), as given in [13] for nonlinear PDE systems that
generally have higher κ than linear PDE problems. In
Since the one shot FE and BE seems to perform better both cases, κ increases with Re and m.
than BE, we take the matrix system of the former for
Ng = 10 and characterize how κ varies as a function of With this relation in hand, we proceed to compute
8
m=4 2.2 1
m=5 Analytical
m=6 Classical N =64
m=7 2 0
Classical N=32
m=8 Classical N=8
10 3 m=9
m = 10
-1 Quantum N=8
1.8
-2
1.6
-3
1.4
10 2 -4
1.2
-5
1 -6
increasing N
increasing m
1
10 -7
0 0.5 1 1.5 2 2.5 3 3.5 2 3 4 5 6 7 8 9 10
10 -4 10 -2 10 0
FIG. 4: (a) shows the behavior of κ with increasing m and decreasing ν. For ν < 0.01, κ saturates to a constant
that is decided largely by m. This is expected since for Poiseuille flow, by appropriately scaling time with ν (using
t∗ = νt/L2 ), ν can be absorbed into the equation. However, this is not the case when C ̸= 0 in eq. 1, in which case
ν would have a much stronger effect on κ. (b) Using QFlowS, a small sample space of low dimensional matrices for
varying κ is solved to compute T0∗ in each case, as shown in figure 2(c), which yields a power law scaling of T0∗ against
κ, as marked in the figure. This feature can be used to predict T0∗ for higher dimensional matrices. T0∗ is insensitive
to κ below a threshold value of ν for these problems. Thin dashed lines indicate 99% levels. (c) shows the mean
viscous dissipation rate ε with respect to increasing Re (decreasing ν) computed from quantum solutions.
T0∗ for increasing κ, by extracting, as before, the T Q vantage and also introduce measurement errors. Here,
phase diagram and finally obtaining the relation T0∗ ∼ we examine a QPP that produces just one real-valued
−0.363 log(κ) + 0.918. In practice, only a small zone of output of the average viscous dissipation rate per unit
2
the phase diagram is explored. The effect of ν is as one volume, ε = ν⟨( ∂u∂y ) ⟩ (by requiring only a very few mea-
would expect, with T0∗ decreasing and saturating for very surements). Given that we are equipped with an oracle
low values. Now this relation between T0∗ − κ obtained UV (QLSA-2) that prepares a quantum register with the
via simulations on QFlowS serves as an ideal source for velocity field solution, we append to it a derivative mod-
choosing T0∗ appropriately to perform accurate simula- ule that computes first ∂u ∂y of the solution depicted in the
tions for bigger circuit sizes as well. The earlier relation circuit diagram shown in figure 1(a). This is done by ei-
(despite a slight variation with matrix structure) forms a ther (1) the LCU method where a finite difference matrix
reasonable approximation for T0∗ for all systems consid- of first derivative is decomposed as linear combination of
ered here. This process predicts optimal T0∗ for Hamilto- unitaries, or (2) a spectral method in which an IQFT is
nian simulation algorithms for both QLSA-1 and QLSA- first applied to enter the conjugate space and the first
2. In case of QLSA-2, QFlowS also efficiently generates derivative is now a simple scalar multiplication with the
the set of the LCU coefficients for optimal performance. corresponding wavenumber, k. Finally the application of
Further, even for other class of problems (different PDEs QFT transforms it back into real space.
and discretization schemes), QFlowS’s QPE optimizer
could be employed to perform similar low cost classical At this point, if one is interested in general nonlinear
pre-processing to suggest optimal T0∗ for accurate and functions such as trigonometric, logarithmic, square-root
efficient fluid flow simulations. Further, barring minor or higher powers, we implement the following procedure.
quantitative differences, on performing a similar analysis The derivatives that are stored as quantum amplitudes
on the Couette flow case we find that the qualitative out- are first converted into an nm -bit binary representation
come and inferences drawn are nearly the same as that using Quantum Analog-Digital Converter (QADC) [47].
of the Poiseuille flow case as seen here. The correspond- Following this step, a direct squaring algorithm outlined
ing velocity profiles for the Couette flow are shown in SI in SI Appendix Section 4, or a binary quantum arith-
Appendix Section 1.B. metic squaring circuit (an inverse of the square-root algo-
rithm, which is more expensive than the former, see [48])
2
is used to compute ( ∂u ∂y ) , which are finally converted
back into amplitude encoding using Quantum Digital-
QUANTUM POST-PROCESSING PROTOCOL Analog Converter (QDAC) [47]. This algorithm requires
(QPP) O(1/ϵQP P ) calls to the controlled-UV oracles (that has a
complexity of O(polylogg(N/ϵ)κ), thus an overall com-
Once the velocity field is obtained, measuring it by plexity of O(polylog(N/ϵ)κ/ϵQP P )), one query to the
repeated execution (excluding the requirements of quan- bit-squaring algorithm with complexity O((log2 N )2 ) and
tum amplitude amplification [46]) of the quantum circuit O((log2 N )2 /ϵQP P ) single- and two-qubit gates. The out-
(O(N ) complexity) will compromise any quantum ad- line of the QPP algorithm introduced here along with
9
circuit implementation is given in circuit SI Appendix vantage in precision O(poly log(N/ϵ)κ), other methods
Section 4. based on adiabatic QC [49] could have potentially simpler
As a final step we apply a matrix Uavg that computes implementations, while offering a similar performance,
the sum of derivatives at all points into one qubit, mea- motivating further investigations.
suring which, along with a few ancillas, outputs the de- To keep the discussion compact, the data reported
sired ε (after some normalization and multiplication by here are taken mainly from QLSA-1, but a similarly
ν). Applying QPP on the quantum solution yields a be- detailed discussion of simulation results with QLSA-2
havior of ε with Re, computed at T=0.2, as shown in forms the bulk of the upcoming work. In QLSA-2 the
figure 3(c). Since the initial condition is a uniform flow, critical factors computed are the coefficients for LCU,
in the beginning there will be sharp gradients near the GPE parameters and a comparison between the Fourier
wall. To capture them, one would need a large number of and Chebyshev approaches. Further, we have intro-
grid points since the error in ∂u/∂y is ϵ ≈ O(∆y). This duced a QPP protocol, where we propose the compu-
effect is seen clearly from figure 3(c), where the classical tation of the viscous dissipation rate ε, using a spe-
dissipation computed for Ng = 8, 16 and 64 shows im- cific combination of QFT, IQFT, QADC, QDAC and
proving trends; for high enough N, it begins to closely bit-arithmetic, with an overall complexity that scales as
follow the analytical result. We pick the case of Ng = 8 O(poly log(N/ϵ)κ/ϵQP P + ((log2 N )2 /ϵQP P )). The QPP
for the quantum case and see that it follows closely the introduces an extra O(1/ϵQP P ) scaling that brings down
corresponding classical solution for a total of 13 qubits. the performance of QLSA-2 to the level of QLSA-1 (if
This could be made more accurate with more of the QPE ϵ ≈ ϵQP P ). Added to this, for purposes of quantum
qubits and the resolution Ng , as seen before. In essence, amplitude amplification, QFlowS is capable of repeat-
this demonstrates the possibility for computing quanti- ing circuit runs in parallel (currently tested up to ∼8000
ties such as ε effectively as a quantum post processing shots).
step.
We should point out that this method avoids measur-
ing the entire velocity field, thus protecting it from com-
DISCUSSION promising the quantum advantage and escalating possi-
ble measurement errors. We observe that ε computed
from the resulting quantum simulations captures the
We have demonstrated here a possible quantum algo-
known analytical results. This method can be extended
rithmic strategy and its full implementation using gate-
to compute other nonlinear functions of the velocity field.
based quantum circuits on QFlowS, to simulate Poiseuille
and Couette flows in an end-to-end manner. First, we In summary, along with introducing a new quantum
identify suitable quantum state preparation algorithms simulator package QFlowS—designed mainly for CFD
by considering the sparsity and functional forms of the applications—we also demonstrate a complete implemen-
initial velocity data being encoded. In CFD, it is gen- tation of an end-to-end algorithm to perform fluid flow
erally admissible to choose a relatively simple form and simulations using QC, which paves the way for future
a sparse initial condition, which would result in a rel- QCFD simulations of both linear and nonlinear flows.
atively low cost of state preparation. QSP-1,2 could However, it is important to note that we have not ad-
both be used as shown here, by assessing the form of in- dressed other key challenges such as noise and quan-
put to encode initial and boundary conditions with sub- tum error correction, which we emphasize are critical to
exponential complexity (O(log(Nbe )) and O(kn), respec- simulations on near term quantum devices. Also, the
tively), as well as to re-initialize instantaneous velocity complexities of the algorithms provided here are esti-
fields. However, data that are dense with no functional mates; along with investigating higher order finite dif-
form would force an exponential circuit depth. ference schemes, a detailed error and complexity anal-
Second, using finite difference schemes, the governing ysis along with computing exact gate counts and cir-
equations were discretized to form linear system of equa- cuit depths form an important part of ongoing efforts.
tions, and solved by implementing QLSA-1 and -2, which Extending these methods and tools presented to nonlin-
are state-of-the-art, high precision algorithms with expo- ear systems such as Burgers equations and Navier-Stokes
nential advantage compared to classical schemes. Here, equations also part of the ongoing efforts.
we have made a detailed analysis of the behavior of the
velocity solutions and the attendant errors, which have
revealed that FE outperforms BE. Further, we proposed
the role of T0 and discussed algorithms to prescribe the
optimal value T0∗ . The power-law and exponential form Data availability
relations of (ϵmin −QP E ) and (T0∗ −κ), respectively, given
by QFlowS, forms a well-informed basis to choose T0∗
and minimum required qubits to perform accurate (up to All study data are included in the main text. The pack-
ϵmin ) and qubit-efficient fluid flow simulations. Though age QFlowS will shortly be made available as an open-
QLSA-2 evades QPE and can provide exponential ad- source package on GitHub.
10
We wish to thank Dhawal Buaria (NYU), Yigit Subaśi We also set up an alternative matrix equation
(LANL), Jörg Schumacher (TU Ilmenau), Balu Nadiga
(LANL), Patrick Rebentrost (CQT), Stefan Wörner Abe2 ū = bbe2 , (A5)
(IBM) and Philipp Pfeffer (TU Ilmenau) for insightful
discussions. S.S.B acknowledges the computational re-
for all time steps as shown in eq. A8. (Abe2 )ij =
sources provided by the NYU Greene supercomputing
−(I + A∆t) ∀ i = j. However, ∀ i ≤ m, (Abe2 )ij =
facility on which these simulations were performed.
−I for j = i − 1 and ∀ i ≥ m, (Abe2 )ij = −I for j =
i − 1. Further, (bbe2 )i = {uin ∀ i = 0; = −f ∆t ∀ 0 <
i ≤ m; and = 0 ∀ i > m}.
Appendix A: Numerical method
The well known 2nd order central difference scheme uj+1 − uji
i
= Auji + f , (A6)
is used to discretize the flow domain into Ng equidis- ∆t
tant grid points, as shown in figure S1(a) (u =
[u1 , u2 , · · · , uNg ]), with grid spacing ∆y = h = 1/(Ng + which leads to the matrix equation,
1), admitting a discretization error ∼ O(∆y 2 ). Since
the velocity is known at the boundaries, one solves only
for the Ng − 2 unknown internal grid points. Thus the
Laplacian operator can be written as Af e ũ = bf e , (A7)
∂u
= Au + f . (A2) Equations A8 and A9 thus unroll all the time steps into
∂t one big matrix of dimensions (m+p+1)Ng ×(m+p+1)Ng ,
thus solving for the velocity ũ = [u0 , u1 , · · · , um+p ] at all
b. Temporal discretization: times in one shot, where every uj is the full field at all grid
points. The total time T is discretized into m + p time
steps, where one chooses a large enough p, such that every
To integrate in time, the temporal domain t ∈ [0, T ] is uji = uj+1 for j ∈ [m+ 1, m+p], which implies that, after
i
discretized into m = T /∆t time steps using two different the attainment of a steady state, the solution produces
schemes both admitting an error ∼ O(∆t): p copies of the the steady state solution. This is done
1. Backward Euler or Implicit method : This dis- only to improve the measurement probability of the post-
cretizes the time derivative as selected state [13] and does not affect the solution itself.
This method is stable only for an appropriate choice of
uj+1 − uj Courant number (von Neumann stability criteria), α =
= Auj+1 + f , (A3) ∆t
∆t (∆y)2 < 0.5, making it conditionally stable and specific to
the PDE under discussion, which therefore also decides
and gives the matrix equation the upper bound on the largest admissible ∆t.
FIG. 5: (a) Shows the computational flow field setup: The flow is confined between two no-slip boundaries separated
by a distance D = 1, with x and y as the streamwise and wall-normal directions, respectively. The left and right
panels in (a) show the schematics of steady state velocity profiles of the Poiseuille and Couette flows, respectively.
The finite difference scheme discretizes the flow domain along y into N grid points with separation ∆y, at which the
velocities are computed. (b) shows the 2nd order central difference stencil in 2D. For the 1D case, only the points in
the y-direction are considered and the resulting approximation to the Laplacian operator estimated at a given point
yi using the two points on either side of it is given in eq. A1.
I 0 ···
u0
in
u
−I (I − A∆t) 1
u b0
.. ..
. ..
. .
0 .
. .
. m
.. −I (I − A∆t) u =
bm−1
(A8)
m+1 0
u
−I I .
.
. .. ...
.
. .
.
m+p
−I I u 0
I 0 ···
0
u uin
−(I + A∆t) I u 1 b0
.. ..
. ..
. .
0 .
. .
.. m
u
= bm−1 (A9)
. −(I + A∆t) I
m+1
u 0
−I I
. .
.. .. . .
. . . .
m+p
−I I u 0
3. Couette flow fidelity does not necessarily indicate better ϵrms , show-
ing that fidelity might not be the most robust measure
of performance when solving physical, fluid mechanical
Following the same procedure as for Poiseuille flow, the problems. Higher fidelity only indicates larger overlap of
Couette flow (u(0, t) = 0, u(1, t) = 1) can be captured ac- the quantum solutions with respect to the classical inver-
curately as seen in the flow profiles of figure 6(a-d). Simi- sion solution, which itself is erroneous due to finite dis-
lar inferences as discussed in the main text are applicable cretization and truncation errors. Even if fidelity=1, the
in this case as well. We wish to highlight two possible error is still bounded by ∼ O((∆y)2 , ∆t). The solution
measures of accuracy: (i) the fidelity = |uQ · uC | and (ii) for the one shot case is a wavefunction that encodes all
RMS error (with respect to analytical solution), given by time steps, but we extract only the final time step. The
ϵrms = (⟨uQ − uan ⟩)1/2 , which are both plotted in figure fidelity is an overall measure of this solution but does
6(e) for a one shot FE scheme with a nearly accurate not quantify whether the vector subspace corresponding
estimate of T0∗ . We can clearly observe that with in- to the final time step is more accurate or not. Also, the
creasing QP E , the fidelity increases, though the ϵrms has fidelity in the iterative BE case drops largely for every
a weakly increasing trend. This indicates the that higher
12
time step, since a dynamical T0∗ is not chosen for every 2. Quantum gates and circuits: Quantum gates are
time step. However, higher fidelity can be seen as only given by unitary operators U (U U † = U † U = I),
an indicator of whether the chosen quantum parameters that are essentially rotation matrices, which col-
for QLSA are at least in the reasonable ballpark. lectively form a quantum circuit that manipulates
quantum information in a specific way. The quan-
tum circuit can be viewed as a tensor product of
Appendix B: QFlowS - A brief overview of the all the single qubit gates that forms a matrix of the
package size 2n × 2n for an n-qubit circuit. On QFlowS,
the quantum gates are not implemented as matrix
QFlowS is a specialized high performance quantum operations, but as algebraic operations. The exact
simulator that enables setting up CFD problems in the transformation caused by different gates is trans-
QC format seamlessly. We summarize here briefly the lated into vector operations that affects the specific
various features of QFlowS as schematically depicted in coefficients, but done in parallel on multiple cores.
figure 5(a). This makes the simulator more efficient with re-
spect to both memory and speed. For example, a
1. Qubits and quantum states: Qubits which are quan- Hadamard gate and a NOT gate acting on a two
tum analogues of classical bits, form the fundamen- qubit state shown in figure S4, is given by
tal units of information storage and are represented
by quantum states that follow rules of quantum me-
The action of such a circuit brings about a trans-
chanics. Mathematically, they form elements of a
formation on the state, as given by eq. 11. Such a
complex vector space (Hilbert space H (∈ Cn )). An
transformation could be algebraically dealt as fol-
n-qubit state of the quantum computer, formed by
lows. In an n-qubit circuit, a Hadamard gate act-
taking tensor products of single qubit states |ψ⟩, is
ing on the qubit q is Hq , for which ui 7→ √12 (ui ±
given by
ui+2n−q−1 ). This operation is performed on vec-
n
2
X tor elements by the parallel processing of causally
⊗n
|ψ⟩ = ci |ui ⟩, ci ∈ C, (B1) disconnected (unentangled) gates and vector sub-
i=1 spaces without having to store or multiply the large
which encodes 2n complex values ci , in the basis 2n × 2n matrices. QFlowS can successfully handle
|ui ⟩, that are stored as 1-dimensional arrays on circuits with ∼ 10 million multi-controlled and two-
QFlowS. The memory required should ideally scale level gates even on a standard workstation with 10
linearly with wavefunction size (≈ 16 × 2n bytes core CPU, 1TB memory and 32GB RAM.
with double precision), but there is an overhead
due to the need to store quantum circuit instruc-
tions. Currently, QFlowS offers simulation capa- 3. Algorithm library and portability: QFlowS is
bilities with up to 30+ qubits, which span vector docked with several standard quantum subroutines
spaces with dimensions of the order ∼ 109 . For or algorithms such as Quantum Fourier Transform
performing parallelized simulations, these quantum (QFT) and Quantum Phase Estimation (QPE)
states are either (a) loaded on a large memory sin- that can be readily used in any new circuit. Along
gle node architecture (with or without GPU) dealt with that, QFlowS has classical CFD tools such fi-
with an OpenMP style parallelization, or (b) dis- nite difference (FDM) , finite volume (FVM) and
tributed onto different processors for an MPI style boundary element methods (BEM), implicit and
execution. Both methods were tested initially but explicit time stepping methods, predictor-corrector
the former method was favored because of the sim- methods and linearization methods such as Ho-
plicity of implementation and lower communication motopy Analysis Method (HAM) and Carleman
overheads. Some key operations on quantum states method—to highlight a few. These classical sub-
that can be done with QFlowS include: routines generate appropriate matrices and vectors
in formats that can seamlessly be imported by the
(a) Quantum State Preparation (QSP) - To ini- quantum algorithms. This package will offer easy
tialize any arbitrary states or states with spe- portability onto both local workstations and super-
cial features. This is detailed further in SI computers.
Appendix, Section 3;
(b) Quantum state tomography and amplitude es-
timation - To estimate the amplitude of the fi- 4. Visualization: With QFlowS, the quantum states
nal quantum state by reconstructing the state and quantum circuits can also be visualized with an
using different tomography techniques [8]; in-built state histogram builder and circuit drawer,
(c) Quantum state characteristics - To compute with which algorithms and states can be visualized
other useful properties of the state such as and verified for correctness to simulate and test the
density matrices, entanglement and norm. circuits.
13
1 1 1
y
0.5 0.5 0.5
y
y
0.4 0.4 0.4
t=0 t=0
0.3 0.3 0.3
t=0.49383 t=2.4691
0.2 t=0.98765 0.2 t=4.9383 0.2
t=1.4815 t=7.4074
0.1 analytical 0.1 analytical 0.1
0 0 0
-0.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 -0.2 0 0.2 0.4 0.6 0.8 1
u u
u
1
0.9
0.8 10-2
0.7
0.6
0.03
y
0.5
0.025
0.4
0.3 0.02
10-3
0.2
0.015
0.1 3 4 5 6 7
0
-0.2 0 0.2 0.4 0.6 0.8 1 3 4 5 6 7
u
FIG. 6: (a) and (b) shows the quantum simulation of the flow field evolving forward in time towards steady state
(analytical solution shown as solid black line) using BE scheme that use 7 and 12 qubits (3 and 8 QPE qubits (QP E ),
respectively), with u(y, 0)) = 0, N = 10, Re = 10, ∂p/∂x = 0 and dt = 0.01. The accuracy of the converged solution
improves for higher QP E similar to the Poiseuille flow case. Here the velocity field is solved for at every time step;
(c) shows increasingly accurate converged solutions with increasing QP E , but solved using the FE scheme where the
velocity field is solved for all time steps in one shot, and only the final solution is extracted. Here α = 0.5 is set to
meet the the von Neumann stability criterion and the parameter T0∗ = 5.0 is fixed. It is important to note that, the
same T0∗ works for both Poiseuille flow and Couette flow independent of the boundary conditions, since it depends
only on κ and form of the matrix. (d) shows the one-shot method, but it is solved with a BE scheme for T0∗ = 6.5.
(e) shows both (1-fidelity) and ϵrms (inset) with increasing QP E from 3 to 7 computed for the one-shot FE case.
Appendix C: Quantum State Preparation respond to certain classes of states with specific prop-
erties. We briefly elucidate two such algorithms used
QFlowS is equipped with a Quantum State Prepara- here. When the state that is being initialized on a quan-
tion (QSP) library that includes algorithms that cor-
14
1 1
|0i H √ |0i + √ |1i
2 2
0 1 0 1 u0 u1 + u3
1 1 0 1 0 =⇒ u1 7→ √1 u0 + u2
|0i |1i
X U = H1 ⊗ X0 = √
2 0 1 0 −1 u2 2 u 1 − u 3
(B2)
1 0 −1 0 u3 u0 − u2
FIG. 8: 2 qubit circuit
where pi is the i-th region of discretely sampled ele- Now repeat this O(n) times to generate an n-qubit
ments/regions from a log-concave probability distribu- state with the distribution sampled over 2n regions.
tion function p(ω). With the existence of an efficient Though the complexity of such a method could be seen
classical subroutine to perform partial sums given by as O(n), the quantum arithmetic circuits are generally
expensive and hence one could alternatively follow an
Z i
ωR Z i
(ωR i
−ωL )/2 improvement as proposed in [50]. To construct a circuit
pi = pω dω, piL = pω dω, (C2) based on the above method described here, we employ
i i
ωL ωL the binary tree formulation as shown in [41], which we
shall call QSP-1. Let us consider loading 4 arbitrary
where pi and piL are probabilities of the point lying in non-zero values onto a 2 qubit state as
the entire region i and the left half of the region i, re-
spectively, we can construct a circuit that prepares an |ψ⟩ = u0 |00⟩ + u1 |01⟩ + u2 |10⟩ + u3 |11⟩. (C8)
n-qubit state for k < m as
To construct the circuit with QSP-1, we create a binary
k
2X −1 q tree as shown on the left panel of figure 9(a), where
(k)
|ψ (k) ⟩ = pi |i⟩. (C3) we start with the values to be loaded as the terminal
i=0 nodes at the base of the tree. These values are pair-
wise squared (since probabilities are squares of ampli-
Now we can further discretize this to yield the state tudes) and summed to build the tree upwards. Finally,
all nodes should converge to 1, as one should expect.
2k+1
X−1 q
(k+1) (k+1) Now, to prepare the quantum states, we traverse the
|ψ ⟩= pi |i⟩ (C4) tree downwards starting from the vertex, which corre-
i=0
sponds to the initial state |ψ⟩ = |0⟩ ⊗ 0⟩. Every node
has two children: the zero-child and the one-child which
by the following steps. Given the ability to compute
have amplitudes corresponding to either the |0⟩ or |1⟩
the quantities in eq. C2, we can compute the conditional
state, respectively. Then we compute at every node,
probability function fk (i) q
the angle θ = arccos( zero-child
one-child ) and apply a controlled
piL Ry (θ) gate, where the control sequence is the bit se-
fk (i) = . (C5)
pi quence of the corresponding node. In the example shown
in figure S4, to compute θ2 , we look at the two children
With this we now compute the next level of discretiza- and obtain 0.25(zero-child) and 0.5(one-child). Therefore
tion by constructing a quantum arithmetic circuit that q
performs θ2 = arccos( 0.250.5 ) = π/4. Here the parent node, 0.75
corresponds to a zero-child branch. So the controlled-Ry
|i⟩|0⟩ 7→ |i⟩|θi ⟩, (C6) gate will operate
√ when qubit
√ 1 is set to 0. Thus, after
Ry (θ1 ) we get 0.75|0⟩+ 0.25|1⟩)⊗|0⟩.Further with suc-
cessive applications of controlled gates Ry (θ2 ) and Ry (θ3 )
p
where θi = arccos( fk (i)). Further, by adding an ancil-
lary qubit ((k+1)-th qubit) we perform controlled Ry (θi ) we obtain
rotation gates (controlled on θ); uncomputing the second Ry (θ1 ) √ √
register we get |0⟩ ⊗ 0⟩ −−−−→ 0.75|0⟩ + 0.25|1⟩) ⊗ |0⟩ (C9)
15
FIG. 9: (a) Computing the Decision Trees for QSP -1,2 (b) Quantum circuits for QSP -1 and -2 corresponding (a)
to load four real values
min{O(poly log(N/ϵ)κ/ϵQP P ), O(n), O(kn)}. The TABLE I: Summary of time complexity of quantum
complexity of the entire algorithm presented in subroutines
this work is summarized in Table I (where n =
log2 (N )). We caution the readers that these com- Quantum subroutine Complexity
plexities are only estimates and warrant a more de- QSP-1 O(n)
tailed analysis of the space, time and gate complex- QSP-2 O(2kn)
ity of the algorithms presented here. QLSA-1 O(poly log(N )s2 κ2 /ϵ)
QLSA-2 O(poly log(N/ϵ)κ)
QPP O(UV ) + O((log N )2 /ϵQP P )
End-to-end min{O(poly log(N/ϵ)κ/ϵQP P ), O(kn/ϵQP P )}
+O((log N )2 /ϵQP P ))
≈ O(poly log(N/ϵ)κ/ϵQP P )
[1] K. P. Iyer, J. D. Scheel, J. Schumacher, and K. R. Sreeni- [13] J.-P. Liu, H. Ø. Kolden, H. K. Krovi, N. F. Loureiro,
vasan, Classical 1/3 scaling of convection holds up to K. Trivisa, and A. M. Childs, Efficient quantum al-
Ra = 1015 , Proc. Nat. Acad. Sci. 117, 7594 (2020). gorithm for dissipative nonlinear differential equations,
[2] P. K. Yeung and K. Ravikumar, Advancing understand- Proc. Nat. Acad. Sci. 118 (2021).
ing of turbulence through extreme-scale computation: In- [14] Y. T. Lin, R. B. Lowrie, D. Aslangil, Y. Subaşı, and
termittency and simulations at large problem sizes, Phys. A. T. Sornborger, Koopman von Neumann mechanics
Rev. Fluids 5, 110517 (2020). and the Koopman representation: A perspective on solv-
[3] P.-K. Yeung, K. Ravikumar, and S. Nichols, Turbulence ing nonlinear dynamical systems with quantum comput-
simulations on the verge of exascale: Gpu algorithms and ers, arXiv:2202.02188 (2022).
an alternative to long simulations at high resolutions, [15] D. Giannakis, A. Ourmazd, P. Pfeffer, J. Schumacher,
Bull. Am. Phys. Soc., Q49.004 (2022). and J. Slawinska, Embedding classical dynamics in a
[4] J. S. Rood, M. T. Henry de Frahan, M. S. Day, H. Sitara- quantum computer, Phys. Rev. A 105, 052404 (2022).
man, S. Yellapantula, B. A. Perry, R. W. Grout, A. Alm- [16] By this we mean an algorithm that efficiently prepares
gren, W. Zhang, and J. Chen, Enabling Combustion Sci- a quantum state, processes it and outputs a result by
ence Simulations for Future Exascale Machines, Tech. measurement while retaining all or some net quantum
Rep. (National Renewable Energy Lab. (NREL), Golden, advantage.
CO, 2021). [17] A. M. Childs, J.-P. Liu, and A. Ostrander, High-precision
[5] K. P. Iyer, S. S. Bharadwaj, and K. R. Sreenivasan, The quantum algorithms for partial differential equations,
area rule for circulation in three-dimensional turbulence, Quantum 5, 574 (2021).
Proc. Nat. Acad. Sci. 118 (2021). [18] G. T. Balducci, B. Chen, M. Möller, M. Gerritsma, and
[6] D. Buaria and K. R. Sreenivasan, Scaling of accelera- R. De Breuker, Review and perspectives in quantum com-
tion statistics in high Reynolds number turbulence, Phys. puting for partial differential equations in structural me-
Rev. Lett. 128, 234502 (2022). chanics, Frontiers in Mechanical Engineering , 75 (2022).
[7] G. Bell, D. H. Bailey, J. Dongarra, A. H. Karp, and [19] F. Y. Leong, W.-B. Ewe, and D. E. Koh, Variational
K. Walsh, A look back on 30 years of the gordon bell quantum evolution equation solver, Sci. Rep. 12, 10817
prize, J. High Perform. Comput. Appl. 31, 469 (2017). (2022).
[8] M. A. Nielsen and I. L. Chuang, Quantum Computation [20] M. Lubasch, J. Joo, P. Moinier, M. Kiffner, and
and Quantum Information: 10th Anniversary Edition D. Jaksch, Variational quantum algorithms for nonlinear
(Cambridge University Press, 2010). problems, Phys. Rev. A 101, 010301 (2020).
[9] S. Jordan, Quantum algorithm zoo, https://quantum al- [21] F. Gaitan, Finding flows of a Navier–Stokes fluid through
gorithmzoo.org/ (2022). quantum computing, npj Quantum Information 6, 61
[10] D. D. Awschalom, H. Bernien, R. Brown, A. Clerk, (2020).
E. Chitambar, A. Dibos, J. Dionne, M. Eriksson, B. Fef- [22] F. Oz, O. San, and K. Kara, An efficient quantum partial
ferman, G. D. Fuchs, et al., A Roadmap for Quantum In- differential equation solver with chebyshev points, Sci.
terconnects, Tech. Rep. (Argonne National Lab.(ANL), Rep. 13, 7767 (2023).
Argonne, IL (United States), 2022). [23] N. Gourianov, M. Lubasch, S. Dolgov, Q. Y. van den
[11] Y. Alexeev, D. Bacon, K. R. Brown, R. Calderbank, L. D. Berg, H. Babaee, P. Givi, M. Kiffner, and D. Jaksch, A
Carr, F. T. Chong, B. DeMarco, D. Englund, E. Farhi, quantum-inspired approach to exploit turbulence struc-
B. Fefferman, A. V. Gorshkov, A. Houck, J. Kim, S. Kim- tures, Nat. Comput. Sci. 2, 30 (2022).
mel, M. Lange, S. Lloyd, M. D. Lukin, D. Maslov, [24] A. W. Harrow, A. Hassidim, and S. Lloyd, Quantum al-
P. Maunz, C. Monroe, J. Preskill, M. Roetteler, M. J. gorithm for linear systems of equations, Phys. Rev. Lett.
Savage, and J. Thompson, Quantum computer systems 103, 150502 (2009).
for scientific discovery, PRX Quantum 2, 017001 (2021). [25] A. M. Childs, R. Kothari, and R. D. Somma, Quantum
[12] S. S. Bharadwaj and K. R. Sreenivasan, Quantum com- algorithm for systems of linear equations with exponen-
putation of fluid dynamics, Ind. Acad. Sci. Conf. Ser. 3, tially improved dependence on precision, SIAM Journal
77 (2020). Computing 46, 1920 (2017).
19
[26] S. Aaronson, Read the fine print, Nat. Phys. 11, 291 [39] A. G. Rattew and B. Koczor, Preparing arbitrary con-
(2015). tinuous functions in quantum registers with logarithmic
[27] Y. Cao, A. Papageorgiou, I. Petras, J. Traub, and S. Kais, complexity, arXiv:2205.00519 (2022).
Quantum algorithm and circuit design solving the pois- [40] A. C. Vazquez, R. Hiptmair, and S. Woerner, Enhanc-
son equation, New. J. Phys. 15, 013021 (2013). ing the quantum linear systems algorithm using Richard-
[28] A. Montanaro and S. Pallister, Quantum algorithms and son extrapolation, ACM Trans. Quantum Comput. 3, 1
the finite element method, Phys. Rev. A 93, 032324 (2022).
(2016). [41] A. Prakash, Quantum algorithms for linear algebra and
[29] Quantiki, List of quantum simulators, machine learning. PhD Thesis (University of California,
https://quantiki.org/wiki/list-qc-simulators (2023). Berkeley) (2014).
[30] A. S. Green, P. L. Lumsdaine, N. J. Ross, P. Selinger, and [42] P. Pfeffer, F. Heyder, and J. Schumacher, Hybrid
B. Valiron, Quipper: a scalable quantum programming quantum-classical reservoir computing of thermal con-
language, in Proceedings of the 34th ACM SIGPLAN vection flow, Phys. Rev. Research 4, 033176 (2022).
Conference on Programming Language Design and Im- [43] F. Mozafari, G. De Micheli, and Y. Yang, Efficient de-
plementation (2013) pp. 333–342. terministic preparation of quantum states using decision
[31] T. Jones, A. Brown, I. Bush, and S. C. Benjamin, Quest diagrams, Phys. Rev. A 106, 022617 (2022).
and high performance simulation of quantum computers, [44] H. Wolkowicz and G. P. Styan, More bounds for elgen-
Sci. Rep. 9, 1 (2019). values using traces, Linear Algebra Appl. 31, 1 (1980).
[32] A. C. Vazquez and S. Woerner, Efficient state preparation [45] A. Scherer, B. Valiron, S.-C. Mau, S. Alexander,
for quantum amplitude estimation, Phys. Rev. Appl. 15, E. Van den Berg, and T. E. Chapuran, Concrete resource
034027 (2021). analysis of the quantum linear-system algorithm used to
[33] V. Giovannetti, S. Lloyd, and L. Maccone, Architectures compute the electromagnetic scattering cross section of
for a quantum random access memory, Phys. Rev. A 78, a 2d target, Quantum Inf. Process. 16, 1 (2017).
052310 (2008). [46] G. Brassard, P. Hoyer, M. Mosca, and A. Tapp, Quantum
[34] I. M. Georgescu, S. Ashhab, and F. Nori, Quantum sim- amplitude amplification and estimation, Contemporary
ulation, Rev. Mod. Phys. 86, 153 (2014). Mathematics 305, 53 (2002).
[35] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and [47] K. Mitarai, M. Kitagawa, and K. Fujii, Quantum analog-
R. D. Somma, Exponential improvement in precision digital conversion, Phys. Rev. A 99, 012301 (2019).
for simulating sparse hamiltonians, in Proceedings of the [48] S. Wang, Z. Wang, W. Li, L. Fan, G. Cui, Z. Wei, and
Forty-Sixth Annual ACM Symposium on Theory of Com- Y. Gu, Quantum circuits design for evaluating transcen-
puting (2014) pp. 283–292. dental functions based on a function-value binary expan-
[36] D. W. Berry, A. M. Childs, and R. Kothari, Hamilto- sion method, Quantum Inf. Process. 19, 1 (2020).
nian simulation with nearly optimal dependence on all [49] Y. Subasi, R. D. Somma, and D. Orsucci, Quantum algo-
parameters, in 2015 IEEE 56th Annual Symposium on rithms for systems of linear equations inspired by adia-
Foundations of Computer Science (IEEE, 2015) pp. 792– batic quantum computing, Phys. Rev. Lett. 122, 060504
809. (2019).
[37] A. Ambainis, Variable time amplitude amplification and [50] Y. R. Sanders, G. H. Low, A. Scherer, and D. W.
a faster quantum algorithm for solving systems of linear Berry, Black-box quantum state preparation without
equations, arXiv:1010.4458 (2010). arithmetic, Phys. Rev. Lett. 122, 020502 (2019).
[38] L. Grover and T. Rudolph, Creating superpositions that [51] ANGEL, GITHUB REPOSITORY
correspond to efficiently integrable probability distribu- https://github.com/fmozafari/angel (2022).
tions, arXiv:quant-ph/0208112 (2002).