Abstract
Classical Physics-informed neural networks (PINNs) approximate solutions to PDEs with the help of deep neural networks trained to satisfy the differential operator and the relevant boundary conditions. We revisit this idea in the quantum computing realm, using parameterised random quantum circuits as trial solutions. We further adapt recent PINN-based techniques to our quantum setting, in particular Gaussian smoothing. Our analysis concentrates on the Poisson, the Heat and the Hamilton–Jacobi–Bellman equations, which are ubiquitous in most areas of science. On the theoretical side, we develop a complexity analysis of this approach, and show numerically that random quantum networks can outperform more traditional quantum networks as well as random classical networks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Partial Differential Equations (PDEs) describe continuous phenomena, such as the fluid flows or the propagation of waves. They correspond to the mathematical translation of observable, physical, chemical or biological processes. Unfortunately, these equations rarely admit analytical solutions, and need to be discretised on some mesh. This process can be computationally expensive, especially for high-dimensional problems and when unstructured meshes are required, for example to account for local irregular behaviours. This discretised scheme can then be solved using a variety of numerical methods, such as finite elements (FEM), finite differences (FDM) or the finite volume (FVM). However, even these methods can be inefficient for large and complex problems. For example, the solution of the Navier–Stokes equations, describing the motions of a fluid, can require millions of hours of CPU or GPU time on a supercomputer. Another example is the Poisson equation, one of the most important PDEs in engineering, including heat conduction, gravitation, and electrodynamics. Solving it numerically in high dimension is only tractable with iterative methods, which often do not scale well with dimension and/or require specialist knowledge when dealing with boundary conditions or when generating the discretisation mesh.
Neural networks (NNs) are well positioned to solve such complicated PDEs and are already used in various areas of engineering and applied mathematics for complex regression and image-to-image translation tasks. The scientific computing community has applied them PDE solving as early as the 1980 s [20], yet interest has exploded over recent years, due in part to significant improvements in computational techniques and improvements in the formulation of such networks, as detailed and highlighted for example in [4, 21, 32].
Quantum computing is a transformative new paradigm which takes advantage of the quantum phenomena seen at the microscopic physical scale. While significantly more challenging to design, quantum computers can run specialised algorithms that scale better than their classical counterparts, sometimes exponentially faster. Quantum computers are made of quantum bits (or qubits) that, unlike bits in conventional digital computers, store and process data based on two key principles of quantum physics: quantum superposition and quantum entanglement. They characteristically suffer from specific errors, namely quantum errors, which are related to the quantum nature of their qubits. Even if quantum computers of sufficient complexity are not yet available, there is a clear need to understand what tasks we can hope to perform thereon and to design methods to mitigate the effects of quantum errors [29].
Quantum neural networks form a new class of machine learning networks and leverage quantum mechanical principles such as superposition and entanglement with the potential to deal with complicated problems and/or high-dimensional spaces. Suggested architectures for quantum neural networks include [7, 11, 34] and suggest that there might be potential advantages, including faster training. Preliminary theoretical research into quantum machine learning shows that quantum networks can produce a more trainable model [1]. This is particularly relevant to solving PDEs with machine learning as techniques which produce a more favourable loss landscape can drastically improve the performance of these models [13, 18].
In the present work, we suggest a new way of formulating a quantum neural network, translate some classical machine learning techniques to the quantum setting and develop a complexity analysis in the context of specific PDEs (the Heat, the Poisson and an HJB equation). This provides a framework to demonstrate the potential and the versatility of quantum neural networks as PDE solvers.
The paper is organised as follows: Sect. 2 introduces the PINN algorithm and reviews the basics of classical and quantum networks. In Sect. 3, we introduce a novel quantum network to solve specific PDEs and provide a complexity analysis. Finally, we confirm the quality of the scheme numerically in Sect. 4.
Notations. We denote by \({\mathcal {C}}^{n}(\Omega ,{\mathbb {R}})\) the space of n times differentiable functions from \(\Omega \) to \({\mathbb {R}}\), by \(L^p(\Omega )\) the space of functions with finite \(L^p\) norm and define the Sobolev space \({\mathcal {W}}^{\alpha ,\beta }(\Omega ): = \left\{ f \in L^\beta (\Omega ): \nabla ^s f \in L^\beta (\Omega ) \text { for all }|s| \le \alpha \right\} \), where \(\nabla ^s f\) refers to the weak derivative of order s. Similarly, we let \({\mathcal {W}}_0^{\alpha ,\beta }\) be the subspace of functions in \({\mathcal {W}}^{\alpha ,\beta }(\Omega )\) that vanish in the trace sense on the boundary of \(\Omega \). We use \(\Vert \cdot \Vert \) to refer to the Euclidean norm.
2 Main tools
2.1 PINN algorithm
The Physics-informed neural network (PINN) algorithm relies on the expressive power of neural networks to solve PDEs. Let \(\Omega \subset {\mathbb {R}}^{d}\) be a bounded Lipschitz domain with boundary \(\partial \Omega \), \({\mathscr {F}}: {\mathcal {C}}^{K}(\Omega ,{\mathbb {R}}) \rightarrow {\mathcal {C}}(\Omega ,{\mathbb {R}})\) a differential operator of order at most K, \(E \subset \partial \Omega \) and \(h: E \rightarrow {\mathbb {R}}\). The goal is to estimate the solution \(u: \Omega \rightarrow {\mathbb {R}}\) to the PDE
Let \(u_\Theta : \Omega \rightarrow {\mathbb {R}}\) be a neural network at least K times continuously differentiable parameterised by some set \(\Theta \). We assume access to two datasets: independent and identically distributed (iid) random vectors \(\{X^{(e)}_i\}_{i=1,\ldots ,n_e}\) with known distribution \(\mu _E\) on E and iid random vectors \(\{X^{(r)}_i\}_{i=1,\ldots , n_r}\) with known distribution \(\mu _\Omega \) on \(\Omega \). We then minimise the empirical loss function
over the set \(\Theta \) using a hybrid quantum-classical training loop, where \(\lambda _e >0\) is a hyper-parameter allowing one to balance the two loss components. This training loop evaluates all \(u_\Theta (\cdot )\) values on a quantum computer before feeding the values to a classical computer for use in classical optimisation routines. As shown in [5] (Proposition 3.2 and the associated discussion) minimising (2.1) does not necessarily imply anything about the mean squared error \(||u - u_\Theta ||^2_2\).
2.2 Gaussian smoothing
In [16], the authors investigated Gaussian smoothing the output of a classical neural network for use as a PDE trial solution, providing a simpler expression (as an expectation) for all derivatives. Consider indeed a trial solution of the form
for some \(\sigma >0\), where u is the output of a neural network. Then, assuming u measurable, all derivatives can then be written as follows:
Theorem 2.2.1
([16, Theorem 1]) For any measurable function \(u: {\mathbb {R}}^{d} \rightarrow {\mathbb {R}}\), the function f defined in (2.2) is differentiable and the following holds for all \(x\in {\mathbb {R}}^d\):
Theorem 2.2.1 implies that an (unbiased) estimate for the gradient can be computed for example by Monte Carlo, for example as
using K iid Gaussian \({\mathcal {N}}(0,\textrm{I}\sigma ^{2})\) samples \((\delta _k)_{k=1,\dots ,K}\). This can be improved using a combination of antithetic variable and control variate techniques (see for example [9, Chapter 4] for a thorough overview) resulting in the new estimator
This method can easily be extended to derivatives of any order with recursion, for example for the Laplacian:
In fact, the function f defined in (2.2) is Lipschitz continuous:
Theorem 2.2.2
Let \(u:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) be measurable. The map \({\mathbb {E}}_{\delta \sim {\mathcal {N}}(0,\textrm{I}\sigma ^{2})}[u(\cdot + \delta )]\) is Lipschitz with respect to any arbitrary norm. In particular with the 2-norm, it is \(\frac{{\mathfrak {u}}}{\sigma }\sqrt{\frac{2}{\pi }}\) Lipschitz, with \({\mathfrak {u}}= \sup _{x \in {\mathbb {R}}^d} |u(x)|\).
Proof
This statement is proved in [16] for the \(\ell _2\)-norm, and it is easy to extend to any arbitrary norm. Since f is differentiable by Theorem 2.2.1, it is well known that its Lipschitz constant \(L^{\alpha }(f)\) in the \(\alpha \)-norm can be obtained as
with \(\Vert z\Vert _{\alpha }^{*}:= \sup _{\Vert y\Vert _{\alpha }\le 1}y^\top z\) the dual norm of \(\Vert \cdot \Vert _\alpha \). Using Theorem 2.2.1, we can write
When \(\alpha = 2\), the upper bound becomes \(\frac{{\mathfrak {u}}}{\sigma } \sqrt{\frac{2}{\pi }}\) and the theorem follows. \(\square \)
As concluded in [16], this Lipschitz constant restriction is not particularly limiting in the classical setting since small values of \(\sigma \) can be used. However, in the quantum setting small values of \(\sigma \) introduce a large error if not enough shots are used. Specifically, consider the parameterised expectation (detailed in (2.3))
On actual quantum hardware we can only obtain an estimate \({\widetilde{g}}(x)\) of g(x) using a finite number of shots, and we denote
the (pointwise) inaccuracy, which is a random variable, since \({\widetilde{g}}(x)\) is an empirical estimator. This framework allows for error from both noisy circuits and estimating expectations using a finite number of shots. Since the map g is bounded and the numbers of shots and qubits are finite, then there exists a constant \(\varepsilon _f > 0\) such that \(| \varepsilon (x)| \le \varepsilon _f\) uniformly over \(x \in {\mathbb {R}}^d\). Define \(G(x):= {\mathbb {E}}[g(x+ \delta )]\) and \({\widetilde{G}}(x):= {\mathbb {E}}\left[ {\widetilde{g}}(x+ \delta )\right] \), where \(\delta _{\sigma } \sim {\mathcal {N}}(0,\textrm{I}\sigma ^2)\), the following lemma provides a bound for the distance between the gradients of these two functions:
Lemma 2.2.3
The following bound holds for the Euclidean norm:
Proof
From Theorem 2.2.1, we can write, for any \(x\in {\mathbb {R}}^d\),
We use Jensen’s inequality to take the norm inside of the expectation. The penultimate inequality follows since the Euclidean norm of the Gaussian is a Chi distribution, and the last inequality is Gautschi’s [31, Equation (6)]. \(\square \)
Therefore we see that decreasing the parameter \(\sigma \) increases the impact of quantum induced sampling error. Similar reasoning can be applied to finite difference based methods, and we refer the reader to [2, Section 2.1] for a general review of finite differences for gradients with error.
2.3 Random classical networks
We call ‘random classical network’ a single-hidden-layer feedforward neural network with randomly generated internal weights, where only the last layer of weights and hyperparameters is optimised over. Such random networks have previously been successfully applied to solving different types of (high-dimensional) PDEs [10, 17, 24, 30]. For the Black-Scholes-type PDE (similar to the heat equation), Gonon [10] provided a full error analysis of the prediction error of random neural networks. Specifically, let \(N \in {\mathbb {N}}\) be the number of hidden nodes, \({\varvec{B}}= (B_1,\cdots ,B_N)\) an iid sequence of real-valued random variables and \({\varvec{E}}=(E_1,\cdots ,E_N)\) another iid sequence of random variables in \({\mathbb {R}}^{d}\), independent of \({\varvec{B}}\). For a vector \({\varvec{W}}\in {\mathbb {R}}^N\), define the random function
where we consider the nonlinear activation function \(\varrho (x) = \max (0,x)\). The vector \({\varvec{W}}\) then represents the trainable parameters, while \({\varvec{E}}\) and \({\varvec{B}}\) are sampled from some prior distribution and frozen. These random networks are considerably easier to train than traditional fully connected deep neural networks, especially in a supervised learning context, where training reduces to a linear regression. The PINN algorithm is not a supervised learning approach, and this therefore does not apply; however it does reduce the number of trainable parameters, and hence the computational burden. In [12], Gonon, Grigoryeva and Ortega proved that, as long as the unknown function is sufficiently regular, it is possible to draw the internal weights of the random network from a generic distribution (not depending on the unknown object) and quantify the error in terms of the network architecture.
2.4 Quantum neural networks
2.4.1 Quantum neural networks
Using a quantum network for the PDE trial solution in the PINN algorithm was first proposed by Kyriienko, Paine and Elfving [19]. Essentially, the spatial variable x is embedded into a quantum state via a unitary operator \(\texttt{U}(x)\) (referred to as the feature map), then a parameterised unitary operator \(\texttt{U}_\Theta \) (independent of x) is applied before producing the output of the network by taking the expectation of a Hermitian cost operator \({\mathcal {C}}\):
and the parameters \(\Theta \) are found by minimising some loss function, for example (2.1). In [19], the authors suggested that increasing the number of qubits or using cost operators with more complex Pauli decompositions could produce more expressive networks. Preliminary research [26,27,28] has shown the potential of parameterising the feature maps and repeated application of unitary operators, leading to the more general formulation
for some positive integer L, where \(\Theta = \{ \theta _1,\omega _1,\cdots ,\theta _L,\omega _L\}\) is the set of all hyperparameters. In particular, the authors in [28] showed that one-qubit networks can act as universal approximators for bounded complex continuous functions or integrable functions with a finite number of finite discontinuities.
2.4.2 Random quantum networks
Consider a system with n qubits. Let \(\varvec{A}: {\Omega } \rightarrow {\mathbb {C}}^{2^n \times 2^n}\) be a random function which maps the spatio-temporal variable x to a unitary matrix and \(\varvec{\Lambda } \in {\mathbb {C}}^{2^n \times 2^n}\) a random unitary matrix distributed according to the Haar measure. Then for a suitable set of parameters \(\Theta \), we define the random function \(u^{\varvec{\Lambda },\varvec{A}}_\Theta : \Omega \rightarrow {\mathbb {R}}\),
where \({\mathcal {C}}\) is a Hermitian cost operator. When using this random quantum neural network to approximate u we generate \(\varvec{A}\) and \(\varvec{\Lambda }\), consider them fixed and train the parameters \(\Theta \). Specific examples of \(\varvec{A}\) are given in Sect. 3.1.
Remark 2.4.1
In practice, one may encode the data x only through \(\varvec{A}\) and leave \(\texttt{U}_{\Theta }\) independent of x. We leave the current formulation as is, allowing for more generality.
2.4.3 Optimised parameter shift
When differentiating parameterised expectation values we apply the family of parameter shift rules discussed by Mari, Bromley and Killoran [23]. In quantum computing, one-qubit rotation gates can be written as \(\exp \left\{ -\frac{\textrm{i}x}{2} \texttt{G}\right\} \) for some unitary matrix \(\texttt{G}\) and some real number x. We shall require here a slightly modified version:
Definition 2.4.2
For \(x\in {\mathbb {R}}\), the matrix \(\texttt{M}(x)\) is a single-qubit rotation-like gate if
for some complex-valued involutory generator matrix \(\texttt{G}\) (satisfying \(\texttt{G}^2 = \textrm{I}\)).
The matrix \(\texttt{M}(x)\), for x real, represents a rotation of angle x around the axis spanned by \(\texttt{G}\). Where an arbitrary single-qubit gate can be written using a generator matrix that is Hermitian, therefore here we are restricting to a smaller set of quantum gates.
We shall use these single-qubit rotation-like gates to construct an approximation \(u^{\varvec{\Lambda },\varvec{A}}_\Theta \). The following lemma allows us to decompose the unitary conjugation \(\texttt{M}(x)^{\dagger }\, {\mathcal {C}}\, \texttt{M}(x)\) to the sum of three easy-to-compute terms, as mentioned in [23, Equation (5)], but without proof:
Lemma 2.4.3
For any \(x\in {\mathbb {R}}\), the identity \(\texttt{M}(x)^{\dagger }{\mathcal {C}}\texttt{M}(x) = \texttt{A}+ \texttt{B}\cos (x) + \texttt{C}\sin (x)\) holds for any single-qubit rotation-like gate \(\texttt{M}(x)\) with
Proof
Let \(x\in {\mathbb {R}}\). Since \(\texttt{M}(x)\) is a rotation-like gate as in Definition 2.4.2 with involutory generator \(\texttt{G}\), then it is trivial to show that
Therefore
and the lemma follows. \(\square \)
Consider the function \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) defined as
given some single-qubit rotation-like matrix \(\texttt{M}\) as in Definition 2.4.2. Clearly g is infinitely differentiable, and it is furthermore periodic by Lemma 2.4.3. Denote its partial derivatives
Applying the family of parameter shift rules [23, Equation (9)] results in
for any \(s \in {\mathbb {R}}\setminus \{k\pi :k\in {\mathbb {Z}}\}\), where \({\varvec{e}}_j\) is a unit vector along the \(x_j\) axis. Iteratively applying this rule with the same shift results in
for any \(x\in {\mathbb {R}}\). For \(j_1 = j_2\) and using the value \(s=\frac{\pi }{2}\) reduces this to
and for \(s = \frac{\pi }{4}\) we obtain
While (2.5) only requires the evaluation of two expectation values compared to the three of (2.6), the latter has the distinct advantage of providing both the derivatives \(g_{j,j}\) and \(g_{j}\). For the complexity analysis we apply either (2.5) or (2.6) depending on which one is more efficient for the chosen PDE. The optimised parameter shift rules above are covered in [23], but simple yet tedious computations can provide higher other derivatives if needed, as shown in the following:
Theorem 2.4.4
Let g be any function of the form (2.3) with \(\texttt{M}\) a single-qubit rotation-like gate and \({\mathcal {C}}\) a Hermitian cost operator. For any \(d \ge 2\) and all \(x \in {\mathbb {R}}\),
Proof
Repeated applications of the parameter shift rule (2.4) with the same basis vector and the same shift magnitude for each shift results in
The first and last terms in the expansion are
By periodicity of our expectation function, for d even this reduces to \(\frac{2g(x+\pi {\varvec{e}}_j)}{(2\sin (\pi /d))^d}\), whereas when d is odd the first and last terms in the expansion cancel. \(\square \)
We see that we have the ability to find a \(d^{\text {th}}\) order unmixed partial derivative with just d evaluations of the parameterised expectation. Quantum automatic differentiation engines can be built using a combination of the chain rule and the above formulae where the argument of each expectation is calculated on a classical computer, the expectation is calculated on a quantum computer before the result is fed back to the classical computer to compile the different values in the correct way to construct the derivative. This suggested method is in sharp contrast to the automatic differentiation procedures carried out on classical neural networks where different orders of derivatives can only be calculated sequentially. For traditional automatic differentiation the algorithm has to first build the computational graph for first-order derivatives and then perform back-propagation on the graph for the second-order derivatives before this process is repeated. Note that previous publications such as [19] suggested repeated applications of the basic parameter shift rule which does not benefit from the computational acceleration discussed above.
3 Random network architecture and complexity analysis
3.1 Random network architecture
We assume the input data x is scaled to the interval [0, 1]. By repeated applications of the so-called UAT (Universal Approximation) gate, one-qubit circuits have the ability to approximate any bounded complex function on the unit hypercube [28]. This UAT gate is defined as
with \(\Theta = (\varphi ,\gamma ,\alpha ) \in {\mathbb {R}}\times {\mathbb {R}}^d\times {\mathbb {R}}\). This was extended in [26] to multiple qubits by applying the UAT gate to each qubit followed by an entangling layer. We create the random network’s trainable layer using this UAT gate as well as an entangling layer, which can be repeated several times. For the entangling layer we choose the sequence
of controlled NOT gates, where \(\texttt{C}_{\texttt{NOT}}(q_i,q_j)\) denotes the controlled NOT gate acting on qubit \(q_j\) with control qubit \(q_i\). The whole trainable M-layer circuit then reads
and we write \(\Theta = \{\Theta _{i,j}\}_{0 \le i \le n-1, 1 \le j \le M }\). For the ‘encoding’ matrix \(\varvec{A}(x)\) introduced in Sect. 2.4.2, we use
where \(({\mathcal {X}}_j)_{j=1,\ldots ,n}\) are iid \({\mathcal {U}}(0,1)\) and an index \({\mathfrak {g}}_{j}:= 1 + (j+1 \mod d)\) with nonlinear activation function \({{\,\mathrm{\textrm{acos}}\,}}(\cdot )\). The constants inside \({{\,\mathrm{\textrm{acos}}\,}}(\cdot )\) and the constant \(\pi \) inside \(\texttt{R}_{z}\) are justified experimentally as arguments of \(\texttt{R}_{z}\) away from \(\pm \pi \) lead to better results. Such a choice of encoding matrix is inspired from [19], who show that the resulting matrix \(\varvec{A}(x)\) can then be written as a Chebyshev polynomial, the order of which increases with n, known for approximating well nonlinear functions. Alternative choices for the activation function can be considered with minor modifications as is standard in classical NN literature [6]. However, based on the theoretical reasoning provided above and corroborated by numerical experiments, we found that \({{\,\mathrm{\textrm{acos}}\,}}\) typically produced better results when compared to other approaches.
Finally, we use a specific Ising Hamiltonian with transverse and longitudinal magnetic fields and homogeneous Ising couplings for the cost operator:
where \(\texttt{X}\) and \(\texttt{Z}\) are the usual one-qubit Pauli gates and the index refers to the qubit index they act upon. This is a relatively complex cost operator, allowing us to approximate a wide class of functions, better than the simpler cost operator \(\sum _j \widehat{\texttt{Z}}_j\). Therefore the output of the quantum network is
3.2 Complexity analysis
Define the auxiliary variables \(\alpha _i\) and \(\alpha _{i,j}\) as the number of single-qubit rotation-like gates respectively with \(x_i\) dependence and with \(x_i\) and \(x_j\) dependence in the circuit for \(u_\Theta \). These are not uniquely defined values since the decomposition into single-qubit rotation-like gates is not unique. Given the loss function (2.1), define the quantity \(\xi ({\mathcal {L}}_{n_e,n_r})\), which returns the total number of \(u_\Theta \) evaluations needed to calculate the loss function \({\mathcal {L}}_{n_e,n_r}\), and which we will use as our complexity metric. Let \(N_{\Theta }\) be the total number of components in \(\Theta \). It may seem pertinent to include in this metric the cost of computing all the derivatives \((\partial _{\Theta _i}{\mathcal {L}})_{i=1,\ldots ,N_{\Theta }}\) (for example for the optimisation part); this is however unnecessary since the exact number of \(u_\Theta \) evaluations to do so is given by \(2N_{\Theta }\xi ({\mathcal {L}})\).
Note that the application of an efficient automatic differentiation engine to a classical neural network will almost surely result in a more efficient scaling of \(\xi ({\mathcal {L}})\) with respect to the number of trainable parameters. For example, given a function mapping \({\mathbb {R}}^n\) to \({\mathbb {R}}\) it takes at most c evaluations of the function to calculate the entire Jacobian matrix, where c is guaranteed to be under 6 and is typically between 2 and 3. [14].
3.3 The p-Laplace equation
Consider the p-Laplace (\(1< p < \infty \)) equation with Dirichlet boundary conditions:
where \(u \in {\mathcal {W}}_0^{1, p}(\Omega ) \cap L^p(\Omega )\), \(f \in L^{\frac{p}{p-1}}(\Omega )\) and h the trace of some \({\mathcal {W}}_0^{1, p}(\Omega ) \cap L^p(\Omega )\) function. Where the p-Laplace operator reads
For the variational formulation we have [3, Lemma (2.3)]Footnote 1
We refer the reader to [8] for a reference on the weak formulation of PDEs. To translate the variational statement (3.7) to a loss function, we add a penalisation term for the boundary conditions and approximate the integral along the sample points, resulting in the empirical loss function
This idea of minimising a functional to solve PDEs using neural networks has previously been studied for Poisson equations [25, 33]. The cases \(p\ne 2\) and \(p=2\) need to be studied separately since the mixed second-order partial derivatives of the p-Laplace operator (3.6) cancel when \(p=2\).
3.3.1 The \(p \ne 2\) case
Lemma 3.3.1
For the prototypical PINN loss function (2.1), we have
provided the second-order partial derivatives all commute, namely when
Without any assumptions on the partial derivatives, we have
Proof
Consider decomposing the quantum circuit (3.1) responsible for producing \(u_\Theta (x)\) into one with just single qubit rotation-like gates and CNOTs. Assuming p single-qubit rotation gates and with a slight abuse of notation,
where each \(\phi _i\) is the rotation angle for a particular single qubit rotation-like gate. Note that this decomposition is clearly not unique, as one could choose a decomposition based on which has the lowest \(\xi \) value. Basic applications of the chain rule then yield
We then split the sum over a, b up resulting in
In the first summation there are \((\alpha _i \alpha _j - \alpha _{i,j})\) terms, so applying the standard parameter shift rule equation (2.4) twice results in \(4(\alpha _i \alpha _j - \alpha _{i,j})\) evaluations of \(u_\Theta \). In the second summation there are \(\alpha _{a,b}\) terms, so the optimised parameter shift rule (2.6) results in \(3\alpha _{i,j}\) evaluations of \(u_\Theta \). We use this parameter shift rule as it provides all of the quantum gate partial derivatives needed for the third summation too.
For the p-Laplace operator (3.6), we require all mixed partial derivatives, that is the derivatives for all pairs (i, j); as a result the number of operators needed to evaluate all these derivatives of \(u_\Theta \) is equal to
where we have assumed the derivatives do not necessarily commute. Assuming they do, we then only need to sum over pairs with \(i \le j\), as
Note that the \(n_r\) factor comes from the number of times the residual must be evaluated; since boundary data appears with no derivatives, each sample then only requires one \(u_\Theta \) evaluation. \(\square \)
Lemma 3.3.2
For the variational loss function formulation (3.8),
Proof
The minimisation statement that arises from the variational formulation (3.7) involves both the gradient and the function value for each sample point. Using the same decomposition as before (3.9), the chain rule (3.10a) and the basic parameter shift rule (2.4) the gradient involves exactly \(2\sum _{i=1}^{d} \alpha _i\) evaluations of \(u_\Theta \). The function value results in only one \(u_\Theta \) evaluation. This is done for each sample point inside the domain resulting in
The extra \(n_e\) term in (3.11) is a result of the boundary penalisation term in (3.8). \(\square \)
Lemma 3.3.3
If the PDE trial solution is the Gaussian smoothed output of a quantum network then, using K classical samples of Gaussian noise,
Proof
When the output of the quantum network is Gaussian smoothed (2.2), the Hessian matrix \(\varvec{H} f_\Theta \) can be calculated using
with \(\delta \sim {\mathcal {N}}\left( 0, \sigma ^2 \textrm{I}\right) \), as proven in [16]. The general variance of this estimator can be reduced by applying the control variate and antithetic variable method. For the additive control variate we use \(u_\Theta (x)\) as a baseline which turns the estimate (3.13) into
just as authors do in [16]. Notice the estimate in (3.13) and (3.14) is invariant when \(\delta \) is substituted with \(-\delta \), averaging this new estimator with the one in (3.14) results in
The same technique can be applied to the estimator for the gradient resulting in
There are a total of five \(u_\Theta \) terms in both expressions, the boundary term will feature a single \(f_\Theta \) term for each sample point. Assuming we approximate each classical expectation with K iid Gaussian samples the expression (3.12) follows. \(\square \)
Lemma 3.3.4
We can combine a Gaussian smoothed trial solution with the variational loss function reformulation (3.8) and (3.7) resulting in
Proof
For each sample point the variational formulation (3.7) only involves a gradient term and the function value and, proceeding as in the proof of Lemma 3.3.3, this produces 3K evaluations of \(u_\Theta \), where we again assume K iid Gaussian samples. Once more the boundary penalisation element shown in (3.8) produces the \(Kn_e\) term. \(\square \)
3.3.2 The \(p=2\) case
The \(p=2\) Laplace equation reduces to the Poisson equation, and the following holds:
Lemma 3.3.5
For the Poisson equation, the complexity metrics for the prototypical loss formulation (2.1) and the loss function using the variational form (3.8) read
Proof
The proof of \(\xi \left( {\mathcal {L}}^{\text {Var}}_{n_e,n_r}\right) \) is the exact same as the case with \(p \ne 2\). The first statement’s proof is very similar to that in Lemma 3.3.1 however we include it for the sake of completeness. Using the same decomposition as in Lemma 3.3.1 we have
For the first term we apply the basic parameter shift rule twice to the \(\alpha ^2_i - \alpha _i\) terms resulting in \(4(\alpha ^2_i - \alpha _i)\) evaluations of \(u_\Theta \). For the second term we apply the optimised parameter shift rule given in (2.6), producing \(3\alpha _i\) evaluations of \(u_\Theta \) and also providing all of the quantum derivatives needed for the third term. Summing over all i,
We then add the boundary term and multiply by the relevant sample sizes to produce the original statement. \(\square \)
Lemma 3.3.6
If the PDE trial solution is Gaussian smoothed, then
Proof
The Gaussian Laplacian term is given by [16, Equation (11)]
There are a total of three \(u_\Theta \) terms in (3.16), the boundary term will feature a single \(f_\Theta \) term for each sample point. Assuming we approximate each classical expectation with K iid Gaussian samples the expression (3.15) follows. \(\square \)
3.4 Heat equation
We now consider the heat equation, solution to the system
Similarly to above, we define \(\alpha _t\) to be the number single qubit rotation-like gates with time dependence.
Lemma 3.4.1
The complexity with the loss function (2.1) reads
If the network is Gaussian smoothed then
Proof
For the first statement we apply the working from proof of the p-Laplace with \(p=2\) followed by the basic parameter shift rule for \(\partial _{t}u_\Theta \) which produces the \(2\alpha _t\) factor. For the Gaussian statement we use the Laplacian term in (3.16) and the corresponding variance reduced equation reads
for the time derivative. Counting terms we arrive at (3.17). \(\square \)
Since the other components of the Gaussian noise in (3.18) all return first-order derivatives, then
3.5 Hamilton–Jacobi equation
Consider the classical linear-quadratic Gaussian (LQG) control problem in d dimensions, with the associated HJB equation given by [15]
with \(\mu \in {\mathbb {R}}\). Similarly to the p-Laplace case above, we have the following:
Lemma 3.5.1
In this HJB framework, the complexity with the standard loss function (2.1) reads
which simplifies, for a Gaussian smoothed network, to
Proof
We apply the same decomposition and parameter shift applications as is done in Lemma 3.3.5 and note that doing so provides all of the quantum expectations needed to calculate \(\nabla _x u_\Theta \) too. The basic parameter shift rule is applied to find \(\partial _{t}u_\Theta \), after multiplying each term by the number of relevant samples and adding the boundary element we recover (3.20). For the Gaussian expression we apply (3.20) and (3.16), counting terms adding the boundary term and multiplying by the relevant sample sizes produces (3.21). \(\square \)
3.6 Explicit comparison
With the random network introduced in Sect. 3.1, assuming that the number n of qubits is a multiple of the problem dimension d, we then have, for each \(i, j \in 1,\cdots ,d\),
For this particular class of networks, all second-order partial derivatives commute. For the p-Laplace equation with \(p \ne 2\) and without Gaussian smoothing the complexity in Lemma 3.3.1 reads
Let \({\mathcal {L}}^{\text {Var, Smooth}}_{n_e,n_r}\) represent the loss function (3.8) using Gaussian smoothing (2.2) in its variational form (3.7). In this case, we have
Fix two final UAT trainable layers (\(M=2\)) and let the number of qubits be three times the problem’s dimension (\(n=3d\)). The number of evaluations then reduces to
While still polynomial we clearly have better scaling in the dimension for the variational formulation. We in particular have \(\xi ({\mathcal {L}}_{n_e,n_r}) > \xi \left( {\mathcal {L}}^{\text {Var}}_{n_e,n_r}\right) \) for all values of \(d \in {\mathbb {N}}\). In [16], the authors find good experimental results using values of \(K\in [256,2048]\) for classical PINNs, so we now fix \(K = 1024\). For \(n_r = n_e\), we have
For the p-Laplace equation, Gaussian smoothing is thus more efficient as the dimension grows.
Remark 3.6.1
Using the complexity analysis in the previous sections, a similar comparison analysis is straightforward for the p-Laplace equation with \(p=2\) as well as for the HJB and the Heat equations.
3.7 Expressive power of smoothed networks
From Theorem 2.2.2, when the output of the quantum network is Gaussian smoothed, the resulting PDE trial solution is \(\frac{{\mathfrak {u}}}{\sigma }\sqrt{\frac{2}{\pi }}\) Lipschitz, where \({\mathfrak {u}}= \sup _{x \in {\mathbb {R}}^d} |u_\Theta (x)|\) with respect to the 2-norm. To derive an upper bound for \({\mathfrak {u}}\) for any parameter set it suffices to consider the range of the expectation for the Hermitian cost (3.4) operator
using the Ising Hamiltonian (3.3)
resulting in a Lipschitz constant of \(\frac{3n}{\sigma }\sqrt{\frac{2}{\pi }}\) for the PDE trial solution, unlikely to be the best Lipschitz constant. For example for a single qubit it is easy to see that the expectation of \(\texttt{Z}\) will not be 1 when the expectation of \(\texttt{X}\) is 1.
3.8 Lipschitz constant for the heat equation
3.8.1 Heat equation with small Lipschitz constant
Consider the heat equation defined as
with \({\mathcal {B}}_{0,1}\), the unit ball in \({\mathbb {R}}^d\). The solution is \(u(t,x) = t + \frac{\Vert x\Vert ^2}{2d}\), with Lipschitz constant
In our formulation, \(n \le d\), where n is the number of qubits and the data lies in \({\mathbb {R}}^d\). Therefore, the Lipschitz constant \(L^{\alpha }(u)\) is smaller than that of the PDE trial solution, \(\frac{3n}{\sigma }\sqrt{\frac{2}{\pi }}\), obtained in Sect. 3.7 as long as \(\sigma \le 3n\sqrt{\frac{2}{\pi }}(1+\frac{1}{d})^{-1}\). Good experimental results with \(\sigma \le 0.1\) were outlined in [16], which clearly satisfies the condition for all integers n and d.
3.8.2 Heat equation with large Lipschitz constant
For an example of PDE where this Gaussian method is intractable with quantum networks, consider the heat equation
for \(x \in [0,1]^d\) and \(t \in [0,T)\), with Dirichlet boundary conditions, and consider the solution given by
for \(x=(x_1, \ldots , x_d)\), and \(\nabla _{t,x}\) the gradient with respect to both \(t \text { and } x\) we have the Lipschitz constant upper bound
where \(\varvec{e}_{j}\) denotes the unit vector in \({\mathbb {R}}^d\) with 1 for component j and 0 otherwise. With \(d = 50\) and \(a = 1\), the resulting Lipschitz constant is greater than 22360. Comparing this to the Lipschitz restriction found earlier, if we use a value of \(\sigma = 0.1\) we would need more than 900 qubits for the smoothed PDE trial solution.
4 Numerical examples
The quantum state simulation is performed using the Yao package for Julia [22]. We use the random quantum network introduced in Sect. 3.1 with a varying number of qubits, which we compare to the random classical network developed by Gonon [10]: let \(E_1\) be \(t_5(0, \textrm{I}_d)\) (multivariate t-distribution) random variable and \(B_1\) a Student t-distribution with two degrees of freedom. At each training iteration we uniformly sample new points from \(\Omega \) and \(\partial \Omega \). We train solutions using both classical and quantum networks. Due to the random nature of the networks we repeat each training process five times, and all training statistics reported below are mean values.
4.1 Poisson equation
Consider the Poisson equation (3.5) with \(p=2\), \(\Omega = (0,1)^2\) and
so that the explicit solution reads
Alongside quantum and classical networks, we also investigate the two loss functions \({\mathcal {L}}^{\text {Var}}\) and \({\mathcal {L}}\). To demonstrate the effectiveness of the Haar random operator we also train solutions using \(\varvec{\Lambda } = \textrm{I}\). We use six qubits, detailed training information and network settings are shown in Table A. Final relative \(L_2\) errors of the trial solutions are shown in Table 1 alongside other metrics. Regardless of the loss function used the full random quantum networks outperform all of the random classical networks. In Table 1 we see that the performance of the classical networks improves when the number of nodes increases, notably the full random quantum network has 24 trainable parameters yet it outperforms the classical network with more than 4 times the number of trainable parameters. We also see that for both classical and quantum networks the variational formulation loss functions produces trial solutions with approximately the same final \(L_2\) relative error.
The addition of the Haar random operator \(\varvec{\Lambda }\) has little impact on the training complexity since it is fixed. However, it greatly improves the final \(L_2\) relative error, for example reducing the error for the variational loss from 0.575 to 0.040. Figure 1 shows the squared error \(|u_\Theta -u|^2\) over the domain of \(x_1\) for \(x_2 \in \{0.25,0.5,0.75\}\). The solid blue line indicates the average value for the quantum network with the shaded blue area representing all error values from the five runs. For the classical networks we plot the network with the lowest overall error of the five. Figure 2 shows snapshots of the trial solution against the analytic solution. The five quantum networks display similar behaviour over the intervals when compared to each other.
Squared error values \(|u_\Theta (x) -u(x)|^2\) of the trial solutions for the Poisson equation in \({\mathbb {R}}^2\) with \(x_1\) on the horizontal axis and for \(x_2 \in \{0.25,0.5,0.75\}\). The parameter N refers to the number of nodes in the random classical network and \(n=6\) is the number of qubits in the random quantum network. For the classical networks we plot the solution with the lowest overall mean squared error, whereas for the quantum networks we plot the average squared error with a ribbon indicating the range of error for the five different training runs
4.2 p-Laplace with \(p=3\)
Consider the Poisson equation (3.5) with \(p=3\), \(\Omega = (0,1)^2\) and
so that the explicit solution reads
Alongside quantum and classical networks, we also investigate the two loss functions \({\mathcal {L}}^{\text {Var}}\) and \({\mathcal {L}}\). We use six qubits, with detailed training information and network settings shown in Appendix A. The quantum network has 18 trainable parameters. Training results are shown in Table 2. It can be seen that the quantum network based solutions are considerably more accurate than the classical solutions using either \({\mathcal {L}}^{\text {Var}}\) or \({\mathcal {L}}\). Note, however, that the classical networks’ loss function values were still decreasing at the end of training.
Figure 3 shows the squared error \(|u_\Theta -u|^2\) over the domain of \(x_1\) for \(x_2 \in \{0.25,0.5,0.75\}\). The solid blue line indicates the average value for the quantum network with the shaded blue area representing all error values from five runs. For the classical networks we plot the network with the lowest overall error of the five runs. Figure 4 shows selected snapshots of the trial solution against the analytic solution.
Squared error values \(|u_\Theta (x) -u(x)|^2\) of the trial solutions for the p-Laplace equation with \(p=3\) in \({\mathbb {R}}^2\) with \(x_1\) on the horizontal axis and for \(x_2 \in \{0.25,0.5,0.75\}\). The parameter N refers to the number of nodes in the random classical network and \(n=6\) is the number of qubits in the random quantum network. For the classical networks we plot the solution with the lowest overall mean squared error, whereas for the quantum networks we plot the average squared error with a ribbon indicating the range of error for the five different training runs
4.3 Heat equation
We consider the heat equation (3.22) with the solution (3.23) and \(T = 1\), \(a=0.25\). We solve with \(d=1,2\) using 4 and 6 qubits, respectively. Specific training settings are shown in Appendix B and detailed training results can be found in Tables 3 and 4. For \(d=1\), the quantum network has a lower average MSE than the average values for the classical networks. Specifically, the classical network with the same number of trainable parameters has an MSE an order of magnitude larger. We also train classical networks with both more and fewer trainable parameters and see that the quantum networks outperform all the classical ones. For \(d=2\), the quantum network has 60 trainable parameters and outperforms all the random classical networks with less than 70 parameters. There is a large boost in final MSE when the classical random network has more than 80 nodes; compared to the previous two examples, this could suggest that our quantum network lacks the expressivity needed or more training iterations are required for accurate quantum networks. In Fig. 5, we plot the squared error values \(|u_\Theta -u|^2\) over the domain of \(x_1\) and at values of \(x_2 \in \{0.25,0.5,0.75\}\). The solid blue line indicates the average value for the quantum network with the shaded blue area being a ribbon that covers all error values from the five runs. For the classical networks we plot the network with the lowest overall error of the five. In Fig. 6 we plot snapshots of the trial solution against the analytic solution.
4.4 Hamilton–Jacobi equation
We use the specific HJB equation (3.19) with the unique solution [15]
Due to the domain of x being \({\mathbb {R}}^{d}\) we require a different form of the encoding matrix, as a result we change definition (3.2) to
Squared error values \(|u_\Theta (x) -u(x)|^2\) of the trial solutions for the Heat equation with \(d=2\) and \(x_1\) on the horizontal axis and for \(x_2 \in \{0.25,0.5,0.75\}\). The parameter N refers to the number of nodes in the random classical network and \(n=4\) is the number of qubits in the random quantum network. For the classical networks we plot the solution with the lowest overall mean squared error, whereas for the quantum networks we plot the average squared error with a ribbon indicating the range of error for the five different training runs
We use the activation function \(\tanh \) due to it’s performance in traditional machine learning applications. We solve the specific HJB equation (3.19) with \(d = 2\), \(\mu = 1\), \(T=1\), \(h(x) = \log \left( \frac{1+\Vert x\Vert ^2}{2}\right) \) and 9 qubits alongside 1 trainable layer. Training results are shown in Table 5 and training settings in Appendix C. We see relatively poor performance for both sets of random networks, which is due to hardware limitations. Indeed, the MSE during training did not plateau for any of the models, showing than more training iterations should be used. Calculating the derivatives needed for the HJB equation using a network architecture of 9 qubits requires much larger computational resources, which we leave to further study.
5 Conclusion
This paper develops parameterised quantum circuits to solve widely used PDEs in any dimension. It further provides a precise complexity study of these algorithms and compare them to their classical counterparts, highlighting their potential advantages and limitations. In the future work, we will investigate the impact of quantum noise on these quantum circuit-based PDE solutions using both quantum noise models and actual quantum hardware.
Data availibility
No datasets were generated or analysed during the current study.
Notes
This is for the case with the zero boundary condition. However any problem with prescribed nonzero boundary values can easily be transformed into this setting [8, Section (6.1.2)].
References
Abbas, A., Sutter, D., Zoufal, C., et al.: The power of quantum neural networks. Nat. Comput. Sc.i 1(6), 403–409 (2021)
Berahas, A.S., Cao, L., Choromanski, K., et al.: A theoretical and empirical comparison of gradient approximations in derivative-free optimization. Found. Comput. Math. 22(2), 507–560 (2022)
Bhuvaneswari, V., Lingeshwaran, S., Balachandran, K.: Weak solutions for \(p\)-Laplacian equation. Adv. Nonlinear Anal. 1, 319–334 (2012)
Cuomo, S., Di Cola, V.S., Giampaolo, F., et al.: Scientific machine learning through Physics-informed neural networks: where we are and what’s next. J. Sci. Comput. 92(3), 88 (2022)
Doumèche, N., Biau, G., Boyer, C.: Convergence and error analysis of PINNs. (2023)
Dubey, S.R., Singh, S.K., Chaudhuri, B.B.: Activation functions in deep learning: a comprehensive survey and benchmark. Neurocomputing 503, 92–108 (2022)
Dunjko, V., Briegel, H.J.: Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Rep. Prog. Phys. 81(7), 074001 (2018)
Evans, L.C.: Partial differential equations, vol. 19. American Mathematical Society, New York (2022)
Glasserman, P.: Monte Carlo Methods in Financial Engineering, vol. 53. Springer, Berlin (2004)
Gonon, L.: Random feature neural networks learn Black-Scholes type PDEs without curse of dimensionality. J. Mach. Learn. Res. 24(189), 1–51 (2023)
Gonon, L., Jacquier, A.: Universal approximation theorem and error bounds for quantum neural networks and quantum reservoirs (2023)
Gonon, L., Grigoryeva, L., Ortega, J.P.: Approximation bounds for random neural networks and reservoir systems. Ann. Appl. Probab. 33(1), 28–69 (2023)
Gopakumar, V., Pamela, S., Samaddar, D.: Loss landscape engineering via data regulation on PINNs. Mach. Learn. Appl. 12, 100464 (2023)
Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, https://doi.org/10.1137/1.9780898717761 (2008)
Han, J., Jentzen, A.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)
He, D., Li, S., Shi, W., et al.: Learning Physics-informed neural networks without stacked back-propagation. In: International Conference on AI and Statistics, pp 3034–3047 (2023)
Jacquier, A., Zuric, Z.: Random neural networks for rough volatility (2023)
Krishnapriyan, A., Gholami, A., Zhe, S., et al.: Characterizing possible failure modes in physics-informed neural networks. Adv. Neural. Inf. Process. Syst. 34, 26548–26560 (2021)
Kyriienko, O., Paine, A.E., Elfving, V.E.: Solving nonlinear differential equations with differentiable quantum circuits. Phys. Rev. A 103, 052416 (2021)
Lee, H., Kang, I.S.: Neural algorithm for solving differential equations. J. Comput. Phys. 91(1), 110–131 (1990)
Lu, L., Meng, X., Mao, Z., et al.: DeepXDE: a deep learning library for solving differential equations. SIAM Rev. 63(1), 208–228 (2021)
Luo, X.Z., Liu, J.G., et al.: PZ Yao jl: extensible, efficient framework for quantum algorithm design. Quantum 4, 341 (2020)
Mari, A., Bromley, T.R., Killoran, N.: Estimating the gradient and higher-order derivatives on quantum hardware. Phys. Rev. A 103(1), 012405 (2021)
Mattheakis, M., Joy, H., Protopapas, P.: Unsupervised reservoir computing for solving ordinary differential equations (2021)
Müller, J., Zeinhofer, M.: Deep Ritz revisited (2019)
Pérez-Salinas, A., Cervera-Lierta, A., Gil-Fuster, E., et al.: Data re-uploading for a universal quantum classifier. Quantum 4, 226 (2020)
Pérez-Salinas, A., Cruz-Martinez, J., Alhajri, A.A., et al.: Determining the proton content with a quantum computer. Phys. Rev. D 103(3), 034027 (2021)
Pérez-Salinas, A., López-Núñez, D., García-Sáez, A., et al.: One qubit as a universal approximant. Phys. Rev. A 104(1), 012405 (2021)
Preskill, J.: Quantum computing 40 years later. In: Feynman Lectures on Computation. CRC Press, p 193–244 (2023)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Wendel, J.G.: Note on the Gamma function. Am. Math. Mon. 55(9), 563–564 (1948)
Wenshu, Z., Daolun, L., Luhang, S., et al.: Review of neural network-based methods for solving partial differential equations. Chin. J. Theor. Appl. Mech. 54(3), 543–556 (2022)
Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
Zoufal, C., Lucchi, A., Woerner, S.: Quantum generative adversarial networks for learning and loading random distributions. Npj Quantum Inf. 5(1), 103 (2019)
Acknowledgements
The authors are grateful to the Department of Aeronautics at Imperial College London for supporting this work with a doctoral studentship. SL gratefully acknowledges financial support from the EPSRC grant EP/W032643/1 and AJ that of the EPSRC grants EP/W032643/1 and EP/T032146/1. ‘For the purpose of open access, the author has applied a ‘Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising’
Author information
Authors and Affiliations
Contributions
S.L wrote the manuscript introduction, J.D wrote the remaining manuscript text, authors S.L and A.J edited and reviewed the manuscript text. Numerical results and figures were prepared by J.D.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
p-Laplace equation training details
For \(p=2\) and \(p=3\), we use iid sample points \(\left\{ x^{(i)} \right\} _{i=1,\ldots , n_r}\) drawn uniformly in \((0,1)^{d}\), and \(n_e\) iid sample points \(\left\{ {\tilde{x}}^{(i)}\right\} _{i=1,\ldots , n_e}\) drawn uniformly from the boundary \(\partial (0,1)^{d}\). For the \(L_2\) relative error statistics we use 1000 sample points of the form \(\left\{ x^{(i)} \right\} _{i=1,\ldots , 1000}\).
Specific training details for Poisson equation. | |
---|---|
Quantum model configuration | |
Trainable layers | 1 |
Number of qubits | 6 |
Trainable parameters | 24 |
Classical model configuration | |
Nodes / trainable parameters | 10,16,24,50,100 |
Hyperparameters | |
Total iterations | 1000 |
Domain batch size \(n_r\) | 128 |
Boundary batch size \(n_e\) | 64 |
Boundary weighting \(\lambda _e\) | 1 |
Optimiser | Adam |
Learning rate | \({1E-3}\) |
Adam \(\varepsilon \) | \({1E-8}\) |
Adam \(\left( \beta _1, \beta _2\right) \) | (0.9,0.999) |
Specific training details for the p-Laplce equation with \(p=3\). | |
---|---|
Quantum model configuration | |
Trainable layers | 1 |
Number of qubits | 6 |
Trainable parameters | 18 |
Classical model configuration | |
Nodes / trainable parameters | 10,18,24,50,100 |
Hyperparameters | |
Total iterations | 1000 |
Domain batch size \(n_r\) | 512 |
Boundary batch size \(n_e\) | 64 |
Boundary weighting \(\lambda _e\) | 500 |
Gradient normalised to | 1 |
Optimiser | Adam |
Learning rate | \(2.5E-3\) |
Adam \(\varepsilon \) | \({1E-8}\) |
Adam \(\left( \beta _1, \beta _2\right) \) | (0.9,0.999) |
Heat equation training settings
We use iid sample points \(\left\{ (t^{(i)}, x^{(i)}) \right\} _{i=1,\ldots , n_r}\) drawn uniformly in \([0,1]\times (0,1)^{d}\) and \(n_e\) iid sample points \(\left\{ {\tilde{x}}^{(i)}\right\} _{i=1,\ldots , n_e}\) drawn uniformly from \(\partial (0,1)^{d}\). For the MSE statistics we use 1000 sample points of the form \(\left\{ (t^{(i)}, x^{(i)}) \right\} _{i=1,\ldots , 1000}\).
Specific training details for the heat equation in 2 dimensions. | |
---|---|
Quantum model configuration | |
Trainable layers | 1 |
Number of qubits | 4 |
Trainable parameters | 16 |
Classical model configuration | |
Nodes / trainable parameters | 10,16,24,50,100 |
Hyperparameters | |
Total iterations | 1000 |
Domain batch size \(n_r\) | 128 |
Boundary batch size \(n_e\) | 64 |
Boundary weighting \(\lambda _e\) | 500 |
Gradient clipping | ± 1 |
Optimiser | Adam |
Learning rate | \(5E-3\) |
Adam \(\varepsilon \) | \({1E-8}\) |
Adam \(\left( \beta _1, \beta _2\right) \) | (0.9,0.999) |
Specific training details for the heat equation in 3 dimensions. | |
---|---|
Quantum model configuration | |
Trainable layers | 2 |
Number of qubits | 6 |
Trainable parameters | 60 |
Classical model configuration | |
Nodes / trainable parameters | 30,40,50,60,70,80,100 |
Hyperparameters | |
Total iterations | 2000 |
Domain batch size \(n_r\) | 64 |
Boundary batch size \(n_e\) | 64 |
Boundary weighting \(\lambda _e\) | 500 |
Gradient clipping | ± 1 |
Optimiser | Adam |
Learning rate | \(5E-3\) |
Adam \(\varepsilon \) | \({1E-8}\) |
Adam \(\left( \beta _1, \beta _2\right) \) | (0.9,0.999) |
HJB equation training settings
We use iid sample points \(\left\{ (t^{(i)}, x^{(i)}) \right\} _{i=1,\ldots , n_r}\) and \(\left\{ {\tilde{x}}^{(i)}\right\} _{i=1,\ldots , n_e}\), where \(\left\{ t^{(i)}\right\} _{i=1,\ldots , n_r}\) are drawn uniformly in [0, 1], and \(\left\{ x^{(i)}\right\} _{i=1,\ldots , n_r}\) and \(\left\{ {\tilde{x}}^{(i)}\right\} _{i=1,\ldots , n_e}\) are drawn from centred normalised Gaussian distributions in \({\mathbb {R}}^d\). For the MSE statistics we use 1000 sample points of the form \(\left\{ (t^{(i)}, x^{(i)}) \right\} _{i=1,\ldots , 1000}\).
Specific training details for the HJB equation in 3 dimensions. | |
---|---|
Quantum model configuration | |
Trainable layers | 1 |
Number of qubits | 9 |
Trainable parameters | 45 |
Classical model configuration | |
Nodes / trainable parameters | 50,75 |
Hyperparameters | |
Total iterations | 750 |
Domain batch size \(n_r\) | 64 |
Boundary batch size \(n_e\) | 64 |
Boundary weighting \(\lambda _e\) | 500 |
Gradient clipping | ± 1 |
Optimiser | Adam |
Learning rate | \({5E-3}\) |
Adam \(\varepsilon \) | \({1E-8}\) |
Adam \(\left( \beta _1, \beta _2\right) \) | (0.9,0.999) |
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dees, J., Jacquier, A. & Laizet, S. Unsupervised random quantum networks for PDEs. Quantum Inf Process 23, 325 (2024). https://doi.org/10.1007/s11128-024-04537-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11128-024-04537-0
Keywords
- Physics-informed neural networks
- Partial differential equations
- Random quantum networks
- Gaussian smoothing