[PDF] Quantum Finite Volume Method for Computational Fluid Dynamics with Classical Input and Output

Abstract

Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical methods to solve fluid flows. The finite volume method (FVM) is an important one. In FVM, space is discretized to many grid cells. When the number of grid cells grows, massive computing resources are needed correspondingly. Recently, quantum computing has been proven to outperform a classical computer on specific computational tasks. However, the quantum CFD (QCFD) solver remains a challenge because the conversion between the classical and quantum data would become the bottleneck for the time complexity. Here we propose a QCFD solver with exponential speedup over classical counterparts and focus on how a quantum computer handles classical input and output. By utilizing quantum random access memory, the algorithm realizes sublinear time at every iteration step. The QCFD solver could allow new frontiers in the CFD area by allowing a finer mesh and faster calculation.

Full PDF

QQuantum Finite Volume Method for Computational Fluid Dynamicswith Classical Input and Output

Zhao-Yun Chen, Cheng Xue, Si-Ming Chen, Bing-Han Lu, and Yu-Chun Wu ∗ Key Laboratory of Quantum Information, CAS andUniversity of Science and Technology of China

Ju-Chun Ding and Sheng-Hong Huang † Department of Modern Mechanics, USTC

Guo-Ping Guo

Key Laboratory of Quantum Information, CASUniversity of Science and Technology of China andOrigin Quantum Computing, Hefei ‡ (Dated: February 9, 2021)Computational ﬂuid dynamics (CFD) is a branch of ﬂuid mechanics that uses numerical methodsto solve ﬂuid ﬂows. The ﬁnite volume method (FVM) is an important one. In FVM, space isdiscretized to many grid cells. When the number of grid cells grows, massive computing resourcesare needed correspondingly. Recently, quantum computing has been proven to outperform a classicalcomputer on speciﬁc computational tasks. However, the quantum CFD (QCFD) solver remainsa challenge because the conversion between the classical and quantum data would become thebottleneck for the time complexity. Here we propose a QCFD solver with exponential speedup overclassical counterparts and focus on how a quantum computer handles classical input and output. Byutilizing quantum random access memory, the algorithm realizes sublinear time at every iterationstep. The QCFD solver could allow new frontiers in the CFD area by allowing a ﬁner mesh andfaster calculation. I. INTRODUCTION

Computational ﬂuid dynamics (CFD) is the area thatutilizes numerical methods to obtain the physical prop-erties of ﬂuids. It has many applications, such as aid indesigning aircraft or automobile. CFD is often related tosolving a series of partial diﬀerential equations (PDEs)and can compute the evolution of physical characteristicsof ﬂuid at a given space, including density, momentum,and energy. These characteristics would provide us essen-tial references for the properties of the ﬂuid in the compu-tational space. There are three typical physical governingequations of the CFD: Navier–Stokes (NS), Euler, andReynold–Averaged Navier–Stokes (RANS) equations.The ﬁnite volume method (FVM) is a typical numer-ical method to discretize these physical equations. InFVM, the computational space is discretized into smallcells by dense grid points, separately solving every cell’sevolution at a small timestep and ﬁnally integrating alltime steps. The PDE will be converted to a sparse linearequation at every time step, whose dimension N has alinear dependency on the number of cells. In practice,sparse matrix linear solvers such as the conjugate gra-dient method are available. The best time complexityof the conjugate gradient method is O ( N sκ log 1 /(cid:15) ) timecomplexity where s is the sparsity number (the maximum ∗ [email protected] † [email protected] ‡ [email protected] number non-zero elements in each row or column), κ isthe condition number, and (cid:15) is the precision [1]. A typicalproblem for the FVM is that when the problem size growslarge, the computing resources will become expensive.Instead of using classical computers, quantum com-puting is a promising computing paradigm that oﬀersexponential acceleration over classical computing ap-proaches. Many quantum algorithms, including quantumfactorization[2], quantum simulation[3–6], and the linearsystem solvers, [7–9] have already appeared to prove thisidea. Thus, we try to accelerate a CFD solver with quan-tum computing. There were some works about solvinglinear PDEs using a quantum computer[10–13]. How-ever, these methods cannot be directly applied to solveCFD because the Navier–Stokes equation is a non-linearPDE, which is not covered by these previous results.This paper introduces a quantum solver for CFD prob-lems (QCFD solver) based on classical FVM. We showthat with only classical input, the time cost of each it-eration step can be reduced to polylogarithmic depen-dency on the problem size. This provides an exponentialspeedup of the FVM. The QCFD solver can fully repro-duce the result of the FVM. With the output given clas-sically at every time step, the QFVM is capable of thesteady or unsteady problem with similar conﬁgurations.To apply quantum algorithms to practical problems,the conversion between the classical and the quantumdata could become a bottleneck, especially when usingthe quantum linear solver (QLS) as a submodule[14].In our algorithm, the input and output are all classicaldata. To achieve this, we design a quantum memory lay- a r X i v : . [ qu a n t - ph ] F e b out based on quantum random access memory (QRAM)[15, 16]. At the input stage, the memory layout helps toimplement subprocedures required by the QLS. At theoutput stage, we sample the output state and update thememory classically. We show that these two processes,which act as the interface between classical and quantumdata, can both run in polylogarithmic time. They enableus to integrate the quantum linear solver submodule intothe classical FVM to achieve speedup. The time com-plexity of our algorithm is calculated by scaling the timebetween two iteration step, which is O (cid:18) ( s + log N ) sκ log N(cid:15) polylog( sκ/(cid:15) ) (cid:19) . (1)Our algorithm can be compared to the classical algo-rithm directly. The algorithm has classical input and out-put and uses the same deﬁnition of condition number andthe error threshold. For the condition number problem,we implement a quantum version of the Jacobi precondi-tioner and integrate it into the memory layout. It has thesame eﬀect as applying a Jacobi preconditioner, which iscommon in the classical FVM. Therefore, both quantumand the classical algorithm have a linear dependency (ifwe ignore the polylogarithmic term) on the same condi-tion number. We also provide evidence that the quantumand the classical error threshold are the same under thesense of l ∞ norm.As a result, compared to the time complexity of theclassical counterpart, the quantum solver runs faster interms of the problem size N , but slower in the depen-dency on the precision with a quadratic term. Our al-gorithm will have better performance when the problemsize is large enough, i.e., N (cid:29) /(cid:15) . We performed anumerical simulation on the Onera M6 test case show-ing that the algorithm can output correctly with such acondition. II. PRELIMINARIESA. Discretization and Linearization of the PhysicalGoverning Equation

The typical physical governing equations (Euler, NS,RANS) have to be linearized to apply to the FVM. Inthis paper, we do not focus on the detail of the lineariza-tion. Instead, we apply the identical linearization methodto the classical algorithm and analyze the relationshipsamong the equation variables.Here we take a two dimensional NS equation with com-pressible ﬂow as an example. First write down the dif-ferential form of the NS equation: ∂∂t (cid:90) Ω U dV + (cid:73) ∂ Ω F · d S = 0 , (2) FIG. 1. The grid cell around i th point. F ∂ Ω is the ﬂux at thecertain boundary; ∆ S ∂ Ω is the area. Ω i is the volume of thiscell. where U =  ρρuρvρE  F x =  ρuρu + pρuvρuH  F y =  ρvρuvρv + pρvH  , (3)for any volumne Ω and its boundary ∂ Ω.To discretize it spatially and timely, we split the spaceand time into small grid cells. At the cell i and time stepn, the NS equation can be discretized toΩ i ∆ t (cid:0) U n +1 i − U ni (cid:1) = − (cid:88) ∂ Ω F n +1 i,∂ Ω · ∆ S n +1 i,∂ Ω , (4)where implicit Euler method is applied. We deﬁne theright hand side of the equation (4) as the residual of thispoint, denoted by R n +1 i . The F i is deﬁned by the dif-ference scheme, which is calculated by variables (cid:126)U in thesurrounding cells. The diﬀerence scheme gives a relationbetween nodes. In this paper we deﬁne a matrix C whichhas C i,i (cid:48) = 1 (5)if i and i (cid:48) are related in the diﬀerence scheme. In otherwords, C i,i (cid:48) = 1 means calculateing the residual at i th node uses the variables in j th node. Specially, we alwayshave C i,i = 1.Let ∆ (cid:126)U n +1 = (cid:126)U n +1 − (cid:126)U n , we have (cid:18) Ω i ∆ t δ i,i (cid:48) + ∂R i,k ∂U i (cid:48) ,k (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) U = U n (cid:19) ∆ U ni (cid:48) ,k (cid:48) = − R ni,k . (6)Simply replacing A = (cid:16) Ω i ∆ t δ i,i (cid:48) + ∂R i,k ∂U i (cid:48) ,k (cid:48) (cid:12)(cid:12)(cid:12) U = U n (cid:17) , weobtain a linear equation whose solution implies the timeevolution of the physical variable (cid:126)U .The coeﬃcient matrix A is a sparse matrix. Fromequation (4), the A i,k,i (cid:48) ,k (cid:48) is non-zero when i and i (cid:48) arerelated in the diﬀerence scheme ( C i,i (cid:48) = 1). The sparsenumber (number of nonzero element in a row or column)is ﬁxed by how we select the diﬀerence scheme, denotedby s .Regardless of the physical governing equation, the dis-cretization and the linearization following the classicalFVM method do not change. We will ﬁnally show thatthe spatial or time diﬀerence scheme does not aﬀect howthis algorithm works, and only the constant-coeﬃcientwill change in the analysis of the time complexity. B. Quantum Algorithm with Classical Input andOutput

When we use a quantum computer to cope with apractical problem, we should always expect that the in-put and output are classical. Many quantum algorithmshave been proposed and claimed to be faster (exponen-tially or polynomially) than their classical counterparts.However, a large portion of them only beats classical al-gorithm under some theoretical limitations. A typicalexample is the famous quantum linear system algorithm:Harrow-Haddism-Lloyd (HHL) algorithm[7], which canprepare the state | x (cid:105) encoding the solution of the linearequation Ax = b . This algorithm uses O (log N ) calls tolinear equation oracles, where the classical counterparthas to perform at least O ( N ) calls. Based on the abilityto accelerate solving a linear equation, many quantummachine learning algorithms were proposed and accel-erated exponentially over the classical algorithm. How-ever, most of these algorithms did not answer how todeal with real-world data or obtain a classical output. Inthis paper[14], Aaronson raised a series of obstacles forapplying the HHL algorithm on quantum machine learn-ing algorithms with real-world data. The main problemsinclude how to input the classical data into the quan-tum computer and extract information from the outputstate given by the HHL algorithm. If we hope to pre-serve quantum speedup, two operations are forbidden.One is to prepare the input state | b (cid:105) with an encodedquantum circuit, where even reading all data entries re-quires O ( N ) time; the other is to perform sampling onthe output state to extract the state to a classical vectorwith O ( N ) times measurement.We believe the obstacles that appeared in “HHL-based” quantum machine learning algorithms are evenmore challenging if we want to accelerate the FVM forCFD problems quantumly. In the classical method, thephysical governing equation (e.g., Navier–Stokes equa-tion) will be discretized in both time and space. At everytime step, we linearize the physical governing equation toa linear equation, and its solution represents the evolu-tion of the physical variables. Finally, we can integrate alltime steps to obtain a stable solution. The time complex-ity of the FVM mainly depends on the time complexity ofthe linear solver, which is at least O ( N ). It is straight-forward to suppose that if we change the linear solver to a quantum version, we will have a quantum acceler-ated CFD solver. However, the two problems mentionedabove exist in every step of the time integration. Oneis how to generate the input from the physical variablesat one time step; the other is how to update the phys-ical variables from the quantum linear solver’s output.In conclusion, the quantum algorithm will fail to demon-strate quantum advantage if time complexity requires anextra O ( N ) multiplier.Our proposed QFVM algorithm will consider these ob-stacles. We assume the input and output of this algo-rithm are all classical, ensuring that this algorithm canbe run in the quantum computer without providing moreinput than the classical algorithm. C. Quantum Random Access Memory

Quantum random access memory is the storage devicefor the quantum computer. As the quantum analog ofRAM, QRAM allows a quantum computer to obtain clas-sical data with given addresses in quantum parallel. Inother words, QRAM could perform such unitary trans-formation: U QRAM | i (cid:105) A | (cid:105) D = | i (cid:105) A | d i (cid:105) D , (7)where A and D denote the address and the data registers. d i is a classical data entry stored at the address i .A seminal architecture called “bucket-brigade” pro-vides an eﬃcient way for querying. There have beenmany proposed physical implementations of such archi-tecture, such as optical system [15], acoustics system [17],and circuit quantum electrodynamics[18]. Our work isbased on the QRAM with architecture implemented byany of the physical systems. To eliminate the diﬀerencein understanding the availability of the QRAM, we list allassumptions when we apply the QRAM to our algorithm.First, the QRAM is general to all input addresses andtheir superpositions, namely (cid:80) c i | i (cid:105) . The QRAM shouldbe an arbitrary data loader rather than only allowing toprepare the (cid:80) | i (cid:105)| d i (cid:105) state.Second, if the address register has been prepared, per-forming one query costs O (log N ) time where the fulldata length is N.Third, we assume a QRAM has at least a classicalRAM capability, enabling access to a single entry or over-laying it to another value with constant time. Meanwhile,the QRAM should be compatible with a classical com-puter. A classical computer can read the data in QRAMwithout extra cost.Even though a real physical implementation of QRAMis hard, these assumptions are reasonable because theydo not exceed the capabilities of the previous physicalimplementations.We claim our algorithm as “classically input and out-put” under the sense that the input and output of thealgorithm are stored in the QRAM. Because we believein the compatibility of QRAM and classical computer,the problem deﬁnition, data initialization, and post-processing of the calculation results can all be performedin a connected classical computer. III. QUANTUM FINITE VOLUME METHOD

This section introduces the quantum ﬁnite volumemethod (QFVM) for CFD problems based on the im-plicit Euler method and classical FVM. As described inthe preliminary section (II A), the Euler, NS, or RANSequation can be discretized timely and spatially and ﬁ-nally linearized following speciﬁc classical methods. Wedo not focus on these methods but only transplant theminto a quantum version based on the theory that anyclassical function can be implemented in a quantum com-puter.Take a two-dimensional Euler equation with compress-ible gas (density, X- or Y-directional momentum and en-ergy) as the example. The physical variable (cid:126)U containsthe physical properties of all grid cells. We discretize thecomputing space into N grid cells, then the size of (cid:126)U is4N. We use a two-level subscript ( i, k ) to identify a singleelement in (cid:126)U by U i,k , where i denotes the i th grid cell and k denotes the k th physical characteristics.The time evolution of (cid:126)U is realized by solving the lin-earized NS equation using implicit Euler time iterationscheme. The calculation is iterative, and we use a sub-script n to denote the n th iteration step. At any timestep, the equation has a general form: A n ( (cid:126)U n )∆ (cid:126)U n +1 = − (cid:126)R n ( (cid:126)U n ) , (8)where A is the Jacobian matrix and (cid:126)R is the residualvector. These two parts are determined by the physicalvariable (cid:126)U at n th step. The unknown is ∆ (cid:126)U n +1 . Thesystem evolves one step by calculating (cid:126)U n +1 = (cid:126)U n + ∆ (cid:126)U n +1 . (9)Similar to the deﬁnition of U i,k , the matrix element of A is denoted by A i (cid:48) ,k (cid:48) i,k . Element in (cid:126)R is denoted by R i,k .The initial state of the physical variable (cid:126)U should havebeen stored in QRAM. It is given as the input of thealgorithm. Another part of the input is about the spatialgrid points, which split the space into cells. These datashould also be stored in the QRAM, ready for quantumquery.Our work mainly concentrates on how to bridge thegap between quantum processes and classical processes.On top of our work, we design a QRAM-based memorylayout inspired by previous works[19]. The details aboutthe memory layout are introduced in section IV.Alike the classical program, the QRAM stores thephysical variable (cid:126)U n to construct the linear equation atthe step n. Besides this, we also prepare the residual vec-tor (cid:126)R and a sum tree With the memory layout design, one can construct three quantum subprocedures O A , O b , and O l required by the QLS. They encode the linear equationas unitary transforms, that is: O A | i, k, i (cid:48) , k (cid:48) (cid:105) = | i, k, i (cid:48) , k (cid:48) (cid:105)| A i (cid:48) ,k (cid:48) i,k (cid:105) , (10)which encodes the Jacobian matrix’s element, and O b | i, k (cid:105) = | i, k (cid:105)| R i,k (cid:105) , (11)which encodes the residual vector’s element, and O l | i, p (cid:105) = | i, C i ( p ) (cid:105) , (12)which encodes the p th related cell in the diﬀerencescheme. O l can be implemented directly by the geometry deﬁni-tion input with constant queries. O A also requires query-ing the geometry deﬁnition to obtain the O ( s ) numberof U i at the related cells. Implementation of O b is intro-duced in section IV B.With these three quantum subprocedures, QLS out-puts a solution | (cid:126)u (cid:105) = | ∆ (cid:126)U n +1 (cid:107) ∆ (cid:126)U n +1 (cid:107) (cid:105) , a normalized solutionof the linear equation. Now we are able to constructa procedure P which can prepare the | (cid:126)u with sublineartime.Taking P as the input of the l ∞ tomography algorithm,we can obtain a classical vector ˜ u which is (cid:15) -close to thequantum solution (cid:126)u . This algorithm requires to run P and its controlled version by O ( log N(cid:15) ) many times. As aresult, obtaining ˜ u requires sublinear time.Meanwhile, we can apply amplitude estimation[20] to P to obtain the normalization factor c l = (cid:107) ∆ (cid:126)U n +1 (cid:107) .we use ˜ u and c l to update the sum tree and ﬁnish thisiteration step.For a steady problem, the computing stops in twocases. One is when the residual is smaller than the con-vergence limit (cid:15) , which can be extracted from the top ofthe tree. Another is when reaching the maximum itera-tion steps. After stopping, the output of this algorithmis stored in the physical variable area. IV. QUANTUM MEMORY LAYOUT FORQFVM ALGORITHM

To reduce data transfer costs between quantum andclassical data, we design a memory layout to eﬃcientlyprepare the residual vector state and prepare the oracularinput of the quantum linear solver.The QRAM stores three kinds of data. The ﬁrst is thegeometry data of the problem deﬁnition. From this, wecan query the connection relation between cells (geome-try deﬁnition) in quantum parallel. These are constantduring the calculation and used for constructing the lin-ear equation. Querying this part is equivalent to thisunitary transform: P s | i (cid:105)| j (cid:105) = | i (cid:105)| i j (cid:105) . (13) FIG. 2. Schematic of the quantum memory layout. (a)Three parts of the QRAM memory layout: geometry deﬁnition area,which holds the input of the problem; physical variable area holding the (cid:126)U n ; and the residual sum tree. (b)The linear structureof the geometry deﬁnition area. This area is formed with N blocks. The i th block holds s related indices where each of i (cid:48) = i k (0 (cid:54) k < s ) satisﬁes C i,i (cid:48) = 1. (c)The linear structure of the physical variable area. Each block the physical variables (cid:126)U i at thecell i. (d)The binary tree structure of the residual sum tree. The tree’s leaves are the components of the residual vector (cid:126)R n .Then for each level, we sum up the square of every two nodes. The tree root is (cid:107) (cid:126)R (cid:107) . The second is the vector of the physical variable at thestep n, namely (cid:126)U n . From this part we can implement P U | i, k (cid:105)| (cid:105) = | i, k (cid:105)| U i,k (cid:105) . (14)The third is the residual vector (cid:126)R n and its pre-computed sum. We call it the sum tree. From the bot-tom to the top of the binary tree, every node stored anintegration necessary to prepare the residual state. Wedeﬁne each node’s address as a r ( p ) from the top of thesum tree. p is a binary string where every digit repre-sents the left/right branch with 0/1. For example, a r (0)is the address of the left child of the root; a r (0 ,

1) is theright child of the node at a r (0). Specially, we directly use a r to represent the root’s address. We place all nodes inthe QRAM linearly, where each node’s address a r ( p ) canbe computed eﬃciently. The data contained in address a r ( p ) is denoted by S R ( p ). Thus we can perform suchunitary transform: P R | a r ( p ) (cid:105)| (cid:105) = | a r ( p ) (cid:105)| S R ( p ) (cid:105) . (15)The second and the third part will be initialized onceat the beginning of the calculation. They will be continu-ously updated through the calculation. The ﬁnal steady result, which is (cid:126)U at the last iteration step, is also storedin them.The diagram of the memory layout is demonstrated inﬁgure IV. The functions of this memory layout are asfollows: • Initialize the sum tree will classically access theQRAM O ( N ) times; • With access to the sum tree, one can prepare | R (cid:105) with O (log N ) times of quantum queries to theQRAM; • If the sparsity of the Jacobian matrix is s , updatinga single entry of (cid:126)U will classical access the QRAM O ( s log N ) times to update the sum tree; • The normalization factor between the residual vec-tor and its corresponding quantum state can be ob-tained with one quantum query; • Evolution from U n to U n +1 will cost O ( C log N/(cid:15) ) time.We clearly state the diﬀerence between “classical ac-cess” and “quantum query”. Classical accessing meansonly one data entry is read and modify at a time. Quan-tum query means a unitary transform is performed,simultaneously extracting many data queries into onequantum register using the superposition addresses.In the following part, we will show how these featuresare realized. A. Initialization

The initialization process ﬁlls the QRAM as the mem-ory layout scheme shows. The ﬁrst step is to write inthe initial physical variable (cid:126)U and ﬁll the tree with thewanted sum.The initialization is entirely a classical process. Alongwith writing the data, we should also record the memorylayout to quickly obtain the memory address of everydata entry in constant time.There may be some concentrations about whether such O ( N ) preparation time will cause the vanishment of thisalgorithm’s speedup. We believe after considering thetime consumption of initialization, the quantum speedupstill preserves. We analyze the three things that con-tribute to initialization time.One is the calculation of the residual vector (cid:126)R . Thecalculation of this vector exists at every step of the classi-cal FVM in a CFD problem. Even in classical algorithms,this part is not the bottleneck of the time. Our algorithmonly calculates the residual once initially, which will con-sume much less than a classical algorithm does.The second is the ﬁll of the sum tree. To ﬁll a sum treeonly requires repeatedly adding the sum of the square ofthe residual vector. It is natural to think this process iseasier than the calculation of the residual vector.The third is about the cost of accessing the QRAMclassically. As we have mentioned in section II C, we as-sume the QRAM has the near capability of RAM, whichallows the access to be performed in constant time. B. Preparation of the residual vector state | (cid:126)R (cid:105) According to the method described in [21], the state | (cid:126)R (cid:105) could be prepared eﬃciently because we have accessto all wanted sums of the vector. We pre-compute themin the sum tree, so the preparation can be realized byquerying the sum tree P R .The ﬁrst step is to query the tree root and its left childnode, then calculate the rotating angle at this step: | a r (0) (cid:105)| S R (0) (cid:105)| a r (cid:105)| S R (cid:105)| θ (cid:105)| (cid:105) , (16)where S R (0) = (cid:80) i ∈ [0 ..N/ − R i , S R = (cid:80) i ∈ [0 ..N − R i , θ = arccos S R (0) S R .Now perform a conditional rotation and uncompute,we have: cos θ | (cid:105) + sin θ | (cid:105) . (17) Add another qubit, perform Hadamard gate on it, wehave (cos θ | (cid:105) + sin θ | (cid:105) ) 1 √ | (cid:105) + | (cid:105) ) (18)= 1 √ θ | (cid:105) + sin θ | (cid:105) ) (19)Then we iteratively perform the query, computing therotating angles and conditional rotation. At k th step, wehave the state (cid:88) c ki | i (cid:105) √ | (cid:105) + | (cid:105) ) . (20)Computing the addresses of | i, (cid:105) , | i, (cid:105) , we obtain thereal addresses a r ( i,

0) and a r ( i,

1) in the QRAM, that is (cid:88) c ki | i (cid:105) √ | (cid:105)| a r ( i, (cid:105) + | (cid:105)| a r ( i, (cid:105) , (21)then query to the P R to obtain the rotating angles anduncompute extra registers. Finally, after performing con-ditional rotation, we step to (cid:88) c k +1 i | i (cid:105) (22)Repeatedly performing this process, we caneﬃciently prepare the residual state | (cid:126)R (cid:105) = − (cid:80) R i | i (cid:105) / (cid:113)(cid:80) j ∈ [0 ..N − R j with the help of thesum tree. C. Updating a single entry of the physical variable

From step n to n+1, the physical variable (cid:126)U should beupdated. Same as the computing (cid:126)R and its sum from (cid:126)U ,only the residual on the related cells would change. Fromthe tree leaves, we change the all residual R i (cid:48) ,k (cid:48) relatedto the changed U i,k with C i,i (cid:48) = 1. After these residualvector entries change, we again compute the sum treefrom the leaves to the root and update correspondingly.The number of updated nodes will not exceed the numberof the multiplication of the related residual entries O ( s )and the number of layers of the sum tree log N .As a result, the cost of update one entry of (cid:126)U is lessthan O ( s log N ). D. Sampling the solution state and update theQRAM

The QLS outputs the solution as a quantum state.However, we cannot directly extract it to a classical vec-tor to update (cid:126)U . We need to cope with two problems:ﬁrst is to decide the normalization factor of the solution;the second is to convert the quantum state to a classicalvector.The tomography algorithm only produces a normalizedvector ∆ ˜ U n +1 . We should also obtain all the normalizedfactor in the algorithm to get a real update vector of (cid:126)U n +1 . QLS produces two factors. First is c b , which isgenerated when preparing | b (cid:105) and can be obtained fromthe data structure described above. The second is c l which is generated from the QLS, because matrix inver-sion is usually not unitary and the raw solution A − b is not normalized. With amplitude estimation, we cancompute the probability p l and then obtain the factor by c l = α √ p l , where α is a constant in the QLS. Obtainingthe normalization factors will not aﬀect the asymptotictime complexity of the algorithm.Combining these two factors c = c b c l , we obtain thenorm of solution (cid:107) ∆ (cid:126)U (cid:107) , which implies the variation up-dated on the target vector (cid:126)U in the CFD solver. Whenthe norm of the variation is smaller than a preset thresh-old, we can stop the iteration and return the result.With l ∞ tomography[22] we could eﬃciently producean l ∞ -close sample ∆ ˜ U n +1 of a real-valued quantumstate | ∆ (cid:126)U n +1 (cid:105) with O ( log N(cid:15) ). This sampling algorithmhas a logarithmic dependency on N, enabling each itera-tion step of our algorithm to run at polylogarithmic timecomplexity over the input size N.Updating the QRAM from the sampled vector is alsoeﬃcient. The l ∞ tomography algorithm produces asparse classical vector with not more than O (log N/(cid:15) )non-zero elements, which means the updating will beperformed for less than O (log N/(cid:15) ) times to update theQRAM P U in one iteration step. This results in eﬃciencyin both sampling and updating. V. IMPLEMENTATION OF QUANTUMPRECONDITIONER

The condition number of the linear equation representsto what extent the solution can be aﬀected by the per-tubation on the right-hand-side vector. The conditionnumber is deﬁned as: κ ( A ) = | λ max || λ min | , (23)where | λ max | and | λ min | is the maximum/minimumabsolute of the eigenvalues of A . When the conditionnumber is large, we say the equation is ill-conditioned,requiring high precision and time complexity to solve.The time complexity of the classical sparse linear solverhas a dependency on the condition number. For example,the time complexity of the conjugate gradient method is O ( κsN log 1 /(cid:15) ). The QLS used in our algorithm also hasa linear dependency on the condition number.Preconditioner is a pre-processing method that can re-duce the condition number of the equation. If we havea matrix P such that κ ( P A ) < κ ( A ), we can transform this equation as: A (cid:126)x = (cid:126)b ⇔ P A (cid:126)x = P (cid:126)b. (24)Preconditioners are constructed from the raw equation,and there have been many types of preconditioners. How-ever, not all classical preconditioners could be directlytransplanted to quantum versions. First, the matrix mul-tiplication by the preconditioner should be computed ef-ﬁciently, namely within O (polylog( N )) time. Second, thepreconditioned matrix should also be sparse; otherwise,it cannot be eﬃciently solved by the QLS. Some precon-ditioners suitable for QLS have already proposed in [23]and [24]. We here display an example preconditioner: theblockwise Jacobi preconditioner, which is widely used inthe classical CFD solver. We implement the blockwiseJacobi preconditioner in our algorithm without aﬀectingthe asymptotic complexity on the problem size N. Apply Jacobi preconditioner to subprocedures

Jacobi preconditioner uses the inverse of the diagonalblock. For the raw linear equation, we construct sub-procedures as the input of the QLS. The preconditionedequation has a diﬀerent matrix and vector; therefore,these subprocedures should be modiﬁed.Let ˜ A = P A and (cid:126)R (cid:48) = P (cid:126)R , where P is the Jacobipreconditioner of the matrix A . The element of the P is P i (cid:48) ,k (cid:48) i,k = δ i,i (cid:48) B i (cid:48) ,k (cid:48) i,k , (25)where B i (cid:48) ,i, represents the inverse of the block A i (cid:48) ,i, .The element of ˜ A is˜ A i (cid:48) ,k (cid:48) i,k = (cid:88) j ∈ [0 ..N − l ∈ [0 ..n var − P i (cid:48) ,k (cid:48) j,l A j,li (cid:48) ,k (cid:48) . (26)The Jacobi preconditioner P is blockwise diagonal. Wecan simplify the equation (26) as:˜ A i (cid:48) ,k (cid:48) i,k = (cid:88) l ∈ [0 ..n var − ] P i,k (cid:48) i,l A i,li (cid:48) ,k (cid:48) . (27)This implies that computing a single element of ˜ A re-quires to queries n var elements of A . Another fact is thatthe sparsity matrices of ˜ A and A are the same when theyare symmetric to the diagonal line. This is often true be-cause in the diﬀerence scheme, i and i (cid:48) are related so that A i (cid:48) ,k (cid:48) i,k and A i,k (cid:48) i (cid:48) ,k are all non-zero elements.When it is eﬃcient to implement O A , O (cid:48) A will also beeﬃcient to implement. That is O (cid:48) A | i, k, i (cid:48) , k (cid:48) (cid:105)| (cid:105) = | i, k, i (cid:48) , k (cid:48) (cid:105)| ˜ A i (cid:48) ,k (cid:48) i,k (cid:105) . (28)First we query all elements required for computing theinverse at ( i, i (cid:48) ) block and compute the inverse, we have | i (cid:105)| B i,i, (cid:105) regs(A) . (29)We use B i (cid:48) ,i, to represent a matrix block with n ele-ments. The subscript “regs(A)” mean we require a groupof quantum registers to hold this matrix, marked by A.The corresponding block in A is also queried, | i, i (cid:48) (cid:105)| A i (cid:48) ,i, (cid:105) regs(B) , (30)Combining two register group A and B, we obtain thewanted element A i,k,i (cid:48) ,k (cid:48) . Computing one element re-quires n times of calls to the O A , namely propotionalto O ( s ).From the above derivation, the sparsity of ˜ A is samewith the A . Therefore O l remains unchanged.The subprocedure O b should also be modiﬁed. In theoriginal description of the memory layout, the sum treestores the pre-computed residual vector. In the precon-ditioned version, the (cid:126)R is replaced by (cid:126)R (cid:48) . When the (cid:126)U changes according to the sampling results, we need tocompute the preconditioned residual vector and updatethe sum tree. Computing any element of R (cid:48) i,k is still re-lated to all connected cells, which is R (cid:48) i,k = (cid:88) k (cid:48) ∈ [0 ..n var − P i,ki,k (cid:48) R i,k (cid:48) (31)The complexity of this process is also contributed by com-puting the diagonal block’s inverse of A .From the modiﬁed sum tree, constructing such O (cid:48) b issimilar to O (cid:48) A . VI. RUN TIME ANALYSIS

The time cost for the QCFD algorithm has two maincontributions. One is the cost of the initialization of thememory layout (initialization cost); the other is the timecomplexity between two iteration steps (evolution cost).

Initialization cost

In section IV A, we estimate thetime cost of the initialization stage. The conclusion isthat the initialization stage has O ( N ) time complexity.However, under the assumption about the QRAM’s ca-pability (see section II C), the initialization cost wouldnot cost much more than the preprocessing stage of theclassical FVM. Because the initialization of QCFD onlyprocesses once, but classical FVM has to preprocess it asmany times as the iteration steps, we believe this costwould not become the bottleneck of the QFVM algo-rithm. Evolution cost

Every evolution stage. In [9], Childset al. provide a linear solver algorithm with logarith-mic dependence on precision. They show that the querycomplexity of this algorithm of O A , O l and O b are O (cid:0) sκ polylog( sκ(cid:15) ) (cid:1) . Now we start to analyze the timecomplexity of constructing these subprograms from theinitial problem settings. According to the results in the previous sections, thenumber of queries to QRAM for implementing O A , O l ,and O b is O ( s ), O (1), and O (log N ), correspondingly.The time complexity of preconditioned O (cid:48) A has a multi-plier of O ( s ) contributed by computing the inverse ofthe diagonal blocks of ( A ). The preconditioned O (cid:48) b hasthe same complexity as O b .Now consider the time cost of sampling and updating.We run the QLS with O ( log N(cid:15) ) times to obtain an l ∞ -close classical vector. This becomes another multiplierto the time complexity of the quantum procedure.The last multiplier is the cost of querying the QRAM.As we have assumed, the QRAM use O (log N ) time toperform one query. By composing these results, the timecomplexity of the quantum procedure is O (cid:18) ( s + log N ) sκ log N(cid:15) polylog( sκ/(cid:15) ) (cid:19) . (32)The ﬁnal step is to update the sum tree. Updat-ing a preconditioned residual tree has two steps. Oneis to compute the preconditioned residual, where eachterm will involve in another inversion of the matrix A ,which is O ( s ); the other is to update the tree frombottom to the top, this involves O (log N ) times for onechange in the bottom of the tree. While at most O ( log N(cid:15) )terms of (cid:126)U changes, the time cost of updating the treeis O ( s log N/(cid:15) ). The total time complexity is the ad-dition of the quantum and the classical procedure. Be-cause the quantum procedure’s complexity is asymptot-ically greater than the classical’s, we conclude that theevolution time cost has the time complexity shown inequation (32).The classical counterpart’s time complexity is O ( N sκ log 1 /(cid:15) ) when using CG as the linear solver. Ouralgorithm outperforms the classical algorithm on theproblem size’s dependency but has worse performancewhen the problem requires high precision. When theproblem size N and the requirement of the precision (cid:15) has such relation N (cid:29) /(cid:15) , the quantum algorithm willpotentially have better performance on time. VII. NUMERICAL SIMULATION

To prove our algorithm’s eﬀectiveness, we performedsimulation on a test case to ﬁnd whether a big size prob-lem with low error sensitivity exists.An open-source classical CFD simulation software,SU2[25], is selected as the example. We appended quan-tum error in the simulation process, output the evolutionhistory, and compared it to the classical solver. The er-ror is implemented by biasing the solution vector with arandomized error to emulate the sampling process of the l ∞ tomography. Except for the error, we preserved allphysical problem conﬁgurations, including the mesh set-ting and physical parameters like temperature or Machnumber. The multigrid option was turned oﬀ so that thelinear equation will only be solved once in every iterationstep. The linear solver parameters are chosen to be asprecise as possible to emulate the case that the quantumlinear solver produces the result accurately.We select the inviscid ﬂow around Onera M6 airfoilas the test case, which has 108396 grid points, and thephysical governing equation is the three-dimensional Eu-ler equation. We compare the classical result with dif-ferent error settings: from 5e-2 to 1e-5. The results aredisplayed in ﬁgure 2. When the error is set to 5e-2 quicklydiverges. Except that, all cases converge correctly. Theblack line is the classical baseline. The maximum con-vergence error (cid:15) of this case is between 5e-2 and 3e-2,where quantum advantage preserves.To simulate the quantum eﬀect, we biased the solu-tion vector with a speciﬁc error to emulate the samplingprocess of the l ∞ tomography. This is implemented as apostprocessor of the linear solver.We focus on the convergence history of test cases anddeﬁne two kinds of error thresholds. The ﬁrst is “maxi-mum convergence error (cid:15) ”, the maximal value which al-lows the convergence. The problem will quickly diverge ifthe error is larger than the ﬁrst threshold. The second is“maximum stable error (cid:15) ”, where the problem will havethe same convergence history if the error is less than thisthreshold. The convergence speed will be gradually slowwhen the error gets larger between the ﬁrst and the sec-ond threshold. If the maximum stable error is large, wedeﬁne this kind of problem as “quantum friendly” be-cause the quantum algorithm is much likely to run fasterthan the classical. In this case, the problem size is large,satisfying N (cid:29) /(cid:15) . This provides evidence that quan-tum advantage can be realized in CFD problems. VIII. ERROR ANALYSIS

The time complexity of the QFVM has better perfor-mance on the number of grid cells N but worse on theprecision (cid:15) , which implies that the problem size shouldbe large enough to show the quantum advantage. On theother side, the numerical simulation shows that the pre-cision should be small. Otherwise, the time integrationwill not converge. There is the problem: if the precisionrequirement has some dependency on the problem size,the quantum acceleration will decrease or even vanish. Inthis section, we will provide evidence that the precisionwill not grow with the problem size.First, we calculate the total error generated by thequantum sampling with error bound (cid:15) speciﬁed. At onestep, we deﬁne the physical variable (cid:126)U and its update∆ (cid:126)U . In QFVM, the quantum process outputs a quantumstate | u (cid:105) which is propotional to ∆ (cid:126)U ,∆ (cid:126)U = (cid:107) ∆ (cid:126)U (cid:107) u. (33)The l ∞ tomography outputs a classical vector ˜ u which is (cid:15) -close to u . At any index i , we have u i = ˜ u i + e i , (34)where the error term | e i | < (cid:15) .Now we consider the amplitude of the e i . When per-forming l ∞ tomography, the output vector is a samplefrom the multinomial distribution where the samplingnumber M = C log N/(cid:15) and the probability distribution( | u | , | u | , ... | u N − | ). At any term, the standard er-ror of such sample is: σ i = (cid:112) M | u i | (1 − | u i | ). When | u i | is small enough, we have σ i ∼ | u i |√ N . Now we as-sume the error e i is approximatedly linear dependent onthe standard error σ i , thus we have e i = O ( σ i ) ∼ O ( | u i |√ N ) . (35)The update vector output by the QFVM should bemultiplied by (cid:107) ∆ (cid:126)U (cid:107) . As a result, the total error will beampliﬁed by this coeﬃcient. E i = (cid:107) ∆ U (cid:107) e i . (36)Compare two cases describing the same problem whereone has N cells and the other has kN (mark the variableswith extra prime, e.g. u (cid:48) ). We can assume the distribu-tion of ∆ (cid:126)U and ∆ (cid:126)U (cid:48) is the same because the physicalcharacteristic does not change. From this, we have (cid:107) ∆ U (cid:48) (cid:107) = k (cid:107) ∆ U (cid:107) , (37)because only the vector size changes to k times. From thedeﬁnition of u (equation (33)), this results in the decreaseof the amplitude of the u , i.e. u (cid:48) i = 1 √ k u i . (38)Combining equation (35), (36) and (38), we obtain that E i = O ( U i ). This result implies that the total errorgenerated by the quantum sampling will not change overthe problem size N . IX. CONCLUSION

This paper developed a quantum version of the ﬁnitevolume method (QFVM) for solving the CFD problemwith substantial speedup. Identical to the classical FVM,the QFVM is iterative and can output the evolution his-tory of the ﬂuid ﬂow in the computing space. The inputand output of the algorithm are both classical data. Inconclusion, we give the time complexity of between eachstep is O (cid:16) ( s +log N ) sκ log N(cid:15) polylog( sκ/(cid:15) ) (cid:17) , where N isthe number of cells (the problem size), κ is the conditionnumber of the linear equation, s is the sparsity of theJacobian matrix, and (cid:15) is the precision threshold of theoutput. Compared to the classical solver’s best time com-plexity, which has a linear dependency on the grid size0 FIG. 3. Results of the simulation. The test case is the three-dimensional inviscid ﬂow around the Onera M6 airfoil. a)Convergence history of the test case of the quantum solver with diﬀerent error setting, compared to the classical solver asthe baseline (use BCGStab as the linear solver). The error is set from 5e-2 to 1e-5. Except (cid:15) =5e-2, all other cases convergecorrectly. The maximum convergence error is approximately 3e-2. b) Flow density around the airfoil at quantum error (cid:15) =3e-2.This result is the same as the classical solver’s result displayed in subﬁgure d). c) Bad solution at (cid:15) =3e-2. The ﬂow density at20 iteration steps is not correctly computed. d) The solution output by the classical solver. N , this algorithm is exponentially faster. The speedupwould be signiﬁcant when we choose an extremely large N , which allows the quantum computer to solve complexCFD problems, such as tackling a larger space or ﬁnermesh.We use the QLS to accelerate the solution of the lin-ear equation. However, previous works often ignore thecost of transferring data between quantum and classicalcomputers. To achieve an eﬃcient transfer, we design amemory layout based on the QRAM. The memory layoutstores the problem conﬁguration and the internal result.With a quantum parallel query, one can implement quan-tum subprocedures necessary for the QLS; it is also up-dated eﬃciently when the algorithm iterates over steps.Numerical simulations are conducted to check whetherthe ﬁnal result is aﬀected by the quantum error. Theproblem size of the test case is 1 e

5, and the QCFD solverconverges correctly at (cid:15) =3e-2, after around 200 steps.This result shows that CFD could have strong error tol- erance with large problem size; therefore, the quantumadvantage preserves.Our future work will focus on how precision aﬀects theﬁnal result and the convergence history, and how to opti-mize it. We believe that the quantum computer will showits advantage in solving a more complex CFD problemshortly.

ACKNOWLEDGMENTS

This work was supported by the National Natural Sci-ence Foundation of China (Grants Nos. 11625419), theNational Key Research and Development Program ofChina (Grant No. 2016YFA0301700), the Strategic Pri-ority Research Program of the Chinese Academy of Sci-ences (Grant No. XDB24030600), and the Anhui Initia-tive in Quantum Information Technologies (Grants No.AHY0800000). [1] J. R. Shewchuk,

An Introduction to the Conjugate Gradi-ent Method Without the Agonizing Pain (Carnegie Mel- lon University, 1994). [2] P. W. Shor, Polynomial-time algorithms for prime factor-ization and discrete logarithms on a quantum computer,SIAM review , 303 (1999).[3] I. M. Georgescu, S. Ashhab, and F. Nori, Quantum sim-ulation, Reviews of Modern Physics , 153 (2014).[4] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, andR. D. Somma, Exponential improvement in precisionfor simulating sparse hamiltonians, in Proceedings of theForty-sixth Annual ACM Symposium on Theory of Com-puting , STOC ’14 (ACM, New York, NY, USA, 2014) pp.283–292.[5] D. W. Berry, A. M. Childs, and R. Kothari, Hamilto-nian simulation with nearly optimal dependence on allparameters, in (IEEE, 2015) pp. 792–809.[6] P. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero,J. R. McClean, R. Barends, J. Kelly, P. Roushan,A. Tranter, N. Ding, et al. , Scalable quantum simula-tion of molecular energies, Physical Review X , 031007(2016).[7] A. W. Harrow, A. Hassidim, and S. Lloyd, Quantum al-gorithm for linear systems of equations, Physical reviewletters , 150502 (2009).[8] A. Ambainis, Variable time amplitude ampliﬁcation anda faster quantum algorithm for solving systems of linearequations, arXiv preprint arXiv:1010.4458 (2010).[9] A. M. Childs, R. Kothari, and R. D. Somma,Quantum algorithm for systems of linear equationswith exponentially improved dependence on preci-sion, SIAM Journal on Computing , 1920 (2017),https://doi.org/10.1137/16M1087072.[10] A. M. Childs, J.-P. Liu, and A. Ostrander, High-precisionquantum algorithms for partial diﬀerential equations,arXiv preprint arXiv:2002.07868 (2020).[11] F. Fillion-Gourdeau and E. Lorin, Simple digital quan-tum algorithm for symmetric ﬁrst order linear hyperbolicsystems, arXiv preprint arXiv:1705.09361 (2017).[12] P. C. S. Costa, S. Jordan, and A. Ostrander, Quantum al-gorithm for simulating the wave equation, arXiv preprintarXiv:1711.05394 10.1103/PhysRevA.99.012323 (2017).[13] Y. Cao, A. Papageorgiou, I. Petras, J. Traub, andS. Kais, Quantum algorithm and circuit design solv- ing the poisson equation, arXiv preprint arXiv:1207.248510.1088/1367-2630/15/1/013021 (2012).[14] S. Aaronson, Read the ﬁne print, Nature Physics , 291(2015).[15] V. Giovannetti, S. Lloyd, and L. Maccone, Architecturesfor a quantum random access memory, Physical ReviewA , 052310 (2008).[16] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum ran-dom access memory, Physical review letters , 160501(2008).[17] C. T. Hann, C.-L. Zou, Y. Zhang, Y. Chu,R. J. Schoelkopf, S. M. Girvin, and L. Jiang,Hardware-eﬃcient quantum random access memorywith hybrid quantum acoustic systems, arXiv preprintarXiv:1906.11340 (2019).[18] R. Naik, N. Leung, S. Chakram, P. Groszkowski, Y. Lu,N. Earnest, D. McKay, J. Koch, and D. Schuster, Ran-dom access quantum information processors using multi-mode circuit quantum electrodynamics, Nature commu-nications , 1904 (2017).[19] I. Kerenedis and A. Prakash, Quantum recommendationsystems, (2016).[20] G. Brassard, P. Hoyer, M. Mosca, and A. Tapp, Quan-tum amplitude ampliﬁcation and estimation, (2000),arXiv:quant-ph/0005055.[21] L. Grover and T. Rudolph, Creating superpositions thatcorrespond to eﬃciently integrable probability distribu-tions, (2002).[22] K. Iordanis, L. Jonas, and A. Prakash, quantum algo-rithms for deep convolutional neural networks, arXivpreprint arXiv:1911.01117 (2019).[23] B. D. Clader, B. C. Jacobs, and C. R. Sprouse, Precon-ditioned quantum linear system algorithm 10.1103/Phys-RevLett.110.250504 (2013), arXiv:1301.2340.[24] C. Shao and H. Xiang, Quantum circulant precon-ditioner for linear system of equations 10.1103/Phys-RevA.98.062321 (2018), arXiv:1807.04563.[25] T. D. Economon, F. Palacios, S. R. Copeland, T. W.Lukaczyk, and J. J. Alonso, SU2: An Open-Source Suitefor Multiphysics Simulation and Design, AIAA Journal54