VVolterra filters for quantum estimation and detection
Mankei Tsang
1, 2, ∗ Department of Electrical and Computer Engineering,National University of Singapore, 4 Engineering Drive 3, Singapore 117583 Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117551 (Dated: September 12, 2018)The implementation of optimal statistical inference protocols for high-dimensional quantum sys-tems is often computationally expensive. To avoid the difficulties associated with optimal techniques,here I propose an alternative approach to quantum estimation and detection based on Volterra fil-ters. Volterra filters have a clear hierarchy of computational complexities and performances, dependonly on finite-order correlation functions, and are applicable to systems with no simple Markovianmodel. These features make Volterra filters appealing alternatives to optimal nonlinear protocols forthe inference and control of complex quantum systems. Applications of the first-order Volterra filterto continuous-time quantum filtering, the derivation of a Heisenberg-picture uncertainty relation,quantum state tomography, and qubit readout are discussed.
I. INTRODUCTION
The advance of quantum technologies relies on ourability to measure and control complex quantum sys-tems. An important task in quantum control is to in-fer unknown variables from the noisy measurements ofa quantum system. Examples include the prediction ofquantum dynamics for measurement-based feedback con-trol [1–5] and the estimation and detection of weak sig-nals [6–23]. To implement the signal processing for suchtasks, a Bayesian decision-theoretic formulation of op-timal quantum statistical inference is now well estab-lished [1–7, 17–19]. The quantum filtering theory pio-neered by Belavkin [24, 25] for the optimal prediction ofquantum dynamics has especially been hailed as a semi-nal achievement in quantum control theory; its applica-tions to measurement-based cooling [26], squeezing [27],state preparation [28], quantum error correction [29, 30],qubit readout [20–22], and quantum state tomography[11–14] in atomic, optical, optomechanical, condensed-matter, and superconducting-microwave-circuit systems[1] have been studied extensively in the literature.Although optimal quantum inference has been suc-cessful experimentally for low-dimensional systems, suchas qubits [31] and few-photon systems [32], as well asnear-Gaussian systems, such as optical phase estimation[23] and optomechanics [33], its implementation for high-dimensional non-Gaussian quantum systems is beset withdifficulties in practice. An exact implementation of thequantum Bayes rule [2] for optimal inference requires nu-merical updates of the posterior density matrix based onthe measurement record. Except for special cases suchas Gaussian systems [1], the number of elements neededto keep track of the density matrix scales exponentiallywith the degrees of freedom, making the implementationprohibitive for many-body non-Gaussian systems. Thisproblem, known as the curse of dimensionality, means ∗ [email protected] that approximations must often be sought [26, 29, 30, 34–37]. Current approximation techniques for dynamicalsystems include Gaussian approximations [13, 26, 34],phase-space particle filters [36], Hilbert-space truncation[30, 35], and manifold learning [37], but these techniquesprovide little assurance about their actual errors and of-ten remain too expensive to compute for real-time con-trol of high-dimensional systems. Another problem withoptimal inference and the associated stochastic-master-equation approach is its reliance on a Markovian model,which is difficult to use for many complex systems, espe-cially those with 1 /f or fractional noise statistics. Withthe ongoing trend of increasing complexity in quantumexperiments, not only with condensed matter but alsowith optomechanics [38], atomic ensembles [39], and su-perconducting circuits [40], optimal inference is becomingan unattainable goal in practice.Against this backdrop, here I propose an alternativeapproach to quantum estimation and detection based onVolterra filters. Instead of seeking absolute optimality,Volterra filters are a class of polynomial estimators witha clear hierarchy of computational complexities and esti-mation errors [41]. Their applications to quantum esti-mation and detection promise to solve many of the practi-cal problems associated with optimal quantum inference,including the curse of dimensionality, the lack of error as-surances upon approximations, and the need for a Marko-vian model. The filter errors also provide a set of up-per error bounds on the Bayesian quantum Cram´er-Rao[6, 7, 42], Ziv-Zakai [43], and Helstrom [6, 7, 44] bounds,forming novel hierarchies of fundamental uncertainty re-lations and may be of independent foundational interest.The Volterra series has recently been used to model theinput-output relations of a quantum system [45], but myfocus here is different and concerns the estimation of hid-den observables and hypothesis testing given the outputmeasurement record. a r X i v : . [ qu a n t - ph ] D ec II. QUANTUM ESTIMATIONA. Formalism
Consider a quantum system in the Heisenberg picturewith initial density operator ρ . Let y = y (1) y (2)... y ( K ) (2.1)be a column vector of observables under measurement.For example, y can be the observables of an outputoptical field under homodyne, heterodyne, or photon-counting measurements. Given a measurement recordof y , the goal of quantum estimation is to infer a columnvector of hidden observables x ≡ x (1) x (2)... x ( J ) . (2.2)For example, x can be the observables of a quantum sys-tem that has interacted with the optical field, such as theposition of a quantum mechanical oscillator or a spin op-erator of an atomic ensemble, and the goal of the estima-tion is to infer x given the measurement record. Quantumestimation is usually framed in the Schr¨odinger picturevia the concept of posterior density operator [1, 2], but itcan be shown to be equivalent to the Heisenberg-pictureapproach adopted here [4]. This task is especially impor-tant for measurement-based feedback control [1], such asmeasurement-based cooling and squeezing, to gain real-time information about quantum degrees of freedom andto reduce their uncertainties via feedback control. Ex-periments that implement quantum estimation have beenreported in Refs. [31–33] for example.The estimation error has a well-defined decision-theoretic meaning if all the x and y operators commutewith one another, such that x and y can be jointly mea-sured and treated as classical random variables in thesame probability space [4, 7, 46]. This assumption is ap-plicable to a wide range of scenarios, including quantumfiltering [4, 46] and the estimation of any classical param-eter or waveform coupled to a quantum system [17, 47].Since x and y are compatible observables, the rest ofthe estimation theory is identical to the classical treat-ment [41]. Let ˇ x ( j | y ) be an estimator of x ( j ) given y ,and assume that the estimator is given by the truncatedVolterra series, viz.,ˇ x ( j | y ) = P (cid:88) p =0 (cid:88) ≤ k ≤ k ≤···≤ k p ≤ K h p ( j, k , k , . . . , k p | θ ) × y ( k ) y ( k ) . . . y ( k p ) , (2.3) where θ is a vector of tunable parameters, P is the orderof the series and quantifies the complexity of the filter,and the zeroth-order term is simply a constant h ( j ) anddoes not depend on y . For P → ∞ , the series can beregarded as the Taylor series for an arbitrary estimator,although I will focus on finite P .A useful trick to simplify the notations is to define theset of all products of y elements up to order P as y ( P ) ≡ (cid:8) , y, y ⊗ , . . . , y ⊗ P (cid:9) , (2.4)where y ⊗ p ≡ { y ( k ) y ( k ) . . . y ( k p );1 ≤ k ≤ k ≤ · · · ≤ k p ≤ K } (2.5)is the set of all p th-order products of y elements. Thenthe Volterra series in Eq. (2.3) can be rewritten asˇ x ( j | y ) = (cid:88) µ h ( P ) ( µ | θ ) y ( P ) ( µ ) , (2.6)where h ( P ) is a linear filter with respect to y ( P ) but equiv-alent to the Volterra filter that is nonlinear with respectto y , and µ is a composite index that goes through allelements in y ( P ) .Define (cid:104) f ( x, y ) (cid:105) ≡ tr [ ρf ( x, y )] (2.7)as the expectation of any function of x and y , with trdenoting the operator trace. Let the error covariancematrix beΣ( j, k ) ≡ (cid:104) [ x ( j ) − ˇ x ( j | y )] [ x ( k ) − ˇ x ( k | y )] (cid:105) . (2.8)The absolutely minimum mean-square error for arbitraryestimators is achieved by the conditional expectation of x given y [4]. For the optimal filtering and prediction ofquantum observables for example, the usual method is tocompute the posterior density operator ρ ( y ) conditionedon the measurement record y in the Schr¨odinger pictureusing the Kraus operators that characterize the measure-ments [1, 2], and then take the conditional expectationgiven by ˇ x ( j | y ) = tr[ x S ( j ) ρ ( y )], with x S ( j ) being theSchr¨odinger picture of x ( j ). If the continuous-time limitis taken, the posterior density operator obeys the cele-brated stochastic master equation [1–4] first proposed byBelavkin [24, 25]. The computation of ρ ( y ) suffers fromthe curse of dimensionality however. To restrict the com-plexity, consider here instead the error of the P th-orderVolterra filter given byΣ ( P ) ( j, k | θ ) = (cid:42)(cid:34) x ( j ) − (cid:88) µ h ( P ) ( j, µ | θ ) y ( P ) ( µ ) (cid:35) × (cid:34) x ( k ) − (cid:88) µ h ( P ) ( k, µ | θ ) y ( P ) ( µ ) (cid:35)(cid:43) (2.9)= C x ( j, k ) − (cid:88) µ h ( P ) ( j, µ | θ ) C xy ( P ) ( k, µ ) − (cid:88) µ h ( P ) ( k, µ | θ ) C xy ( P ) ( j, µ )+ (cid:88) µ,ν h ( P ) ( j, µ | θ ) h ( P ) ( k, ν | θ ) C y ( P ) ( µ, ν ) , (2.10)where C x ( j, k ) ≡ (cid:104) x ( j ) x ( k ) (cid:105) , (2.11) C xy ( P ) ( j, µ ) ≡ (cid:68) x ( j ) y ( P ) ( µ ) (cid:69) , (2.12) C y ( P ) ( µ, ν ) ≡ (cid:68) y ( P ) ( µ ) y ( P ) ( ν ) (cid:69) . (2.13)To optimize the Volterra filter, one can seek the pa-rameters θ that minimize any desired component ofΣ ( P ) ( j, k | θ ) in Eq. (2.10), which has the remarkablefeature of depending only on finite-order correlations.Specifically, C xy ( P ) ( j, µ ) depends on the correlation be-tween x ( j ) and products of y elements up to the P th or-der, and C y ( P ) depends on the correlations among y up tothe 2 P th order. Stationarity assumptions and frequency-domain techniques can further simplify the expressions.Quantum mechanics comes into the problem throughthe correlations. They must obey uncertainty relationswith other incompatible observables [7, 48]. They can vi-olate Bell [49] and Leggett-Garg [50] inequalities, requir-ing different probability spaces for different experimen-tal settings. They may result from nontrivial internalquantum dynamics with no classical correspondence; thepromise of quantum computation and simulation [51] isin fact based on the difficulty of reproducing quantum dy-namical statistics using any hidden-variable model. Thisdifficulty also means that attempts to simplify quantumfilters via classical models [26, 34, 36] are likely to beinaccurate for highly nonclassical systems. The Volterrafilters sidestep the issue via a manifestly non-Markovianapproach that does not require an online simulation ofthe internal quantum dynamics. The identification of thecorrelations and the filter synthesis, though nontrivial,can be done offline for control applications.A challenge for classical applications of Volterra filtersis that the correlations are often difficult to model or mea-sure in practice, but it is less problematic for quantumsystems: computing and measuring correlation functionsis already a major endeavor in condensed-matter physics[52] and early quantum optics [53] with an extensive lit-erature. The Volterra-series approach to input-output analysis [45] should also help their simulation. Com-pared with the stochastic-master-equation approach [1–4], the use of correlation functions has the advantage ofnot requiring a Markovian model or stochastic calculus,although the Volterra filters may require a longer memorydepending on the time scales of the correlation functionsand the signal-to-noise properties. An empirical alter-native to prior system identification is to train the filterdirectly using experimental or simulated data to mini-mize the sample errors.I now consider the ideal case where arbitrary Volterrafilters can be implemented, such that the tunable param-eters θ are all elements of h ( P ) . Since Σ ( P ) is quadraticwith respect to h ( P ) , the minimization can be performedanalytically. Define the risk function [54] to be minimizedas R ( θ ) ≡ (cid:88) j,k u ( j )Σ ( P ) ( j, k | θ ) u ( k ) , (2.14)where u is an arbitrary real vector. The optimal Volterrafilter ˜ h ( P ) ≡ arg min h ( P ) R ( h ( P ) ) (2.15)for arbitary u satisfies the equation C xy ( P ) ( j, ν ) = (cid:88) µ ˜ h ( P ) ( j, µ ) C y ( P ) ( µ, ν ) , (2.16)which is a system of linear equations with respect to ˜ h ( P ) and can be solved by conventional methods, and the re-sulting error covariance matrix is˜Σ ( P ) ( j, k ) ≡ Σ ( P ) ( j, k | ˜ h ( P ) ) (2.17)= C x ( j, k ) − (cid:88) µ ˜ h ( P ) ( j, µ ) C xy ( P ) ( k, µ ) . (2.18)This error can be computed offline to evaluate the op-timal performance of a Volterra filter and the trade-offbetween the error and the filter complexity P . Going toa higher order is guaranteed not to increase the error,since ˜Σ ( P ) ≤ ˜Σ ( Q ) if P > Q (a higher-order filter canalways achieve the performance of a lower-order filter byignoring the higher-order terms in y ( P ) ). As the infinite-order Volterra filter can be regarded as the Taylor seriesfor an arbitrary function, ˜ h ( ∞ ) will be the optimal amongarbitrary estimators and ˜Σ ( ∞ ) will coincide with the ab-solutely optimal error. ˜Σ ( P ) thus provides a hierarchy ofincreasingly tight upper error bounds for optimal quan-tum inference. Most importantly, a finite-order Volterrafilter can still enjoy a performance given by Eq. (2.18)for any statistics, even if it is not optimal in the abso-lute sense. On a fundamental level, it is interesting tonote that, if x is classical, the upper error bounds alsoapply to the Bayesian quantum Cram´er-Rao [6, 7, 42]and Ziv-Zakai [43] lower error bounds, forming a novelset of operationally motivated uncertainty relations; anexample is shown in Sec. II C.The optimal P = 0 Volterra filter does not process themeasurement and is simply given by the prior expecta-tion (cid:104) x (cid:105) . The P = 1 Volterra filter is a linear filter withrespect to y and deserves special attention, as it is thesimplest Volterra filter beyond the trivial zeroth-ordercase and will likely become the most popular. If x and y are jointly Gaussian, the optimal linear filter is alsothe optimal among arbitrary estimators and equivalentto the Kalman filter when applied to the prediction ofMarkovian dynamical systems [55], but the linear filtercan still be used for any non-Gaussian or non-Markovianstatistics and depends only on the second-order correla-tions in terms of x and y . B. Continuous-time quantum filtering
For example, consider the continuous-time quantumfiltering and prediction problem, which is to estimate aHeisenberg-picture observable x ( t ) given the past mea-surement record { y ( τ ); t ≤ τ ≤ T < t } [4]. It can beshown that all the Heisenberg-picture operators underconsideration commute with one another under rathergeneral conditions for filtering and prediction [4, 46]. If t < T is desired for smoothing [17], care should be takenin the modeling to ensure that x ( t ) still commutes with y and an operational meaning of the estimation error ex-ists. For example, a c-number signal, such as a classicalforce, commutes with all operators by definition.To transition from the discrete formalism to continoustime, define a discrete time given by t j = t + jδt, (2.19)with initial time t , integer j , and time interval δt .For infinitesimal δt , the linear P = 1 estimator in thecontinuous-time limit becomesˇ x ( t | y ) = h ( t ) + (cid:90) Tt dτ h ( t, τ ) y ( τ ) , (2.20)where ˇ x ( t | y ), h ( t ), h ( t, τ ), and y ( τ ) are continuous-time versions of ˇ x ( j | y ), h ( j ), h ( j, k ) /δt , and y ( k ), re-spectively. Eq. (2.20) is a continuous-time limit of theVolterra series in Eq. (2.3) for P = 1. Assuming zero-mean x and y for simplicity and using Eqs. (2.16) and(2.18), the optimal linear filter ˜ h ( t, τ ) and the corre-sponding mean-square error ˜Σ (1) ( t, t ) can be expressedas C xy ( t, τ ) = (cid:90) Tt ds ˜ h ( t, s ) C y ( s, τ ) , (2.21)˜Σ (1) ( t, t ) = C x ( t, t ) − (cid:90) Tt dτ ˜ h ( t, τ ) C xy ( t, τ ) , (2.22) where C x ( t, t ) ≡ (cid:10) x ( t ) (cid:11) , (2.23) C xy ( t, τ ) ≡ (cid:104) x ( t ) y ( τ ) (cid:105) , (2.24) C y ( t, τ ) ≡ (cid:104) y ( t ) y ( τ ) (cid:105) (2.25)are the only correlation functions needed to computeboth the filter and the error. Although this form of theoptimal linear estimator is known in the classical con-text [55], its applicability to quantum systems with anynonlinear dynamics and non-Gaussian statistics is hith-erto unappreciated. Compared with the stochastic mas-ter equation, the linear filter can be more easily imple-mented using fast digital electronics or even analog elec-tronics in practice [23, 56] for measurement-based feed-back control, while the implementation of higher-orderfilters is more involved but can leverage existing digital-signal-processing techniques [41]. C. Heisenberg-picture uncertainty relation
To demonstrate a side consequence of the Volterra-filter formalism, here I use the analytic error expressionfor the first-order Volterra filter to derive a quantum un-certainty relation for Heisenberg-picture operators. Con-sider the Hamiltonian H ( t ) = H ( t ) − qx ( t ), where q is acanonical position operator, x ( t ) is a classical force, and H is the rest of the Hamiltonian. Suppose that H isat most quadratic with respect to canonical position andmomentum operators, such that the equations of motionfor those operators in the Heisenberg picture are linear.The initial density operator ρ , on the other hand, canhave any non-Gaussian statistics.Consider an output field quadrature operator y ( t ) thatcommutes with itself at different times in the Heisenbergpicture [4]. For example, it can model the homodynemeasurement of an output optical field in optomechanics.It can be shown that y ( t ) = y ( t ) + (cid:90) T dtg ( t, τ ) x ( τ ) , (2.26)where g ( t, τ ) = (cid:26) i (cid:126) [ y ( t ) , q ( τ )] , t > τ, , t ≤ τ, (2.27)is the causal c-number commutator and the subscript 0denotes the interaction picture with respect to the Hamil-tonian H .Without loss of generality, assume that x ( t ), y ( t ), and q ( t ) are zero-mean processes. Consider the estimationof x ( t ) using the record { y ( τ ); 0 < τ ≤ T } . If y ( t ) hasnon-Gaussian statistics, the optimal nonlinear estimatoris difficult to derive, but the first-order Volterra filtergiven by ˇ x ( t | y ) = (cid:90) T dτ h ( t, τ ) y ( τ ) (2.28)can be analyzed more easily. To proceed, it is more con-venient to consider discrete time as defined in Eq. (2.19).Regarding x , y , y , and ˇ x as column vectors and g and h as matrices, Eqs. (2.26) and (2.28) can be rewrittenin matrix form as y = y + δtgx, (2.29)ˇ x = δth y. (2.30)With covariance matrices defined as C x ≡ (cid:10) xx (cid:62) (cid:11) , (2.31) C y ≡ (cid:10) y y (cid:62) (cid:11) , (2.32) C xy ≡ (cid:10) xy (cid:62) (cid:11) = δtC x g (cid:62) , (2.33) C y ≡ (cid:10) yy (cid:62) (cid:11) = δt gC x g (cid:62) + C y , (2.34)where (cid:62) denotes the matrix transpose, the optimal linearfilter becomes δt ˜ h = C xy C − y = δtC x g (cid:62) (cid:0) δt gC x g (cid:62) + C y (cid:1) − , (2.35)and the error covariance matrix becomes˜Σ (1) ≡ (cid:28)(cid:16) x − δt ˜ h y (cid:17) (cid:16) x − δt ˜ h y (cid:17) (cid:62) (cid:29) (2.36)= C x − δt ˜ h C (cid:62) xy (2.37)= C x − δt C x g (cid:62) (cid:0) δt gC x g (cid:62) + C y (cid:1) − gC x (2.38)= (cid:0) C − x + δt g (cid:62) C − y g (cid:1) − , (2.39)where the last line uses the matrix inversion lemma [57].The error covariance can be compared with theBayesian quantum Cram´er-Rao bound derived inRef. [42]. The quantum bound for Gaussian x resultsin a matrix inequality given by˜Σ (1) ≥ (cid:18) C − x + 4 δt (cid:126) C q (cid:19) − , (2.40)where C q ( t j , t k ) ≡ (cid:104) q ( t j ) q ( t k ) + q ( t k ) q ( t j ) (cid:105) . (2.41)Unlike x ( t ) and y ( t ), q ( t ) may not self-commute at differ-ent times, and the symmetric ordering in the covariancefunction [58] arises naturally from the derivation of thequantum bound in Ref. [42]. Comparing Eq. (2.39) andEq. (2.40), it can be seen that the inequality holds onlyif g (cid:62) C − y g ≤ (cid:126) C q , (2.42)which is a matrix uncertainty relation between two quan-tum processes in the Heisenberg picture involving theircausal commutator g . Note that y and q are canonicalphase-space coordinate operators with linear dynamicsbut need not have Gaussian statistics. The end result does not involve x and can be applied to any quantumsystem that satisfies the stated assumptions beyond theestimation scenario. The estimation procedure nonethe-less gives the relation a clear operational meaning.Eq. (2.42) can be further simplified by assuming linear-time-invariant dynamics and stationary statistics. Theresult in the continuous long-time limit is a spectral un-certainty relation given by S y ( ω ) S q ( ω ) ≥ (cid:126) | G ( ω ) | , (2.43)with the frequency-domain quantities defined by C y ( t, τ ) = (cid:90) ∞−∞ dω π S y ( ω ) exp [ iω ( t − τ )] , (2.44) C q ( t, τ ) = (cid:90) ∞−∞ dω π S q ( ω ) exp [ iω ( t − τ )] , (2.45) g ( t, τ ) = (cid:90) ∞−∞ dω π G ( ω ) exp [ iω ( t − τ )] . (2.46)The spectral relation imposes a lower bound on the noisefloor of an output operator y ( t ) in terms of the spectrumof a noncommuting operator q ( t ). For example, the re-lation can be used to determine the fundamental limit tothe noise floor of optical homodyne detection as a func-tion of the mechanical-position power spectral density fora gravitational-wave detector [8, 59]. The inequality canbe saturated if the quantum statistics are Gaussian [42]. D. Quantum state tomography
For an application in quantum information processing,consider the estimation of parameters in a density matrix,also known as quantum state tomography [9–15]. Assumea d × d density matrix of the form ρ z = Id + d − (cid:88) α =1 z α E α , (2.47)where I is the identity matrix, E α is a set of Hermitian,traceless, and orthonormal matrices that satisfy E α = E † α , tr E α = 0 , tr E α E β = δ αβ , (2.48)and z is a column vector of real unknown parameters. ρ z is Hermitian and tr ρ z = 1 by construction, and thedensity matrix describes a physical quantum state only if ρ z is positive-semidefinite [11]. Measurements can oftenbe modeled as [11] y = Az + y , (2.49)where y is a column vector, A is a known measurementmatrix, and y is a zero-mean noise vector. The main dif-ficulty with the Bayesian estimation protocol [10, 13, 15]is that, owing to the physical-state requirement, the priorfor z is highly non-Gaussian, while the statistics of y mayalso be non-Gaussian. With the non-Gaussian statisticsand d scaling exponentially with the degrees of freedom,exact Bayesian estimation of z would suffer from thecurse of dimensionality. Existing approximation tech-niques include Gaussian approximations [13] and particlefilters [15], but their actual estimation errors remain un-clear.The Volterra filters can be used despite the non-Gaussianity of z or y . Let x = Bz (2.50)be a column vector of parameters to be estimated for agiven sampling matrix B . Note that B can be a non-square matrix and the number of elements in x can bemuch smaller than that in z if the dimensionality of thelatter is a concern. For example, the fidelity betweenthe density matrix and a target pure state [60] can beexpressed in this way, in which case B is a row vectorand x is a scalar. The optimal first-order filter can beexpressed asˇ x = B (cid:104) z (cid:105) + ˜ h ( y − A (cid:104) z (cid:105) ) , (2.51)˜ h = BC z A (cid:62) (cid:0) AC z A (cid:62) + C y (cid:1) − , (2.52) C z ≡ (cid:10) zz (cid:62) (cid:11) − (cid:104) z (cid:105) (cid:104) z (cid:105) (cid:62) . (2.53)The filter is guaranteed to offer an error covariance ma-trix given by˜Σ (1) = B (cid:0) C − z + A (cid:62) C − y A (cid:1) − B (cid:62) . (2.54)The linear complexity and the error guarantee are themain advantages of the Volterra filter. A shortcomingis that, due to noise and the lack of a constraint in thealgorithm, the estimate ˇ x may not lead to a positive-semidefinite density matrix. If this is a problem, an ob-vious remedy is to find the physical x closest to ˇ x withrespect to a distance measure. A more sophisticated wayis to compute the posterior distribution over a region nearˇ x with a volume suggested by ˜Σ (1) . If the noise is lowenough or the number of trials is large enough such that˜Σ (1) is small, the region needs to cover a small parame-ter subspace only, and the curse of dimensionality can beavoided.The remaining issue is the choice of prior (cid:104) z (cid:105) and C z in an objective manner. One option is to take one ofthe commonly used objective priors for z [11, 15] andcompute its moments. For d = 2 and z being the Blochvector, the prior moments can be easily calculated bytaking advantage of the Bloch spherical symmetry. Thecomputation seems nontrivial for d ≥
3, but for each d itneeds to be done just once and for all.The most conservative and arguably paranoid optionis to choose a prior that is least favorable to the Volterrafilter. Given a prior probability measure π z on z , onecan define a risk function, such as the Hilbert-Schmidtdistance given by R ( π z ) = tr ˜Σ (1) ( π z ) . (2.55) Then the least favorable prior is one that maximizes therisk while still observing the physical constraint on ρ x ,that is, arg max π z ; ρ z ≥ R ( π z ) . (2.56)Note that this prior depends in general on the measure-ment matrix A as well as the sampling matrix B . With-out the physical constraint, the least favorable C z wouldbe infinite, giving˜Σ (1) ≤ B (cid:0) A (cid:62) C − y A (cid:1) − B (cid:62) , (2.57)and the Volterra filter would become equivalent to theunconstrained maximum-likelihood estimator for Gaus-sian y . The effect of a finite C z is to pull the estimatefrom the maximum-likelihood value towards the prior (cid:104) x (cid:105) via the weighted average given by Eq. (2.51). III. QUANTUM DETECTIONA. Formalism
Assume two hypotheses denoted by H and H . Thesehypotheses can be about the initial density operator aswell as the dynamics and measurements of the quantumsystem [18]. As before, let the measured Heisenberg-picture observables be y with commuting elements underboth hypotheses. The goal of detection is equivalent tobinary hypothesis testing, which is to make a decision on H or H based on y . Applications include force detec-tion [8, 18, 44], fundamenal tests of quantum mechanics[18, 19, 38, 61], quantum error correction [1, 29], andqubit readout [20–22]. Prior work on the use of Volterrafilters for classical detection focuses on the heuristic de-flection criterion [62, 63], but it does not seem to haveany decision-theoretic meaning or relationship with themore rigorous criteria of error probabilities [63]. Here Ipropose a similar performance criterion that is able toprovide an upper bound on the average error probability,while still offering a simple design rule for the Volterrafilters. To my knowledge the proposed design rule is newalso in the context of classical detection theory.Let λ ( y ) be a test statistic as a polynomial function of y similar to Eq. (2.3). For later notational convenience,I will rewrite it as λ ( y ) = h + H (cid:62) Y, (3.1)where the zeroth-order term h is written separately, Y is a column vector with the elements in { y, y ⊗ , . . . , y ⊗ P } without the constant term 1, H is a column vector withthe corresponding elements in h ( P ) , and (cid:62) denotes thetranspose. Let (cid:104) f ( y ) (cid:105) be the expectation of a functionof y given hypothesis H , and (cid:104) f ( y ) (cid:105) be the expectationgiven hypothesis H . Note that the hypotheses can beabout the initial density operator, the dynamics, and thedefinition of y .I demand the test statistic to have different expecta-tions for the two hypotheses, viz., (cid:104) λ (cid:105) (cid:54) = (cid:104) λ (cid:105) . (3.2)This means that the order P cannot be arbitrary butmust be high enough to result in different expectations. Ifurther demand the expectations to be symmetric around0, viz., (cid:104) λ (cid:105) + (cid:104) λ (cid:105) = 0 . (3.3)This is accomplished by setting h = − H (cid:62) ¯ Y , (3.4)¯ Y ≡
12 ( (cid:104) Y (cid:105) + (cid:104) Y (cid:105) ) , (3.5)resulting in (cid:104) λ (cid:105) = − (cid:104) λ (cid:105) = H (cid:62) ∆ , (3.6)∆ ≡
12 ( (cid:104) Y (cid:105) − (cid:104) Y (cid:105) ) . (3.7)Without loss of generality, I assume (cid:104) λ (cid:105) = H (cid:62) ∆ > H if λ < H if λ ≥
0. This is commonly expressed as [55] λ ( y ) H ≷ H . (3.8)The average error probability becomes P e ( H ) = π (cid:104) λ ≥ ( y ) (cid:105) + π (cid:104) λ< ( y ) (cid:105) , (3.9)where π and π are the prior probabilities for the hy-potheses and 1 λ ≥ and 1 λ< are indicator functions.Since P e ( H ) in general depends on infinite orders of λ moments, I appeal to the Cantelli inequality [64] to ob-tain (cid:104) λ ≥ ( y ) (cid:105) ≤ (cid:10) λ (cid:11) − (cid:104) λ (cid:105) (cid:104) λ (cid:105) , (3.10)and similarly for (cid:104) λ< ( y ) (cid:105) . This leads to upper boundson P e given by P e ( H ) ≤ Q ( H ) ≤ R ( H ) , (3.11) Q ( H ) ≡ π H (cid:62) ∆) / ( H (cid:62) C H )+ π H (cid:62) ∆) / ( H (cid:62) C H ) , (3.12) R ( H ) ≡ H (cid:62) ( π C + π C ) H ( H (cid:62) ∆) , (3.13)where C ≡ (cid:10) Y Y (cid:62) (cid:11) − (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:62) , (3.14) C ≡ (cid:10) Y Y (cid:62) (cid:11) − (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:62) (3.15) are the conditional covariance matrices. 1 / R can be re-garded an output signal-to-noise ratio and has a similarform to the deflection criterion [62, 63], although R hasa clearer decision-theoretic meaning as an upper errorbound.The purpose of using R rather than P e or Q is to de-fine an easy-to-optimize criterion in terms of finite-ordercorrelations. To find the R -optimal filter, consider theCauchy-Schwarz inequality (cid:0) H (cid:62) ∆ (cid:1) ≤ (cid:0) H (cid:62) M H (cid:1) (cid:0) ∆ (cid:62) M − ∆ (cid:1) (3.16)for any positive-definite matrix M . The inequality issaturated if and only if H = αM − ∆ for any constant α .Setting M = π C + π C , I obtain˜ R ≡ min H R ( H ) = 1∆ (cid:62) ( π C + π C ) − ∆ , (3.17)˜ H ≡ arg min H R ( H ) = α ( π C + π C ) − ∆ , (3.18)and the R -optimal test statistic ˜ λ ( y ) ≡ h + ˜ H (cid:62) Y , taking α = 1 without loss of generality, becomes˜ λ ( y ) = ∆ (cid:62) ( π C + π C ) − (cid:0) Y − ¯ Y (cid:1) , (3.19)which can then be used in a threshold test. The mer-its of this approach are similar to those in the estima-tion scenario: dependence of ˜ λ ( y ) on finite-order cor-relations ∆, C , and C without relying on a Marko-vian model, a performance guaranteed by upper bounds P e ( ˜ H ) ≤ Q ( ˜ H ) ≤ ˜ R (the actual P e may be much lower),and a hierarchy of decreasing ˜ R versus increasing com-plexity. For the study of fundamental quantum metrol-ogy, P e ( ˜ H ), Q ( ˜ H ), and ˜ R also provide a set of upperbounds on the Helstrom bound [6, 7, 44].It is not difficult to show that, if the hypotheses areabout the mean of a Gaussian y and C = C , ˜ λ ( y ) for P = 1 coincides with the well known matched filter, andthe threshold test of ˜ λ ( y ) against 0 leads to the optimal P e among all decision rules if π = π [55]. The deriva-tion of the R -optimal Volterra filter here in fact resem-bles the historic derivation of the linear matched filtervia maximizing an output signal-to-noise ratio [55]. Thecrucial differences are that here ˜ λ ( y ) can include higher-order products of y elements and the upper error boundsprovide performance guarantees even for non-Gaussianstatistics. B. Qubit readout
For an application of the detection theory, consider thequbit readout problem described in Refs. [20–22]. Thegoal is to infer the initial state of the qubit in one ofthe two possibilities from noisy measurements. The twohypotheses can be modeled as H : y ( t k ) = y ( t k ) , H : y ( t k ) = Sx ( t k ) + y ( t k ) , (3.20)where x is a hidden qubit observable that can undergospontaneous decay or excitation in time, S is a positivesignal amplitude, and y is a zero-mean noise process. Toperform hypothesis testing given a record of y , considerthe first-order R -optimal decision rule given by˜ H = ( π C + π C ) − ∆ , (3.21)˜ λ ( y ) = ˜ H (cid:62) ( y − ¯ y ) H ≷ H , (3.22)where ∆ = 12 ( (cid:104) y (cid:105) − (cid:104) y (cid:105) ) , (3.23)¯ y = 12 ( (cid:104) y (cid:105) + (cid:104) y (cid:105) ) , (3.24) C = (cid:10) yy (cid:62) (cid:11) − (cid:104) y (cid:105) (cid:104) y (cid:105) (cid:62) , (3.25) C = (cid:10) yy (cid:62) (cid:11) − (cid:104) y (cid:105) (cid:104) y (cid:105) (cid:62) , (3.26)and the upper error bounds are given by Eqs. (3.11)–(3.13).˜ λ ( y ) for P = 1 is a linear filter with respect to y andsimilar to the linear filters proposed in Ref. [20]. Anadvantage of the R -optimal rule here is that the filter˜ H depends only on the first-order moments ∆( k ) and¯ y ( k ) and second-order correlations C and C . All thesemoments can be simulated or measured directly in anexperiment without the assumptions of continuous time,white Gaussian noise, and uncorrelated signal and noisemade in prior work. The calculation of ˜ H is relativelystraightforward compared with the numerical optimiza-tion procedure in Ref. [20], while Q and ˜ R provide theo-retical performance guarantees. The upper bounds maybe conservative, and a more precise comparison of P e ( ˜ H )with other linear or nonlinear filters [20–22] will requirefurther numerical simulations and experimental tests.To proceed further, consider the continuous-time limit.For the two-level x ∈ { , } process with initial value x (0) = 1 and spontaneous decay time T studied inRefs. [20, 22], it is not difficult [65] to show that themean is (cid:104) x ( t ) (cid:105) = exp (cid:18) − tT (cid:19) , (3.27)and the covariance function is C x ( t, τ ) ≡ (cid:104) x ( t ) x ( τ ) (cid:105) − (cid:104) x ( t ) (cid:105) (cid:104) x ( τ ) (cid:105) (3.28)= exp (cid:20) − max( t, τ ) T (cid:21) − exp (cid:18) − t + τT (cid:19) . (3.29)For a zero-mean white Gaussian noise with noise powerΠ, (cid:104) y ( t ) y ( τ ) (cid:105) = Π δ ( t − τ ) . (3.30)The test statistic becomes˜ λ = (cid:90) T dt ˜ h ( t ) (cid:20) y ( t ) − S (cid:104) x ( t ) (cid:105) (cid:21) , (3.31) and a continuous-time limit of Eq. (3.21) leads to a Fred-holm integral equation of the second kind [55] given by S (cid:104) x ( t ) (cid:105) = Π˜ h ( t ) + π S (cid:90) T dτ C x ( t, τ )˜ h ( τ ) . (3.32)Further analytic simplifications may be possible for T →∞ using Laplace transform, but a numerical solution ofthe Fredholm equation can easily be sought, as it is linearwith respect to ˜ h and can be inverted in discrete timeusing, for example, the mldivide function in Matlab.Define the input signal-to-noise ratio (SNR) as S T / Π. Fig. 1 plots some numerical examples of thefilter for π = π = 1 / T = 5 T . The Matlabcomputation of all the filters shown with δt = 0 . T takes seconds to complete on a desktop PC. Fig. 2 plotsthe upper error bounds versus the input SNR. The upperbounds turn out to be conservative here, as a numericalinvestigation of P e later will demonstrate. t/T Π ˜ h ( t ) / S − − − − − Filter shapes for SNR = 1 , , , . . . , FIG. 1. (Color online). The normalized R -optimal filters2Π˜ h ( t ) /S in log scale versus normalized time t/T for differ-ent input SNR ≡ S T / Π = 1 , , , . . . , π = π = 1 / T = 5 T are assumed. The different plots can be distin-guished by the reducing correlation times for increasing SNR. The proposed decision rule can be compared with theoptimal likelihood-ratio test (LRT) [55]. For the givenproblem, there exists an analytic expression for the log-
FIG. 2. (Color online). Upper bounds Q ( ˜ H ) and ˜ R on theaverage error probability P e for the first-order Volterra filterversus input SNR from 10 dB to 30 dB in log-log scale. π = π = 1 / T = 5 T are assumed. P e is guaranteed to bein the shaded region below the curves. likelihood ratio given by [22] λ o ( y ) = S Π (cid:90) T dη ( t )ˇ x ( t ) − S (cid:90) T dt ˇ x ( t ) , (3.33) dη ( t ) = y ( t ) dt, (3.34)ˇ x ( t ) = p ( t ) p ( t ) + p ( t ) , (3.35) p ( t ) = exp (cid:20) S Π (cid:90) t dη ( τ ) − t (cid:18) S
2Π + 1 T (cid:19)(cid:21) , (3.36) p ( t ) = 1 T (cid:90) t dτ p ( τ ) , (3.37)where the dη integrals are in the It¯o sense. The optimaldecision rule is thus λ o ( y ) H ≷ H ln π π . (3.38)Although the LRT will achieve the lowest P e , the highlynonlinear dependence of λ o on y makes its exact imple-mentation difficult in real-time applications or for a largenumber of qubits.The average error probabilities for both the R -optimalrule and the LRT are estimated numerically using MonteCarlo simulations and plotted in Fig. 3. The errors areclose at lower input SNR values. Considering the simplic-ity of the R -optimal rule, the divergence at higher SNRis expected and indeed slight. At the input SNR of 10 , P e for LRT is 6 . × − , while that for the R -optimalrule is only around a factor of 2 higher at 1 . × − .A further optimization of P e beyond the results shownin Fig. 3 can be done by fine-tuning the threshold of the R -optimal rule. For example, a numerical search for the optimal threshold brings its error probability at inputSNR = 10 down to 8 . × − . A higher-order filter ishardly necessary for the SNRs considered here. input SNR a v e r ag ee rr o r p r o b a b ili t y P e − − Numerical error probabilities Q ( ˜ H )˜ R LRT R -optimal FIG. 3. (Color online). Numerically computed average errorprobabilities P e for the R -optimal rule and the likelihood-ratio test (LRT) versus the input SNR from 10 dB to 30 dBin log-log scale. π = π = 1 / T = 5 T are assumed.Also shown are parts of the upper bounds Q ( ˜ H ) and ˜ R forcomparison. The upper bounds depend only on low-order momentsand apply equally to all problems with the same low-order moments, regardless of their higher-order statis-tics. It is not surprising that such indiscriminate boundsare loose for this particular example, as shown in Fig. 3.What is surprising is the near-optimal performance of adecision rule based on a loose upper bound. The log-likelihood ratio is given analytically for the problem con-sidered here, so one may compare it with the R -optimaltest statistic to see how the two resemble each other. Ingeneral, however, the log-likelihood ratio is difficult oreven impossible to compute if the full probability mod-els are more complicated or simply unidentified. The R -optimal rule requires only low-order moments to beknown, and is hence more convenient to implement inpractice. IV. CONCLUSION
I have proposed the use of Volterra filters for quan-tum estimation and detection. The importance of theproposal lies in its promise to solve many of the practi-cal problems associated with existing optimal quantuminference techniques, including the curse of dimensional-ity, the lack of performance assurances upon approxima-tions, and the need for a Markovian model. Beyond the0examples of quantum state tomography and qubit read-out discussed in this paper, diverse applications in quan-tum information processing [1, 38, 51], including cool-ing [26], squeezing [27], state preparation [28], metrology[6–8, 16–18, 42–44], fundamental tests of quantum me-chanics [18, 19, 38, 61], and error correction [29, 30], areexpected to benefit. Potential extensions of the theoryinclude adaptive, recursive, and coherent generalizationsfor feedback control [1] and noise cancellation [66], filtertraining via machine learning [67], robustness analysis,the use of other performance criteria for improved ro-bustness [68] or multi-hypothesis testing [18, 19], a con-nection with Shannon information theory through the relations between filtering errors and entropic informa-tion [69], and a study of fundamental uncertainty rela-tions in conjunction with quantum lower error bounds[6, 7, 16, 42–44].
ACKNOWLEDGMENTS
This work is supported by the Singapore NationalResearch Foundation under NRF Grant No. NRF-NRFF2011-07. [1] H. M. Wiseman and G. J. Milburn,
Quantum Measure-ment and Control (Cambridge University Press, Cam-bridge, 2010).[2] C. W. Gardiner and P. Zoller,
Quantum Noise (Springer-Verlag, Berlin, 2004).[3] K. Jacobs,
Quantum Measurement Theory and its Appli-cations (Cambridge University Press, Cambridge, 2014).[4] L. Bouten, R. Van Handel, and M. James, SIAM Journalon Control and Optimization , 2199 (2007); L. Bouten,R. van Handel, and M. R. James, SIAM Review , 239(2009).[5] S. Haroche and J. M. Raimond, Exploring the Quantum:Atoms, Cavities, and Photons (Oxford University Press,Oxford, 2006).[6] C. W. Helstrom,
Quantum Detection and EstimationTheory (Academic Press, New York, 1976).[7] A. S. Holevo,
Statistical Structure of Quantum Theory (Springer-Verlag, Berlin, 2001).[8] V. B. Braginsky and F. Y. Khalili,
Quantum Measure-ment (Cambridge University Press, Cambridge, 1992).[9] M. G. A. Paris and J. ˇReh´aˇcek, eds.,
Quantum State Es-timation (Springer-Verlag, Berlin, 2004).[10] R. Blume-Kohout, New Journal of Physics , 043034(2010); C. Ferrie, New Journal of Physics , 093035(2014).[11] C. A. Riofr´ıo, Continuous Measurement Quantum StateTomography of Atomic Ensembles , Ph.D. thesis, Univer-sity of New Mexico, Albuquerque (2014).[12] R. L. Cook, C. A. Riofr´ıo, and I. H. Deutsch, Phys. Rev.A , 032113 (2014).[13] K. M. R. Audenaert and S. Scheel, New Journal ofPhysics , 023028 (2009).[14] P. Six, P. Campagne-Ibarcq, I. Dotsenko, A. Sarlette,B. Huard, and P. Rouchon, ArXiv e-prints (2015),arXiv:1510.01726 [quant-ph].[15] C. Granade, J. Combes, and D. G. Cory, ArXiv e-prints(2015), arXiv:1509.03770 [quant-ph].[16] V. Giovannetti, S. Lloyd, and L. Maccone, Nature Pho-ton. , 222 (2011).[17] M. Tsang, Phys. Rev. Lett. , 250403 (2009); Phys.Rev. A , 033840 (2009); Phys. Rev. A , 013824(2010); S. Gammelmark, B. Julsgaard, and K. Mølmer,Phys. Rev. Lett. , 160401 (2013); I. Guevara andH. Wiseman, ArXiv e-prints (2015), arXiv:1503.02799[quant-ph]. [18] M. Tsang, Phys. Rev. Lett. , 170502 (2012).[19] M. Tsang, Quantum Meas. Quantum Metr. , 84 (2013).[20] J. Gambetta, W. A. Braff, A. Wallraff, S. M. Girvin, andR. J. Schoelkopf, Phys. Rev. A , 012325 (2007).[21] B. D’Anjou and W. A. Coish, Phys. Rev. A , 012313(2014); B. D’Anjou, L. Kuret, L. Childress, and W. A.Coish, ArXiv e-prints (2015), arXiv:1507.06846 [quant-ph].[22] S. Ng and M. Tsang, Phys. Rev. A , 022325 (2014).[23] T. A. Wheatley, D. W. Berry, H. Yonezawa, D. Nakane,H. Arao, D. T. Pope, T. C. Ralph, H. M. Wiseman,A. Furusawa, and E. H. Huntington, Phys. Rev. Lett. , 093601 (2010); H. Yonezawa, D. Nakane, T. A.Wheatley, K. Iwasawa, S. Takeda, H. Arao, K. Ohki,K. Tsumura, D. W. Berry, T. C. Ralph, H. M. Wiseman,E. H. Huntington, and A. Furusawa, Science , 1514(2012); K. Iwasawa, K. Makino, H. Yonezawa, M. Tsang,A. Davidovic, E. Huntington, and A. Furusawa, Phys.Rev. Lett. , 163602 (2013).[24] V. P. Belavkin, Physics Letters A , 355 (1989).[25] V. P. Belavkin, ArXiv Mathematical Physics e-prints(2007), arXiv:math-ph/0702079, and references therein.[26] D. A. Steck, K. Jacobs, H. Mabuchi, T. Bhattacharya,and S. Habib, Phys. Rev. Lett. , 223004 (2004); D. A.Steck, K. Jacobs, H. Mabuchi, S. Habib, and T. Bhat-tacharya, Phys. Rev. A , 012322 (2006).[27] L. K. Thomsen, S. Mancini, and H. M. Wiseman, Phys.Rev. A , 061801 (2002).[28] M. Yanagisawa, Phys. Rev. Lett. , 190201 (2006);A. Negretti, U. V. Poulsen, and K. Mølmer, Phys. Rev.Lett. , 223601 (2007).[29] C. Ahn, A. C. Doherty, and A. J. Landahl, Phys. Rev. A , 042301 (2002); M. Sarovar, C. Ahn, K. Jacobs, andG. J. Milburn, Phys. Rev. A , 052324 (2004).[30] B. A. Chase, A. J. Landahl, and J. M. Geremia, Phys.Rev. A , 032304 (2008).[31] D. B. Hume, T. Rosenband, and D. J. Wineland, Phys.Rev. Lett. , 120502 (2007).[32] C. Sayrin, I. Dotsenko, X. Zhou, B. Peaudecerf,T. Rybarczyk, S. Gleyzes, P. Rouchon, M. Mirrahimi,H. Amini, M. Brune, J.-M. Raimond, and S. Haroche,Nature (London) , 73 (2011).[33] W. Wieczorek, S. G. Hofer, J. Hoelscher-Obermaier,R. Riedinger, K. Hammerer, and M. Aspelmeyer, Phys.Rev. Lett. , 223601 (2015). [34] I. G. Vladimirov and I. R. Petersen, ArXiv e-prints(2012), arXiv:1202.0946 [quant-ph].[35] H. Amini, R. A. Somaraju, I. Dotsenko, C. Sayrin,M. Mirrahimi, and P. Rouchon, Automatica , 2683(2013).[36] M. R. Hush, S. S. Szigeti, A. R. R. Carvalho, and J. J.Hope, New Journal of Physics , 113060 (2013).[37] A. E. B. Nielsen, A. S. Hopkins, and H. Mabuchi, NewJournal of Physics , 105043 (2009).[38] M. Aspelmeyer, T. J. Kippenberg, and F. Marquardt,Rev. Mod. Phys. , 1391 (2014).[39] I. Bloch, J. Dalibard, and W. Zwerger, Rev. Mod. Phys. , 885 (2008).[40] A. A. Houck, H. E. Tureci, and J. Koch, Nature Phys. , 292 (2012).[41] V. J. Mathews and G. L. Sicuranza, Polynomial SignalProcessing (Wiley, New York, 2000).[42] M. Tsang, H. M. Wiseman, and C. M. Caves, Phys. Rev.Lett. , 090401 (2011).[43] M. Tsang, Phys. Rev. Lett. , 230401 (2012); D. W.Berry, M. Tsang, M. J. W. Hall, and H. M. Wiseman,Phys. Rev. X , 031018 (2015).[44] M. Tsang and R. Nair, Phys. Rev. A , 042115 (2012);M. Tsang, New Journal of Physics , 073005 (2013).[45] J. Zhang, Y.-X. Liu, R.-B. Wu, K. Jacobs, S. KayaOzdemir, L. Yang, T.-J. Tarn, and F. Nori, ArXiv e-prints (2014), arXiv:1407.8108 [quant-ph].[46] V. P. Belavkin, Foundations of Physics , 685 (1994).[47] Q. Gao, D. Dong, and I. R. Petersen, ArXiv e-prints(2015), arXiv:1504.06780 [math-ph].[48] M. Ozawa, Phys. Rev. A , 042105 (2003).[49] J. S. Bell, Speakable and Unspeakable in QuantumMechanics (Cambridge University Press, Cambridge,1987); R. Horodecki, P. Horodecki, M. Horodecki, andK. Horodecki, Rev. Mod. Phys. , 865 (2009).[50] A. J. Leggett and A. Garg, Phys. Rev. Lett. , 857(1985); C. Emary, N. Lambert, and F. Nori, Reports onProgress in Physics , 016001 (2014).[51] M. A. Nielsen and I. L. Chuang, Quantum Computationand Quantum Information (Cambridge University Press,Cambridge, 2011).[52] U. Weiss,
Quantum Dissipative Systems (World Scien-tific, Singapore, 2008); S. Datta,
Electronic Transport inMesoscopic Systems (Cambridge University Press, Cam-bridge, 1995); H. Bruus and K. Flensberg,
Introduction to Many-Body Quantum Theory in Condensed MatterPhysics (Oxford University Press, Oxford, 2004).[53] L. Mandel and E. Wolf,
Optical Coherence and QuantumOptics (Cambridge University Press, Cambridge, 1995).[54] J. O. Berger,
Statistical Decision Theory and BayesianAnalysis (Springer-Verlag, New York, 1985).[55] H. L. Van Trees,
Detection, Estimation, and ModulationTheory, Part I. (John Wiley & Sons, New York, 2001).[56] J. Stockton, M. Armen, and H. Mabuchi, J. Opt. Soc.Am. B , 3019 (2002).[57] D. Simon, Optimal State Estimation: Kalman, H Infin-ity, and Nonlinear Approaches (Wiley, Hoboken, 2006).[58] A. A. Clerk, M. H. Devoret, S. M. Girvin, F. Marquardt,and R. J. Schoelkopf, Rev. Mod. Phys. , 1155 (2010).[59] H. Miao, Y. Ma, C. Zhao, and Y. Chen, ArXiv e-prints(2015), arXiv:1506.00117 [quant-ph].[60] S. T. Flammia and Y.-K. Liu, Phys. Rev. Lett. ,230501 (2011).[61] Y. Chen, Journal of Physics B: Atomic, Molecular andOptical Physics , 104001 (2013).[62] B. Picinbono and P. Duvaut, IEEE Transactions on In-formation Theory , 1061 (1990).[63] B. Picinbono, IEEE Transactions on Aerospace and Elec-tronic Systems , 1072 (1995).[64] P. Billingsley, Probability and Measure (Wiley, New York,1995).[65] C. W. Gardiner,
Stochastic Methods: A Handbook for theNatural and Social Sciences (Springer, Berlin, 2010).[66] M. Tsang and C. M. Caves, Phys. Rev. Lett. , 123601(2010); Phys. Rev. X , 031016 (2012); N. Yamamoto,Phys. Rev. X , 041029 (2014).[67] A. Hentschel and B. C. Sanders, Phys. Rev. Lett. ,063603 (2010); E. Magesan, J. M. Gambetta, A. D.C´orcoles, and J. M. Chow, Phys. Rev. Lett. , 200501(2015).[68] M. R. James, Phys. Rev. A , 032108 (2004).[69] A. Barchielli and G. Lupieri, “Information gain in quan-tum continual measurements,” in Quantum Stochasticsand Information , edited by V. P. Belavkin and M. Gu(World Scientific, Singapore, 2008) Chap. 15, pp. 325–345, arXiv:quant-ph/0612010; M. Tsang, in2014 IEEEInternational Symposium on Information Theory (ISIT)