[PDF] Volterra filters for quantum estimation and detection

Abstract

The implementation of optimal statistical inference protocols for high-dimensional quantum systems is often computationally expensive. To avoid the difficulties associated with optimal techniques, here I propose an alternative approach to quantum estimation and detection based on Volterra filters. Volterra filters have a clear hierarchy of computational complexities and performances, depend only on finite-order correlation functions, and are applicable to systems with no simple Markovian model. These features make Volterra filters appealing alternatives to optimal nonlinear protocols for the inference and control of complex quantum systems. Applications of the first-order Volterra filter to continuous-time quantum filtering, the derivation of a Heisenberg-picture uncertainty relation, quantum state tomography, and qubit readout are discussed.

Full PDF

VVolterra ﬁlters for quantum estimation and detection

Mankei Tsang

1, 2, ∗ Department of Electrical and Computer Engineering,National University of Singapore, 4 Engineering Drive 3, Singapore 117583 Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117551 (Dated: September 12, 2018)The implementation of optimal statistical inference protocols for high-dimensional quantum sys-tems is often computationally expensive. To avoid the diﬃculties associated with optimal techniques,here I propose an alternative approach to quantum estimation and detection based on Volterra ﬁl-ters. Volterra ﬁlters have a clear hierarchy of computational complexities and performances, dependonly on ﬁnite-order correlation functions, and are applicable to systems with no simple Markovianmodel. These features make Volterra ﬁlters appealing alternatives to optimal nonlinear protocols forthe inference and control of complex quantum systems. Applications of the ﬁrst-order Volterra ﬁlterto continuous-time quantum ﬁltering, the derivation of a Heisenberg-picture uncertainty relation,quantum state tomography, and qubit readout are discussed.

I. INTRODUCTION

The advance of quantum technologies relies on ourability to measure and control complex quantum sys-tems. An important task in quantum control is to in-fer unknown variables from the noisy measurements ofa quantum system. Examples include the prediction ofquantum dynamics for measurement-based feedback con-trol [1–5] and the estimation and detection of weak sig-nals [6–23]. To implement the signal processing for suchtasks, a Bayesian decision-theoretic formulation of op-timal quantum statistical inference is now well estab-lished [1–7, 17–19]. The quantum ﬁltering theory pio-neered by Belavkin [24, 25] for the optimal prediction ofquantum dynamics has especially been hailed as a semi-nal achievement in quantum control theory; its applica-tions to measurement-based cooling [26], squeezing [27],state preparation [28], quantum error correction [29, 30],qubit readout [20–22], and quantum state tomography[11–14] in atomic, optical, optomechanical, condensed-matter, and superconducting-microwave-circuit systems[1] have been studied extensively in the literature.Although optimal quantum inference has been suc-cessful experimentally for low-dimensional systems, suchas qubits [31] and few-photon systems [32], as well asnear-Gaussian systems, such as optical phase estimation[23] and optomechanics [33], its implementation for high-dimensional non-Gaussian quantum systems is beset withdiﬃculties in practice. An exact implementation of thequantum Bayes rule [2] for optimal inference requires nu-merical updates of the posterior density matrix based onthe measurement record. Except for special cases suchas Gaussian systems [1], the number of elements neededto keep track of the density matrix scales exponentiallywith the degrees of freedom, making the implementationprohibitive for many-body non-Gaussian systems. Thisproblem, known as the curse of dimensionality, means ∗ [email protected] that approximations must often be sought [26, 29, 30, 34–37]. Current approximation techniques for dynamicalsystems include Gaussian approximations [13, 26, 34],phase-space particle ﬁlters [36], Hilbert-space truncation[30, 35], and manifold learning [37], but these techniquesprovide little assurance about their actual errors and of-ten remain too expensive to compute for real-time con-trol of high-dimensional systems. Another problem withoptimal inference and the associated stochastic-master-equation approach is its reliance on a Markovian model,which is diﬃcult to use for many complex systems, espe-cially those with 1 /f or fractional noise statistics. Withthe ongoing trend of increasing complexity in quantumexperiments, not only with condensed matter but alsowith optomechanics [38], atomic ensembles [39], and su-perconducting circuits [40], optimal inference is becomingan unattainable goal in practice.Against this backdrop, here I propose an alternativeapproach to quantum estimation and detection based onVolterra ﬁlters. Instead of seeking absolute optimality,Volterra ﬁlters are a class of polynomial estimators witha clear hierarchy of computational complexities and esti-mation errors [41]. Their applications to quantum esti-mation and detection promise to solve many of the practi-cal problems associated with optimal quantum inference,including the curse of dimensionality, the lack of error as-surances upon approximations, and the need for a Marko-vian model. The ﬁlter errors also provide a set of up-per error bounds on the Bayesian quantum Cram´er-Rao[6, 7, 42], Ziv-Zakai [43], and Helstrom [6, 7, 44] bounds,forming novel hierarchies of fundamental uncertainty re-lations and may be of independent foundational interest.The Volterra series has recently been used to model theinput-output relations of a quantum system [45], but myfocus here is diﬀerent and concerns the estimation of hid-den observables and hypothesis testing given the outputmeasurement record. a r X i v : . [ qu a n t - ph ] D ec II. QUANTUM ESTIMATIONA. Formalism

Consider a quantum system in the Heisenberg picturewith initial density operator ρ . Let y =  y (1) y (2)... y ( K )  (2.1)be a column vector of observables under measurement.For example, y can be the observables of an outputoptical ﬁeld under homodyne, heterodyne, or photon-counting measurements. Given a measurement recordof y , the goal of quantum estimation is to infer a columnvector of hidden observables x ≡  x (1) x (2)... x ( J )  . (2.2)For example, x can be the observables of a quantum sys-tem that has interacted with the optical ﬁeld, such as theposition of a quantum mechanical oscillator or a spin op-erator of an atomic ensemble, and the goal of the estima-tion is to infer x given the measurement record. Quantumestimation is usually framed in the Schr¨odinger picturevia the concept of posterior density operator [1, 2], but itcan be shown to be equivalent to the Heisenberg-pictureapproach adopted here [4]. This task is especially impor-tant for measurement-based feedback control [1], such asmeasurement-based cooling and squeezing, to gain real-time information about quantum degrees of freedom andto reduce their uncertainties via feedback control. Ex-periments that implement quantum estimation have beenreported in Refs. [31–33] for example.The estimation error has a well-deﬁned decision-theoretic meaning if all the x and y operators commutewith one another, such that x and y can be jointly mea-sured and treated as classical random variables in thesame probability space [4, 7, 46]. This assumption is ap-plicable to a wide range of scenarios, including quantumﬁltering [4, 46] and the estimation of any classical param-eter or waveform coupled to a quantum system [17, 47].Since x and y are compatible observables, the rest ofthe estimation theory is identical to the classical treat-ment [41]. Let ˇ x ( j | y ) be an estimator of x ( j ) given y ,and assume that the estimator is given by the truncatedVolterra series, viz.,ˇ x ( j | y ) = P (cid:88) p =0 (cid:88) ≤ k ≤ k ≤···≤ k p ≤ K h p ( j, k , k , . . . , k p | θ ) × y ( k ) y ( k ) . . . y ( k p ) , (2.3) where θ is a vector of tunable parameters, P is the orderof the series and quantiﬁes the complexity of the ﬁlter,and the zeroth-order term is simply a constant h ( j ) anddoes not depend on y . For P → ∞ , the series can beregarded as the Taylor series for an arbitrary estimator,although I will focus on ﬁnite P .A useful trick to simplify the notations is to deﬁne theset of all products of y elements up to order P as y ( P ) ≡ (cid:8) , y, y ⊗ , . . . , y ⊗ P (cid:9) , (2.4)where y ⊗ p ≡ { y ( k ) y ( k ) . . . y ( k p );1 ≤ k ≤ k ≤ · · · ≤ k p ≤ K } (2.5)is the set of all p th-order products of y elements. Thenthe Volterra series in Eq. (2.3) can be rewritten asˇ x ( j | y ) = (cid:88) µ h ( P ) ( µ | θ ) y ( P ) ( µ ) , (2.6)where h ( P ) is a linear ﬁlter with respect to y ( P ) but equiv-alent to the Volterra ﬁlter that is nonlinear with respectto y , and µ is a composite index that goes through allelements in y ( P ) .Deﬁne (cid:104) f ( x, y ) (cid:105) ≡ tr [ ρf ( x, y )] (2.7)as the expectation of any function of x and y , with trdenoting the operator trace. Let the error covariancematrix beΣ( j, k ) ≡ (cid:104) [ x ( j ) − ˇ x ( j | y )] [ x ( k ) − ˇ x ( k | y )] (cid:105) . (2.8)The absolutely minimum mean-square error for arbitraryestimators is achieved by the conditional expectation of x given y [4]. For the optimal ﬁltering and prediction ofquantum observables for example, the usual method is tocompute the posterior density operator ρ ( y ) conditionedon the measurement record y in the Schr¨odinger pictureusing the Kraus operators that characterize the measure-ments [1, 2], and then take the conditional expectationgiven by ˇ x ( j | y ) = tr[ x S ( j ) ρ ( y )], with x S ( j ) being theSchr¨odinger picture of x ( j ). If the continuous-time limitis taken, the posterior density operator obeys the cele-brated stochastic master equation [1–4] ﬁrst proposed byBelavkin [24, 25]. The computation of ρ ( y ) suﬀers fromthe curse of dimensionality however. To restrict the com-plexity, consider here instead the error of the P th-orderVolterra ﬁlter given byΣ ( P ) ( j, k | θ ) = (cid:42)(cid:34) x ( j ) − (cid:88) µ h ( P ) ( j, µ | θ ) y ( P ) ( µ ) (cid:35) × (cid:34) x ( k ) − (cid:88) µ h ( P ) ( k, µ | θ ) y ( P ) ( µ ) (cid:35)(cid:43) (2.9)= C x ( j, k ) − (cid:88) µ h ( P ) ( j, µ | θ ) C xy ( P ) ( k, µ ) − (cid:88) µ h ( P ) ( k, µ | θ ) C xy ( P ) ( j, µ )+ (cid:88) µ,ν h ( P ) ( j, µ | θ ) h ( P ) ( k, ν | θ ) C y ( P ) ( µ, ν ) , (2.10)where C x ( j, k ) ≡ (cid:104) x ( j ) x ( k ) (cid:105) , (2.11) C xy ( P ) ( j, µ ) ≡ (cid:68) x ( j ) y ( P ) ( µ ) (cid:69) , (2.12) C y ( P ) ( µ, ν ) ≡ (cid:68) y ( P ) ( µ ) y ( P ) ( ν ) (cid:69) . (2.13)To optimize the Volterra ﬁlter, one can seek the pa-rameters θ that minimize any desired component ofΣ ( P ) ( j, k | θ ) in Eq. (2.10), which has the remarkablefeature of depending only on ﬁnite-order correlations.Speciﬁcally, C xy ( P ) ( j, µ ) depends on the correlation be-tween x ( j ) and products of y elements up to the P th or-der, and C y ( P ) depends on the correlations among y up tothe 2 P th order. Stationarity assumptions and frequency-domain techniques can further simplify the expressions.Quantum mechanics comes into the problem throughthe correlations. They must obey uncertainty relationswith other incompatible observables [7, 48]. They can vi-olate Bell [49] and Leggett-Garg [50] inequalities, requir-ing diﬀerent probability spaces for diﬀerent experimen-tal settings. They may result from nontrivial internalquantum dynamics with no classical correspondence; thepromise of quantum computation and simulation [51] isin fact based on the diﬃculty of reproducing quantum dy-namical statistics using any hidden-variable model. Thisdiﬃculty also means that attempts to simplify quantumﬁlters via classical models [26, 34, 36] are likely to beinaccurate for highly nonclassical systems. The Volterraﬁlters sidestep the issue via a manifestly non-Markovianapproach that does not require an online simulation ofthe internal quantum dynamics. The identiﬁcation of thecorrelations and the ﬁlter synthesis, though nontrivial,can be done oﬄine for control applications.A challenge for classical applications of Volterra ﬁltersis that the correlations are often diﬃcult to model or mea-sure in practice, but it is less problematic for quantumsystems: computing and measuring correlation functionsis already a major endeavor in condensed-matter physics[52] and early quantum optics [53] with an extensive lit-erature. The Volterra-series approach to input-output analysis [45] should also help their simulation. Com-pared with the stochastic-master-equation approach [1–4], the use of correlation functions has the advantage ofnot requiring a Markovian model or stochastic calculus,although the Volterra ﬁlters may require a longer memorydepending on the time scales of the correlation functionsand the signal-to-noise properties. An empirical alter-native to prior system identiﬁcation is to train the ﬁlterdirectly using experimental or simulated data to mini-mize the sample errors.I now consider the ideal case where arbitrary Volterraﬁlters can be implemented, such that the tunable param-eters θ are all elements of h ( P ) . Since Σ ( P ) is quadraticwith respect to h ( P ) , the minimization can be performedanalytically. Deﬁne the risk function [54] to be minimizedas R ( θ ) ≡ (cid:88) j,k u ( j )Σ ( P ) ( j, k | θ ) u ( k ) , (2.14)where u is an arbitrary real vector. The optimal Volterraﬁlter ˜ h ( P ) ≡ arg min h ( P ) R ( h ( P ) ) (2.15)for arbitary u satisﬁes the equation C xy ( P ) ( j, ν ) = (cid:88) µ ˜ h ( P ) ( j, µ ) C y ( P ) ( µ, ν ) , (2.16)which is a system of linear equations with respect to ˜ h ( P ) and can be solved by conventional methods, and the re-sulting error covariance matrix is˜Σ ( P ) ( j, k ) ≡ Σ ( P ) ( j, k | ˜ h ( P ) ) (2.17)= C x ( j, k ) − (cid:88) µ ˜ h ( P ) ( j, µ ) C xy ( P ) ( k, µ ) . (2.18)This error can be computed oﬄine to evaluate the op-timal performance of a Volterra ﬁlter and the trade-oﬀbetween the error and the ﬁlter complexity P . Going toa higher order is guaranteed not to increase the error,since ˜Σ ( P ) ≤ ˜Σ ( Q ) if P > Q (a higher-order ﬁlter canalways achieve the performance of a lower-order ﬁlter byignoring the higher-order terms in y ( P ) ). As the inﬁnite-order Volterra ﬁlter can be regarded as the Taylor seriesfor an arbitrary function, ˜ h ( ∞ ) will be the optimal amongarbitrary estimators and ˜Σ ( ∞ ) will coincide with the ab-solutely optimal error. ˜Σ ( P ) thus provides a hierarchy ofincreasingly tight upper error bounds for optimal quan-tum inference. Most importantly, a ﬁnite-order Volterraﬁlter can still enjoy a performance given by Eq. (2.18)for any statistics, even if it is not optimal in the abso-lute sense. On a fundamental level, it is interesting tonote that, if x is classical, the upper error bounds alsoapply to the Bayesian quantum Cram´er-Rao [6, 7, 42]and Ziv-Zakai [43] lower error bounds, forming a novelset of operationally motivated uncertainty relations; anexample is shown in Sec. II C.The optimal P = 0 Volterra ﬁlter does not process themeasurement and is simply given by the prior expecta-tion (cid:104) x (cid:105) . The P = 1 Volterra ﬁlter is a linear ﬁlter withrespect to y and deserves special attention, as it is thesimplest Volterra ﬁlter beyond the trivial zeroth-ordercase and will likely become the most popular. If x and y are jointly Gaussian, the optimal linear ﬁlter is alsothe optimal among arbitrary estimators and equivalentto the Kalman ﬁlter when applied to the prediction ofMarkovian dynamical systems [55], but the linear ﬁltercan still be used for any non-Gaussian or non-Markovianstatistics and depends only on the second-order correla-tions in terms of x and y . B. Continuous-time quantum ﬁltering

For example, consider the continuous-time quantumﬁltering and prediction problem, which is to estimate aHeisenberg-picture observable x ( t ) given the past mea-surement record { y ( τ ); t ≤ τ ≤ T < t } [4]. It can beshown that all the Heisenberg-picture operators underconsideration commute with one another under rathergeneral conditions for ﬁltering and prediction [4, 46]. If t < T is desired for smoothing [17], care should be takenin the modeling to ensure that x ( t ) still commutes with y and an operational meaning of the estimation error ex-ists. For example, a c-number signal, such as a classicalforce, commutes with all operators by deﬁnition.To transition from the discrete formalism to continoustime, deﬁne a discrete time given by t j = t + jδt, (2.19)with initial time t , integer j , and time interval δt .For inﬁnitesimal δt , the linear P = 1 estimator in thecontinuous-time limit becomesˇ x ( t | y ) = h ( t ) + (cid:90) Tt dτ h ( t, τ ) y ( τ ) , (2.20)where ˇ x ( t | y ), h ( t ), h ( t, τ ), and y ( τ ) are continuous-time versions of ˇ x ( j | y ), h ( j ), h ( j, k ) /δt , and y ( k ), re-spectively. Eq. (2.20) is a continuous-time limit of theVolterra series in Eq. (2.3) for P = 1. Assuming zero-mean x and y for simplicity and using Eqs. (2.16) and(2.18), the optimal linear ﬁlter ˜ h ( t, τ ) and the corre-sponding mean-square error ˜Σ (1) ( t, t ) can be expressedas C xy ( t, τ ) = (cid:90) Tt ds ˜ h ( t, s ) C y ( s, τ ) , (2.21)˜Σ (1) ( t, t ) = C x ( t, t ) − (cid:90) Tt dτ ˜ h ( t, τ ) C xy ( t, τ ) , (2.22) where C x ( t, t ) ≡ (cid:10) x ( t ) (cid:11) , (2.23) C xy ( t, τ ) ≡ (cid:104) x ( t ) y ( τ ) (cid:105) , (2.24) C y ( t, τ ) ≡ (cid:104) y ( t ) y ( τ ) (cid:105) (2.25)are the only correlation functions needed to computeboth the ﬁlter and the error. Although this form of theoptimal linear estimator is known in the classical con-text [55], its applicability to quantum systems with anynonlinear dynamics and non-Gaussian statistics is hith-erto unappreciated. Compared with the stochastic mas-ter equation, the linear ﬁlter can be more easily imple-mented using fast digital electronics or even analog elec-tronics in practice [23, 56] for measurement-based feed-back control, while the implementation of higher-orderﬁlters is more involved but can leverage existing digital-signal-processing techniques [41]. C. Heisenberg-picture uncertainty relation

To demonstrate a side consequence of the Volterra-ﬁlter formalism, here I use the analytic error expressionfor the ﬁrst-order Volterra ﬁlter to derive a quantum un-certainty relation for Heisenberg-picture operators. Con-sider the Hamiltonian H ( t ) = H ( t ) − qx ( t ), where q is acanonical position operator, x ( t ) is a classical force, and H is the rest of the Hamiltonian. Suppose that H isat most quadratic with respect to canonical position andmomentum operators, such that the equations of motionfor those operators in the Heisenberg picture are linear.The initial density operator ρ , on the other hand, canhave any non-Gaussian statistics.Consider an output ﬁeld quadrature operator y ( t ) thatcommutes with itself at diﬀerent times in the Heisenbergpicture [4]. For example, it can model the homodynemeasurement of an output optical ﬁeld in optomechanics.It can be shown that y ( t ) = y ( t ) + (cid:90) T dtg ( t, τ ) x ( τ ) , (2.26)where g ( t, τ ) = (cid:26) i (cid:126) [ y ( t ) , q ( τ )] , t > τ, , t ≤ τ, (2.27)is the causal c-number commutator and the subscript 0denotes the interaction picture with respect to the Hamil-tonian H .Without loss of generality, assume that x ( t ), y ( t ), and q ( t ) are zero-mean processes. Consider the estimationof x ( t ) using the record { y ( τ ); 0 < τ ≤ T } . If y ( t ) hasnon-Gaussian statistics, the optimal nonlinear estimatoris diﬃcult to derive, but the ﬁrst-order Volterra ﬁltergiven by ˇ x ( t | y ) = (cid:90) T dτ h ( t, τ ) y ( τ ) (2.28)can be analyzed more easily. To proceed, it is more con-venient to consider discrete time as deﬁned in Eq. (2.19).Regarding x , y , y , and ˇ x as column vectors and g and h as matrices, Eqs. (2.26) and (2.28) can be rewrittenin matrix form as y = y + δtgx, (2.29)ˇ x = δth y. (2.30)With covariance matrices deﬁned as C x ≡ (cid:10) xx (cid:62) (cid:11) , (2.31) C y ≡ (cid:10) y y (cid:62) (cid:11) , (2.32) C xy ≡ (cid:10) xy (cid:62) (cid:11) = δtC x g (cid:62) , (2.33) C y ≡ (cid:10) yy (cid:62) (cid:11) = δt gC x g (cid:62) + C y , (2.34)where (cid:62) denotes the matrix transpose, the optimal linearﬁlter becomes δt ˜ h = C xy C − y = δtC x g (cid:62) (cid:0) δt gC x g (cid:62) + C y (cid:1) − , (2.35)and the error covariance matrix becomes˜Σ (1) ≡ (cid:28)(cid:16) x − δt ˜ h y (cid:17) (cid:16) x − δt ˜ h y (cid:17) (cid:62) (cid:29) (2.36)= C x − δt ˜ h C (cid:62) xy (2.37)= C x − δt C x g (cid:62) (cid:0) δt gC x g (cid:62) + C y (cid:1) − gC x (2.38)= (cid:0) C − x + δt g (cid:62) C − y g (cid:1) − , (2.39)where the last line uses the matrix inversion lemma [57].The error covariance can be compared with theBayesian quantum Cram´er-Rao bound derived inRef. [42]. The quantum bound for Gaussian x resultsin a matrix inequality given by˜Σ (1) ≥ (cid:18) C − x + 4 δt (cid:126) C q (cid:19) − , (2.40)where C q ( t j , t k ) ≡ (cid:104) q ( t j ) q ( t k ) + q ( t k ) q ( t j ) (cid:105) . (2.41)Unlike x ( t ) and y ( t ), q ( t ) may not self-commute at diﬀer-ent times, and the symmetric ordering in the covariancefunction [58] arises naturally from the derivation of thequantum bound in Ref. [42]. Comparing Eq. (2.39) andEq. (2.40), it can be seen that the inequality holds onlyif g (cid:62) C − y g ≤ (cid:126) C q , (2.42)which is a matrix uncertainty relation between two quan-tum processes in the Heisenberg picture involving theircausal commutator g . Note that y and q are canonicalphase-space coordinate operators with linear dynamicsbut need not have Gaussian statistics. The end result does not involve x and can be applied to any quantumsystem that satisﬁes the stated assumptions beyond theestimation scenario. The estimation procedure nonethe-less gives the relation a clear operational meaning.Eq. (2.42) can be further simpliﬁed by assuming linear-time-invariant dynamics and stationary statistics. Theresult in the continuous long-time limit is a spectral un-certainty relation given by S y ( ω ) S q ( ω ) ≥ (cid:126) | G ( ω ) | , (2.43)with the frequency-domain quantities deﬁned by C y ( t, τ ) = (cid:90) ∞−∞ dω π S y ( ω ) exp [ iω ( t − τ )] , (2.44) C q ( t, τ ) = (cid:90) ∞−∞ dω π S q ( ω ) exp [ iω ( t − τ )] , (2.45) g ( t, τ ) = (cid:90) ∞−∞ dω π G ( ω ) exp [ iω ( t − τ )] . (2.46)The spectral relation imposes a lower bound on the noiseﬂoor of an output operator y ( t ) in terms of the spectrumof a noncommuting operator q ( t ). For example, the re-lation can be used to determine the fundamental limit tothe noise ﬂoor of optical homodyne detection as a func-tion of the mechanical-position power spectral density fora gravitational-wave detector [8, 59]. The inequality canbe saturated if the quantum statistics are Gaussian [42]. D. Quantum state tomography

For an application in quantum information processing,consider the estimation of parameters in a density matrix,also known as quantum state tomography [9–15]. Assumea d × d density matrix of the form ρ z = Id + d − (cid:88) α =1 z α E α , (2.47)where I is the identity matrix, E α is a set of Hermitian,traceless, and orthonormal matrices that satisfy E α = E † α , tr E α = 0 , tr E α E β = δ αβ , (2.48)and z is a column vector of real unknown parameters. ρ z is Hermitian and tr ρ z = 1 by construction, and thedensity matrix describes a physical quantum state only if ρ z is positive-semideﬁnite [11]. Measurements can oftenbe modeled as [11] y = Az + y , (2.49)where y is a column vector, A is a known measurementmatrix, and y is a zero-mean noise vector. The main dif-ﬁculty with the Bayesian estimation protocol [10, 13, 15]is that, owing to the physical-state requirement, the priorfor z is highly non-Gaussian, while the statistics of y mayalso be non-Gaussian. With the non-Gaussian statisticsand d scaling exponentially with the degrees of freedom,exact Bayesian estimation of z would suﬀer from thecurse of dimensionality. Existing approximation tech-niques include Gaussian approximations [13] and particleﬁlters [15], but their actual estimation errors remain un-clear.The Volterra ﬁlters can be used despite the non-Gaussianity of z or y . Let x = Bz (2.50)be a column vector of parameters to be estimated for agiven sampling matrix B . Note that B can be a non-square matrix and the number of elements in x can bemuch smaller than that in z if the dimensionality of thelatter is a concern. For example, the ﬁdelity betweenthe density matrix and a target pure state [60] can beexpressed in this way, in which case B is a row vectorand x is a scalar. The optimal ﬁrst-order ﬁlter can beexpressed asˇ x = B (cid:104) z (cid:105) + ˜ h ( y − A (cid:104) z (cid:105) ) , (2.51)˜ h = BC z A (cid:62) (cid:0) AC z A (cid:62) + C y (cid:1) − , (2.52) C z ≡ (cid:10) zz (cid:62) (cid:11) − (cid:104) z (cid:105) (cid:104) z (cid:105) (cid:62) . (2.53)The ﬁlter is guaranteed to oﬀer an error covariance ma-trix given by˜Σ (1) = B (cid:0) C − z + A (cid:62) C − y A (cid:1) − B (cid:62) . (2.54)The linear complexity and the error guarantee are themain advantages of the Volterra ﬁlter. A shortcomingis that, due to noise and the lack of a constraint in thealgorithm, the estimate ˇ x may not lead to a positive-semideﬁnite density matrix. If this is a problem, an ob-vious remedy is to ﬁnd the physical x closest to ˇ x withrespect to a distance measure. A more sophisticated wayis to compute the posterior distribution over a region nearˇ x with a volume suggested by ˜Σ (1) . If the noise is lowenough or the number of trials is large enough such that˜Σ (1) is small, the region needs to cover a small parame-ter subspace only, and the curse of dimensionality can beavoided.The remaining issue is the choice of prior (cid:104) z (cid:105) and C z in an objective manner. One option is to take one ofthe commonly used objective priors for z [11, 15] andcompute its moments. For d = 2 and z being the Blochvector, the prior moments can be easily calculated bytaking advantage of the Bloch spherical symmetry. Thecomputation seems nontrivial for d ≥

3, but for each d itneeds to be done just once and for all.The most conservative and arguably paranoid optionis to choose a prior that is least favorable to the Volterraﬁlter. Given a prior probability measure π z on z , onecan deﬁne a risk function, such as the Hilbert-Schmidtdistance given by R ( π z ) = tr ˜Σ (1) ( π z ) . (2.55) Then the least favorable prior is one that maximizes therisk while still observing the physical constraint on ρ x ,that is, arg max π z ; ρ z ≥ R ( π z ) . (2.56)Note that this prior depends in general on the measure-ment matrix A as well as the sampling matrix B . With-out the physical constraint, the least favorable C z wouldbe inﬁnite, giving˜Σ (1) ≤ B (cid:0) A (cid:62) C − y A (cid:1) − B (cid:62) , (2.57)and the Volterra ﬁlter would become equivalent to theunconstrained maximum-likelihood estimator for Gaus-sian y . The eﬀect of a ﬁnite C z is to pull the estimatefrom the maximum-likelihood value towards the prior (cid:104) x (cid:105) via the weighted average given by Eq. (2.51). III. QUANTUM DETECTIONA. Formalism

Assume two hypotheses denoted by H and H . Thesehypotheses can be about the initial density operator aswell as the dynamics and measurements of the quantumsystem [18]. As before, let the measured Heisenberg-picture observables be y with commuting elements underboth hypotheses. The goal of detection is equivalent tobinary hypothesis testing, which is to make a decision on H or H based on y . Applications include force detec-tion [8, 18, 44], fundamenal tests of quantum mechanics[18, 19, 38, 61], quantum error correction [1, 29], andqubit readout [20–22]. Prior work on the use of Volterraﬁlters for classical detection focuses on the heuristic de-ﬂection criterion [62, 63], but it does not seem to haveany decision-theoretic meaning or relationship with themore rigorous criteria of error probabilities [63]. Here Ipropose a similar performance criterion that is able toprovide an upper bound on the average error probability,while still oﬀering a simple design rule for the Volterraﬁlters. To my knowledge the proposed design rule is newalso in the context of classical detection theory.Let λ ( y ) be a test statistic as a polynomial function of y similar to Eq. (2.3). For later notational convenience,I will rewrite it as λ ( y ) = h + H (cid:62) Y, (3.1)where the zeroth-order term h is written separately, Y is a column vector with the elements in { y, y ⊗ , . . . , y ⊗ P } without the constant term 1, H is a column vector withthe corresponding elements in h ( P ) , and (cid:62) denotes thetranspose. Let (cid:104) f ( y ) (cid:105) be the expectation of a functionof y given hypothesis H , and (cid:104) f ( y ) (cid:105) be the expectationgiven hypothesis H . Note that the hypotheses can beabout the initial density operator, the dynamics, and thedeﬁnition of y .I demand the test statistic to have diﬀerent expecta-tions for the two hypotheses, viz., (cid:104) λ (cid:105) (cid:54) = (cid:104) λ (cid:105) . (3.2)This means that the order P cannot be arbitrary butmust be high enough to result in diﬀerent expectations. Ifurther demand the expectations to be symmetric around0, viz., (cid:104) λ (cid:105) + (cid:104) λ (cid:105) = 0 . (3.3)This is accomplished by setting h = − H (cid:62) ¯ Y , (3.4)¯ Y ≡

12 ( (cid:104) Y (cid:105) + (cid:104) Y (cid:105) ) , (3.5)resulting in (cid:104) λ (cid:105) = − (cid:104) λ (cid:105) = H (cid:62) ∆ , (3.6)∆ ≡

12 ( (cid:104) Y (cid:105) − (cid:104) Y (cid:105) ) . (3.7)Without loss of generality, I assume (cid:104) λ (cid:105) = H (cid:62) ∆ > H if λ < H if λ ≥

0. This is commonly expressed as [55] λ ( y ) H ≷ H . (3.8)The average error probability becomes P e ( H ) = π (cid:104) λ ≥ ( y ) (cid:105) + π (cid:104) λ< ( y ) (cid:105) , (3.9)where π and π are the prior probabilities for the hy-potheses and 1 λ ≥ and 1 λ< are indicator functions.Since P e ( H ) in general depends on inﬁnite orders of λ moments, I appeal to the Cantelli inequality [64] to ob-tain (cid:104) λ ≥ ( y ) (cid:105) ≤ (cid:10) λ (cid:11) − (cid:104) λ (cid:105) (cid:104) λ (cid:105) , (3.10)and similarly for (cid:104) λ< ( y ) (cid:105) . This leads to upper boundson P e given by P e ( H ) ≤ Q ( H ) ≤ R ( H ) , (3.11) Q ( H ) ≡ π H (cid:62) ∆) / ( H (cid:62) C H )+ π H (cid:62) ∆) / ( H (cid:62) C H ) , (3.12) R ( H ) ≡ H (cid:62) ( π C + π C ) H ( H (cid:62) ∆) , (3.13)where C ≡ (cid:10) Y Y (cid:62) (cid:11) − (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:62) , (3.14) C ≡ (cid:10) Y Y (cid:62) (cid:11) − (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:62) (3.15) are the conditional covariance matrices. 1 / R can be re-garded an output signal-to-noise ratio and has a similarform to the deﬂection criterion [62, 63], although R hasa clearer decision-theoretic meaning as an upper errorbound.The purpose of using R rather than P e or Q is to de-ﬁne an easy-to-optimize criterion in terms of ﬁnite-ordercorrelations. To ﬁnd the R -optimal ﬁlter, consider theCauchy-Schwarz inequality (cid:0) H (cid:62) ∆ (cid:1) ≤ (cid:0) H (cid:62) M H (cid:1) (cid:0) ∆ (cid:62) M − ∆ (cid:1) (3.16)for any positive-deﬁnite matrix M . The inequality issaturated if and only if H = αM − ∆ for any constant α .Setting M = π C + π C , I obtain˜ R ≡ min H R ( H ) = 1∆ (cid:62) ( π C + π C ) − ∆ , (3.17)˜ H ≡ arg min H R ( H ) = α ( π C + π C ) − ∆ , (3.18)and the R -optimal test statistic ˜ λ ( y ) ≡ h + ˜ H (cid:62) Y , taking α = 1 without loss of generality, becomes˜ λ ( y ) = ∆ (cid:62) ( π C + π C ) − (cid:0) Y − ¯ Y (cid:1) , (3.19)which can then be used in a threshold test. The mer-its of this approach are similar to those in the estima-tion scenario: dependence of ˜ λ ( y ) on ﬁnite-order cor-relations ∆, C , and C without relying on a Marko-vian model, a performance guaranteed by upper bounds P e ( ˜ H ) ≤ Q ( ˜ H ) ≤ ˜ R (the actual P e may be much lower),and a hierarchy of decreasing ˜ R versus increasing com-plexity. For the study of fundamental quantum metrol-ogy, P e ( ˜ H ), Q ( ˜ H ), and ˜ R also provide a set of upperbounds on the Helstrom bound [6, 7, 44].It is not diﬃcult to show that, if the hypotheses areabout the mean of a Gaussian y and C = C , ˜ λ ( y ) for P = 1 coincides with the well known matched ﬁlter, andthe threshold test of ˜ λ ( y ) against 0 leads to the optimal P e among all decision rules if π = π [55]. The deriva-tion of the R -optimal Volterra ﬁlter here in fact resem-bles the historic derivation of the linear matched ﬁltervia maximizing an output signal-to-noise ratio [55]. Thecrucial diﬀerences are that here ˜ λ ( y ) can include higher-order products of y elements and the upper error boundsprovide performance guarantees even for non-Gaussianstatistics. B. Qubit readout

For an application of the detection theory, consider thequbit readout problem described in Refs. [20–22]. Thegoal is to infer the initial state of the qubit in one ofthe two possibilities from noisy measurements. The twohypotheses can be modeled as H : y ( t k ) = y ( t k ) , H : y ( t k ) = Sx ( t k ) + y ( t k ) , (3.20)where x is a hidden qubit observable that can undergospontaneous decay or excitation in time, S is a positivesignal amplitude, and y is a zero-mean noise process. Toperform hypothesis testing given a record of y , considerthe ﬁrst-order R -optimal decision rule given by˜ H = ( π C + π C ) − ∆ , (3.21)˜ λ ( y ) = ˜ H (cid:62) ( y − ¯ y ) H ≷ H , (3.22)where ∆ = 12 ( (cid:104) y (cid:105) − (cid:104) y (cid:105) ) , (3.23)¯ y = 12 ( (cid:104) y (cid:105) + (cid:104) y (cid:105) ) , (3.24) C = (cid:10) yy (cid:62) (cid:11) − (cid:104) y (cid:105) (cid:104) y (cid:105) (cid:62) , (3.25) C = (cid:10) yy (cid:62) (cid:11) − (cid:104) y (cid:105) (cid:104) y (cid:105) (cid:62) , (3.26)and the upper error bounds are given by Eqs. (3.11)–(3.13).˜ λ ( y ) for P = 1 is a linear ﬁlter with respect to y andsimilar to the linear ﬁlters proposed in Ref. [20]. Anadvantage of the R -optimal rule here is that the ﬁlter˜ H depends only on the ﬁrst-order moments ∆( k ) and¯ y ( k ) and second-order correlations C and C . All thesemoments can be simulated or measured directly in anexperiment without the assumptions of continuous time,white Gaussian noise, and uncorrelated signal and noisemade in prior work. The calculation of ˜ H is relativelystraightforward compared with the numerical optimiza-tion procedure in Ref. [20], while Q and ˜ R provide theo-retical performance guarantees. The upper bounds maybe conservative, and a more precise comparison of P e ( ˜ H )with other linear or nonlinear ﬁlters [20–22] will requirefurther numerical simulations and experimental tests.To proceed further, consider the continuous-time limit.For the two-level x ∈ { , } process with initial value x (0) = 1 and spontaneous decay time T studied inRefs. [20, 22], it is not diﬃcult [65] to show that themean is (cid:104) x ( t ) (cid:105) = exp (cid:18) − tT (cid:19) , (3.27)and the covariance function is C x ( t, τ ) ≡ (cid:104) x ( t ) x ( τ ) (cid:105) − (cid:104) x ( t ) (cid:105) (cid:104) x ( τ ) (cid:105) (3.28)= exp (cid:20) − max( t, τ ) T (cid:21) − exp (cid:18) − t + τT (cid:19) . (3.29)For a zero-mean white Gaussian noise with noise powerΠ, (cid:104) y ( t ) y ( τ ) (cid:105) = Π δ ( t − τ ) . (3.30)The test statistic becomes˜ λ = (cid:90) T dt ˜ h ( t ) (cid:20) y ( t ) − S (cid:104) x ( t ) (cid:105) (cid:21) , (3.31) and a continuous-time limit of Eq. (3.21) leads to a Fred-holm integral equation of the second kind [55] given by S (cid:104) x ( t ) (cid:105) = Π˜ h ( t ) + π S (cid:90) T dτ C x ( t, τ )˜ h ( τ ) . (3.32)Further analytic simpliﬁcations may be possible for T →∞ using Laplace transform, but a numerical solution ofthe Fredholm equation can easily be sought, as it is linearwith respect to ˜ h and can be inverted in discrete timeusing, for example, the mldivide function in Matlab.Deﬁne the input signal-to-noise ratio (SNR) as S T / Π. Fig. 1 plots some numerical examples of theﬁlter for π = π = 1 / T = 5 T . The Matlabcomputation of all the ﬁlters shown with δt = 0 . T takes seconds to complete on a desktop PC. Fig. 2 plotsthe upper error bounds versus the input SNR. The upperbounds turn out to be conservative here, as a numericalinvestigation of P e later will demonstrate. t/T Π ˜ h ( t ) / S − − − − − Filter shapes for SNR = 1 , , , . . . , FIG. 1. (Color online). The normalized R -optimal ﬁlters2Π˜ h ( t ) /S in log scale versus normalized time t/T for diﬀer-ent input SNR ≡ S T / Π = 1 , , , . . . , π = π = 1 / T = 5 T are assumed. The diﬀerent plots can be distin-guished by the reducing correlation times for increasing SNR. The proposed decision rule can be compared with theoptimal likelihood-ratio test (LRT) [55]. For the givenproblem, there exists an analytic expression for the log-

FIG. 2. (Color online). Upper bounds Q ( ˜ H ) and ˜ R on theaverage error probability P e for the ﬁrst-order Volterra ﬁlterversus input SNR from 10 dB to 30 dB in log-log scale. π = π = 1 / T = 5 T are assumed. P e is guaranteed to bein the shaded region below the curves. likelihood ratio given by [22] λ o ( y ) = S Π (cid:90) T dη ( t )ˇ x ( t ) − S (cid:90) T dt ˇ x ( t ) , (3.33) dη ( t ) = y ( t ) dt, (3.34)ˇ x ( t ) = p ( t ) p ( t ) + p ( t ) , (3.35) p ( t ) = exp (cid:20) S Π (cid:90) t dη ( τ ) − t (cid:18) S

2Π + 1 T (cid:19)(cid:21) , (3.36) p ( t ) = 1 T (cid:90) t dτ p ( τ ) , (3.37)where the dη integrals are in the It¯o sense. The optimaldecision rule is thus λ o ( y ) H ≷ H ln π π . (3.38)Although the LRT will achieve the lowest P e , the highlynonlinear dependence of λ o on y makes its exact imple-mentation diﬃcult in real-time applications or for a largenumber of qubits.The average error probabilities for both the R -optimalrule and the LRT are estimated numerically using MonteCarlo simulations and plotted in Fig. 3. The errors areclose at lower input SNR values. Considering the simplic-ity of the R -optimal rule, the divergence at higher SNRis expected and indeed slight. At the input SNR of 10 , P e for LRT is 6 . × − , while that for the R -optimalrule is only around a factor of 2 higher at 1 . × − .A further optimization of P e beyond the results shownin Fig. 3 can be done by ﬁne-tuning the threshold of the R -optimal rule. For example, a numerical search for the optimal threshold brings its error probability at inputSNR = 10 down to 8 . × − . A higher-order ﬁlter ishardly necessary for the SNRs considered here. input SNR a v e r ag ee rr o r p r o b a b ili t y P e − − Numerical error probabilities Q ( ˜ H )˜ R LRT R -optimal FIG. 3. (Color online). Numerically computed average errorprobabilities P e for the R -optimal rule and the likelihood-ratio test (LRT) versus the input SNR from 10 dB to 30 dBin log-log scale. π = π = 1 / T = 5 T are assumed.Also shown are parts of the upper bounds Q ( ˜ H ) and ˜ R forcomparison. The upper bounds depend only on low-order momentsand apply equally to all problems with the same low-order moments, regardless of their higher-order statis-tics. It is not surprising that such indiscriminate boundsare loose for this particular example, as shown in Fig. 3.What is surprising is the near-optimal performance of adecision rule based on a loose upper bound. The log-likelihood ratio is given analytically for the problem con-sidered here, so one may compare it with the R -optimaltest statistic to see how the two resemble each other. Ingeneral, however, the log-likelihood ratio is diﬃcult oreven impossible to compute if the full probability mod-els are more complicated or simply unidentiﬁed. The R -optimal rule requires only low-order moments to beknown, and is hence more convenient to implement inpractice. IV. CONCLUSION

I have proposed the use of Volterra ﬁlters for quan-tum estimation and detection. The importance of theproposal lies in its promise to solve many of the practi-cal problems associated with existing optimal quantuminference techniques, including the curse of dimensional-ity, the lack of performance assurances upon approxima-tions, and the need for a Markovian model. Beyond the0examples of quantum state tomography and qubit read-out discussed in this paper, diverse applications in quan-tum information processing [1, 38, 51], including cool-ing [26], squeezing [27], state preparation [28], metrology[6–8, 16–18, 42–44], fundamental tests of quantum me-chanics [18, 19, 38, 61], and error correction [29, 30], areexpected to beneﬁt. Potential extensions of the theoryinclude adaptive, recursive, and coherent generalizationsfor feedback control [1] and noise cancellation [66], ﬁltertraining via machine learning [67], robustness analysis,the use of other performance criteria for improved ro-bustness [68] or multi-hypothesis testing [18, 19], a con-nection with Shannon information theory through the relations between ﬁltering errors and entropic informa-tion [69], and a study of fundamental uncertainty rela-tions in conjunction with quantum lower error bounds[6, 7, 16, 42–44].

ACKNOWLEDGMENTS

This work is supported by the Singapore NationalResearch Foundation under NRF Grant No. NRF-NRFF2011-07. [1] H. M. Wiseman and G. J. Milburn,

Quantum Measure-ment and Control (Cambridge University Press, Cam-bridge, 2010).[2] C. W. Gardiner and P. Zoller,

Quantum Noise (Springer-Verlag, Berlin, 2004).[3] K. Jacobs,

Quantum Measurement Theory and its Appli-cations (Cambridge University Press, Cambridge, 2014).[4] L. Bouten, R. Van Handel, and M. James, SIAM Journalon Control and Optimization , 2199 (2007); L. Bouten,R. van Handel, and M. R. James, SIAM Review , 239(2009).[5] S. Haroche and J. M. Raimond, Exploring the Quantum:Atoms, Cavities, and Photons (Oxford University Press,Oxford, 2006).[6] C. W. Helstrom,

Quantum Detection and EstimationTheory (Academic Press, New York, 1976).[7] A. S. Holevo,

Statistical Structure of Quantum Theory (Springer-Verlag, Berlin, 2001).[8] V. B. Braginsky and F. Y. Khalili,

Quantum Measure-ment (Cambridge University Press, Cambridge, 1992).[9] M. G. A. Paris and J. ˇReh´aˇcek, eds.,

Quantum State Es-timation (Springer-Verlag, Berlin, 2004).[10] R. Blume-Kohout, New Journal of Physics , 043034(2010); C. Ferrie, New Journal of Physics , 093035(2014).[11] C. A. Riofr´ıo, Continuous Measurement Quantum StateTomography of Atomic Ensembles , Ph.D. thesis, Univer-sity of New Mexico, Albuquerque (2014).[12] R. L. Cook, C. A. Riofr´ıo, and I. H. Deutsch, Phys. Rev.A , 032113 (2014).[13] K. M. R. Audenaert and S. Scheel, New Journal ofPhysics , 023028 (2009).[14] P. Six, P. Campagne-Ibarcq, I. Dotsenko, A. Sarlette,B. Huard, and P. Rouchon, ArXiv e-prints (2015),arXiv:1510.01726 [quant-ph].[15] C. Granade, J. Combes, and D. G. Cory, ArXiv e-prints(2015), arXiv:1509.03770 [quant-ph].[16] V. Giovannetti, S. Lloyd, and L. Maccone, Nature Pho-ton. , 222 (2011).[17] M. Tsang, Phys. Rev. Lett. , 250403 (2009); Phys.Rev. A , 033840 (2009); Phys. Rev. A , 013824(2010); S. Gammelmark, B. Julsgaard, and K. Mølmer,Phys. Rev. Lett. , 160401 (2013); I. Guevara andH. Wiseman, ArXiv e-prints (2015), arXiv:1503.02799[quant-ph]. [18] M. Tsang, Phys. Rev. Lett. , 170502 (2012).[19] M. Tsang, Quantum Meas. Quantum Metr. , 84 (2013).[20] J. Gambetta, W. A. Braﬀ, A. Wallraﬀ, S. M. Girvin, andR. J. Schoelkopf, Phys. Rev. A , 012325 (2007).[21] B. D’Anjou and W. A. Coish, Phys. Rev. A , 012313(2014); B. D’Anjou, L. Kuret, L. Childress, and W. A.Coish, ArXiv e-prints (2015), arXiv:1507.06846 [quant-ph].[22] S. Ng and M. Tsang, Phys. Rev. A , 022325 (2014).[23] T. A. Wheatley, D. W. Berry, H. Yonezawa, D. Nakane,H. Arao, D. T. Pope, T. C. Ralph, H. M. Wiseman,A. Furusawa, and E. H. Huntington, Phys. Rev. Lett. , 093601 (2010); H. Yonezawa, D. Nakane, T. A.Wheatley, K. Iwasawa, S. Takeda, H. Arao, K. Ohki,K. Tsumura, D. W. Berry, T. C. Ralph, H. M. Wiseman,E. H. Huntington, and A. Furusawa, Science , 1514(2012); K. Iwasawa, K. Makino, H. Yonezawa, M. Tsang,A. Davidovic, E. Huntington, and A. Furusawa, Phys.Rev. Lett. , 163602 (2013).[24] V. P. Belavkin, Physics Letters A , 355 (1989).[25] V. P. Belavkin, ArXiv Mathematical Physics e-prints(2007), arXiv:math-ph/0702079, and references therein.[26] D. A. Steck, K. Jacobs, H. Mabuchi, T. Bhattacharya,and S. Habib, Phys. Rev. Lett. , 223004 (2004); D. A.Steck, K. Jacobs, H. Mabuchi, S. Habib, and T. Bhat-tacharya, Phys. Rev. A , 012322 (2006).[27] L. K. Thomsen, S. Mancini, and H. M. Wiseman, Phys.Rev. A , 061801 (2002).[28] M. Yanagisawa, Phys. Rev. Lett. , 190201 (2006);A. Negretti, U. V. Poulsen, and K. Mølmer, Phys. Rev.Lett. , 223601 (2007).[29] C. Ahn, A. C. Doherty, and A. J. Landahl, Phys. Rev. A , 042301 (2002); M. Sarovar, C. Ahn, K. Jacobs, andG. J. Milburn, Phys. Rev. A , 052324 (2004).[30] B. A. Chase, A. J. Landahl, and J. M. Geremia, Phys.Rev. A , 032304 (2008).[31] D. B. Hume, T. Rosenband, and D. J. Wineland, Phys.Rev. Lett. , 120502 (2007).[32] C. Sayrin, I. Dotsenko, X. Zhou, B. Peaudecerf,T. Rybarczyk, S. Gleyzes, P. Rouchon, M. Mirrahimi,H. Amini, M. Brune, J.-M. Raimond, and S. Haroche,Nature (London) , 73 (2011).[33] W. Wieczorek, S. G. Hofer, J. Hoelscher-Obermaier,R. Riedinger, K. Hammerer, and M. Aspelmeyer, Phys.Rev. Lett. , 223601 (2015). [34] I. G. Vladimirov and I. R. Petersen, ArXiv e-prints(2012), arXiv:1202.0946 [quant-ph].[35] H. Amini, R. A. Somaraju, I. Dotsenko, C. Sayrin,M. Mirrahimi, and P. Rouchon, Automatica , 2683(2013).[36] M. R. Hush, S. S. Szigeti, A. R. R. Carvalho, and J. J.Hope, New Journal of Physics , 113060 (2013).[37] A. E. B. Nielsen, A. S. Hopkins, and H. Mabuchi, NewJournal of Physics , 105043 (2009).[38] M. Aspelmeyer, T. J. Kippenberg, and F. Marquardt,Rev. Mod. Phys. , 1391 (2014).[39] I. Bloch, J. Dalibard, and W. Zwerger, Rev. Mod. Phys. , 885 (2008).[40] A. A. Houck, H. E. Tureci, and J. Koch, Nature Phys. , 292 (2012).[41] V. J. Mathews and G. L. Sicuranza, Polynomial SignalProcessing (Wiley, New York, 2000).[42] M. Tsang, H. M. Wiseman, and C. M. Caves, Phys. Rev.Lett. , 090401 (2011).[43] M. Tsang, Phys. Rev. Lett. , 230401 (2012); D. W.Berry, M. Tsang, M. J. W. Hall, and H. M. Wiseman,Phys. Rev. X , 031018 (2015).[44] M. Tsang and R. Nair, Phys. Rev. A , 042115 (2012);M. Tsang, New Journal of Physics , 073005 (2013).[45] J. Zhang, Y.-X. Liu, R.-B. Wu, K. Jacobs, S. KayaOzdemir, L. Yang, T.-J. Tarn, and F. Nori, ArXiv e-prints (2014), arXiv:1407.8108 [quant-ph].[46] V. P. Belavkin, Foundations of Physics , 685 (1994).[47] Q. Gao, D. Dong, and I. R. Petersen, ArXiv e-prints(2015), arXiv:1504.06780 [math-ph].[48] M. Ozawa, Phys. Rev. A , 042105 (2003).[49] J. S. Bell, Speakable and Unspeakable in QuantumMechanics (Cambridge University Press, Cambridge,1987); R. Horodecki, P. Horodecki, M. Horodecki, andK. Horodecki, Rev. Mod. Phys. , 865 (2009).[50] A. J. Leggett and A. Garg, Phys. Rev. Lett. , 857(1985); C. Emary, N. Lambert, and F. Nori, Reports onProgress in Physics , 016001 (2014).[51] M. A. Nielsen and I. L. Chuang, Quantum Computationand Quantum Information (Cambridge University Press,Cambridge, 2011).[52] U. Weiss,

Quantum Dissipative Systems (World Scien-tiﬁc, Singapore, 2008); S. Datta,

Electronic Transport inMesoscopic Systems (Cambridge University Press, Cam-bridge, 1995); H. Bruus and K. Flensberg,

Introduction to Many-Body Quantum Theory in Condensed MatterPhysics (Oxford University Press, Oxford, 2004).[53] L. Mandel and E. Wolf,

Optical Coherence and QuantumOptics (Cambridge University Press, Cambridge, 1995).[54] J. O. Berger,

Statistical Decision Theory and BayesianAnalysis (Springer-Verlag, New York, 1985).[55] H. L. Van Trees,

Detection, Estimation, and ModulationTheory, Part I. (John Wiley & Sons, New York, 2001).[56] J. Stockton, M. Armen, and H. Mabuchi, J. Opt. Soc.Am. B , 3019 (2002).[57] D. Simon, Optimal State Estimation: Kalman, H Inﬁn-ity, and Nonlinear Approaches (Wiley, Hoboken, 2006).[58] A. A. Clerk, M. H. Devoret, S. M. Girvin, F. Marquardt,and R. J. Schoelkopf, Rev. Mod. Phys. , 1155 (2010).[59] H. Miao, Y. Ma, C. Zhao, and Y. Chen, ArXiv e-prints(2015), arXiv:1506.00117 [quant-ph].[60] S. T. Flammia and Y.-K. Liu, Phys. Rev. Lett. ,230501 (2011).[61] Y. Chen, Journal of Physics B: Atomic, Molecular andOptical Physics , 104001 (2013).[62] B. Picinbono and P. Duvaut, IEEE Transactions on In-formation Theory , 1061 (1990).[63] B. Picinbono, IEEE Transactions on Aerospace and Elec-tronic Systems , 1072 (1995).[64] P. Billingsley, Probability and Measure (Wiley, New York,1995).[65] C. W. Gardiner,

Stochastic Methods: A Handbook for theNatural and Social Sciences (Springer, Berlin, 2010).[66] M. Tsang and C. M. Caves, Phys. Rev. Lett. , 123601(2010); Phys. Rev. X , 031016 (2012); N. Yamamoto,Phys. Rev. X , 041029 (2014).[67] A. Hentschel and B. C. Sanders, Phys. Rev. Lett. ,063603 (2010); E. Magesan, J. M. Gambetta, A. D.C´orcoles, and J. M. Chow, Phys. Rev. Lett. , 200501(2015).[68] M. R. James, Phys. Rev. A , 032108 (2004).[69] A. Barchielli and G. Lupieri, “Information gain in quan-tum continual measurements,” in Quantum Stochasticsand Information , edited by V. P. Belavkin and M. Gu(World Scientiﬁc, Singapore, 2008) Chap. 15, pp. 325–345, arXiv:quant-ph/0612010; M. Tsang, in2014 IEEEInternational Symposium on Information Theory (ISIT)