Power spectrum analysis with least-squares fitting: Amplitude bias and its elimination, with application to optical tweezers and atomic force microscope cantilevers
PPower spectrum analysis with least-squares fitting: Amplitude bias and itselimination, with application to optical tweezers and atomic force microscopecantilevers
Simon F. Nørrelykke
Max Planck Institute for the Physics of Complex Systems, 01187 Dresden , Germany.Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA. andHenrik Flyvbjerg
Department of Micro- and Nanotechnology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark.(Dated: October 22, 2018)Optical tweezers and AFM cantilevers are often calibrated by fitting their experimental power-spectra of Brownian motion. We demonstrate here that if this is done with typical weighted least-squares methods the result is a bias of relative size between − /n and +1 /n on the value of the fitteddiffusion coefficient. Here n is the number of power-spectra averaged over, so typical calibrationscontain 10–20% bias. Both the sign and the size of the bias depends on the weighting schemeapplied. Hence, so do length-scale calibrations based on the diffusion coefficient.The fitted value for the characteristic frequency is not affected by this bias. For the AFM then,force measurements are not affected provided an independent length-scale calibration is available.For optical-tweezers there is no such luck, since the spring constant is found as the ratio of thecharacteristic frequency and the diffusion coefficient.We give analytical results for the weight-dependent bias for the wide class of systems whosedynamics is described by a linear (integro-)differential equation with additive noise, white or col-ored. Examples are optical tweezers with hydrodynamic self-interaction and aliasing, calibration ofOrnstein-Uhlenbeck models in finance, models for cell-migration in biology, etc. Because the biastakes the form of a simple multiplicative factor on the fitted amplitude (e.g. the diffusion coefficient)it is straightforward to remove, and the user will need minimal modifications to his or her favoriteleast-square fitting programs.Results are demonstrated and illustrated using synthetic data, so we can compare fits with knowntrue values. We also fit some commonly occurring power spectra once-and-for-all in the sensethat we give their parameter values and associated error-bars as explicit functions of experimentalpower-spectral values. PACS numbers:Keywords: least-squares fitting, maximum likelihood estimation, power-spectral analysis, bias, optical tweez-ers, AFM, atomic force microscope, colored noise, gamma distribution, error-bars, goodness of fit
I. INTRODUCTION
Optical tweezers (OTs) are often calibrated by fittingthe power spectrum of a trapped bead’s Brownian mo-tion with a Lorentzian [1–5]. Similarly, atomic force mi-croscopes (AFMs) are sometimes calibrated by fitting thepower spectrum of the cantilever’s Brownian motion withthe power spectrum of a damped harmonic oscillator [6–8]. These fits are routinely done by least-squares (LSQ)minimization, the premises of which are rarely satisfiedin practice. Here we do Maximum Likelihood Estimation(MLE) of parameters, the premises of which are satisfied,and give the parameter values, with error bars, as explicitfunctions of the experimental power spectrum in an excel-lent approximation. These closed-formula results shouldbe useful for on-line calibration, since that requires high-speed determination of parameters. They also demon-strate that typical calibrations using least-squares fittingcontain 10–20% systematic errors, also known as bias .We find quite generally that both the sign and the sizeof the bias depends on the details of how a least-squares fit is implemented through the choice of how the data-points are weighted. Analytical expressions for the biasare given and examples of the behavior of the stochasticfit-errors are examined numerically.These results apply beyond the examples given here,since the same bias occurs in all systems describedby a linear (integro-)differential equation with addi-tive noise. Thus, using the correction factors given inEqs (32) and (41), it is possible to obtain correct unbi-ased estimates for the fit parameters without performinga computationally expensive MLE. Instead, one simplyadjusts the results of an ordinary least-squares fit.We also show that the bias found for least-squares fitsappears under fairly general conditions, independent ofthe details of the function that is fitted, but dependingonly on the distribution of the error on the data. Analyt-ical correction factors are given for the examples of Gaus-sian, exponential, and gamma distributed errors. Finally,a general criterion is given for when one may expect aleast-squares fit to be biased: When the experimentalmean is correlated with the experimental variance. Con- a r X i v : . [ phy s i c s . i n s - d e t ] A ug versely, if they are uncorrelated the fit is unbiased.The paper is organized as a main text followed by sixappendices that contain proofs and other technical de-tails. II. DYNAMICS
The equation of motion for a massive particle movingin a harmonic potential under the influence of thermalforces is m ¨ x ( t ) + γ ˙ x ( t ) + κx ( t ) = F therm ( t ) . (1)Here x ( t ) is the coordinate of the particle as function oftime t , m its inertial mass, γ its friction coefficient, κ is Hooke’s constant, and F therm is the thermal force onthe particle. This force is random, and assumed to havewhite-noise statistical properties, (cid:104) F therm ( t ) (cid:105) = 0 (2) (cid:104) F therm ( t ) F therm ( t (cid:48) ) (cid:105) = 2 k B T γ δ ( t − t (cid:48) ) , for all t, t (cid:48) , where δ is Dirac’s delta function, k B T the Boltzmann en-ergy, and (cid:104)·(cid:105) is the expectation value with respect to thenoise. This theory’s power spectrum of thermal motionis derived in [9]. III. POWER SPECTRA
Optical tweezers should be calibrated using the hy-drodynamically correct power spectrum given in [9, 10],possibly taking into account a number of effects listedthere and further detailed in [11–13]. These effects in-clude: (i) The frequency dependence of the friction coeffi-cient, due to hydrodynamic self-coupling; (ii) dependenceof the friction coefficient on distance to nearby surfaces,due to hydrodynamic coupling; (iii) extra 1 /f power atlow frequencies, caused by the laser pointing stability,hydrodynamic self-coupling, etc.; and (iv) optical inter-ference effects, caused by a standing wave between thetrapped object and the nearby microscope cover-slip sur-face, when determining the displacement sensitivity (Voltto nano-meter conversion factor). Generally, the mostcrucial step, where the largest systematic errors are likelyto appear, is the determination of the displacement sensi-tivity [10]. However, as also described in [9], there are sit-uations in which an acceptable approximation is achievedby fitting a Lorentzian, P f = D/ (2 π ) f c2 + f , (3)to an average ¯ P (ex) f of, say, n two-sided experimentalpower spectra for the Brownian motion of a trappedmicrosphere (occasionally confusion arises over extra ormissing factors of two in PSDs—this is down to the use ofone-sided, f >
0, versus two-sided, f ≷
0, PSDs; as long as the total power is contained in the chosen frequencyrange the two approaches are equally correct). The no-tation used is essentially the same as in [9], where thealiased Lorentzian is also derived, i.e., f c = κ/ (2 πγ ) and D = k B T /γ .Similarly, AFM cantilevers are sometimes calibratedby fitting a frequency interval around the resonance peak[8] using P f = D/ (2 π )( πmγ ) ( f − f ) + f (4)= D/ (2 π )( Qf ) ( f − f ) + f , (5)to a similar average of experimental power spectra fora cantilever’s Brownian motion. Here, the character-istic frequency f = κ/ (4 π m ) and the quality factor Q = √ mκ/γ . Describing the AFM cantilever as a simpleharmonic oscillator is rigorously correct for high qualityfactors Q (cid:29)
1, e.g. in air where dissipative forces aresmall [8]. In water, this description also works, at leastfor short stiff cantilevers, but now the drag coefficient isfrequency dependent [7]. It is not our aim here to reviewthe expansive literature on AFM calibration—the onlypoint we seek to make, is that if the amplitude of thepower spectrum is involved, systematic fitting errors aretypically present as shown below.Both these theoretical power spectra follow from theEinstein-Ornstein-Uhlenbeck theory for the Brownianmotion of a damped harmonic oscillator in one dimen-sion; see Eq. (1) and [9], where the parameters in Eqs. (3)and (4) are also defined. In Appendix B we give the re-sults for the aliased AFM power spectral density (PSD).Least-squares fits of power spectra are biased, as shownbelow in Section V, irrespective of whether one appliesthe above simplified theory or a more complete one thattakes into account aliasing, hydrodynamics, electronic fil-ters etc. This bias will be on the amplitude only andwill not affect shape parameters, i.e., D will be biased,whereas f c , f , and Q will not. IV. STATISTICAL PROPERTIES OFEXPERIMENTAL POWER SPECTRUM
Here, and throughout the rest of the paper, we differ-entiate between theoretical expectation values, indicatedby brackets, (cid:104)·(cid:105) , and experimental averages , indicated bya bar, ¯ · . Following the notation of [9], we write the ther-mal force in Eq. (2) in terms of a normalized white-noiseprocess η ( t ) F therm ( t ) = (cid:112) k B T γ η ( t ) (6)whose Fourier transform obeys (cid:104) ˜ η k (cid:105) = 0 ; (cid:104) ˜ η ∗ k ˜ η (cid:96) (cid:105) = t msr δ k,(cid:96) (7)where k, (cid:96) = − N/ , . . . , N/ η ( t ) is an white-noise process, hence tempo-rally uncorrelated, Re ˜ η k and Im ˜ η k are two mutually in-dependent random variables, uncorrelated for different k >
0, with Gaussian distributions by virtue of the Cen-tral Limit Theorem, or, equivalently, by virtue of η ( t ) be-ing the first derivative of a Wiener process with respectto time. Consequently, the sum of their squares | ˜ η k | is anon-negative random variable, uncorrelated for different k >
0, with exponential distribution [17]. Hence, so arethe experimental values P (ex) f for the power spectrum for f = k/t msr > expectation value P f for the exper-imental spectrum P (ex) f , as given in Eq. (4). It predictsalso the distribution from which the experimental powerspectrum is “drawn”: The power-spectral value P (ex) f at each frequency f is an independent random number,drawn from an exponential distribution with expectationvalue P f , p ( P (ex) f ; P f ) = 1 P f exp( − P (ex) f /P f ) . (8)Consequently, (cid:104) P (ex) f (cid:105) = P f , (9) σ ( P (ex) f ) = (cid:104) ( P (ex) f − P f ) (cid:105) / = P f , (10)and the signal-to-noise ratio (cid:104) P (ex) f (cid:105) /σ ( P (ex) f ) equals one.This is why we average over n experimental spectra,¯ P (ex) f ≡ n n (cid:88) i =1 P (ex) f,i , (11)before plotting and fitting: To reduce noise.If the n spectra are statistically independent—as isthe case if they are computed from data taken in non-overlapping time intervals—then we have, unchanged,that (cid:104) ¯ P (ex) f (cid:105) = P f , (12)but σ ( ¯ P (ex) f ) = σ ( P (ex) f ) / √ n = P f / √ n , (13)and ¯ P (ex) f is distributed according to a distribution that isthe convolution of n identical exponential distributions,viz. the gamma-distribution: p n ( ¯ P (ex) f ; P f ) = ¯ P (ex) n − f ( n/P f ) n Γ( n ) exp (cid:16) − n ¯ P (ex) f /P f (cid:17) , (14)where n is the shape parameter, P f /n the scale param-eter, and Γ( n ) = ( n − P f , is the prod-uct of the shape and the scale parameters, the mode is P f ( n − /n , the variance, P f /n , is the product of themean and the scale, and the skewness is 2 / √ n . Fromthe known distribution it is easy to show that the q thmoment of ¯ P (ex) is (cid:104) ¯ P (ex) q (cid:105) = Γ( n + q ) n q Γ( n ) P qf , (15)a result that we shall need later.In the limit n → ∞ this distribution approaches aGaussian by courtesy of the Central Limit Theorem, butit is a slow approach, because the starting point, theexponential distribution, is highly skewed. For moderatevalues of n , p n ( ¯ P (ex) f ; P f ) is far from Gaussian, see Fig. 1.It is also quite skewed, but this is not the source of thebias. The bias is caused by the correlation between theexperimental mean, ¯ P (ex) f , and the experimental variance s n ( ¯ P (ex) f ) ≡ n − n (cid:88) i =1 ( P (ex) f,i − ¯ P (ex) f ) , (16)as discussed in greater detail below. The latter is easilyshown to satisfy (cid:104) s n ( ¯ P (ex) f ) (cid:105) = σ ( ¯ P (ex) f ). p ( y ) n=1n=4n=16Gauss FIG. 1: The gamma distribution, Eq. (14), as function of¯ P (ex) f /P f for various values of the shape parameter n : Thickfull line, n = 1; full line, n = 2; thin full line n = 16. Alsoshown is the Gaussian limit value, plotted here with a varianceof 1 /n = 1 /
16 (dashed line).
A. Ubiquity of the gamma-distribution
As a matter of fact, all systems described by a linear(integro-)differential equation with additive noise, be itwhite or colored, have power spectral values that are de-scribed by the statistics in Eq. (14). To see that whiteand colored noise lead to the same statistics, all we needto realize is that the power spectral density of a colorednoise is the product of the power spectral density of awhite noise and some function F ( f ) that describes thecolor of the noise (a filter function): Quite generally, ifthe dynamics of x is given by a polynomial Q in d / d t with constant coefficients Q (cid:18) dd t (cid:19) x ( t ) = ψ ( t ) , (17)where ψ is a colored noise, then the power spectrum P (ex) f ( x ) = | ˜ ψ f | | Q ( − i πf ) | = | ˜ η f | F ( f ) | Q ( − i πf ) | (18)is seen to be the product of the power spectrum of awhite noise η and some function of f . Thus, for a givenfrequency f , the statistics of the power spectral value isdetermined by the statistics of white noise. V. LEAST-SQUARE FITTING
As alluded to in the previous section, one effect of theexponential distribution of P (ex) f,i is that weighted least-square fitting to ¯ P (ex) f does not follow the text-book be-havior and returns a result that is systematically wrong,or, biased. In the following we remind the reader of therather narrow set of circumstances under which least-square fitting works. We then relax the requirementsand see what effect it has on the fit results.Weighted least-squares (WLS) fitting minimizes χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] w i (19)with respect to fit-parameters θ . Here, y i = f i + e i arethe experimental data (dependent variable), e i errors, f i the fit-function evaluated at i (the independent variable),and w i are weights. It is assumed that the experimen-tal data are uncorrelated and that the weights are un-correlated with the experimental data. If, in addition,the weights are also independent of i , and thus constant,weighted least squares reduce to ordinary least squares(OLS). When the theoretical model, f i , is a linear func-tion of the fit parameters, θ , both OLS and WLS mini-mization are known to return parameter estimates thatare BLUE (best linear unbiased estimate). The formerwas first shown by Gauss in 1829 whereas the latter wasshown by Aitken in 1935 [14]. Specifically, in WLS theerrors do not have to be independent, just uncorrelated,and in OLS they do not have to come from the samedistribution, they are only assumed to have the samevariance (homoscedasticity).Things turn particularly simple when the experimen-tal data are Gaussian distributed. We can then choose y i , e.g., as the average of several measurements, and theweights, w i , as the reciprocal of the experimental stan-dard deviation. With this choice, y i and w i are inde-pendent, because the sample mean and sample variance calculated from independent, identically Gaussian dis-tributed random variables are statistically independent.In this Gaussian scenario it is possible to assign mean-ingful confidence intervals to the fit-parameters and cal-culate the goodness of the fit, also known as its support or p -value, and this is why one is sometimes tempted toassume that data are Gaussian even when it is not quitethe case. Finally, with Gaussian data MLE and WLSare mathematically identical as it easily seen by insert-ing a Gaussian in p n ’s place in Eq. (50) and deriving thecorresponding cost function, which is simply χ . We now ask what the effect is on the estimate of thefit-parameters if the data are not Gaussian, the weightsare not independent of the experimental value, and themodel is a non-linear function of the fit parameters.
Allof these circumstances are realized when fitting powerspectra with the statistic given in the previous section.
A. Some analytical results for bias in least squaresfitting
An estimator, ˆ θ , is biased if its expectation value differsfrom the true value, (cid:104) ˆ θ (cid:105) (cid:54) = θ ∗ , of the quantity it is anestimator for. Below, we show that some common least-square estimators for power spectra are biased.At the minimum of the χ function given in Eq. (19),the first derivative with respect to the fit parameters iszero, and the stationarity conditions thus reads N (cid:88) i =1 [ y i − f i ] w i ∂ θ f i = N (cid:88) i =1 [ y i − f i ] w i ∂ θ w i . (20)Notice, that in order to be completely general we haveallowed the weights to depend on the fit parameters. Wenow treat a few specific, but universal, scenarios one ata time.
1. Experimental standard deviations as weights
Proceeding as usual with classic WLS, we pick theweights to be inversely proportional to the experimen-tal estimate for the standard deviation. This estimatecould be the known uncertainty from the experimentalapparatus, but often it is estimated simply as s n,i = (cid:118)(cid:117)(cid:117)(cid:116) n − n (cid:88) j =1 ( y i,j − ¯ y i ) , (21)where n is the number of times the experiment is repeatedwith the independent variable set to its i th value. Theexperimental average (sample mean)¯ y i = 1 n n (cid:88) j =1 y i,j (22)is our estimate for the expectation value of y i . Doing thisfor each i , χ ( θ ) = N (cid:88) i =1 [¯ y i − f i ( θ )] s − n,i . (23)The stationarity conditions then reduce to N (cid:88) i =1 ¯ y i s n,i ∂ θ f i = N (cid:88) i =1 s n,i f i ( θ ) ∂ θ f i , (24)the solution of which gives us our estimate, ˆ θ [ y ], for θ for a given data-set y = ( y i,j ) i =1 ,...,N ; j =1 ,...,n . Note, be-cause N and n are finite the estimate for θ resulting fromEq. (24) is a stochastic quantity and will vary from oneexperimental realization to another.To calculate the bias of the estimator for given n , weneed to find its expectation value for that n . We do this intwo steps: First, we let the number of data-points N →∞ , while keeping n and the number of fit-parametersfixed, as in an infinite experiment. In this limit the fitreturns an estimate θ ∗ n of θ that is no longer a fluctuatingquantity, but may still depend on n . So, for N = ∞ Eq. (24) reads ∞ (cid:88) i =1 ¯ y i s n,i ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 s n,i f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) . (25)Note, that the experimental values, ¯ y i and s n,i , are stillfluctuating quantities described by the same statistic asbefore, whereas f i ( θ ∗ n ) and ∂ θ f i ( θ ∗ n ) are not fluctuatingbecause they are functions of non-fluctuating variables.So, when in the next step we take the expectation valueof Eq. (25), only ¯ y i and s n,i are affected ∞ (cid:88) i =1 (cid:42) ¯ y i s n,i (cid:43) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 (cid:42) s n,i (cid:43) f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) , (26)which is solved by f i ( θ ∗ n ) = (cid:10) ¯ y i /s n,i (cid:11) (cid:104) /s n,i (cid:105) , for all i = 1 , . . . , ∞ , (27)if a parameter set θ ∗ n exists, which solves all these manyequations simultaneously. Provided that we know thedistribution of y i , we can calculate the expectation val-ues at the right-hand side of this equation and thus de-termine whether the fit is biased. For a start, note thatif the sample mean and the reciprocal sample varianceare uncorrelated, then the numerator on the right-hand-side factorized to (cid:104) ¯ y i (cid:105)(cid:104) /s n,i (cid:105) . Thus the right-hand-sideequals (cid:104) ¯ y i (cid:105) , i.e. the fit is unbiased. So it is sufficient thatmean and reciprocal variance are uncorrelated to ensurean unbiased fit.It turns out that it is also a necessary condition, if thereis no redundancy in the parameterization of f i by θ , i.e.,if their relationship is one-to-one. This is the only sensi-ble way to parameterize a function, and easily achieved by elimination a possible redundancy through reparam-eterization. Assuming this one-on-one relationship, weuse that f i ( θ ∗ ) = (cid:104) ¯ y i (cid:105) and Eq. (27) to write f i ( θ ∗ ) = (cid:34) (cid:104) ¯ y i (cid:105) (cid:10) /s n,i (cid:11) (cid:104) ¯ y i /s n,i (cid:105) (cid:35) f i ( θ ∗ n ) , for all i = 1 , . . . , ∞ . (28)We now see that the fit is unbiased if and only if the termin square brackets in Eq. (28) equals unity.In summary, a necessary and sufficient criterion forleast-squares fitting to be unbiased is that the samplemean and reciprocal sample variance are uncorrelated. Note that ‘uncorrelated’ is a weaker requirement than‘independent’ and although the latter implies the for-mer, the reverse is not generally true. Naturally, whenthe sample mean and sample variance are independent,the sample mean and reciprocal sample variance are alsouncorrelated.Notice that we nowhere used what s n,i is, and in facta more general version of Eq. (28) is f i ( θ ∗ ) = (cid:34) (cid:104) y i (cid:105) (cid:10) w i (cid:11) (cid:104) y i w i (cid:105) (cid:35) f i ( θ ∗ n ) (29)where w i is assumed independent of θ but is otherwiseunconstrained. The criterion for an unbiased estimatoris now simply that the data, y i , and the squared weights, w i , are uncorrelated.Under some circumstances it is possible to determinethe bias of the fit parameters from the bias of the fit-function given in Eq. (29). If the function is invertiblethis is trivially the case. But, it is also possible if theamplitude of the fit-function is determined by just oneof the fit-parameters and the term in the square bracketsis independent of i . In this case, bias can be removedfrom the parameter estimate, ˆ θ , by simply multiplyingthe amplitude-parameter by the square bracket and leav-ing the other parameters unchanged. In the next sectionwe give an example of this.
2. Experimental averages as weights
If the sample standard deviation is proportional to thesample mean, s n,i ∝ ¯ y i for each i , the mean and varianceare clearly not uncorrelated and we expect the parameterestimate to be biased. From Eq. (28) we find f i ( θ ∗ ) = (cid:20) (cid:104) ¯ y i (cid:105) (cid:104) ¯ y − i (cid:105)(cid:104) ¯ y − i (cid:105) (cid:21) f i ( θ ∗ n ) . (30)As a real-world example, consider gamma-distributedexperimental data. We already mentioned that thegamma-distribution describes all power-spectra resultingfrom linear dynamical equations driven by an additivenoise, white or colored. Carrying on as before, we use thestandard deviation to weight the data, w f = 1 /s n ( ¯ P (ex) f ),where f is the frequency. Since the standard deviationand the mean are proportional, see Eq. (13), we have w f ∝ / ¯ P (ex) f . Using ¯ P (ex) f ’s known gamma-distribution,see Eq. (14), we can compute (cid:104) ( ¯ P (ex) f ) − q (cid:105) for q = 1 , , . . . and find the value of the bracketed bias-term in Eq. (30) (cid:34) (cid:104) ¯ P (ex) f (cid:105) (cid:104) ( ¯ P (ex) f ) − (cid:105)(cid:104) ( ¯ P (ex) f ) − (cid:105) (cid:35) = n/ ( n − , (31)That is, weighted least squares fitting of this wide classof power-spectra has a built-in, frequency independent,multiplicative bias of n − n for n >
2. The true powerspectral form can then be obtained from the least-squarefit as P f ( θ ∗ ) = nn − P f ( θ ∗ n ) (32)In this scenario, the theoretical N → ∞ limit that wetook in going from Eq. (24) to Eq. (25) corresponds toletting the measurement time (or sampling frequency)become infinite while keeping the sampling frequency (ormeasurement time) and all other experimental factorsfixed.In the simple case of a Lorentzian, Eq. (3), we have θ ∗ = ( D, f c ) and P f ( θ ∗ ) ≡ D/ (2 π ) f c2 + f = nn − D ∗ n / (2 π )( f ∗ c ,n ) + f , (33)from which we see that f ∗ c ,n is unbiased. In other words,the true value of the fit parameters can be obtained fromthe WLS estimates as D = nn − D ∗ n ≈ nn − D (lsq) (34) f c = f ∗ c ,n ≈ f c(lsq) , (35)where D (lsq) and f c(lsq) are the stochastically fluctuatingvalues returned by a least-square fit to a finite N data-set. This is a general feature of the power-spectra: Theycan be written as the product of an overall multiplicativescale-factor and a shape-function depending on f , where only the scale-factor is influenced by the bias .
3. Theoretical values as weights
If the sample standard deviation is known to be pro-portional to f i ( θ ∗ ), it seems reasonable to weight the databy the theoretical value 1 /f i . After inserting w i ∝ /f i in Eq. (19), the quantity to minimize is χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] f − i ( θ ) , (36) and the stationarity equations Eq. (20) become N (cid:88) i =1 y i f − i ∂ θ f i = N (cid:88) i =1 y i f − i ∂ θ f i . (37)Repeating the arguments from the previous section, theexpectation value of the stationarity conditions in thelimit N → ∞ is ∞ (cid:88) i =1 (cid:104) y i (cid:105) f − i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 (cid:104) y i (cid:105) f − i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) , (38)which is solved by f i ( θ ∗ n ) = (cid:104) y i (cid:105)(cid:104) y i (cid:105) , for all i = 1 , . . . , ∞ , (39)if a parameter set θ ∗ n exists, which solves all these manyequations simultaneously. The true value, f i ( θ ∗ ), can inthat case be written as f i ( θ ∗ ) = (cid:2) (cid:104) y i (cid:105) / (cid:104) y i (cid:105) (cid:3) f i ( θ ∗ n ) (40)and θ ∗ can be determined from θ ∗ n if (cid:104) y i (cid:105) / (cid:104) y i (cid:105) is inde-pendent of i , constant, and can be absorbed in θ ∗ n in asimple manner.If we again use the gamma-distributed power spectraldata as example, we see that P f ( θ ∗ ) = nn + 1 P f ( θ ∗ n ) (41)i.e, we once more have a bias that scales with the number n of spectra averaged over, although this time the bias ishalf the size and positive.Closed-form expressions for f ∗ c ,n and D ∗ n are straight-forward to obtain, when P f is a Lorentzian [9] (reference[9] contains a typo: DT msr should simply read D ). Theexpression for f ∗ c ,n given in [9] is un-biased, whereas theexpression for D ∗ n has its bias removed by multiplying by n/ ( n + 1). Note, that apart from the n/ ( n + 1) factor on D , the results derived in [9] are mathematically identi-cal to those derived using an MLE approach in the nextsection.
4. Constant values as weights (“un-weighted”)
When all weights are assumed to be equal, the functionto minimize, χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] , (42)has stationarity equations which, for N → ∞ gives ∞ (cid:88) i =1 f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 y i ∂ θ f i ( θ ∗ n ) . (43)Taking the expectation value on both sides, we have ∞ (cid:88) i =1 f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 (cid:104) y i (cid:105) ∂ θ f i ( θ ∗ n ) . (44)This equation may have the solution f i ( θ ∗ n ) = (cid:104) y i (cid:105) = f i ( θ ∗ ) . (45)In other words, this estimator is unbiased. However, it isnot a precise estimator: The stochastic errors on valuesit returns tend to be larger than those obtained with aweighted fit. Figures 2 and 3 illustrate this for the caseof fitting an aliased Lorentzian to power spectra fromoptical tweezers.
500 1000 2000 4000 8000012345678 f max (Hz) C oe ff i c i en t o f v a r i a t i on i n % D, LSQ, theo. stdD, LSQ, no weightsD, LSQ, exp.stdD, LSQ, exp. mean
FIG. 2: Stochastic error on fit parameters as a function ofcut-off frequency, f max , for least squares fitting of aliasedLorentzians using various weights, Eq. (19). For all fits theerrors are substantial for low values of f max . Notice the loga-rithmic frequency scale: For large f max , WLS returns a vari-ance nearly an order of magnitude smaller than for OLS (fit-ting with constant weights). The total number of acquireddata points was held fixed at N = 262 , n = 16 power spectra generated with f c = 500 Hz, D = 0 . µ m/s , and f sample = 16 ,
384 Hz. Sym-bols show the coefficient of variation, s n ( D ) / ¯ D , with D mea-sured over 100 independent stochastic simulations. The coef-ficient of variation for f c show similar trends and are roughlytwo times larger at these settings (data not shown). For OLS(triangles, Eq. (42)) the error is independent of f max for largefrequencies because the information there is de-emphasizedby the fitting algorithm. Using the experimental standarddeviation or mean as weights (filled circles, Eq. (23)) leadsto stochastic errors nearly as small as when using theoreticalweights (empty circles, Eq. (36))—the systematic errors (bi-ases) are twice as big and of the opposite sign however, seeFig. 4.
5. Iterated values as weights
When weights are determined iteratively, a fit is per-formed and the value returned for θ is used in the weight − n C oe ff i c i en t o f v a r i a t i on i n % D, LSQ, theo. stdD, LSQ, no weightsD, LSQ, exp.stdD, LSQ, exp. mean
FIG. 3: Stochastic error on fit parameters as a function ofnumber of power spectra averaged over, n , for least squaresfitting of aliased Lorentzians using various weights, Eq. (19):Constant weights (triangles); experimental standard devia-tion (black circles); experimental average (grey circles); theo-retical weights (white circles). Thick black line through whitecircles show the theoretical prediction Eq. (E2). The numberof acquired data points was held fixed at N = 262 , f max = f Nyq , for power spec-tra generated with f c = 500 Hz and D = 0 . µ m/s . Sym-bols show the coefficient of variation, s n ( D ) / ¯ D , with D mea-sured over 1,000 independent stochastic simulations. Usingtheoretical weights clearly outperforms all the other weighingschemes for n <
32; for n >
32, the experimental weighingscheme performs comparably well—use of the experimentalstandard deviation as weight leads to slightly larger variationthan use of the experimental average, as expected, and bothshow more variation than results obtained with theoreticalweights. Results obtained with constant weights—sometimesreferred to as “no weights” or “un-weighted”—show roughlyten times larger variation than does results obtained with the-oretical weights, for all n . of a new fit and so on, until some convergence criterionhas been met. The function to minimize with respect to θ for given θ (iter) reads χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] w i ( θ (iter) ) . (46)It has the stationarity equations N (cid:88) i =1 f i w i ( θ (iter) ) ∂ θ f i = N (cid:88) i =1 y i w i ( θ (iter) ) ∂ θ f i . (47)We now use the same arguments as above together withthe observation that at the fixed-point solution to thisiterative scheme, θ (iter) = θ n , and both equal θ ∗ n in thelimit N → ∞ . Taking the expectation value of the equa-tion after taking the limit N → ∞ , we see that θ ∗ n maysatisfy f i ( θ ∗ n ) = (cid:104) y i (cid:105) = f i ( θ ∗ ) , (48)in which case this estimator is unbiased. The estima-tion scheme just described is also known as “iteratively − − B i a s i n % D, LSQ, theo. stdD, LSQ, no weightsD, LSQ, exp. stdD, LSQ, exp. mean
FIG. 4: Systematic error (bias) on fit parameters as a functionof number of power spectra averaged over, n , for least squaresfitting of aliased Lorentzians using various weights, Eq. (19).The corner frequency is unbiased (data not shown), indepen-dent of n , whereas the diffusion coefficient shows a strongweighing-scheme dependent bias. The sampling frequencywas held fixed at f sample = 16 ,
384 Hz, and the fit was doneto the averaged power spectra generated with N = 262 , t msr = 16 s), f c = 500 Hz, and D = 0 . µ m/s . Symbolsshow the signed error on the mean of D measured over 1,000independent stochastic simulations. For least squares fittingwith constant weights (triangles) the bias on D is zero. Leastsquares fitting with experimental weights (black circles: stan-dard deviation; grey circles: average value) has a systematicerror on D with − /n dependence (thin black line) for n > /n -dependence(thick black line) in the systematic error on D . reweighted least squares” and convergence is not guaran-teed.
6. Error-bars
Error-bars on the parameter estimates ˆ θ for θ ∗ , can becalculated from the error-bars on the estimator ˆ θ n for θ ∗ n and the simple relationship in terms of n between θ ∗ and θ ∗ n , by propagation of errors (see Eq. (73)), provided thelatter exists. The error bars on ˆ θ n can be those returnedby a least-squares fitting routine used to determine ˆ θ n ,or they may be know theoretically. In that connection,note that the error-bars given in [9] for WLS with theo-retical weights are correct only in the limit n = ∞ , butby replacing D with n/ ( n + 1) D they are correct for all n . When ˆ θ is found from a ˆ θ n which itself is found byminimizing χ , the precision with which χ is known de-termines the precision of the estimate ˆ θ . Because thevarious weights in χ have different stochastic fluctua-tions, we expect that theoretical weights, Eq. (36), willgive smaller variation in ˆ θ than experimental weights,Eq. (23). A small calculation shows that the variance ofthe stationarity equations, Eq. (24), around zero includesa term (cid:104) ¯ P (ex) − f (cid:105) ∝ { ( n − n − n − n − } − , i.e., the error-bars are expected to be large for n ∼
4, if ex-perimental weights are used. When theoretical weightsare used, the error-bars are small and well-defined for all n . These results are verified by example, see Fig. 3. B. Results for power spectra
We verified the above theoretical results by Monte-Carlo simulations of the OT and AFM power-spectra andfound perfect agreement. Below, we list the specific re-sults.For the diffusion coefficient D , we found that the out-come of least-squares fitting, D (lsq) , depends on the num-ber, n , of power spectra averaged over in the follow-ing manner for the aliased (see Fig. 4) and non-aliasedLorentzians, as well as the aliased and non-aliased AFMexpressions (data not shown):1. D (lsq) = D ( n − /n , n >
2; if data-pointsare weighted with their experimental standard-deviation or mean, w f = 1 /s n ( ¯ P (ex) f ) or w f = √ n/ ¯ P (ex) f . This is the weighting choice with thelargest bias and with large stochastic errors for n ≤
4. Here, D is underestimated.2. D (lsq) = D ( n + 1) /n ; if data-points are weightedusing the theoretical expectation value for thestandard-deviation, w f = √ n/P f . This is the leastsquares fitting of [9]; a hybrid of MLE and LSQ,analytically tractable and thus very robust. It alsohas the smallest stochastic errors. Here, D (lsq) overestimates D .3. D (lsq) = D ; if the weights are updated iterativelyto the fitted value, w f = 1 /P iter f . This is a prac-tically correct but computationally intensive andsomewhat unstable approach; the initial guess mustbe close to the true value, or the algorithm will notconverge.4. D (lsq) = D ; if all data-points are given thesame weight, w f = constant. This method de-emphasizes information available at higher frequen-cies because power spectral values have constant relative error, hence rapidly decreasing absolute er-ror beyond f c or f , yet are treated as having con-stant absolute error. Consequently, this estimatorhas low precision, typically with two to ten timeslarger stochastic errors than the Cases 1–3 above.This is OLS fitting applied far outside its range ofvalidity.Figures 2, 3, and 4 show examples of the size of thesestochastic errors and biases, respectively, for fits of thealiased Lorentzian to data.We found that the expectation values of the fittedvalues for f c , f , and Q were independent of weighingscheme and un-biased, whereas the standard deviationsvaried by up to an order of magnitude depending onweighing scheme applied. Furthermore, fitting of non-aliased PSDs to aliased data introduces large systematicerrors for the trivial reason that they fail to capture theshape of the PSD near the Nyquist frequency (data notshown). VI. MAXIMUM LIKELIHOOD ESTIMATIONOF FIT PARAMETERS
We now proceed to determine the fit-parameters bythe method of maximum likelihood estimation. We alsoderive closed-form expressions for the fit-parameters interms of the experimental data values. They should beuseful for speedy online calibration.Both power spectra, given in Eqs. (3) and (4), are con-sequences of a linear, time-invariant dynamics driven bya white noise, Eq. (1). They are consequently of the form P f = 1 a + bf + cf , (49)with c = 0 in the case of the Lorentzian, and ( a, b ) or( a, b, c ) parameters to be fitted. This simple form, com-bined with the simple statistical properties of the exper-imental power spectral values, makes rigorous MLE ofthe parameters a straightforward numerical optimizationproblem as shown below.When fitting, we fit only to the positive-frequency partof the power spectrum, or to a subset of it. So here weconsidered only that part of the spectrum. Since ¯ P (ex) f and ¯ P (ex) f (cid:48) are uncorrelated for f (cid:54) = f (cid:48) , the probabilitydensity for the experimental spectrum ¯ P (ex) f , given itsexpectation value P f , is p ( ¯ P (ex) | P ) = (cid:89) f p n ( ¯ P (ex) f ; P f ) , (50)where p n is given in Eq. (14). Thus Maximum Likelihoodestimation of the theory’s parameter values consists inchoosing these parameters so they maximize p ( ¯ P (ex) | P )for given ¯ P (ex) , or, equivalently, minimize the negativelogarithm F (cid:48) ≡ n (cid:88) f (cid:16) ¯ P (ex) f /P f + ln P f (cid:17) + constant . (51)whereconstant = ln Γ( n ) − n ln n − ( n −
1) ln ¯ P (ex) f (52)is a constant with respect to ( a, b, c ) so it can be ignoredwhen minimizing. We are then left with the task of find-ing the values of ( a, b, c ) that minimize the cost function F ( a, b, c ) ≡ (cid:88) f (cid:16) ¯ P (ex) f /P f + ln P f (cid:17) , (53)which is an uncomplicated optimization problem that canbe solved numerically with standard programs. Goodstarting values are given in Eqs. (63) and (64). A. From non-linear to linear stationarity equationsvia a simple trick
Although we gave the solution to the full MLE prob-lem implicitly above, as the minimum of F Eq. (53), wecan speed up the fitting process substantially (numericaloptimization can be slow when the data-sets are large)and gain more insight into the problem by taking a fewmore analytical steps before turning to numerics.
1. Results for Optical Tweezers
For P f given in Eq. (3), written as in Eq. (49) with c = 0, F in Eq. (53) reads F ( a, b ) = (cid:88) f (cid:16) ( a + bf ) ¯ P (ex) f − ln( a + bf ) (cid:17) . (54)It is minimized with respect to a and b when these pa-rameters satisfy the stationarity condition (cid:88) f ¯ P (ex) f = (cid:88) f P f = (cid:88) f a + bf (cid:88) f f ¯ P (ex) f = (cid:88) f f P f = (cid:88) f f a + bf . (55)These are non-linear equations for a and b . However,from Eq. (15) we know that we can write P f = n (cid:104) ¯ P (ex)2 f (cid:105) ( n + 1) P f = n (cid:104) ¯ P (ex)2 f (cid:105) ( n + 1) ( a + bf ) . (56)Substituting for P f in Eq. (55) we find (cid:88) f ¯ P (ex) f = nn + 1 (cid:88) f (cid:104) ¯ P (ex)2 (cid:105) ( a + bf ) (cid:88) f f ¯ P (ex) f = nn + 1 (cid:88) f f (cid:104) ¯ P (ex)2 (cid:105) ( a + bf ) , (57)which is linear in a and b .Before solving for a and b we introduce the statistic S p,q ≡ K (cid:88) f f p ¯ P (ex) qf , (58)with K the number of terms in the sum, and the statisticobeys lim K →∞ S p,q = (cid:104) S p,q (cid:105) .The sums should only include those frequencies theuser deems relevant, i.e., frequencies corresponding tomechanical or electronic resonances can be excluded,and high/low frequency cut-offs can be applied. Thatis, the statistics can be trimmed iteratively if required:Power spectral values too far from a fit can be iden-tified and excluded from the sums, after which a newfit is found, et cetera until a steady state is reached0and all power spectral values satisfy the user-defined ac-ceptance criterion. When no frequencies are excluded, nK = N = t msr f sample , the total number of data ac-quired. In this latter case, one fits the power spectrumall the way out to the Nyquist frequency, and aliasingshould be taken into account (see appendices); unlessaliasing is eliminated by over-sampling data acquisitionelectronics.With this notation, Eq. (57) can be written in matrixform (cid:18) (cid:104) S , (cid:105) (cid:104) S , (cid:105)(cid:104) S , (cid:105) (cid:104) S , (cid:105) (cid:19) (cid:18) ab (cid:19) = n + 1 n (cid:18) S , S , (cid:19) , (59)with solution (cid:18) ab (cid:19) = 1 + 1 /n (cid:104) S , (cid:105)(cid:104) S , (cid:105) − (cid:104) S , (cid:105) (cid:18) S , (cid:104) S , (cid:105) − S , (cid:104) S , (cid:105) S , (cid:104) S , (cid:105) − S , (cid:104) S , (cid:105) (cid:19) , (60)In this expression, we know S p,q from the experiment,but not (cid:104) S p,q (cid:105) . However, we can always write S p,q = (cid:104) S p,q (cid:105) + δS p,q and substitute for (cid:104) S p,q (cid:105) in Eq. (60). About δS p,q it is easy to show that (cid:104) δS p,q (cid:105) = 0 and (cid:104) ( δS p,q ) (cid:105) = 1 K (cid:34) Γ( n + 2 q ) n q Γ( n ) − (cid:18) Γ( n + q ) n q Γ( n ) (cid:19) (cid:35) (cid:88) f f p P qf = 1 K [weak n -dep.] ˜ S p, q , (61)where ˜ S is defined as in Eq. (58) except for P f = (cid:104) ¯ P (ex) f (cid:105) replacing ¯ P (ex) f . Above, q = 2 in the relevant expres-sions and therefore ˜ S p,q scales as 1 /K for p = 0 , K for p = 2. Thus, apart from some n -dependence, the variance of S p,q scales as 1 /K or 1 /K both of which are very small numbers in a typical experi-ment, where K = N/n is of order 10 –10 . It is thereforean excellent approximation to replace (cid:104) S p,q (cid:105) with S p,q inEq. (60) (cid:18) ab (cid:19) ≈ (cid:18) ˆ a ˆ b (cid:19) ≡ /nS , S , − S , (cid:18) S , S , − S , S , S , S , − S , S , (cid:19) , (62)where now (ˆ a, ˆ b ) are our estimates of ( a, b ). By comparingEqs. (3) and (49) we then have f c = (cid:16) ab (cid:17) / ≈ (cid:18) ˆ a ˆ b (cid:19) / = (cid:18) S , S , − S , S , S , S , − S , S , (cid:19) / (63) D = 2 π b ≈ π ˆ b = 2 π nn + 1 S , S , − S , S , S , − S , S , . (64)
2. Results for AFM cantilevers
Identical reasoning leads, in the case of c (cid:54) = 0, to thethree coupled linear equations for a , b , and c S , S , S , S , S , S , S , S , S , (cid:124) (cid:123)(cid:122) (cid:125) S abc (cid:124) (cid:123)(cid:122) (cid:125) (cid:126)v ≈ (1+1 /n ) S , S , S , (cid:124) (cid:123)(cid:122) (cid:125) (cid:126)s , (65)which are inverted to give abc ≈ ˆ a ˆ b ˆ c ≡ n + 1 n S − S , S , S , . (66)Comparing Eqs. (4) and (49) then gives f = (cid:16) ac (cid:17) / ≈ (cid:18) ˆ a ˆ c (cid:19) / (67) D = 2 π b + 2( ac ) / ≈ π ˆ b + 2(ˆ a ˆ c ) / (68) (cid:18) πmγ (cid:19) = cb + 2( ac ) / ≈ ˆ c ˆ b + 2(ˆ a ˆ c ) / . (69)For completeness and ease of implementation we givethe inversion formulas: S − = C / D , D = S , S , S , − S , S , (70) − S , S , + 2 S , S , S , − S , , C = C C C C C C C C C , (71)and C = S , S , − S , C = S , S , − S , S , C = S , S , − S , (72) C = S , S , − S , S , C = S , S , − S , C = S , S , − S , . Of course, it is also possible to numerically invertEq. (65) and insert the resulting values for ˆ a, ˆ b, ˆ c inEqs. (63,64,67,68,69). The above results however, shouldresolve potential issues with numerical stability arisingfrom the inversion of near-singular matrices. B. Examples of Fitting the power spectra
Figure 5A shows the average of n = 16 power spectrafor the Brownian motion of a micro-sphere in an optical1trap. The data are synthetic, computer generated; see[9]. Thus, the values for f c and D are known exactly,and can be used as benchmarks for results obtained byfitting, with no worry about any of the complicating cir-cumstances that can affect data in the real world. Thered line is the expectation value of the aliased Lorentzian,taking as input the exactly known values of f c and D .The yellow line is the aliased Lorentzian correspondingto the stochastically realized parameter values for f c and D , determined by rigorous MLE of ( a, b ) with no use ofour simplifying trick. The dashed blue line is the resultof MLE, with the use of our simplifying trick. All of thesethree lines plot virtually on top of each other. Finally, theblack lines shows the result of least squares fitting of analiased Lorentzian to the data, using as weights: (i) Thestandard deviations on the n = 16 spectra with a result-ing − /n = − .
5% systematic error on D , see Eq. (32);and (ii) Theoretical weights, resulting in a 1 /n = 6 . D , see Eq. (41).Note, that for clarity of presentation we discuss thenon-aliased cases in the main-text and give closed-formaliased results in Appendices B and C. Numerical testsof the theory are done using the aliased theory in order toutilize all the available data: When fitting a non-aliasedexpression to aliased data it is necessary to introduce acut-off frequency f max (cid:28) f Nyq and discard all data aboveit.Figure 5B shows the same data and fits in a normal-ized residual plot , i.e., data or fits minus the true ex-pected value (the residue) divided by the true expectedvalue. Thus, deviation from zero in this plot shows byhow much data and fits differ from the true expectedvalue, measured in units of the true expected value.The data should scatter about 0 with standard deviation n − / = 16 − / = 25%, whereas the fits should simplytrace zero for all frequencies. Here, the true expectedvalue is known because we use synthetic data; we nor-malize by it to show the results of all fits in a singlefigure. In an experiment, the true expected value is notknown and the fitted value is used instead—when a biasis present the residues will consequently differ from zeroin a systematic manner.This bias always hides in the scatter of the data in afigure like Fig. 5A, because n − ≤ n − / for n = 1 , , . . . .But, the bias is substantial for small to intermediate val-ues of n and will reveal itself in a plot like Fig. 5B.Figure 6 is similar to Fig. 5, except its power spectrumdescribes the confined Brownian motion of a massive par-ticle, e.g. an AFM cantilever in air, and the fits do nottake aliasing into account. The data are synthetic, com-puter generated; see Appendix A. C. Error-bars on fit parameters
Because we have derived closed-form expressions forthe fit parameters we can also calculate expressions forthe expected error-bars on the fit parameters. We do ! ! ! N o r m a li z ed r e s i dua l s i n % Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights ! ! ! Frequency (Hz) P o w e r ( m s ) Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights AB FIG. 5: Various fits of aliased Lorentzian to synthetic powerspectrum for microsphere in optical trap. Data are artifi-cial, with known parameter values f c = 500 Hz and D =0 . µ m /s, f sample = 16 ,
384 Hz, and t msr = 16 s. A: Dataand aliased Lorentzians. PSD values (grey line) are averagesover 16 spectra. Note, how the three Lorentzians correspond-ing to, respectively, the exactly known values of f c and D (red line), rigorous numerical ML-fit (yellow line), and an-alytical ML-fit using simplifying trick (dashed blue line) allplot on top of each other. In contrast, the Lorentzian fromnumerical least-squares fits with experimental or theoreticalweights (thick and thin black lines, respectively) are offsetby -12.5% and +6.25% due to their systematic errors on D . B: Residual plots, i.e., same data and fits as shown in PanelA, but divided by the known true expectation value, thenunity subtracted. The agreement between data and the ML-fits is practically perfect whereas the least-squares fits clearlyunder/over-estimate the PSD. that and compare to the theoretical limit for how smallthe error-bars can be.Quite generally, irrespective of whether we study thealiased or non-aliased Lorentzian or the AFM PSD, theerror-bars on the fit-parameters are found by propagatingthe errors on a , b , and c using the generic formula for afunction z (calculating the differential ∆ z and squaring2 ! ! ! N o r m a li z ed r e s i dua l s i n % Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights ! ! Frequency (kHz) P o w e r ( m s ) Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights AB FIG. 6: Various fits of Eq. (4) to the average of 16 powerspectra for an AFM cantilever. Data are synthetic withknown parameter values f = 15 kHz, D = 0 . µ m /s, and( πmγ ) = 6169 µ s , f sample = 65 ,
536 Hz, and t msr = 8 s. A :Data and fits. Note, how the three curves corresponding to,respectively, the exactly known parameters (red line), rigor-ous numerical ML-fit (yellow line), and analytical ML-fit us-ing simplifying trick (dashed blue line) all plot on top of eachother. In contrast, the result of a numerical least-squares fitswith experimental or theoretical weights (thick and thin blacklines, respectively) are offset by -12.5% and +6.25% due totheir systematic errors on D . In all the fits, only frequenciesbelow 20 kHz were included. B : Residual plots, i.e., same asshown in Panel A, but divided by the exactly known expecta-tion value, then unity subtracted. At the highest frequencies,where the non-aliased fits start to deviate from the aliasedPSD, a slight departure from a straight line is seen in all thefits. it): σ ( z [ a, b, c ]) = (cid:104) (∆ z ) (cid:105) = (73)( ∂ a z ) (cid:104) (∆ a ) (cid:105) + ( ∂ b z ) (cid:104) (∆ b ) (cid:105) + ( ∂ c z ) (cid:104) (∆ c ) (cid:105) + 2 ∂ a z ∂ b z (cid:104) ∆ a ∆ b (cid:105) + 2 ∂ a z ∂ c z (cid:104) ∆ a ∆ c (cid:105) + 2 ∂ b z ∂ c z (cid:104) ∆ b ∆ c (cid:105) , with ∂ a z = ∂z/∂a , and similarly for b and c ; z = f c , f , D, . . . , and (cid:104) ∆ a ∆ b (cid:105) etc. are elements of the covari- ance matrixcov( a, b, c ) ≡ (cid:104) (∆ a ) (cid:105) (cid:104) ∆ a ∆ b (cid:105) (cid:104) ∆ a ∆ c (cid:105)(cid:104) ∆ a ∆ b (cid:105) (cid:104) (∆ b ) (cid:105) (cid:104) ∆ b ∆ c (cid:105)(cid:104) ∆ a ∆ c (cid:105) (cid:104) ∆ b ∆ c (cid:105) (cid:104) (∆ c ) (cid:105) . (74)This covariance matrix can be written in terms of theexpectation values for the statistics, see Appendix D,cov( a, b, c ) ≈ cov(ˆ a, ˆ b, ˆ c ) = 1 N n + 3 n (cid:104) S (cid:105) − (75)with (cid:104) S (cid:105) = n + 1 n ˜ S , (76)where ˜ S is a matrix of the same form as S in Eq. (65),but with expectation values P f = (cid:104) ¯ P (ex) (cid:105) replacing theexperimental values ¯ P (ex) in the statistics Eq. (58), seeEq. (F3).To find out how much of the available information weare putting to use, we can express the covariance matrixin terms of the Fisher information matrix I = N ˜ S forthe full (not the trick) MLE problemcov( a, b, c ) ≈ cov(ˆ a, ˆ b, ˆ c ) = n + 3 n + 1 I − (77)from which we see that as n increase, we approach theCram´er-Rao bound [15] on the variance for an unbiasedestimator, which is just I − . The main thing to notice,however, is that all the covariances in Eq. (75) are verysmall because N is very large—the weak n dependence(see Fig. 7) is of no consequence in comparison. In anexperiment, simply insert the fitted value for the power-spectrum as a best estimate for P f in ˜ S .
1. Results for Optical Tweezers
For the non-aliased Lorentzian in Eq. (3) we then have,using the generic Eq. (73) σ ( f c ) = f c2 (cid:20) (cid:104) (∆ a ) (cid:105) a + (cid:104) (∆ b ) (cid:105) b − (cid:104) ∆ a ∆ b (cid:105) ab (cid:21) (78)and σ ( D ) = D (cid:104) (∆ b ) (cid:105) b , (79)irrespective or the method used to estimate a , b , and c . Actual numbers are found by inserting the estimates(ˆ a, ˆ b, ˆ c ). The results for the aliased Lorentzian are givenin Appendix E.3 C oe ff i c i en t o f v a r i a t i on i n % Simulated fcSimulated DAnalytical fcAnalytical D[(n+3)/(n+1)]
FIG. 7: Stochastic error on fit parameters as a function ofnumber, n , of PSDs averaged over, for ML-fits of aliasedLorentzians using simplifying trick. Compared to the de-pendence on the measurement time (Fig. 8), the error isonly weakly dependent on n . The sampling frequency washeld fixed at f sample = 16 ,
384 Hz, and the fit was done tothe averaged power spectra generated with N = 262 , t msr = 16 s), f c = 500 Hz, and D = 0 . µ m/s . Filledsymbols show the coefficient of variation, σ ( X ) / (cid:104) X (cid:105) , with X = f c , D determined using Eqs. (C2) and (C3), from 100independent stochastic simulations. Empty symbols showthe theoretical expectation values from Eqs. (E1) and (E2).Dashed lines show the (cid:112) ( n + 3) / ( n + 1) ∈ [1 : √
2] scalingpredicted from Eqs. (77) and (E5).
2. Results for AFM cantilevers
For the AFM, the variance of the three fit parametersare: σ ( f ) = f (cid:20) (cid:104) (∆ a ) (cid:105) a + (cid:104) (∆ c ) (cid:105) c − (cid:104) ∆ a ∆ c (cid:105) ac (cid:21) (80) σ ( D ) = D π (cid:2) (cid:104) (∆ a ) (cid:105) f − + (cid:104) (∆ b ) (cid:105) + (cid:104) (∆ c ) (cid:105) f + 2 (cid:104) ∆ a ∆ b (cid:105) f − + 2 (cid:104) ∆ a ∆ c (cid:105) + 2 (cid:104) ∆ b ∆ c (cid:105) f (cid:3) (81) σ ( G ) = G c (cid:2) (cid:104) (∆ a ) (cid:105) f − + (cid:104) (∆ b ) (cid:105) + (cid:104) (∆ c ) (cid:105) ( G − − f ) + 2 (cid:104) ∆ a ∆ b (cid:105) f − + 2 (cid:104) ∆ a ∆ c (cid:105) (1 − G − f − )+ 2 (cid:104) ∆ b ∆ c (cid:105) ( G − − f ) (cid:3) , (82)where G = (2 πm/γ ) . (83)
3. How good are the fits?
The goodness of fit , i.e., the support for the hypothesisthat the fitted theory is correct, is [16] the probabilitythat a repetition of the experiment yields a data set witha smaller value for p . A calculation shows that for K (cid:29) − N C oe ff i c i en t o f v a r i a t i on i n % Simulated fcSimulated DAnalytical fcAnalytical D
FIG. 8: 1 / √ N -dependence of stochastic error on fit parame-ters for ML-fits of aliased Lorentzians using simplifying trick.An expected 1 / √ N behavior is easily made out on the double-logarithmic scale. The sampling frequency was held fixedat f sample = 16 ,
384 Hz, and the fit was done to the aver-age of n = 16 power spectra generated with f c = 500 Hzand D = 0 . µ m/s . Filled symbols show the coefficientof variation, σ ( X ) / (cid:104) X (cid:105) , with X = f c , D determined usingEqs. (C2) and (C3), from 100 independent stochastic simula-tions. Empty symbols connected by lines show the theoreticalexpectation values from Eqs. (E1) and (E2). the support, or backing, is B ( s ) = erfc (cid:16) | s − K | (cid:112) n/ (2 K ) (cid:17) (84)where erfc is the complementary error function, s ≡ (cid:80) f ¯ P (ex) f /P f , and K is the number of terms in the sum s . We have assumed above that this number is muchlarger than the number of parameters fitted, hence equalto the number of degrees of freedom. It is of order 10 –10 in our case, while 2–3 parameters are fitted. Whenthe sum s = K , the expectation value for s , the back-ing is one but it rapidly drops to zero as s becomeslarger or smaller than K . The backing calculated forthe fits shown in Fig. 5 were, respectively, 0 (lsq fits; zerowithin the numerical precision of MatLab), 0.87 (ML-fit with simplifying trick), and 1.00 (rigorous numericalML-fit; the first deviation from unity is in the 7th dec-imal place). These numbers are stochastically varyingbecause the PSD values are. The backing for the rigor-ous numerical ML-fit will always be close to one becausethe stationarity conditions, see Eq. (53), for which thefit parameters are determined are virtually the same as s = K . VII. SUMMARY AND CONCLUSION
1. A time series of N coordinate values for an opticallytrapped microsphere or an AFM cantilever doingBrownian motion, gives rise to N power spectralvalues ( N/ / √ N in the aliasedcase (Section VI C and Appendix E). For the non-aliased case see also Fig. 2.2. For the purpose of displaying the experimentalpower spectrum, its signal-to-noise value is re-duced by a factor √ n by dividing the original data,the time series of N coordinates, into n equallylong, non-overlapping subseries (Section IV). Fromthese, n experimental power spectra are calculatedand averaged over. This noise-reduced power spec-trum covers the same frequency interval, but theseparation ∆ f between consecutive points has in-creased by a factor n .3. Noise reduction trades resolution on the frequencyaxis for resolution on the power axis in a man-ner that loses no information about the parameterscharacterizing the power spectrum.4. The 1 / √ n scatter of experimental power spectralvalues should not be confused with the expectedstochastic error on parameter values characteriz-ing the power spectrum; the latter are of order1 / √ N with only a weak n -dependence, as shownin Eq. (75), and Figs. 8 and 7.5. Formulas given above, based on maximum likeli-hood estimates, eliminate this issue and all furtherfitting for the cases of the non-aliased and aliasedLorentzian power spectrum and for the non-aliaseddamped harmonic oscillator as model for an AFMcantilever (Section VI A). Fitting has been doneonce-and-for-all and the results, including errorbarsand a goodness-of-fit measure, are given by the for-mulas in Section VI and Appendix C.6. For other systems described by a linear Langevinequation driven by white or colored noise (Sec-tion IV A), one can determine parameters of thetheoretical power spectrum by fitting it to an ex-perimental spectrum, using weighted least-squaresfitting and experimental or theoretical error bars asweights, which is computationally faster and morerobust than MLE. The resulting value for the diffu-sion coefficient D should be corrected as describedin Eqs. (32) and (41) respectively.7. Quite generally, beyond optical traps and AFMcantilevers, if the dependent variable (the data) andthe squared weights are independent, then weightedleast-squares yields unbiased results. If not, theleast-squares fit is typically biased, but this biascan often be removed altogether by a simple re-scaling of the fit-results (Section V).8. To minimize the stochastic error on fit parameterswe suggest setting n = 8 or larger if using ML-fitswith our simplifying trick or WLS with theoreti-cal weights, see Eqs. (77) and (E5) and Fig. 7. IfWLS with experimental weights is used, we suggest n = 16 or larger, see Fig. 3. Also, for given f c , n ,and N , f sample = 8 f c will minimize the stochas-tic error, see Fig. 9. Typically however, t msr and f c are fixed and f sample should simply be set as high asmeaningfully possible as N is the most importantfactor in increasing the precision, see Fig. 8.9. For optical tweezers, the highest precision is ob-tained by fitting over as much of the frequencyrange as can be captured by theory, including mod-ification to the PSD due to hydrodynamics, optics,and instrumentation [9, 10, 13]. If a certain levelof precision is required it then becomes a matterof measuring long enough with the sampling fre-quency as high as experimentally possible (the min-imum in Fig. 9 is for fixed N ).10. When doing least squares fitting we recommend theuse of theoretical weights, Eq. (36), if the standarddeviation is known to be proportional to the ex-pected value, because this minimizes the stochasticerrors on the fit parameters, see Figs. 2 and 3. Ifany bias is present, that will also be smaller thanthe bias resulting from experimental weights, andcan often be eliminated with the help of Eq. (40)or (41). f sample C oe ff i c i en t o f v a r i a t i on i n % Simulated fcSimulated DAnalytical fcAnalytical D
FIG. 9: Stochastic error on fit parameters as a function ofsampling frequency, f sample , for ML-fits of aliased Lorentziansusing simplifying trick. A non-monotonic behavior is seenwith minimum around f sample = 8 f c . The number of acquireddata points was held fixed at N = 262 , n = 16 power spectra generated with f c = 500 Hz and D = 0 . µ m/s . Filled symbols show the co-efficient of variation, σ ( X ) / (cid:104) X (cid:105) , with X = f c , D determinedusing Eqs. (C2) and (C3), from 100 independent stochasticsimulations. Empty symbols connected by lines show the the-oretical expectation values from Eqs. (E1) and (E2). VIII. ACKNOWLEDGEMENTS
We are thankful to Erik Sch¨affer for a critical read-ing of the manuscript. SFN gratefully acknowledges fi-5nancial support from the Carlsberg Foundation and theLundbeck Foundation. HF gratefully acknowledges fi-nancial support from the Human Frontier Science Pro-gram, GP0054/2009-C.6
Appendix A: Monte Carlo simulation of theEinstein-Ornstein-Uhlenbeck theory of Brownianmotion in a harmonic potential
In the case of non-negligible mass m , Eq. (1) is rewrit-ten as two coupled first-order equations,dd t (cid:18) x ( t ) v ( t ) (cid:19) = − M (cid:18) x ( t ) v ( t ) (cid:19) + (2 D ) / γm (cid:18) η ( t ) (cid:19) , (A1)where we have introduced the 2 × M = (cid:18) − κm γm (cid:19) . (A2)Equation (A1) has the solution (cid:18) x ( t ) v ( t ) (cid:19) = (2 D ) / γm (cid:90) t −∞ d t (cid:48) e − M ( t − t (cid:48) ) (cid:18) η ( t (cid:48) ) (cid:19) , (A3)from which follows that (cid:18) x j +1 v j +1 (cid:19) = e − M ∆ t (cid:18) x j v j (cid:19) + (cid:18) ∆ x j ∆ v j (cid:19) . (A4)Here e − M ∆ t = 1 λ + − λ − (cid:18) − λ − c + + λ + c − − c + + c − λ + λ − ( c + − c − ) λ + c + − λ − c − (cid:19) (A5)where λ ± ≡ γ m ± (cid:114) γ m − κm , u ± ≡ (cid:18) ∓ ± λ ± (cid:19) (A6)are the two eigenvalues and corresponding eigenvectorsof M , c ± = exp( − λ ± ∆ t ) , (A7)and we have introduced the notation (cid:18) ∆ x j ∆ v j (cid:19) ≡ ∆ x + ,j u + + ∆ x − ,j u − (A8)with∆ x ± ,j ≡ (2 D ) / λ + + λ − λ + − λ − (cid:90) t j +1 t j dt (cid:48) e − λ ± ( t j +1 − t (cid:48) ) η ( t (cid:48) ) . (A9)From Eq. (2) follows that ∆ x ± ,j are two random lengthsdrawn from Gaussian distributions with vanishing expec-tation value and known variances: (cid:104) ∆ x ± ,j ∆ x ± ,k (cid:105) = σ ± δ j,k , (A10) σ ± ≡ D (cid:18) λ + + λ − λ + − λ − (cid:19) − c ± λ ± . It also follows that ∆ x + ,j and ∆ x − ,j are correlated witheach other, but uncorrelated with all ∆ x ± ,k for j (cid:54) = k : (cid:104) ∆ x + ,j ∆ x − ,k (cid:105) = σ − δ j,k , (A11) σ − ≡ D (cid:18) λ + + λ − λ + − λ − (cid:19) − c + c − λ + + λ − . From their known correlation follows, after some calcu-lation, that they can be expressed in terms of two uncor-related
Gaussian distributed random numbers with unitvariance, η ( a ) j and η ( b ) j , as (cid:18) ∆ x j ∆ v j (cid:19) = (A12) (cid:18) A + (cid:18) − λ + (cid:19) + A − (cid:18) − λ − (cid:19)(cid:19) (1 + α ) / η ( a ) j + (cid:18) A + (cid:18) − λ + (cid:19) − A − (cid:18) − λ − (cid:19)(cid:19) (1 − α ) / η ( b ) j , where we have introduced the notation A ± = λ + + λ − λ + − λ − (cid:115) (1 − c ± ) D λ ± (A13)and α = 2 (cid:112) λ + λ − λ + + λ − − c + c − (cid:113) (1 − c )(1 − c − ) . (A14)So iteration of Eq. (A4) with use of Eq. (A12) generatesa time series of positions x j , which is sampled equidis-tantly in time with separation ∆ t from the continuous-time solution to Eq. (1). Since we use the exact ana-lytical solution of Eq. (1) in the generation of this se-ries, the finite value of ∆ t causes no discretization error.The only numerical errors associated with our solutionare associated with the representation of real numberson a computer, and, rather hypothetical, with the use ofpseudo-random numbers. Appendix B: Aliased AFM Power Spectrum
For the AFM, using the results from Appendix A, weget for the aliased power spectrum: (cid:104) P (ex) k (cid:105) = (cid:104)| ˆ x k |(cid:105) /t msr = σ α − + σ − α + − σ − α + − α + α − ∆ t (B1)where α + = 1 + c − c + cos(2 πk/N ) (B2) α − = 1 + c − − c − cos(2 πk/N ) (B3) α + − = 1 + c + c − − ( c + + c − ) cos(2 πk/N ) (B4)and we have used that the discrete Fourier transform of η ( a ) j and η ( b ) j have the following characteristics (cid:104) ˆ η ( a ) ∗ k ˆ η ( b ) l (cid:105) = 0 (B5) (cid:104) ˆ η ( a ) ∗ k ˆ η ( a ) l (cid:105) = (cid:104) ˆ η ( b ) ∗ k ˆ η ( b ) l (cid:105) = t msr ∆ t δ k,l . (B6)7For completeness we note that the expression in Eq. (B1)has as limiting expression Eq. (C1) when the mass van-ishes: lim m → (cid:104) P (ex) k (cid:105) = σ − α − ∆ t (B7)as is seen by inspection. Appendix C: Maximum Likelihood Estimation foraliased power spectra
In an OT experiment, the time-series of bead positions x ( t ) is obtained by sampling the continuous output fromthe photodiode at discrete times t j = j ∆ t , ∆ t = 1 /t msr .Applying the discrete Fourier transform to x ( t ) we find [9]that the expectation value for the aliased power spectrumcan be written in the form: P alias k = 1 A + B cos(2 πk/N ) , (C1)where A and B are related to f c and D through f c = f sample π u (C2) D = f sample2 A tanh( u ) u , (C3) u = cosh − ( − A/B ) . (C4)By inserting Eq. (C1) in Eq. (53) the stationarity con-ditions ( ∂ A F = ∂ B F = 0) are seen to be (cid:88) f ¯ P (ex) f = (cid:88) f P f (C5) (cid:88) f cos(2 πk/N ) ¯ P (ex) f = (cid:88) f cos(2 πk/N ) P f . (C6)We now repeat the trick introduced in Section VI A toturn Eqs. (C5) and (C6) into expressions linear in A and B (cid:18) R , R , R , R , (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) R (cid:18) AB (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) (cid:126)u ≈ (1 + 1 /n ) (cid:18) R , R , (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) (cid:126)r , (C7)that are solved to give A ≈ ˆ A ≡ n + 1 n R , R , − R , R , R , R , − R , (C8) B ≈ ˆ B ≡ n + 1 n R , R , − R , R , R , R , − R , . (C9)where we have introduced the (aliased) statistics R p,q = 1 K (cid:88) k cos p (2 πk/N ) ¯ P (ex) qk . (C10) We do not attempt here to give the aliased results for f , D , and G from the AFM case: To avoid the alias-ing of high frequency noise to the lower frequencies ofinterest, a high sampling frequency is often used whenacquiring AFM data. However, only the region around f is well captured by the dampened harmonic oscillatortheory and therefore no more than this region is fitted.Since the aliased expressions only deviate substantiallyfrom the non-aliased ones at high frequencies, and be-cause the non-aliased expressions are much simpler, weonly treated the non-aliased MLE for the AFM here. Appendix D: Covariance Matrix
This appendix derives the results that are used in Sec-tion VI C and Appendix E. To calculate the covariancematrix we look at the response of the estimated fit pa-rameters to fluctuations in the statistics S p,q . The calcu-lations go through unchanged for the aliased Lorentzian(see above) with statistics R p,q . First, we note that wecan write each term in Eq. (65) S (cid:126)v ≈ S ˆ (cid:126)v ≡ S ˆ a ˆ b ˆ c ≡ (1 + 1 /n ) (cid:126)s (D1)as the sum of its “true underlying” value and a fluctua-tion (∆ S and ∆ (cid:126)s ) or a response (∆ (cid:126)v ) to fluctuations S = (cid:104) S (cid:105) + ∆ S (D2)ˆ (cid:126)v = (cid:126)v + ∆ (cid:126)v (D3) (cid:126)s = (cid:104) (cid:126)s (cid:105) + ∆ (cid:126)s (D4)where the elements in ∆ S are∆ S p,q = 1 K (cid:88) f f p (cid:104) ¯ P (ex) q − (cid:104) ¯ P (ex) q (cid:105) (cid:105) (D5)To first order in the fluctuations we thus have∆ (cid:126)v = ∆ a ∆ b ∆ c = (cid:104) S (cid:105) − (cid:18) n + 1 n ∆ (cid:126)s − ∆ S (cid:126)v (cid:19) (D6)where we notice that to first order (cid:104) ∆ (cid:126)v (cid:105) = 0 which, how-ever, does not mean that this is an unbiased estimatoras shown below. The covariance matrix (cid:104) ∆ (cid:126)v ⊗ ∆ (cid:126)v (cid:105) asgiven in Eq. (74) then follows after some calculation us-ing Eqs. (49) and (15).We emphasize here, that the above calculations wereto first order in 1 / √ K , with K the number of terms inthe statistics S p,q : Whereas the relative size of an indi-vidual fluctuation in the power spectral value ∆ P (ex) isindependent of K , the relative sizes of the overall fluctu-ations in the sums ∆ S p,q go to zero as 1 / √ K . Since K is typically of the order 10 –10 , this first-order approx-imation is very good.8 Appendix E: Error-bars for the aliased Lorentzian
The error-bars on f c and D are calculated as before,using Eq. (73) for the variance: σ ( f c ) = f sample2 A π ( A − B ) (cid:20) (cid:104) (∆ A ) (cid:105) A + (cid:104) (∆ B ) (cid:105) B − (cid:104) ∆ A ∆ B (cid:105) AB (cid:21) (E1)and σ ( D ) = ( ∂ A D ) (cid:104) (∆ A ) (cid:105) + ( ∂ B D ) (cid:104) (∆ B ) (cid:105) + 2 ∂ A D ∂ B D (cid:104) ∆ A ∆ B (cid:105) (E2)where ∂ A D = DA (cid:18) B A − B − − Au √ A − B (cid:19) (E3) ∂ B D = DB (cid:18) Au √ A − B − B A − B (cid:19) , (E4)and u is given in Eq. (C4). The covariance matrix iscalculated as before, givingcov( A, B ) ≡ (cid:18) (cid:104) (∆ A ) (cid:105) (cid:104) ∆ A ∆ B (cid:105)(cid:104) ∆ A ∆ B (cid:105) (cid:104) (∆ B ) (cid:105) (cid:19) ≈ N n + 3 n + 1 ˜ R − , (E5)where ˜ R is a matrix with the same structure as R , seeEq. (C7), but with the experimental values ¯ P (ex) replacedby the theoretical (in practice the fitted) values P f in allthe statistics Eq. (C10).We tested the above analytical results by comparing tothe results of simulations: Multiple independent positiontime-series for a mass-less particle diffusing in a harmonicpotential were created using the methods given in [9].The resulting power spectra were fitted using Eqs. (C2–C10). Since we simulate a stochastic process there isscatter in the fitted parameters and it is this scatter thatwe compare to the results given in Eqs. (E1) and (E2).The results are shown in Figs. 8 and 7. Figure 8 showsthe expected 1 / √ N scaling, whereas Fig. 7 shows a √ n + 3 / √ n + 1 scaling.Figure 9 shows that the optimal tradeoff between preci-sion and amount of data acquired seems to manifest itselfat a sampling frequency roughly eight times the cornerfrequency; any slower than this leads to large errors inboth parameters because the PSD essentially reduces tothe ratio of D to f c , i.e., two parameters are used to fita single constant. Sampling much faster than f c , butkeeping the number of acquired data points fixed, has noeffect on the error on D but is detrimental for f c sincethere is progressively less information about f c at largerfrequencies. The effect on the precision of increasing n is small but positive; compared to the effect of N and f sample it can be ignored (after including it as describedin Eqs. (C8) and (C9)).For comparison, the error-bars on the fit parameters,as a function of cut-off frequency f max , from a least-squares fit are shown in Fig. 2: The average of n = 16synthetic PSDs were fitted by minimizing Eq. (23) withthe data-points weighted by the standard deviation ofthe n PSDs. Also shown is a LSQ fit where the weightsare kept constant; this is the kind of fitting performedby primitive LSQ routines. For a detailed discussionof how the stochastic error depends on the fitting range[ f min : f max ] the reader is referred to section VIII in [9]. Appendix F: Bias of the MLEs
Obviously, we must be paying a price somewhere forturning a non-linear problem into a linear one with ourlittle trick, or else we would have turned a non-linearproblem into an exactly solvable mathematical problem.That is sometimes done, but not here: The trick worksthrough an approximation, and the resulting approxi-mate estimator is biased, which means that its expecta-tion value is different from the true value of the quantityit estimates. So on the average it misses the correct re-sult. Bias is systematic error on averages. That is the nature of the price we pay. Fortunately, it is negligiblein size, as we demonstrate now.For the non-aliased Lorentzian and AFM we find, byexpanding Eq. (D1) to first order in ∆ (cid:126)v and second orderin ∆ S p,q (cid:104) ∆ (cid:126)v (cid:105) = (cid:104) ∆ a (cid:105)(cid:104) ∆ b (cid:105)(cid:104) ∆ c (cid:105) (F1)= (cid:104) S (cid:105) − (cid:20) (cid:104) ∆ S (cid:104) S (cid:105) − ∆ S (cid:105) (cid:126)v − n + 1 n (cid:104) ∆ S (cid:104) S (cid:105) − ∆ (cid:126)s (cid:105) (cid:21) . For the non-aliased Lorentzian this expression can be re-duced to (cid:18) (cid:104) ∆ a (cid:105)(cid:104) ∆ b (cid:105) (cid:19) = 2 N n + 2 n + 1 (cid:16) ˜ S , ˜ S , − ˜ S , (cid:17) − (F2) (cid:18) ˜ S , − ˜ S , − ˜ S , ˜ S , (cid:19) (cid:18) ˜ S , − S , ˜ S , ˜ S , − S , ˜ S , (cid:19) ˜ S , ˜ S , ˜ S , where˜ S p,q = 1 K (cid:88) f f p (cid:104) ¯ P (ex) (cid:105) q = 1 K (cid:88) f f p P qf . (F3)That is, the bias is proportional to 1 /N which is a verysmall number, and displays a weak n -dependence. Fromthese expressions we find the bias on f c and D to be (cid:104) ∆ f c (cid:105) = f c (cid:18) (cid:104) ∆ a (cid:105) a − (cid:104) ∆ b (cid:105) b (cid:19) (F4) (cid:104) ∆ D (cid:105) = − D (cid:104) ∆ b (cid:105) b . (F5)9We only know the true values of f c and D in simula-tions, in an experiment the best estimate of the true valuewould be the fitted value.For the aliased Lorentzian we find in an analogousmanner (cid:104) ∆ f c (cid:105) = f sample π A √ A − B (cid:20) (cid:104) ∆ B (cid:105) B − (cid:104) ∆ A (cid:105) A (cid:21) (F6) (cid:104) ∆ D (cid:105) = − D (cid:20)(cid:18) B A − B + A √ A − B (cid:19)(cid:18) (cid:104) ∆ A (cid:105) A + (cid:104) ∆ B (cid:105) B (cid:19) − (cid:104) ∆ A (cid:105) A (cid:21) , (F7)where (cid:104) ∆ A (cid:105) and (cid:104) ∆ B (cid:105) are found from Eq. (F1) by ev-erywhere replacing ˜ S with ˜ R . The result of a numericaltest of the above relations is shown in Fig. 10. − − B i a s i n % Simulated fcSimulated DTheory fcTheory D
FIG. 10: Bias of the parameter estimates for f c (grey squares)and D (white circles), from ML-fits with our trick, of thealiased Lorentzian. Error-bars are standard errors on themean. The point of this figure is only to show that the bias isindeed very small and can be completely ignored. Theoreticalexpectation value for the bias as given in Eq. (F6) (grey line)and Eq. (F7) (black line) are less than 1 / ,
000 of a percent forthese settings. Bias was measured as the difference betweenthe average of 1,000 determinations of the fit parameters andthe known input values f c = 500 Hz and D = 0 . µ m/s .Simulations were run with N = 2 and f sample = 2 Hz—values chosen to minimize the stochastic errors. Data weretreated with n non-overlapping Hann windows before calcu-lation of the PSD.[1] K. C. Neuman and S. M. Block, Review of Scientific In-struments , 2787 (2004).[2] K. C. Neuman, T. Lionnet, and J.-F. Allemand, AnnuRev Mater Res , 33 (2007).[3] K. C. Neuman and A. Nagy, Nat Meth , 491 (2008).[4] J. R. Moffitt, Y. R. Chemla, S. B. Smith, and C. Busta-mante, Annu Rev Biochem , 205 (2008).[5] T. T. Perkins, Laser & Photon. Rev. , 203 (2009).[6] J. L. Hutter and J. Bechhoefer, Rev Sci Instrum , 1868 (1993).[7] D. Walters et al. , Rev Sci Instrum , 3583 (1996).[8] J. Sader, Journal of applied physics , 64 (1998).[9] K. Berg-Sørensen and H. Flyvbjerg, Review of ScientificInstruments , 594 (2004).[10] S. F. Toli´c-Nørrelykke et al. , Review of Scientific Instru-ments , 103101 (2006).[11] K. Berg-Sørensen, L. Oddershede, E.-L. Florin, and H.Flyvbjerg, Journal of Applied Physics , 3167 (2003). [12] K. Berg-Sørensen et al. , Review of Scientific Instruments , 063106 (2006).[13] E. Sch¨affer, S. F. Nørrelykke, and J. Howard, Langmuir , 3654 (2007).[14] A. C. Aitken, Proceedings of the Royal Society of Edin-burgh , 42 (1935).[15] C. R. Rao, Linear Statistical Inference and Its Applica-tions (Wiley, New York, New York, 1973).[16] N. C. Barford,
Experimental Measurements: Precision,Error and Truth , 2nd ed. (John Wiley & Sons, 1990).[17] Here and below, we used the following results that are easy to verify by direct calculation or by consulting astatistics text book: If two independent stochastic vari-ables are normally distributed
X, Y ∼ N (0 , σ / X , Y ∼ Γ( , σ ), and the sum of their squares is exponentiallydistributed Z = X + Y ∼ Γ(1 , σ ) = E ( σ ). Fi-nally, averaging over n independent, exponentially dis-tributed variables returns a gamma distributed variable n (cid:80) ni =1 Z i ∼ Γ( n, nσ2