[PDF] Power spectrum analysis with least-squares fitting: Amplitude bias and its elimination, with application to optical tweezers and atomic force microscope cantilevers

Abstract

Optical tweezers and AFM cantilevers are often calibrated by fitting their experimental powerspectra of Brownian motion. We demonstrate here that if this is done with typical weighted least-squares methods the result is a bias of relative size between -2/n and +1/n on the value of the fitted diffusion coefficient. Here n is the number of power-spectra averaged over, so typical calibrations contain 10-20% bias. Both the sign and the size of the bias depends on the weighting scheme applied. Hence, so do length-scale calibrations based on the diffusion coefficient. The fitted value for the characteristic frequency is not affected by this bias. For the AFM then, force measurements are not affected provided an independent length-scale calibration is available. For optical-tweezers there is no such luck, since the spring constant is found as the ratio of the characteristic frequency and the diffusion coefficient. We give analytical results for the weight-dependent bias for the wide class of systems whose dynamics is described by a linear (integro-)differential equation with additive noise, white or colored. Examples are optical tweezers with hydrodynamic self-interaction and aliasing, calibration of Ornstein-Uhlenbeck models in finance, models for cell-migration in biology, etc. Because the bias takes the form of a simple multiplicative factor on the fitted amplitude (e.g. the diffusion coefficient) it is straightforward to remove, and the user will need minimal modifications to his or her favorite least-square fitting programs. Results are demonstrated and illustrated using synthetic data, so we can compare fits with known true values. We also fit some commonly occurring power spectra once-and-for-all in the sense that we give their parameter values and associated error-bars as explicit functions of experimental power-spectral values.

Full PDF

PPower spectrum analysis with least-squares ﬁtting: Amplitude bias and itselimination, with application to optical tweezers and atomic force microscopecantilevers

Simon F. Nørrelykke

Max Planck Institute for the Physics of Complex Systems, 01187 Dresden , Germany.Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA. andHenrik Flyvbjerg

Department of Micro- and Nanotechnology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark.(Dated: October 22, 2018)Optical tweezers and AFM cantilevers are often calibrated by ﬁtting their experimental power-spectra of Brownian motion. We demonstrate here that if this is done with typical weighted least-squares methods the result is a bias of relative size between − /n and +1 /n on the value of the ﬁtteddiﬀusion coeﬃcient. Here n is the number of power-spectra averaged over, so typical calibrationscontain 10–20% bias. Both the sign and the size of the bias depends on the weighting schemeapplied. Hence, so do length-scale calibrations based on the diﬀusion coeﬃcient.The ﬁtted value for the characteristic frequency is not aﬀected by this bias. For the AFM then,force measurements are not aﬀected provided an independent length-scale calibration is available.For optical-tweezers there is no such luck, since the spring constant is found as the ratio of thecharacteristic frequency and the diﬀusion coeﬃcient.We give analytical results for the weight-dependent bias for the wide class of systems whosedynamics is described by a linear (integro-)diﬀerential equation with additive noise, white or col-ored. Examples are optical tweezers with hydrodynamic self-interaction and aliasing, calibration ofOrnstein-Uhlenbeck models in ﬁnance, models for cell-migration in biology, etc. Because the biastakes the form of a simple multiplicative factor on the ﬁtted amplitude (e.g. the diﬀusion coeﬃcient)it is straightforward to remove, and the user will need minimal modiﬁcations to his or her favoriteleast-square ﬁtting programs.Results are demonstrated and illustrated using synthetic data, so we can compare ﬁts with knowntrue values. We also ﬁt some commonly occurring power spectra once-and-for-all in the sensethat we give their parameter values and associated error-bars as explicit functions of experimentalpower-spectral values. PACS numbers:Keywords: least-squares ﬁtting, maximum likelihood estimation, power-spectral analysis, bias, optical tweez-ers, AFM, atomic force microscope, colored noise, gamma distribution, error-bars, goodness of ﬁt

I. INTRODUCTION

Optical tweezers (OTs) are often calibrated by ﬁttingthe power spectrum of a trapped bead’s Brownian mo-tion with a Lorentzian [1–5]. Similarly, atomic force mi-croscopes (AFMs) are sometimes calibrated by ﬁtting thepower spectrum of the cantilever’s Brownian motion withthe power spectrum of a damped harmonic oscillator [6–8]. These ﬁts are routinely done by least-squares (LSQ)minimization, the premises of which are rarely satisﬁedin practice. Here we do Maximum Likelihood Estimation(MLE) of parameters, the premises of which are satisﬁed,and give the parameter values, with error bars, as explicitfunctions of the experimental power spectrum in an excel-lent approximation. These closed-formula results shouldbe useful for on-line calibration, since that requires high-speed determination of parameters. They also demon-strate that typical calibrations using least-squares ﬁttingcontain 10–20% systematic errors, also known as bias .We ﬁnd quite generally that both the sign and the sizeof the bias depends on the details of how a least-squares ﬁt is implemented through the choice of how the data-points are weighted. Analytical expressions for the biasare given and examples of the behavior of the stochasticﬁt-errors are examined numerically.These results apply beyond the examples given here,since the same bias occurs in all systems describedby a linear (integro-)diﬀerential equation with addi-tive noise. Thus, using the correction factors given inEqs (32) and (41), it is possible to obtain correct unbi-ased estimates for the ﬁt parameters without performinga computationally expensive MLE. Instead, one simplyadjusts the results of an ordinary least-squares ﬁt.We also show that the bias found for least-squares ﬁtsappears under fairly general conditions, independent ofthe details of the function that is ﬁtted, but dependingonly on the distribution of the error on the data. Analyt-ical correction factors are given for the examples of Gaus-sian, exponential, and gamma distributed errors. Finally,a general criterion is given for when one may expect aleast-squares ﬁt to be biased: When the experimentalmean is correlated with the experimental variance. Con- a r X i v : . [ phy s i c s . i n s - d e t ] A ug versely, if they are uncorrelated the ﬁt is unbiased.The paper is organized as a main text followed by sixappendices that contain proofs and other technical de-tails. II. DYNAMICS

The equation of motion for a massive particle movingin a harmonic potential under the inﬂuence of thermalforces is m ¨ x ( t ) + γ ˙ x ( t ) + κx ( t ) = F therm ( t ) . (1)Here x ( t ) is the coordinate of the particle as function oftime t , m its inertial mass, γ its friction coeﬃcient, κ is Hooke’s constant, and F therm is the thermal force onthe particle. This force is random, and assumed to havewhite-noise statistical properties, (cid:104) F therm ( t ) (cid:105) = 0 (2) (cid:104) F therm ( t ) F therm ( t (cid:48) ) (cid:105) = 2 k B T γ δ ( t − t (cid:48) ) , for all t, t (cid:48) , where δ is Dirac’s delta function, k B T the Boltzmann en-ergy, and (cid:104)·(cid:105) is the expectation value with respect to thenoise. This theory’s power spectrum of thermal motionis derived in [9]. III. POWER SPECTRA

Optical tweezers should be calibrated using the hy-drodynamically correct power spectrum given in [9, 10],possibly taking into account a number of eﬀects listedthere and further detailed in [11–13]. These eﬀects in-clude: (i) The frequency dependence of the friction coeﬃ-cient, due to hydrodynamic self-coupling; (ii) dependenceof the friction coeﬃcient on distance to nearby surfaces,due to hydrodynamic coupling; (iii) extra 1 /f power atlow frequencies, caused by the laser pointing stability,hydrodynamic self-coupling, etc.; and (iv) optical inter-ference eﬀects, caused by a standing wave between thetrapped object and the nearby microscope cover-slip sur-face, when determining the displacement sensitivity (Voltto nano-meter conversion factor). Generally, the mostcrucial step, where the largest systematic errors are likelyto appear, is the determination of the displacement sensi-tivity [10]. However, as also described in [9], there are sit-uations in which an acceptable approximation is achievedby ﬁtting a Lorentzian, P f = D/ (2 π ) f c2 + f , (3)to an average ¯ P (ex) f of, say, n two-sided experimentalpower spectra for the Brownian motion of a trappedmicrosphere (occasionally confusion arises over extra ormissing factors of two in PSDs—this is down to the use ofone-sided, f >

0, versus two-sided, f ≷

0, PSDs; as long as the total power is contained in the chosen frequencyrange the two approaches are equally correct). The no-tation used is essentially the same as in [9], where thealiased Lorentzian is also derived, i.e., f c = κ/ (2 πγ ) and D = k B T /γ .Similarly, AFM cantilevers are sometimes calibratedby ﬁtting a frequency interval around the resonance peak[8] using P f = D/ (2 π )( πmγ ) ( f − f ) + f (4)= D/ (2 π )( Qf ) ( f − f ) + f , (5)to a similar average of experimental power spectra fora cantilever’s Brownian motion. Here, the character-istic frequency f = κ/ (4 π m ) and the quality factor Q = √ mκ/γ . Describing the AFM cantilever as a simpleharmonic oscillator is rigorously correct for high qualityfactors Q (cid:29)

1, e.g. in air where dissipative forces aresmall [8]. In water, this description also works, at leastfor short stiﬀ cantilevers, but now the drag coeﬃcient isfrequency dependent [7]. It is not our aim here to reviewthe expansive literature on AFM calibration—the onlypoint we seek to make, is that if the amplitude of thepower spectrum is involved, systematic ﬁtting errors aretypically present as shown below.Both these theoretical power spectra follow from theEinstein-Ornstein-Uhlenbeck theory for the Brownianmotion of a damped harmonic oscillator in one dimen-sion; see Eq. (1) and [9], where the parameters in Eqs. (3)and (4) are also deﬁned. In Appendix B we give the re-sults for the aliased AFM power spectral density (PSD).Least-squares ﬁts of power spectra are biased, as shownbelow in Section V, irrespective of whether one appliesthe above simpliﬁed theory or a more complete one thattakes into account aliasing, hydrodynamics, electronic ﬁl-ters etc. This bias will be on the amplitude only andwill not aﬀect shape parameters, i.e., D will be biased,whereas f c , f , and Q will not. IV. STATISTICAL PROPERTIES OFEXPERIMENTAL POWER SPECTRUM

Here, and throughout the rest of the paper, we diﬀer-entiate between theoretical expectation values, indicatedby brackets, (cid:104)·(cid:105) , and experimental averages , indicated bya bar, ¯ · . Following the notation of [9], we write the ther-mal force in Eq. (2) in terms of a normalized white-noiseprocess η ( t ) F therm ( t ) = (cid:112) k B T γ η ( t ) (6)whose Fourier transform obeys (cid:104) ˜ η k (cid:105) = 0 ; (cid:104) ˜ η ∗ k ˜ η (cid:96) (cid:105) = t msr δ k,(cid:96) (7)where k, (cid:96) = − N/ , . . . , N/ η ( t ) is an white-noise process, hence tempo-rally uncorrelated, Re ˜ η k and Im ˜ η k are two mutually in-dependent random variables, uncorrelated for diﬀerent k >

0, with Gaussian distributions by virtue of the Cen-tral Limit Theorem, or, equivalently, by virtue of η ( t ) be-ing the ﬁrst derivative of a Wiener process with respectto time. Consequently, the sum of their squares | ˜ η k | is anon-negative random variable, uncorrelated for diﬀerent k >

0, with exponential distribution [17]. Hence, so arethe experimental values P (ex) f for the power spectrum for f = k/t msr > expectation value P f for the exper-imental spectrum P (ex) f , as given in Eq. (4). It predictsalso the distribution from which the experimental powerspectrum is “drawn”: The power-spectral value P (ex) f at each frequency f is an independent random number,drawn from an exponential distribution with expectationvalue P f , p ( P (ex) f ; P f ) = 1 P f exp( − P (ex) f /P f ) . (8)Consequently, (cid:104) P (ex) f (cid:105) = P f , (9) σ ( P (ex) f ) = (cid:104) ( P (ex) f − P f ) (cid:105) / = P f , (10)and the signal-to-noise ratio (cid:104) P (ex) f (cid:105) /σ ( P (ex) f ) equals one.This is why we average over n experimental spectra,¯ P (ex) f ≡ n n (cid:88) i =1 P (ex) f,i , (11)before plotting and ﬁtting: To reduce noise.If the n spectra are statistically independent—as isthe case if they are computed from data taken in non-overlapping time intervals—then we have, unchanged,that (cid:104) ¯ P (ex) f (cid:105) = P f , (12)but σ ( ¯ P (ex) f ) = σ ( P (ex) f ) / √ n = P f / √ n , (13)and ¯ P (ex) f is distributed according to a distribution that isthe convolution of n identical exponential distributions,viz. the gamma-distribution: p n ( ¯ P (ex) f ; P f ) = ¯ P (ex) n − f ( n/P f ) n Γ( n ) exp (cid:16) − n ¯ P (ex) f /P f (cid:17) , (14)where n is the shape parameter, P f /n the scale param-eter, and Γ( n ) = ( n − P f , is the prod-uct of the shape and the scale parameters, the mode is P f ( n − /n , the variance, P f /n , is the product of themean and the scale, and the skewness is 2 / √ n . Fromthe known distribution it is easy to show that the q thmoment of ¯ P (ex) is (cid:104) ¯ P (ex) q (cid:105) = Γ( n + q ) n q Γ( n ) P qf , (15)a result that we shall need later.In the limit n → ∞ this distribution approaches aGaussian by courtesy of the Central Limit Theorem, butit is a slow approach, because the starting point, theexponential distribution, is highly skewed. For moderatevalues of n , p n ( ¯ P (ex) f ; P f ) is far from Gaussian, see Fig. 1.It is also quite skewed, but this is not the source of thebias. The bias is caused by the correlation between theexperimental mean, ¯ P (ex) f , and the experimental variance s n ( ¯ P (ex) f ) ≡ n − n (cid:88) i =1 ( P (ex) f,i − ¯ P (ex) f ) , (16)as discussed in greater detail below. The latter is easilyshown to satisfy (cid:104) s n ( ¯ P (ex) f ) (cid:105) = σ ( ¯ P (ex) f ). p ( y ) n=1n=4n=16Gauss FIG. 1: The gamma distribution, Eq. (14), as function of¯ P (ex) f /P f for various values of the shape parameter n : Thickfull line, n = 1; full line, n = 2; thin full line n = 16. Alsoshown is the Gaussian limit value, plotted here with a varianceof 1 /n = 1 /

16 (dashed line).

A. Ubiquity of the gamma-distribution

As a matter of fact, all systems described by a linear(integro-)diﬀerential equation with additive noise, be itwhite or colored, have power spectral values that are de-scribed by the statistics in Eq. (14). To see that whiteand colored noise lead to the same statistics, all we needto realize is that the power spectral density of a colorednoise is the product of the power spectral density of awhite noise and some function F ( f ) that describes thecolor of the noise (a ﬁlter function): Quite generally, ifthe dynamics of x is given by a polynomial Q in d / d t with constant coeﬃcients Q (cid:18) dd t (cid:19) x ( t ) = ψ ( t ) , (17)where ψ is a colored noise, then the power spectrum P (ex) f ( x ) = | ˜ ψ f | | Q ( − i πf ) | = | ˜ η f | F ( f ) | Q ( − i πf ) | (18)is seen to be the product of the power spectrum of awhite noise η and some function of f . Thus, for a givenfrequency f , the statistics of the power spectral value isdetermined by the statistics of white noise. V. LEAST-SQUARE FITTING

As alluded to in the previous section, one eﬀect of theexponential distribution of P (ex) f,i is that weighted least-square ﬁtting to ¯ P (ex) f does not follow the text-book be-havior and returns a result that is systematically wrong,or, biased. In the following we remind the reader of therather narrow set of circumstances under which least-square ﬁtting works. We then relax the requirementsand see what eﬀect it has on the ﬁt results.Weighted least-squares (WLS) ﬁtting minimizes χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] w i (19)with respect to ﬁt-parameters θ . Here, y i = f i + e i arethe experimental data (dependent variable), e i errors, f i the ﬁt-function evaluated at i (the independent variable),and w i are weights. It is assumed that the experimen-tal data are uncorrelated and that the weights are un-correlated with the experimental data. If, in addition,the weights are also independent of i , and thus constant,weighted least squares reduce to ordinary least squares(OLS). When the theoretical model, f i , is a linear func-tion of the ﬁt parameters, θ , both OLS and WLS mini-mization are known to return parameter estimates thatare BLUE (best linear unbiased estimate). The formerwas ﬁrst shown by Gauss in 1829 whereas the latter wasshown by Aitken in 1935 [14]. Speciﬁcally, in WLS theerrors do not have to be independent, just uncorrelated,and in OLS they do not have to come from the samedistribution, they are only assumed to have the samevariance (homoscedasticity).Things turn particularly simple when the experimen-tal data are Gaussian distributed. We can then choose y i , e.g., as the average of several measurements, and theweights, w i , as the reciprocal of the experimental stan-dard deviation. With this choice, y i and w i are inde-pendent, because the sample mean and sample variance calculated from independent, identically Gaussian dis-tributed random variables are statistically independent.In this Gaussian scenario it is possible to assign mean-ingful conﬁdence intervals to the ﬁt-parameters and cal-culate the goodness of the ﬁt, also known as its support or p -value, and this is why one is sometimes tempted toassume that data are Gaussian even when it is not quitethe case. Finally, with Gaussian data MLE and WLSare mathematically identical as it easily seen by insert-ing a Gaussian in p n ’s place in Eq. (50) and deriving thecorresponding cost function, which is simply χ . We now ask what the eﬀect is on the estimate of theﬁt-parameters if the data are not Gaussian, the weightsare not independent of the experimental value, and themodel is a non-linear function of the ﬁt parameters.

Allof these circumstances are realized when ﬁtting powerspectra with the statistic given in the previous section.

A. Some analytical results for bias in least squaresﬁtting

An estimator, ˆ θ , is biased if its expectation value diﬀersfrom the true value, (cid:104) ˆ θ (cid:105) (cid:54) = θ ∗ , of the quantity it is anestimator for. Below, we show that some common least-square estimators for power spectra are biased.At the minimum of the χ function given in Eq. (19),the ﬁrst derivative with respect to the ﬁt parameters iszero, and the stationarity conditions thus reads N (cid:88) i =1 [ y i − f i ] w i ∂ θ f i = N (cid:88) i =1 [ y i − f i ] w i ∂ θ w i . (20)Notice, that in order to be completely general we haveallowed the weights to depend on the ﬁt parameters. Wenow treat a few speciﬁc, but universal, scenarios one ata time.

1. Experimental standard deviations as weights

Proceeding as usual with classic WLS, we pick theweights to be inversely proportional to the experimen-tal estimate for the standard deviation. This estimatecould be the known uncertainty from the experimentalapparatus, but often it is estimated simply as s n,i = (cid:118)(cid:117)(cid:117)(cid:116) n − n (cid:88) j =1 ( y i,j − ¯ y i ) , (21)where n is the number of times the experiment is repeatedwith the independent variable set to its i th value. Theexperimental average (sample mean)¯ y i = 1 n n (cid:88) j =1 y i,j (22)is our estimate for the expectation value of y i . Doing thisfor each i , χ ( θ ) = N (cid:88) i =1 [¯ y i − f i ( θ )] s − n,i . (23)The stationarity conditions then reduce to N (cid:88) i =1 ¯ y i s n,i ∂ θ f i = N (cid:88) i =1 s n,i f i ( θ ) ∂ θ f i , (24)the solution of which gives us our estimate, ˆ θ [ y ], for θ for a given data-set y = ( y i,j ) i =1 ,...,N ; j =1 ,...,n . Note, be-cause N and n are ﬁnite the estimate for θ resulting fromEq. (24) is a stochastic quantity and will vary from oneexperimental realization to another.To calculate the bias of the estimator for given n , weneed to ﬁnd its expectation value for that n . We do this intwo steps: First, we let the number of data-points N →∞ , while keeping n and the number of ﬁt-parametersﬁxed, as in an inﬁnite experiment. In this limit the ﬁtreturns an estimate θ ∗ n of θ that is no longer a ﬂuctuatingquantity, but may still depend on n . So, for N = ∞ Eq. (24) reads ∞ (cid:88) i =1 ¯ y i s n,i ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 s n,i f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) . (25)Note, that the experimental values, ¯ y i and s n,i , are stillﬂuctuating quantities described by the same statistic asbefore, whereas f i ( θ ∗ n ) and ∂ θ f i ( θ ∗ n ) are not ﬂuctuatingbecause they are functions of non-ﬂuctuating variables.So, when in the next step we take the expectation valueof Eq. (25), only ¯ y i and s n,i are aﬀected ∞ (cid:88) i =1 (cid:42) ¯ y i s n,i (cid:43) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 (cid:42) s n,i (cid:43) f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) , (26)which is solved by f i ( θ ∗ n ) = (cid:10) ¯ y i /s n,i (cid:11) (cid:104) /s n,i (cid:105) , for all i = 1 , . . . , ∞ , (27)if a parameter set θ ∗ n exists, which solves all these manyequations simultaneously. Provided that we know thedistribution of y i , we can calculate the expectation val-ues at the right-hand side of this equation and thus de-termine whether the ﬁt is biased. For a start, note thatif the sample mean and the reciprocal sample varianceare uncorrelated, then the numerator on the right-hand-side factorized to (cid:104) ¯ y i (cid:105)(cid:104) /s n,i (cid:105) . Thus the right-hand-sideequals (cid:104) ¯ y i (cid:105) , i.e. the ﬁt is unbiased. So it is suﬃcient thatmean and reciprocal variance are uncorrelated to ensurean unbiased ﬁt.It turns out that it is also a necessary condition, if thereis no redundancy in the parameterization of f i by θ , i.e.,if their relationship is one-to-one. This is the only sensi-ble way to parameterize a function, and easily achieved by elimination a possible redundancy through reparam-eterization. Assuming this one-on-one relationship, weuse that f i ( θ ∗ ) = (cid:104) ¯ y i (cid:105) and Eq. (27) to write f i ( θ ∗ ) = (cid:34) (cid:104) ¯ y i (cid:105) (cid:10) /s n,i (cid:11) (cid:104) ¯ y i /s n,i (cid:105) (cid:35) f i ( θ ∗ n ) , for all i = 1 , . . . , ∞ . (28)We now see that the ﬁt is unbiased if and only if the termin square brackets in Eq. (28) equals unity.In summary, a necessary and suﬃcient criterion forleast-squares ﬁtting to be unbiased is that the samplemean and reciprocal sample variance are uncorrelated. Note that ‘uncorrelated’ is a weaker requirement than‘independent’ and although the latter implies the for-mer, the reverse is not generally true. Naturally, whenthe sample mean and sample variance are independent,the sample mean and reciprocal sample variance are alsouncorrelated.Notice that we nowhere used what s n,i is, and in facta more general version of Eq. (28) is f i ( θ ∗ ) = (cid:34) (cid:104) y i (cid:105) (cid:10) w i (cid:11) (cid:104) y i w i (cid:105) (cid:35) f i ( θ ∗ n ) (29)where w i is assumed independent of θ but is otherwiseunconstrained. The criterion for an unbiased estimatoris now simply that the data, y i , and the squared weights, w i , are uncorrelated.Under some circumstances it is possible to determinethe bias of the ﬁt parameters from the bias of the ﬁt-function given in Eq. (29). If the function is invertiblethis is trivially the case. But, it is also possible if theamplitude of the ﬁt-function is determined by just oneof the ﬁt-parameters and the term in the square bracketsis independent of i . In this case, bias can be removedfrom the parameter estimate, ˆ θ , by simply multiplyingthe amplitude-parameter by the square bracket and leav-ing the other parameters unchanged. In the next sectionwe give an example of this.

2. Experimental averages as weights

If the sample standard deviation is proportional to thesample mean, s n,i ∝ ¯ y i for each i , the mean and varianceare clearly not uncorrelated and we expect the parameterestimate to be biased. From Eq. (28) we ﬁnd f i ( θ ∗ ) = (cid:20) (cid:104) ¯ y i (cid:105) (cid:104) ¯ y − i (cid:105)(cid:104) ¯ y − i (cid:105) (cid:21) f i ( θ ∗ n ) . (30)As a real-world example, consider gamma-distributedexperimental data. We already mentioned that thegamma-distribution describes all power-spectra resultingfrom linear dynamical equations driven by an additivenoise, white or colored. Carrying on as before, we use thestandard deviation to weight the data, w f = 1 /s n ( ¯ P (ex) f ),where f is the frequency. Since the standard deviationand the mean are proportional, see Eq. (13), we have w f ∝ / ¯ P (ex) f . Using ¯ P (ex) f ’s known gamma-distribution,see Eq. (14), we can compute (cid:104) ( ¯ P (ex) f ) − q (cid:105) for q = 1 , , . . . and ﬁnd the value of the bracketed bias-term in Eq. (30) (cid:34) (cid:104) ¯ P (ex) f (cid:105) (cid:104) ( ¯ P (ex) f ) − (cid:105)(cid:104) ( ¯ P (ex) f ) − (cid:105) (cid:35) = n/ ( n − , (31)That is, weighted least squares ﬁtting of this wide classof power-spectra has a built-in, frequency independent,multiplicative bias of n − n for n >

2. The true powerspectral form can then be obtained from the least-squareﬁt as P f ( θ ∗ ) = nn − P f ( θ ∗ n ) (32)In this scenario, the theoretical N → ∞ limit that wetook in going from Eq. (24) to Eq. (25) corresponds toletting the measurement time (or sampling frequency)become inﬁnite while keeping the sampling frequency (ormeasurement time) and all other experimental factorsﬁxed.In the simple case of a Lorentzian, Eq. (3), we have θ ∗ = ( D, f c ) and P f ( θ ∗ ) ≡ D/ (2 π ) f c2 + f = nn − D ∗ n / (2 π )( f ∗ c ,n ) + f , (33)from which we see that f ∗ c ,n is unbiased. In other words,the true value of the ﬁt parameters can be obtained fromthe WLS estimates as D = nn − D ∗ n ≈ nn − D (lsq) (34) f c = f ∗ c ,n ≈ f c(lsq) , (35)where D (lsq) and f c(lsq) are the stochastically ﬂuctuatingvalues returned by a least-square ﬁt to a ﬁnite N data-set. This is a general feature of the power-spectra: Theycan be written as the product of an overall multiplicativescale-factor and a shape-function depending on f , where only the scale-factor is inﬂuenced by the bias .

3. Theoretical values as weights

If the sample standard deviation is known to be pro-portional to f i ( θ ∗ ), it seems reasonable to weight the databy the theoretical value 1 /f i . After inserting w i ∝ /f i in Eq. (19), the quantity to minimize is χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] f − i ( θ ) , (36) and the stationarity equations Eq. (20) become N (cid:88) i =1 y i f − i ∂ θ f i = N (cid:88) i =1 y i f − i ∂ θ f i . (37)Repeating the arguments from the previous section, theexpectation value of the stationarity conditions in thelimit N → ∞ is ∞ (cid:88) i =1 (cid:104) y i (cid:105) f − i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 (cid:104) y i (cid:105) f − i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) , (38)which is solved by f i ( θ ∗ n ) = (cid:104) y i (cid:105)(cid:104) y i (cid:105) , for all i = 1 , . . . , ∞ , (39)if a parameter set θ ∗ n exists, which solves all these manyequations simultaneously. The true value, f i ( θ ∗ ), can inthat case be written as f i ( θ ∗ ) = (cid:2) (cid:104) y i (cid:105) / (cid:104) y i (cid:105) (cid:3) f i ( θ ∗ n ) (40)and θ ∗ can be determined from θ ∗ n if (cid:104) y i (cid:105) / (cid:104) y i (cid:105) is inde-pendent of i , constant, and can be absorbed in θ ∗ n in asimple manner.If we again use the gamma-distributed power spectraldata as example, we see that P f ( θ ∗ ) = nn + 1 P f ( θ ∗ n ) (41)i.e, we once more have a bias that scales with the number n of spectra averaged over, although this time the bias ishalf the size and positive.Closed-form expressions for f ∗ c ,n and D ∗ n are straight-forward to obtain, when P f is a Lorentzian [9] (reference[9] contains a typo: DT msr should simply read D ). Theexpression for f ∗ c ,n given in [9] is un-biased, whereas theexpression for D ∗ n has its bias removed by multiplying by n/ ( n + 1). Note, that apart from the n/ ( n + 1) factor on D , the results derived in [9] are mathematically identi-cal to those derived using an MLE approach in the nextsection.

4. Constant values as weights (“un-weighted”)

When all weights are assumed to be equal, the functionto minimize, χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] , (42)has stationarity equations which, for N → ∞ gives ∞ (cid:88) i =1 f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 y i ∂ θ f i ( θ ∗ n ) . (43)Taking the expectation value on both sides, we have ∞ (cid:88) i =1 f i ( θ ∗ n ) ∂ θ f i ( θ ∗ n ) = ∞ (cid:88) i =1 (cid:104) y i (cid:105) ∂ θ f i ( θ ∗ n ) . (44)This equation may have the solution f i ( θ ∗ n ) = (cid:104) y i (cid:105) = f i ( θ ∗ ) . (45)In other words, this estimator is unbiased. However, it isnot a precise estimator: The stochastic errors on valuesit returns tend to be larger than those obtained with aweighted ﬁt. Figures 2 and 3 illustrate this for the caseof ﬁtting an aliased Lorentzian to power spectra fromoptical tweezers.

500 1000 2000 4000 8000012345678 f max (Hz) C oe ff i c i en t o f v a r i a t i on i n % D, LSQ, theo. stdD, LSQ, no weightsD, LSQ, exp.stdD, LSQ, exp. mean

FIG. 2: Stochastic error on ﬁt parameters as a function ofcut-oﬀ frequency, f max , for least squares ﬁtting of aliasedLorentzians using various weights, Eq. (19). For all ﬁts theerrors are substantial for low values of f max . Notice the loga-rithmic frequency scale: For large f max , WLS returns a vari-ance nearly an order of magnitude smaller than for OLS (ﬁt-ting with constant weights). The total number of acquireddata points was held ﬁxed at N = 262 , n = 16 power spectra generated with f c = 500 Hz, D = 0 . µ m/s , and f sample = 16 ,

384 Hz. Sym-bols show the coeﬃcient of variation, s n ( D ) / ¯ D , with D mea-sured over 100 independent stochastic simulations. The coef-ﬁcient of variation for f c show similar trends and are roughlytwo times larger at these settings (data not shown). For OLS(triangles, Eq. (42)) the error is independent of f max for largefrequencies because the information there is de-emphasizedby the ﬁtting algorithm. Using the experimental standarddeviation or mean as weights (ﬁlled circles, Eq. (23)) leadsto stochastic errors nearly as small as when using theoreticalweights (empty circles, Eq. (36))—the systematic errors (bi-ases) are twice as big and of the opposite sign however, seeFig. 4.

5. Iterated values as weights

When weights are determined iteratively, a ﬁt is per-formed and the value returned for θ is used in the weight − n C oe ff i c i en t o f v a r i a t i on i n % D, LSQ, theo. stdD, LSQ, no weightsD, LSQ, exp.stdD, LSQ, exp. mean

FIG. 3: Stochastic error on ﬁt parameters as a function ofnumber of power spectra averaged over, n , for least squaresﬁtting of aliased Lorentzians using various weights, Eq. (19):Constant weights (triangles); experimental standard devia-tion (black circles); experimental average (grey circles); theo-retical weights (white circles). Thick black line through whitecircles show the theoretical prediction Eq. (E2). The numberof acquired data points was held ﬁxed at N = 262 , f max = f Nyq , for power spec-tra generated with f c = 500 Hz and D = 0 . µ m/s . Sym-bols show the coeﬃcient of variation, s n ( D ) / ¯ D , with D mea-sured over 1,000 independent stochastic simulations. Usingtheoretical weights clearly outperforms all the other weighingschemes for n <

32; for n >

32, the experimental weighingscheme performs comparably well—use of the experimentalstandard deviation as weight leads to slightly larger variationthan use of the experimental average, as expected, and bothshow more variation than results obtained with theoreticalweights. Results obtained with constant weights—sometimesreferred to as “no weights” or “un-weighted”—show roughlyten times larger variation than does results obtained with the-oretical weights, for all n . of a new ﬁt and so on, until some convergence criterionhas been met. The function to minimize with respect to θ for given θ (iter) reads χ ( θ ) = N (cid:88) i =1 [ y i − f i ( θ )] w i ( θ (iter) ) . (46)It has the stationarity equations N (cid:88) i =1 f i w i ( θ (iter) ) ∂ θ f i = N (cid:88) i =1 y i w i ( θ (iter) ) ∂ θ f i . (47)We now use the same arguments as above together withthe observation that at the ﬁxed-point solution to thisiterative scheme, θ (iter) = θ n , and both equal θ ∗ n in thelimit N → ∞ . Taking the expectation value of the equa-tion after taking the limit N → ∞ , we see that θ ∗ n maysatisfy f i ( θ ∗ n ) = (cid:104) y i (cid:105) = f i ( θ ∗ ) , (48)in which case this estimator is unbiased. The estima-tion scheme just described is also known as “iteratively − − B i a s i n % D, LSQ, theo. stdD, LSQ, no weightsD, LSQ, exp. stdD, LSQ, exp. mean

FIG. 4: Systematic error (bias) on ﬁt parameters as a functionof number of power spectra averaged over, n , for least squaresﬁtting of aliased Lorentzians using various weights, Eq. (19).The corner frequency is unbiased (data not shown), indepen-dent of n , whereas the diﬀusion coeﬃcient shows a strongweighing-scheme dependent bias. The sampling frequencywas held ﬁxed at f sample = 16 ,

384 Hz, and the ﬁt was doneto the averaged power spectra generated with N = 262 , t msr = 16 s), f c = 500 Hz, and D = 0 . µ m/s . Symbolsshow the signed error on the mean of D measured over 1,000independent stochastic simulations. For least squares ﬁttingwith constant weights (triangles) the bias on D is zero. Leastsquares ﬁtting with experimental weights (black circles: stan-dard deviation; grey circles: average value) has a systematicerror on D with − /n dependence (thin black line) for n > /n -dependence(thick black line) in the systematic error on D . reweighted least squares” and convergence is not guaran-teed.

6. Error-bars

Error-bars on the parameter estimates ˆ θ for θ ∗ , can becalculated from the error-bars on the estimator ˆ θ n for θ ∗ n and the simple relationship in terms of n between θ ∗ and θ ∗ n , by propagation of errors (see Eq. (73)), provided thelatter exists. The error bars on ˆ θ n can be those returnedby a least-squares ﬁtting routine used to determine ˆ θ n ,or they may be know theoretically. In that connection,note that the error-bars given in [9] for WLS with theo-retical weights are correct only in the limit n = ∞ , butby replacing D with n/ ( n + 1) D they are correct for all n . When ˆ θ is found from a ˆ θ n which itself is found byminimizing χ , the precision with which χ is known de-termines the precision of the estimate ˆ θ . Because thevarious weights in χ have diﬀerent stochastic ﬂuctua-tions, we expect that theoretical weights, Eq. (36), willgive smaller variation in ˆ θ than experimental weights,Eq. (23). A small calculation shows that the variance ofthe stationarity equations, Eq. (24), around zero includesa term (cid:104) ¯ P (ex) − f (cid:105) ∝ { ( n − n − n − n − } − , i.e., the error-bars are expected to be large for n ∼

4, if ex-perimental weights are used. When theoretical weightsare used, the error-bars are small and well-deﬁned for all n . These results are veriﬁed by example, see Fig. 3. B. Results for power spectra

We veriﬁed the above theoretical results by Monte-Carlo simulations of the OT and AFM power-spectra andfound perfect agreement. Below, we list the speciﬁc re-sults.For the diﬀusion coeﬃcient D , we found that the out-come of least-squares ﬁtting, D (lsq) , depends on the num-ber, n , of power spectra averaged over in the follow-ing manner for the aliased (see Fig. 4) and non-aliasedLorentzians, as well as the aliased and non-aliased AFMexpressions (data not shown):1. D (lsq) = D ( n − /n , n >

2; if data-pointsare weighted with their experimental standard-deviation or mean, w f = 1 /s n ( ¯ P (ex) f ) or w f = √ n/ ¯ P (ex) f . This is the weighting choice with thelargest bias and with large stochastic errors for n ≤

4. Here, D is underestimated.2. D (lsq) = D ( n + 1) /n ; if data-points are weightedusing the theoretical expectation value for thestandard-deviation, w f = √ n/P f . This is the leastsquares ﬁtting of [9]; a hybrid of MLE and LSQ,analytically tractable and thus very robust. It alsohas the smallest stochastic errors. Here, D (lsq) overestimates D .3. D (lsq) = D ; if the weights are updated iterativelyto the ﬁtted value, w f = 1 /P iter f . This is a prac-tically correct but computationally intensive andsomewhat unstable approach; the initial guess mustbe close to the true value, or the algorithm will notconverge.4. D (lsq) = D ; if all data-points are given thesame weight, w f = constant. This method de-emphasizes information available at higher frequen-cies because power spectral values have constant relative error, hence rapidly decreasing absolute er-ror beyond f c or f , yet are treated as having con-stant absolute error. Consequently, this estimatorhas low precision, typically with two to ten timeslarger stochastic errors than the Cases 1–3 above.This is OLS ﬁtting applied far outside its range ofvalidity.Figures 2, 3, and 4 show examples of the size of thesestochastic errors and biases, respectively, for ﬁts of thealiased Lorentzian to data.We found that the expectation values of the ﬁttedvalues for f c , f , and Q were independent of weighingscheme and un-biased, whereas the standard deviationsvaried by up to an order of magnitude depending onweighing scheme applied. Furthermore, ﬁtting of non-aliased PSDs to aliased data introduces large systematicerrors for the trivial reason that they fail to capture theshape of the PSD near the Nyquist frequency (data notshown). VI. MAXIMUM LIKELIHOOD ESTIMATIONOF FIT PARAMETERS

We now proceed to determine the ﬁt-parameters bythe method of maximum likelihood estimation. We alsoderive closed-form expressions for the ﬁt-parameters interms of the experimental data values. They should beuseful for speedy online calibration.Both power spectra, given in Eqs. (3) and (4), are con-sequences of a linear, time-invariant dynamics driven bya white noise, Eq. (1). They are consequently of the form P f = 1 a + bf + cf , (49)with c = 0 in the case of the Lorentzian, and ( a, b ) or( a, b, c ) parameters to be ﬁtted. This simple form, com-bined with the simple statistical properties of the exper-imental power spectral values, makes rigorous MLE ofthe parameters a straightforward numerical optimizationproblem as shown below.When ﬁtting, we ﬁt only to the positive-frequency partof the power spectrum, or to a subset of it. So here weconsidered only that part of the spectrum. Since ¯ P (ex) f and ¯ P (ex) f (cid:48) are uncorrelated for f (cid:54) = f (cid:48) , the probabilitydensity for the experimental spectrum ¯ P (ex) f , given itsexpectation value P f , is p ( ¯ P (ex) | P ) = (cid:89) f p n ( ¯ P (ex) f ; P f ) , (50)where p n is given in Eq. (14). Thus Maximum Likelihoodestimation of the theory’s parameter values consists inchoosing these parameters so they maximize p ( ¯ P (ex) | P )for given ¯ P (ex) , or, equivalently, minimize the negativelogarithm F (cid:48) ≡ n (cid:88) f (cid:16) ¯ P (ex) f /P f + ln P f (cid:17) + constant . (51)whereconstant = ln Γ( n ) − n ln n − ( n −

1) ln ¯ P (ex) f (52)is a constant with respect to ( a, b, c ) so it can be ignoredwhen minimizing. We are then left with the task of ﬁnd-ing the values of ( a, b, c ) that minimize the cost function F ( a, b, c ) ≡ (cid:88) f (cid:16) ¯ P (ex) f /P f + ln P f (cid:17) , (53)which is an uncomplicated optimization problem that canbe solved numerically with standard programs. Goodstarting values are given in Eqs. (63) and (64). A. From non-linear to linear stationarity equationsvia a simple trick

Although we gave the solution to the full MLE prob-lem implicitly above, as the minimum of F Eq. (53), wecan speed up the ﬁtting process substantially (numericaloptimization can be slow when the data-sets are large)and gain more insight into the problem by taking a fewmore analytical steps before turning to numerics.

1. Results for Optical Tweezers

For P f given in Eq. (3), written as in Eq. (49) with c = 0, F in Eq. (53) reads F ( a, b ) = (cid:88) f (cid:16) ( a + bf ) ¯ P (ex) f − ln( a + bf ) (cid:17) . (54)It is minimized with respect to a and b when these pa-rameters satisfy the stationarity condition (cid:88) f ¯ P (ex) f = (cid:88) f P f = (cid:88) f a + bf (cid:88) f f ¯ P (ex) f = (cid:88) f f P f = (cid:88) f f a + bf . (55)These are non-linear equations for a and b . However,from Eq. (15) we know that we can write P f = n (cid:104) ¯ P (ex)2 f (cid:105) ( n + 1) P f = n (cid:104) ¯ P (ex)2 f (cid:105) ( n + 1) ( a + bf ) . (56)Substituting for P f in Eq. (55) we ﬁnd (cid:88) f ¯ P (ex) f = nn + 1 (cid:88) f (cid:104) ¯ P (ex)2 (cid:105) ( a + bf ) (cid:88) f f ¯ P (ex) f = nn + 1 (cid:88) f f (cid:104) ¯ P (ex)2 (cid:105) ( a + bf ) , (57)which is linear in a and b .Before solving for a and b we introduce the statistic S p,q ≡ K (cid:88) f f p ¯ P (ex) qf , (58)with K the number of terms in the sum, and the statisticobeys lim K →∞ S p,q = (cid:104) S p,q (cid:105) .The sums should only include those frequencies theuser deems relevant, i.e., frequencies corresponding tomechanical or electronic resonances can be excluded,and high/low frequency cut-oﬀs can be applied. Thatis, the statistics can be trimmed iteratively if required:Power spectral values too far from a ﬁt can be iden-tiﬁed and excluded from the sums, after which a newﬁt is found, et cetera until a steady state is reached0and all power spectral values satisfy the user-deﬁned ac-ceptance criterion. When no frequencies are excluded, nK = N = t msr f sample , the total number of data ac-quired. In this latter case, one ﬁts the power spectrumall the way out to the Nyquist frequency, and aliasingshould be taken into account (see appendices); unlessaliasing is eliminated by over-sampling data acquisitionelectronics.With this notation, Eq. (57) can be written in matrixform (cid:18) (cid:104) S , (cid:105) (cid:104) S , (cid:105)(cid:104) S , (cid:105) (cid:104) S , (cid:105) (cid:19) (cid:18) ab (cid:19) = n + 1 n (cid:18) S , S , (cid:19) , (59)with solution (cid:18) ab (cid:19) = 1 + 1 /n (cid:104) S , (cid:105)(cid:104) S , (cid:105) − (cid:104) S , (cid:105) (cid:18) S , (cid:104) S , (cid:105) − S , (cid:104) S , (cid:105) S , (cid:104) S , (cid:105) − S , (cid:104) S , (cid:105) (cid:19) , (60)In this expression, we know S p,q from the experiment,but not (cid:104) S p,q (cid:105) . However, we can always write S p,q = (cid:104) S p,q (cid:105) + δS p,q and substitute for (cid:104) S p,q (cid:105) in Eq. (60). About δS p,q it is easy to show that (cid:104) δS p,q (cid:105) = 0 and (cid:104) ( δS p,q ) (cid:105) = 1 K (cid:34) Γ( n + 2 q ) n q Γ( n ) − (cid:18) Γ( n + q ) n q Γ( n ) (cid:19) (cid:35) (cid:88) f f p P qf = 1 K [weak n -dep.] ˜ S p, q , (61)where ˜ S is deﬁned as in Eq. (58) except for P f = (cid:104) ¯ P (ex) f (cid:105) replacing ¯ P (ex) f . Above, q = 2 in the relevant expres-sions and therefore ˜ S p,q scales as 1 /K for p = 0 , K for p = 2. Thus, apart from some n -dependence, the variance of S p,q scales as 1 /K or 1 /K both of which are very small numbers in a typical experi-ment, where K = N/n is of order 10 –10 . It is thereforean excellent approximation to replace (cid:104) S p,q (cid:105) with S p,q inEq. (60) (cid:18) ab (cid:19) ≈ (cid:18) ˆ a ˆ b (cid:19) ≡ /nS , S , − S , (cid:18) S , S , − S , S , S , S , − S , S , (cid:19) , (62)where now (ˆ a, ˆ b ) are our estimates of ( a, b ). By comparingEqs. (3) and (49) we then have f c = (cid:16) ab (cid:17) / ≈ (cid:18) ˆ a ˆ b (cid:19) / = (cid:18) S , S , − S , S , S , S , − S , S , (cid:19) / (63) D = 2 π b ≈ π ˆ b = 2 π nn + 1 S , S , − S , S , S , − S , S , . (64)

2. Results for AFM cantilevers

Identical reasoning leads, in the case of c (cid:54) = 0, to thethree coupled linear equations for a , b , and c  S , S , S , S , S , S , S , S , S , (cid:124) (cid:123)(cid:122) (cid:125) S  abc (cid:124) (cid:123)(cid:122) (cid:125) (cid:126)v ≈ (1+1 /n )  S , S , S , (cid:124) (cid:123)(cid:122) (cid:125) (cid:126)s , (65)which are inverted to give  abc  ≈  ˆ a ˆ b ˆ c  ≡ n + 1 n S −  S , S , S ,  . (66)Comparing Eqs. (4) and (49) then gives f = (cid:16) ac (cid:17) / ≈ (cid:18) ˆ a ˆ c (cid:19) / (67) D = 2 π b + 2( ac ) / ≈ π ˆ b + 2(ˆ a ˆ c ) / (68) (cid:18) πmγ (cid:19) = cb + 2( ac ) / ≈ ˆ c ˆ b + 2(ˆ a ˆ c ) / . (69)For completeness and ease of implementation we givethe inversion formulas: S − = C / D , D = S , S , S , − S , S , (70) − S , S , + 2 S , S , S , − S , , C =  C C C C C C C C C  , (71)and C = S , S , − S , C = S , S , − S , S , C = S , S , − S , (72) C = S , S , − S , S , C = S , S , − S , C = S , S , − S , . Of course, it is also possible to numerically invertEq. (65) and insert the resulting values for ˆ a, ˆ b, ˆ c inEqs. (63,64,67,68,69). The above results however, shouldresolve potential issues with numerical stability arisingfrom the inversion of near-singular matrices. B. Examples of Fitting the power spectra

Figure 5A shows the average of n = 16 power spectrafor the Brownian motion of a micro-sphere in an optical1trap. The data are synthetic, computer generated; see[9]. Thus, the values for f c and D are known exactly,and can be used as benchmarks for results obtained byﬁtting, with no worry about any of the complicating cir-cumstances that can aﬀect data in the real world. Thered line is the expectation value of the aliased Lorentzian,taking as input the exactly known values of f c and D .The yellow line is the aliased Lorentzian correspondingto the stochastically realized parameter values for f c and D , determined by rigorous MLE of ( a, b ) with no use ofour simplifying trick. The dashed blue line is the resultof MLE, with the use of our simplifying trick. All of thesethree lines plot virtually on top of each other. Finally, theblack lines shows the result of least squares ﬁtting of analiased Lorentzian to the data, using as weights: (i) Thestandard deviations on the n = 16 spectra with a result-ing − /n = − .

5% systematic error on D , see Eq. (32);and (ii) Theoretical weights, resulting in a 1 /n = 6 . D , see Eq. (41).Note, that for clarity of presentation we discuss thenon-aliased cases in the main-text and give closed-formaliased results in Appendices B and C. Numerical testsof the theory are done using the aliased theory in order toutilize all the available data: When ﬁtting a non-aliasedexpression to aliased data it is necessary to introduce acut-oﬀ frequency f max (cid:28) f Nyq and discard all data aboveit.Figure 5B shows the same data and ﬁts in a normal-ized residual plot , i.e., data or ﬁts minus the true ex-pected value (the residue) divided by the true expectedvalue. Thus, deviation from zero in this plot shows byhow much data and ﬁts diﬀer from the true expectedvalue, measured in units of the true expected value.The data should scatter about 0 with standard deviation n − / = 16 − / = 25%, whereas the ﬁts should simplytrace zero for all frequencies. Here, the true expectedvalue is known because we use synthetic data; we nor-malize by it to show the results of all ﬁts in a singleﬁgure. In an experiment, the true expected value is notknown and the ﬁtted value is used instead—when a biasis present the residues will consequently diﬀer from zeroin a systematic manner.This bias always hides in the scatter of the data in aﬁgure like Fig. 5A, because n − ≤ n − / for n = 1 , , . . . .But, the bias is substantial for small to intermediate val-ues of n and will reveal itself in a plot like Fig. 5B.Figure 6 is similar to Fig. 5, except its power spectrumdescribes the conﬁned Brownian motion of a massive par-ticle, e.g. an AFM cantilever in air, and the ﬁts do nottake aliasing into account. The data are synthetic, com-puter generated; see Appendix A. C. Error-bars on ﬁt parameters

Because we have derived closed-form expressions forthe ﬁt parameters we can also calculate expressions forthe expected error-bars on the ﬁt parameters. We do ! ! ! N o r m a li z ed r e s i dua l s i n % Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights ! ! ! Frequency (Hz) P o w e r ( m s ) Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights AB FIG. 5: Various ﬁts of aliased Lorentzian to synthetic powerspectrum for microsphere in optical trap. Data are artiﬁ-cial, with known parameter values f c = 500 Hz and D =0 . µ m /s, f sample = 16 ,

384 Hz, and t msr = 16 s. A: Dataand aliased Lorentzians. PSD values (grey line) are averagesover 16 spectra. Note, how the three Lorentzians correspond-ing to, respectively, the exactly known values of f c and D (red line), rigorous numerical ML-ﬁt (yellow line), and an-alytical ML-ﬁt using simplifying trick (dashed blue line) allplot on top of each other. In contrast, the Lorentzian fromnumerical least-squares ﬁts with experimental or theoreticalweights (thick and thin black lines, respectively) are oﬀsetby -12.5% and +6.25% due to their systematic errors on D . B: Residual plots, i.e., same data and ﬁts as shown in PanelA, but divided by the known true expectation value, thenunity subtracted. The agreement between data and the ML-ﬁts is practically perfect whereas the least-squares ﬁts clearlyunder/over-estimate the PSD. that and compare to the theoretical limit for how smallthe error-bars can be.Quite generally, irrespective of whether we study thealiased or non-aliased Lorentzian or the AFM PSD, theerror-bars on the ﬁt-parameters are found by propagatingthe errors on a , b , and c using the generic formula for afunction z (calculating the diﬀerential ∆ z and squaring2 ! ! ! N o r m a li z ed r e s i dua l s i n % Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights ! ! Frequency (kHz) P o w e r ( m s ) Synthetic dataInputML ! fit, rigorousML ! fit, trickLSQ ! fit, exp. weightsLSQ ! fit, theo. weights AB FIG. 6: Various ﬁts of Eq. (4) to the average of 16 powerspectra for an AFM cantilever. Data are synthetic withknown parameter values f = 15 kHz, D = 0 . µ m /s, and( πmγ ) = 6169 µ s , f sample = 65 ,

536 Hz, and t msr = 8 s. A :Data and ﬁts. Note, how the three curves corresponding to,respectively, the exactly known parameters (red line), rigor-ous numerical ML-ﬁt (yellow line), and analytical ML-ﬁt us-ing simplifying trick (dashed blue line) all plot on top of eachother. In contrast, the result of a numerical least-squares ﬁtswith experimental or theoretical weights (thick and thin blacklines, respectively) are oﬀset by -12.5% and +6.25% due totheir systematic errors on D . In all the ﬁts, only frequenciesbelow 20 kHz were included. B : Residual plots, i.e., same asshown in Panel A, but divided by the exactly known expecta-tion value, then unity subtracted. At the highest frequencies,where the non-aliased ﬁts start to deviate from the aliasedPSD, a slight departure from a straight line is seen in all theﬁts. it): σ ( z [ a, b, c ]) = (cid:104) (∆ z ) (cid:105) = (73)( ∂ a z ) (cid:104) (∆ a ) (cid:105) + ( ∂ b z ) (cid:104) (∆ b ) (cid:105) + ( ∂ c z ) (cid:104) (∆ c ) (cid:105) + 2 ∂ a z ∂ b z (cid:104) ∆ a ∆ b (cid:105) + 2 ∂ a z ∂ c z (cid:104) ∆ a ∆ c (cid:105) + 2 ∂ b z ∂ c z (cid:104) ∆ b ∆ c (cid:105) , with ∂ a z = ∂z/∂a , and similarly for b and c ; z = f c , f , D, . . . , and (cid:104) ∆ a ∆ b (cid:105) etc. are elements of the covari- ance matrixcov( a, b, c ) ≡  (cid:104) (∆ a ) (cid:105) (cid:104) ∆ a ∆ b (cid:105) (cid:104) ∆ a ∆ c (cid:105)(cid:104) ∆ a ∆ b (cid:105) (cid:104) (∆ b ) (cid:105) (cid:104) ∆ b ∆ c (cid:105)(cid:104) ∆ a ∆ c (cid:105) (cid:104) ∆ b ∆ c (cid:105) (cid:104) (∆ c ) (cid:105)  . (74)This covariance matrix can be written in terms of theexpectation values for the statistics, see Appendix D,cov( a, b, c ) ≈ cov(ˆ a, ˆ b, ˆ c ) = 1 N n + 3 n (cid:104) S (cid:105) − (75)with (cid:104) S (cid:105) = n + 1 n ˜ S , (76)where ˜ S is a matrix of the same form as S in Eq. (65),but with expectation values P f = (cid:104) ¯ P (ex) (cid:105) replacing theexperimental values ¯ P (ex) in the statistics Eq. (58), seeEq. (F3).To ﬁnd out how much of the available information weare putting to use, we can express the covariance matrixin terms of the Fisher information matrix I = N ˜ S forthe full (not the trick) MLE problemcov( a, b, c ) ≈ cov(ˆ a, ˆ b, ˆ c ) = n + 3 n + 1 I − (77)from which we see that as n increase, we approach theCram´er-Rao bound [15] on the variance for an unbiasedestimator, which is just I − . The main thing to notice,however, is that all the covariances in Eq. (75) are verysmall because N is very large—the weak n dependence(see Fig. 7) is of no consequence in comparison. In anexperiment, simply insert the ﬁtted value for the power-spectrum as a best estimate for P f in ˜ S .

1. Results for Optical Tweezers

For the non-aliased Lorentzian in Eq. (3) we then have,using the generic Eq. (73) σ ( f c ) = f c2 (cid:20) (cid:104) (∆ a ) (cid:105) a + (cid:104) (∆ b ) (cid:105) b − (cid:104) ∆ a ∆ b (cid:105) ab (cid:21) (78)and σ ( D ) = D (cid:104) (∆ b ) (cid:105) b , (79)irrespective or the method used to estimate a , b , and c . Actual numbers are found by inserting the estimates(ˆ a, ˆ b, ˆ c ). The results for the aliased Lorentzian are givenin Appendix E.3 C oe ff i c i en t o f v a r i a t i on i n % Simulated fcSimulated DAnalytical fcAnalytical D[(n+3)/(n+1)]

FIG. 7: Stochastic error on ﬁt parameters as a function ofnumber, n , of PSDs averaged over, for ML-ﬁts of aliasedLorentzians using simplifying trick. Compared to the de-pendence on the measurement time (Fig. 8), the error isonly weakly dependent on n . The sampling frequency washeld ﬁxed at f sample = 16 ,

384 Hz, and the ﬁt was done tothe averaged power spectra generated with N = 262 , t msr = 16 s), f c = 500 Hz, and D = 0 . µ m/s . Filledsymbols show the coeﬃcient of variation, σ ( X ) / (cid:104) X (cid:105) , with X = f c , D determined using Eqs. (C2) and (C3), from 100independent stochastic simulations. Empty symbols showthe theoretical expectation values from Eqs. (E1) and (E2).Dashed lines show the (cid:112) ( n + 3) / ( n + 1) ∈ [1 : √

2] scalingpredicted from Eqs. (77) and (E5).

2. Results for AFM cantilevers

For the AFM, the variance of the three ﬁt parametersare: σ ( f ) = f (cid:20) (cid:104) (∆ a ) (cid:105) a + (cid:104) (∆ c ) (cid:105) c − (cid:104) ∆ a ∆ c (cid:105) ac (cid:21) (80) σ ( D ) = D π (cid:2) (cid:104) (∆ a ) (cid:105) f − + (cid:104) (∆ b ) (cid:105) + (cid:104) (∆ c ) (cid:105) f + 2 (cid:104) ∆ a ∆ b (cid:105) f − + 2 (cid:104) ∆ a ∆ c (cid:105) + 2 (cid:104) ∆ b ∆ c (cid:105) f (cid:3) (81) σ ( G ) = G c (cid:2) (cid:104) (∆ a ) (cid:105) f − + (cid:104) (∆ b ) (cid:105) + (cid:104) (∆ c ) (cid:105) ( G − − f ) + 2 (cid:104) ∆ a ∆ b (cid:105) f − + 2 (cid:104) ∆ a ∆ c (cid:105) (1 − G − f − )+ 2 (cid:104) ∆ b ∆ c (cid:105) ( G − − f ) (cid:3) , (82)where G = (2 πm/γ ) . (83)

3. How good are the ﬁts?

The goodness of ﬁt , i.e., the support for the hypothesisthat the ﬁtted theory is correct, is [16] the probabilitythat a repetition of the experiment yields a data set witha smaller value for p . A calculation shows that for K (cid:29) − N C oe ff i c i en t o f v a r i a t i on i n % Simulated fcSimulated DAnalytical fcAnalytical D

FIG. 8: 1 / √ N -dependence of stochastic error on ﬁt parame-ters for ML-ﬁts of aliased Lorentzians using simplifying trick.An expected 1 / √ N behavior is easily made out on the double-logarithmic scale. The sampling frequency was held ﬁxedat f sample = 16 ,

384 Hz, and the ﬁt was done to the aver-age of n = 16 power spectra generated with f c = 500 Hzand D = 0 . µ m/s . Filled symbols show the coeﬃcientof variation, σ ( X ) / (cid:104) X (cid:105) , with X = f c , D determined usingEqs. (C2) and (C3), from 100 independent stochastic simula-tions. Empty symbols connected by lines show the theoreticalexpectation values from Eqs. (E1) and (E2). the support, or backing, is B ( s ) = erfc (cid:16) | s − K | (cid:112) n/ (2 K ) (cid:17) (84)where erfc is the complementary error function, s ≡ (cid:80) f ¯ P (ex) f /P f , and K is the number of terms in the sum s . We have assumed above that this number is muchlarger than the number of parameters ﬁtted, hence equalto the number of degrees of freedom. It is of order 10 –10 in our case, while 2–3 parameters are ﬁtted. Whenthe sum s = K , the expectation value for s , the back-ing is one but it rapidly drops to zero as s becomeslarger or smaller than K . The backing calculated forthe ﬁts shown in Fig. 5 were, respectively, 0 (lsq ﬁts; zerowithin the numerical precision of MatLab), 0.87 (ML-ﬁt with simplifying trick), and 1.00 (rigorous numericalML-ﬁt; the ﬁrst deviation from unity is in the 7th dec-imal place). These numbers are stochastically varyingbecause the PSD values are. The backing for the rigor-ous numerical ML-ﬁt will always be close to one becausethe stationarity conditions, see Eq. (53), for which theﬁt parameters are determined are virtually the same as s = K . VII. SUMMARY AND CONCLUSION

1. A time series of N coordinate values for an opticallytrapped microsphere or an AFM cantilever doingBrownian motion, gives rise to N power spectralvalues ( N/ / √ N in the aliasedcase (Section VI C and Appendix E). For the non-aliased case see also Fig. 2.2. For the purpose of displaying the experimentalpower spectrum, its signal-to-noise value is re-duced by a factor √ n by dividing the original data,the time series of N coordinates, into n equallylong, non-overlapping subseries (Section IV). Fromthese, n experimental power spectra are calculatedand averaged over. This noise-reduced power spec-trum covers the same frequency interval, but theseparation ∆ f between consecutive points has in-creased by a factor n .3. Noise reduction trades resolution on the frequencyaxis for resolution on the power axis in a man-ner that loses no information about the parameterscharacterizing the power spectrum.4. The 1 / √ n scatter of experimental power spectralvalues should not be confused with the expectedstochastic error on parameter values characteriz-ing the power spectrum; the latter are of order1 / √ N with only a weak n -dependence, as shownin Eq. (75), and Figs. 8 and 7.5. Formulas given above, based on maximum likeli-hood estimates, eliminate this issue and all furtherﬁtting for the cases of the non-aliased and aliasedLorentzian power spectrum and for the non-aliaseddamped harmonic oscillator as model for an AFMcantilever (Section VI A). Fitting has been doneonce-and-for-all and the results, including errorbarsand a goodness-of-ﬁt measure, are given by the for-mulas in Section VI and Appendix C.6. For other systems described by a linear Langevinequation driven by white or colored noise (Sec-tion IV A), one can determine parameters of thetheoretical power spectrum by ﬁtting it to an ex-perimental spectrum, using weighted least-squaresﬁtting and experimental or theoretical error bars asweights, which is computationally faster and morerobust than MLE. The resulting value for the diﬀu-sion coeﬃcient D should be corrected as describedin Eqs. (32) and (41) respectively.7. Quite generally, beyond optical traps and AFMcantilevers, if the dependent variable (the data) andthe squared weights are independent, then weightedleast-squares yields unbiased results. If not, theleast-squares ﬁt is typically biased, but this biascan often be removed altogether by a simple re-scaling of the ﬁt-results (Section V).8. To minimize the stochastic error on ﬁt parameterswe suggest setting n = 8 or larger if using ML-ﬁtswith our simplifying trick or WLS with theoreti-cal weights, see Eqs. (77) and (E5) and Fig. 7. IfWLS with experimental weights is used, we suggest n = 16 or larger, see Fig. 3. Also, for given f c , n ,and N , f sample = 8 f c will minimize the stochas-tic error, see Fig. 9. Typically however, t msr and f c are ﬁxed and f sample should simply be set as high asmeaningfully possible as N is the most importantfactor in increasing the precision, see Fig. 8.9. For optical tweezers, the highest precision is ob-tained by ﬁtting over as much of the frequencyrange as can be captured by theory, including mod-iﬁcation to the PSD due to hydrodynamics, optics,and instrumentation [9, 10, 13]. If a certain levelof precision is required it then becomes a matterof measuring long enough with the sampling fre-quency as high as experimentally possible (the min-imum in Fig. 9 is for ﬁxed N ).10. When doing least squares ﬁtting we recommend theuse of theoretical weights, Eq. (36), if the standarddeviation is known to be proportional to the ex-pected value, because this minimizes the stochasticerrors on the ﬁt parameters, see Figs. 2 and 3. Ifany bias is present, that will also be smaller thanthe bias resulting from experimental weights, andcan often be eliminated with the help of Eq. (40)or (41). f sample C oe ff i c i en t o f v a r i a t i on i n % Simulated fcSimulated DAnalytical fcAnalytical D

FIG. 9: Stochastic error on ﬁt parameters as a function ofsampling frequency, f sample , for ML-ﬁts of aliased Lorentziansusing simplifying trick. A non-monotonic behavior is seenwith minimum around f sample = 8 f c . The number of acquireddata points was held ﬁxed at N = 262 , n = 16 power spectra generated with f c = 500 Hz and D = 0 . µ m/s . Filled symbols show the co-eﬃcient of variation, σ ( X ) / (cid:104) X (cid:105) , with X = f c , D determinedusing Eqs. (C2) and (C3), from 100 independent stochasticsimulations. Empty symbols connected by lines show the the-oretical expectation values from Eqs. (E1) and (E2). VIII. ACKNOWLEDGEMENTS

We are thankful to Erik Sch¨aﬀer for a critical read-ing of the manuscript. SFN gratefully acknowledges ﬁ-5nancial support from the Carlsberg Foundation and theLundbeck Foundation. HF gratefully acknowledges ﬁ-nancial support from the Human Frontier Science Pro-gram, GP0054/2009-C.6

Appendix A: Monte Carlo simulation of theEinstein-Ornstein-Uhlenbeck theory of Brownianmotion in a harmonic potential

In the case of non-negligible mass m , Eq. (1) is rewrit-ten as two coupled ﬁrst-order equations,dd t (cid:18) x ( t ) v ( t ) (cid:19) = − M (cid:18) x ( t ) v ( t ) (cid:19) + (2 D ) / γm (cid:18) η ( t ) (cid:19) , (A1)where we have introduced the 2 × M = (cid:18) − κm γm (cid:19) . (A2)Equation (A1) has the solution (cid:18) x ( t ) v ( t ) (cid:19) = (2 D ) / γm (cid:90) t −∞ d t (cid:48) e − M ( t − t (cid:48) ) (cid:18) η ( t (cid:48) ) (cid:19) , (A3)from which follows that (cid:18) x j +1 v j +1 (cid:19) = e − M ∆ t (cid:18) x j v j (cid:19) + (cid:18) ∆ x j ∆ v j (cid:19) . (A4)Here e − M ∆ t = 1 λ + − λ − (cid:18) − λ − c + + λ + c − − c + + c − λ + λ − ( c + − c − ) λ + c + − λ − c − (cid:19) (A5)where λ ± ≡ γ m ± (cid:114) γ m − κm , u ± ≡ (cid:18) ∓ ± λ ± (cid:19) (A6)are the two eigenvalues and corresponding eigenvectorsof M , c ± = exp( − λ ± ∆ t ) , (A7)and we have introduced the notation (cid:18) ∆ x j ∆ v j (cid:19) ≡ ∆ x + ,j u + + ∆ x − ,j u − (A8)with∆ x ± ,j ≡ (2 D ) / λ + + λ − λ + − λ − (cid:90) t j +1 t j dt (cid:48) e − λ ± ( t j +1 − t (cid:48) ) η ( t (cid:48) ) . (A9)From Eq. (2) follows that ∆ x ± ,j are two random lengthsdrawn from Gaussian distributions with vanishing expec-tation value and known variances: (cid:104) ∆ x ± ,j ∆ x ± ,k (cid:105) = σ ± δ j,k , (A10) σ ± ≡ D (cid:18) λ + + λ − λ + − λ − (cid:19) − c ± λ ± . It also follows that ∆ x + ,j and ∆ x − ,j are correlated witheach other, but uncorrelated with all ∆ x ± ,k for j (cid:54) = k : (cid:104) ∆ x + ,j ∆ x − ,k (cid:105) = σ − δ j,k , (A11) σ − ≡ D (cid:18) λ + + λ − λ + − λ − (cid:19) − c + c − λ + + λ − . From their known correlation follows, after some calcu-lation, that they can be expressed in terms of two uncor-related

Gaussian distributed random numbers with unitvariance, η ( a ) j and η ( b ) j , as (cid:18) ∆ x j ∆ v j (cid:19) = (A12) (cid:18) A + (cid:18) − λ + (cid:19) + A − (cid:18) − λ − (cid:19)(cid:19) (1 + α ) / η ( a ) j + (cid:18) A + (cid:18) − λ + (cid:19) − A − (cid:18) − λ − (cid:19)(cid:19) (1 − α ) / η ( b ) j , where we have introduced the notation A ± = λ + + λ − λ + − λ − (cid:115) (1 − c ± ) D λ ± (A13)and α = 2 (cid:112) λ + λ − λ + + λ − − c + c − (cid:113) (1 − c )(1 − c − ) . (A14)So iteration of Eq. (A4) with use of Eq. (A12) generatesa time series of positions x j , which is sampled equidis-tantly in time with separation ∆ t from the continuous-time solution to Eq. (1). Since we use the exact ana-lytical solution of Eq. (1) in the generation of this se-ries, the ﬁnite value of ∆ t causes no discretization error.The only numerical errors associated with our solutionare associated with the representation of real numberson a computer, and, rather hypothetical, with the use ofpseudo-random numbers. Appendix B: Aliased AFM Power Spectrum

For the AFM, using the results from Appendix A, weget for the aliased power spectrum: (cid:104) P (ex) k (cid:105) = (cid:104)| ˆ x k |(cid:105) /t msr = σ α − + σ − α + − σ − α + − α + α − ∆ t (B1)where α + = 1 + c − c + cos(2 πk/N ) (B2) α − = 1 + c − − c − cos(2 πk/N ) (B3) α + − = 1 + c + c − − ( c + + c − ) cos(2 πk/N ) (B4)and we have used that the discrete Fourier transform of η ( a ) j and η ( b ) j have the following characteristics (cid:104) ˆ η ( a ) ∗ k ˆ η ( b ) l (cid:105) = 0 (B5) (cid:104) ˆ η ( a ) ∗ k ˆ η ( a ) l (cid:105) = (cid:104) ˆ η ( b ) ∗ k ˆ η ( b ) l (cid:105) = t msr ∆ t δ k,l . (B6)7For completeness we note that the expression in Eq. (B1)has as limiting expression Eq. (C1) when the mass van-ishes: lim m → (cid:104) P (ex) k (cid:105) = σ − α − ∆ t (B7)as is seen by inspection. Appendix C: Maximum Likelihood Estimation foraliased power spectra

In an OT experiment, the time-series of bead positions x ( t ) is obtained by sampling the continuous output fromthe photodiode at discrete times t j = j ∆ t , ∆ t = 1 /t msr .Applying the discrete Fourier transform to x ( t ) we ﬁnd [9]that the expectation value for the aliased power spectrumcan be written in the form: P alias k = 1 A + B cos(2 πk/N ) , (C1)where A and B are related to f c and D through f c = f sample π u (C2) D = f sample2 A tanh( u ) u , (C3) u = cosh − ( − A/B ) . (C4)By inserting Eq. (C1) in Eq. (53) the stationarity con-ditions ( ∂ A F = ∂ B F = 0) are seen to be (cid:88) f ¯ P (ex) f = (cid:88) f P f (C5) (cid:88) f cos(2 πk/N ) ¯ P (ex) f = (cid:88) f cos(2 πk/N ) P f . (C6)We now repeat the trick introduced in Section VI A toturn Eqs. (C5) and (C6) into expressions linear in A and B (cid:18) R , R , R , R , (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) R (cid:18) AB (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) (cid:126)u ≈ (1 + 1 /n ) (cid:18) R , R , (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) (cid:126)r , (C7)that are solved to give A ≈ ˆ A ≡ n + 1 n R , R , − R , R , R , R , − R , (C8) B ≈ ˆ B ≡ n + 1 n R , R , − R , R , R , R , − R , . (C9)where we have introduced the (aliased) statistics R p,q = 1 K (cid:88) k cos p (2 πk/N ) ¯ P (ex) qk . (C10) We do not attempt here to give the aliased results for f , D , and G from the AFM case: To avoid the alias-ing of high frequency noise to the lower frequencies ofinterest, a high sampling frequency is often used whenacquiring AFM data. However, only the region around f is well captured by the dampened harmonic oscillatortheory and therefore no more than this region is ﬁtted.Since the aliased expressions only deviate substantiallyfrom the non-aliased ones at high frequencies, and be-cause the non-aliased expressions are much simpler, weonly treated the non-aliased MLE for the AFM here. Appendix D: Covariance Matrix

This appendix derives the results that are used in Sec-tion VI C and Appendix E. To calculate the covariancematrix we look at the response of the estimated ﬁt pa-rameters to ﬂuctuations in the statistics S p,q . The calcu-lations go through unchanged for the aliased Lorentzian(see above) with statistics R p,q . First, we note that wecan write each term in Eq. (65) S (cid:126)v ≈ S ˆ (cid:126)v ≡ S  ˆ a ˆ b ˆ c  ≡ (1 + 1 /n ) (cid:126)s (D1)as the sum of its “true underlying” value and a ﬂuctua-tion (∆ S and ∆ (cid:126)s ) or a response (∆ (cid:126)v ) to ﬂuctuations S = (cid:104) S (cid:105) + ∆ S (D2)ˆ (cid:126)v = (cid:126)v + ∆ (cid:126)v (D3) (cid:126)s = (cid:104) (cid:126)s (cid:105) + ∆ (cid:126)s (D4)where the elements in ∆ S are∆ S p,q = 1 K (cid:88) f f p (cid:104) ¯ P (ex) q − (cid:104) ¯ P (ex) q (cid:105) (cid:105) (D5)To ﬁrst order in the ﬂuctuations we thus have∆ (cid:126)v =  ∆ a ∆ b ∆ c  = (cid:104) S (cid:105) − (cid:18) n + 1 n ∆ (cid:126)s − ∆ S (cid:126)v (cid:19) (D6)where we notice that to ﬁrst order (cid:104) ∆ (cid:126)v (cid:105) = 0 which, how-ever, does not mean that this is an unbiased estimatoras shown below. The covariance matrix (cid:104) ∆ (cid:126)v ⊗ ∆ (cid:126)v (cid:105) asgiven in Eq. (74) then follows after some calculation us-ing Eqs. (49) and (15).We emphasize here, that the above calculations wereto ﬁrst order in 1 / √ K , with K the number of terms inthe statistics S p,q : Whereas the relative size of an indi-vidual ﬂuctuation in the power spectral value ∆ P (ex) isindependent of K , the relative sizes of the overall ﬂuctu-ations in the sums ∆ S p,q go to zero as 1 / √ K . Since K is typically of the order 10 –10 , this ﬁrst-order approx-imation is very good.8 Appendix E: Error-bars for the aliased Lorentzian

The error-bars on f c and D are calculated as before,using Eq. (73) for the variance: σ ( f c ) = f sample2 A π ( A − B ) (cid:20) (cid:104) (∆ A ) (cid:105) A + (cid:104) (∆ B ) (cid:105) B − (cid:104) ∆ A ∆ B (cid:105) AB (cid:21) (E1)and σ ( D ) = ( ∂ A D ) (cid:104) (∆ A ) (cid:105) + ( ∂ B D ) (cid:104) (∆ B ) (cid:105) + 2 ∂ A D ∂ B D (cid:104) ∆ A ∆ B (cid:105) (E2)where ∂ A D = DA (cid:18) B A − B − − Au √ A − B (cid:19) (E3) ∂ B D = DB (cid:18) Au √ A − B − B A − B (cid:19) , (E4)and u is given in Eq. (C4). The covariance matrix iscalculated as before, givingcov( A, B ) ≡ (cid:18) (cid:104) (∆ A ) (cid:105) (cid:104) ∆ A ∆ B (cid:105)(cid:104) ∆ A ∆ B (cid:105) (cid:104) (∆ B ) (cid:105) (cid:19) ≈ N n + 3 n + 1 ˜ R − , (E5)where ˜ R is a matrix with the same structure as R , seeEq. (C7), but with the experimental values ¯ P (ex) replacedby the theoretical (in practice the ﬁtted) values P f in allthe statistics Eq. (C10).We tested the above analytical results by comparing tothe results of simulations: Multiple independent positiontime-series for a mass-less particle diﬀusing in a harmonicpotential were created using the methods given in [9].The resulting power spectra were ﬁtted using Eqs. (C2–C10). Since we simulate a stochastic process there isscatter in the ﬁtted parameters and it is this scatter thatwe compare to the results given in Eqs. (E1) and (E2).The results are shown in Figs. 8 and 7. Figure 8 showsthe expected 1 / √ N scaling, whereas Fig. 7 shows a √ n + 3 / √ n + 1 scaling.Figure 9 shows that the optimal tradeoﬀ between preci-sion and amount of data acquired seems to manifest itselfat a sampling frequency roughly eight times the cornerfrequency; any slower than this leads to large errors inboth parameters because the PSD essentially reduces tothe ratio of D to f c , i.e., two parameters are used to ﬁta single constant. Sampling much faster than f c , butkeeping the number of acquired data points ﬁxed, has noeﬀect on the error on D but is detrimental for f c sincethere is progressively less information about f c at largerfrequencies. The eﬀect on the precision of increasing n is small but positive; compared to the eﬀect of N and f sample it can be ignored (after including it as describedin Eqs. (C8) and (C9)).For comparison, the error-bars on the ﬁt parameters,as a function of cut-oﬀ frequency f max , from a least-squares ﬁt are shown in Fig. 2: The average of n = 16synthetic PSDs were ﬁtted by minimizing Eq. (23) withthe data-points weighted by the standard deviation ofthe n PSDs. Also shown is a LSQ ﬁt where the weightsare kept constant; this is the kind of ﬁtting performedby primitive LSQ routines. For a detailed discussionof how the stochastic error depends on the ﬁtting range[ f min : f max ] the reader is referred to section VIII in [9]. Appendix F: Bias of the MLEs

Obviously, we must be paying a price somewhere forturning a non-linear problem into a linear one with ourlittle trick, or else we would have turned a non-linearproblem into an exactly solvable mathematical problem.That is sometimes done, but not here: The trick worksthrough an approximation, and the resulting approxi-mate estimator is biased, which means that its expecta-tion value is diﬀerent from the true value of the quantityit estimates. So on the average it misses the correct re-sult. Bias is systematic error on averages. That is the nature of the price we pay. Fortunately, it is negligiblein size, as we demonstrate now.For the non-aliased Lorentzian and AFM we ﬁnd, byexpanding Eq. (D1) to ﬁrst order in ∆ (cid:126)v and second orderin ∆ S p,q (cid:104) ∆ (cid:126)v (cid:105) =  (cid:104) ∆ a (cid:105)(cid:104) ∆ b (cid:105)(cid:104) ∆ c (cid:105)  (F1)= (cid:104) S (cid:105) − (cid:20) (cid:104) ∆ S (cid:104) S (cid:105) − ∆ S (cid:105) (cid:126)v − n + 1 n (cid:104) ∆ S (cid:104) S (cid:105) − ∆ (cid:126)s (cid:105) (cid:21) . For the non-aliased Lorentzian this expression can be re-duced to (cid:18) (cid:104) ∆ a (cid:105)(cid:104) ∆ b (cid:105) (cid:19) = 2 N n + 2 n + 1 (cid:16) ˜ S , ˜ S , − ˜ S , (cid:17) − (F2) (cid:18) ˜ S , − ˜ S , − ˜ S , ˜ S , (cid:19) (cid:18) ˜ S , − S , ˜ S , ˜ S , − S , ˜ S , (cid:19)  ˜ S , ˜ S , ˜ S ,  where˜ S p,q = 1 K (cid:88) f f p (cid:104) ¯ P (ex) (cid:105) q = 1 K (cid:88) f f p P qf . (F3)That is, the bias is proportional to 1 /N which is a verysmall number, and displays a weak n -dependence. Fromthese expressions we ﬁnd the bias on f c and D to be (cid:104) ∆ f c (cid:105) = f c (cid:18) (cid:104) ∆ a (cid:105) a − (cid:104) ∆ b (cid:105) b (cid:19) (F4) (cid:104) ∆ D (cid:105) = − D (cid:104) ∆ b (cid:105) b . (F5)9We only know the true values of f c and D in simula-tions, in an experiment the best estimate of the true valuewould be the ﬁtted value.For the aliased Lorentzian we ﬁnd in an analogousmanner (cid:104) ∆ f c (cid:105) = f sample π A √ A − B (cid:20) (cid:104) ∆ B (cid:105) B − (cid:104) ∆ A (cid:105) A (cid:21) (F6) (cid:104) ∆ D (cid:105) = − D (cid:20)(cid:18) B A − B + A √ A − B (cid:19)(cid:18) (cid:104) ∆ A (cid:105) A + (cid:104) ∆ B (cid:105) B (cid:19) − (cid:104) ∆ A (cid:105) A (cid:21) , (F7)where (cid:104) ∆ A (cid:105) and (cid:104) ∆ B (cid:105) are found from Eq. (F1) by ev-erywhere replacing ˜ S with ˜ R . The result of a numericaltest of the above relations is shown in Fig. 10. − − B i a s i n % Simulated fcSimulated DTheory fcTheory D

FIG. 10: Bias of the parameter estimates for f c (grey squares)and D (white circles), from ML-ﬁts with our trick, of thealiased Lorentzian. Error-bars are standard errors on themean. The point of this ﬁgure is only to show that the bias isindeed very small and can be completely ignored. Theoreticalexpectation value for the bias as given in Eq. (F6) (grey line)and Eq. (F7) (black line) are less than 1 / ,

000 of a percent forthese settings. Bias was measured as the diﬀerence betweenthe average of 1,000 determinations of the ﬁt parameters andthe known input values f c = 500 Hz and D = 0 . µ m/s .Simulations were run with N = 2 and f sample = 2 Hz—values chosen to minimize the stochastic errors. Data weretreated with n non-overlapping Hann windows before calcu-lation of the PSD.[1] K. C. Neuman and S. M. Block, Review of Scientiﬁc In-struments , 2787 (2004).[2] K. C. Neuman, T. Lionnet, and J.-F. Allemand, AnnuRev Mater Res , 33 (2007).[3] K. C. Neuman and A. Nagy, Nat Meth , 491 (2008).[4] J. R. Moﬃtt, Y. R. Chemla, S. B. Smith, and C. Busta-mante, Annu Rev Biochem , 205 (2008).[5] T. T. Perkins, Laser & Photon. Rev. , 203 (2009).[6] J. L. Hutter and J. Bechhoefer, Rev Sci Instrum , 1868 (1993).[7] D. Walters et al. , Rev Sci Instrum , 3583 (1996).[8] J. Sader, Journal of applied physics , 64 (1998).[9] K. Berg-Sørensen and H. Flyvbjerg, Review of ScientiﬁcInstruments , 594 (2004).[10] S. F. Toli´c-Nørrelykke et al. , Review of Scientiﬁc Instru-ments , 103101 (2006).[11] K. Berg-Sørensen, L. Oddershede, E.-L. Florin, and H.Flyvbjerg, Journal of Applied Physics , 3167 (2003). [12] K. Berg-Sørensen et al. , Review of Scientiﬁc Instruments , 063106 (2006).[13] E. Sch¨aﬀer, S. F. Nørrelykke, and J. Howard, Langmuir , 3654 (2007).[14] A. C. Aitken, Proceedings of the Royal Society of Edin-burgh , 42 (1935).[15] C. R. Rao, Linear Statistical Inference and Its Applica-tions (Wiley, New York, New York, 1973).[16] N. C. Barford,

Experimental Measurements: Precision,Error and Truth , 2nd ed. (John Wiley & Sons, 1990).[17] Here and below, we used the following results that are easy to verify by direct calculation or by consulting astatistics text book: If two independent stochastic vari-ables are normally distributed

X, Y ∼ N (0 , σ / X , Y ∼ Γ( , σ ), and the sum of their squares is exponentiallydistributed Z = X + Y ∼ Γ(1 , σ ) = E ( σ ). Fi-nally, averaging over n independent, exponentially dis-tributed variables returns a gamma distributed variable n (cid:80) ni =1 Z i ∼ Γ( n, nσ2