[PDF] Bayesian estimates of free energies from nonequilibrium work data in the presence of instrument noise

Abstract

The Jarzynski equality and the fluctuation theorem relate equilibrium free energy differences to non-equilibrium measurements of the work. These relations extend to single-molecule experiments that have probed the finite-time thermodynamics of proteins and nucleic acids. The effects of experimental error and instrument noise have not previously been considered. Here, we present a Bayesian formalism for estimating free-energy changes from non-equilibrium work measurements that compensates for instrument noise and combines data from multiple driving protocols. We reanalyze a recent set of experiments in which a single RNA hairpin is unfolded and refolded using optical tweezers at three different rates. Interestingly, the fastest and farthest-from-equilibrium measurements contain the least instrumental noise, and therefore provide a more accurate estimate of the free energies than a few slow, more noisy, near-equilibrium measurements. The methods we propose here will extend the scope of single-molecule experiments; they can be used in the analysis of data from measurements with AFM, optical, and magnetic tweezers.

Full PDF

aa r X i v : . [ c ond - m a t . s t a t - m ec h ] J u l LBNL-62739

Bayesian estimates of free energies from nonequilibrium work data in the presence ofinstrument noise.

Paul Maragakis

Department of Chemistry and Chemical Biology,Harvard University, Cambridge, Massachusetts 02138, USA ∗ Felix Ritort

Departament de F´ısica Fonamental, Facultat de F´ısica,Universitat de Barcelona, 08028 Barcelona, Spain andCIBER-BBN, Networking centre on Bioengineering, Biomaterials and Nanomedicine

Carlos Bustamante

Howard Hughes Medical Institute andDepartments of Physics and Molecular & Cell Biology,University of California, Berkeley, California, 94720, USA

Martin Karplus

Department of Chemistry and Chemical Biology,Harvard University, Cambridge, Massachusetts 02138, USA andLaboratoire de Chimie Biophysique, Institut de Science et d’Ing´enierie Supramol´eculaires,Universit´e Louis Pasteur, F-67083 Strasbourg Cedex, France

Gavin E. Crooks † Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA (Dated: November 2, 2018)The Jarzynski equality and the ﬂuctuation theorem relate equilibrium free energy diﬀerences tonon-equilibrium measurements of the work. These relations extend to single-molecule experimentsthat have probed the ﬁnite-time thermodynamics of proteins and nucleic acids. The eﬀects ofexperimental error and instrument noise have not previously been considered. Here, we presenta Bayesian formalism for estimating free-energy changes from non-equilibrium work measurementsthat compensates for instrument noise and combines data from multiple driving protocols. Wereanalyze a recent set of experiments in which a single RNA hairpin is unfolded and refolded usingoptical tweezers at three diﬀerent rates. Interestingly, the fastest and farthest-from-equilibriummeasurements contain the least instrumental noise, and therefore provide a more accurate estimateof the free energies than a few slow, more noisy, near-equilibrium measurements. The methods wepropose here will extend the scope of single-molecule experiments; they can be used in the analysisof data from measurements with AFM, optical, and magnetic tweezers.

I. INTRODUCTION

A central endeavor of thermodynamics is the measure-ment of entropy and free energy changes, for which theprincipal experimental methods are based on the Clau-sius inequality . One starts with a system equilibratedin one thermodynamic state, A , and then perturbs thesystem, following some explicit protocol, until the controlparameter corresponds to a new thermodynamic state, B .If the temperature T of the surroundings is ﬁxed, thechange in entropy, ∆ S = S B − S A , is related to the ﬂowof heat Q into the system:∆ S ≥ β h Q i , (1) ∗ present address: D. E. Shaw Research, New York, New York10036, USA; electronic address: [email protected] where β = 1 /k B T , and k B is the Boltzmann constant.Equivalently, the free energy diﬀerence ∆ F = F B − F A =∆ h U i − ∆ S/β is related to the work W done on the sys-tem: ∆ F ≤ h W i . (2)Here we use the sign convention ∆ U = Q + W . The anglebrackets indicate an average over many repetitions of thesame experiment. In macroscopic systems individual ob-servations do not diﬀer signiﬁcantly from the mean. Butfor a microscopic system the ﬂuctuations from the meancan be large and the inequality only holds on average(i.e., not for individual measurements).It was recently discovered that equilibrium free energydiﬀerences can also be determined by measuring the workperformed during irreversible transformations, using theJarzynski and work ﬂuctuation relations .These theoretical insights have been used to determinethe unfolding free energy of an RNA hairpin from TrapBeadActuator Bead P i z eoe l e c t i c A c t ua t o r Laser Trap Trap BeadActuator BeadRNAHairpin

FIG. 1: Non-equilibrium work measurements for folding andunfolding an RNA hairpin . A single RNA molecule is at-tached between two beads via hybrid DNA/RNA linkers. Onebead is captured in an optical laser trap that can measure theapplied force on the bead. The other bead is attached to apiezoelectric actuator, which is used to irreversibly unfold andrefold the hairpin . ﬁnite-time, non-equilibrium experiments, as described inFig 1. We consider a protocol (labeled Λ) that startswith an equilibrated system, and then transforms an ex-ternal control parameter from an initial value A , to a ﬁnalvalue B in a ﬁnite time. (In the RNA hairpin unfoldingexperiments, the control parameter is the distance be-tween the center of the optical trap and the center of theﬁxed bead.) This perturbation drives the system out-of-equilibrium. Once the protocol ends, the control param-eter is again ﬁxed, and the system can relax back to ther-mal equilibrium. One can also run the protocol in reverse,starting with a system equilibrated with the control pa-rameter at B , and then transform the system throughthe reverse sequence of intermediate control parameters,to A . We label this conjugate protocol ˜Λ. Due to thereversibility of the microscopic dynamics, the probabil-ity P ( W | ∆ F Λ , Λ) of measuring a particular value of thework during protocol Λ is related to the work probabil-ity density of the conjugate protocol, ˜Λ, by the followingwork ﬂuctuation symmetry : P (+ W | ∆ F Λ , Λ) P ( − W | ∆ F ˜Λ , ˜Λ) = e + βW − β ∆ F Λ , (3)with ∆ F Λ (= − ∆ F ˜Λ ) the change in free energy associ-ated with the change of the external control parameterin protocol Λ (˜Λ). This relation immediately implies theJarzynski equality (cid:10) e − βW (cid:11) = Z dW P (+ W | ∆ F Λ , Λ) e − βW = Z dW P ( − W | ∆ F ˜Λ , ˜Λ) e − β ∆ F = e − β ∆ F . (4)In other words, a Boltzmann weighted average of the irre-versible work recovers the equilibrium free energy diﬀer-ence from a non-equilibrium transformation. The Clau- FIG. 2: Typical force extension curves in the unfolding (solidlines) and folding (dashed lines) of a 20 base pairs RNA hair-pin. Diﬀerent colors correspond to diﬀerent unfolding-foldingcycles. The rip in force observed around 15pN correspondsto the cooperative unfolding/folding transition. The area be-low the force-extension curve is equal to the mechanical workdone on the RNA hairpin. Because the transformations areirreversible, the work performed varies from one unfolding orrefolding measurement to the next. Drift eﬀects observed inforce extension curves arise from diﬀerent causes, includingair currents, mechanical vibrations and temperature changes. sius relation follows by an application of Jensen’s inequal-ity, ln h exp( x ) i ≥ h x i .Given the thermodynamics preamble, we canrephrase the problem of measuring the free en-ergy as follows: How do we calculate the mostaccurate, least biased, estimate of the free en-ergy, given a ﬁnite number of irreversible workmeasurements? Weconsider both the statistical error due to limited dataand, for real experiments, the additional error dueto measurement noise. Furthermore, we may wish tosimultaneously combine the data from multiple proto-cols connecting the same thermodynamic states . Forexample, in the single-molecule experiment describedin Fig. 1, the same RNA hairpin was unfolded at threediﬀerent rates, with each dataset providing a diﬀerentcompromise between statistical and experimental errors.The Clausius relations are exact equalities only for in-ﬁnitely slow, thermodynamically reversible transforma-tions, where the irreversible dissipation is zero. A trans-formation that occurs in a ﬁnite time provides only anupper bound to the free energy and a lower bound tothe entropy change. (Since entropy and free energy arestate variables, the reverse transformation, from thermo-dynamic state B back to A , provides a lower (upper)bound to the same free energy (entropy) change.) Oneapproach to analyzing irreversible transformations is todirectly apply the Jarzynski relation . However,this identity strictly holds only in the limit of an inﬁ-nite number of repeated experiments. For a ﬁnite num-ber of measurements, we again obtain an inequality thatonly holds on average , and the free energy estimatestend to be strongly biased . Becausethe magnitude of the bias depends on the protocol, onecannot reliably combine data from diﬀerent protocols .Moreover, the Jarzynski relation is sensitive to measure-ment noise and variations in the experimental setup (e.g.,heterogeneity in the attachments and variable length oftethers). Broadening of the work distribution leads toa bias in the estimated free energy, since smaller workvalues contribute more than larger work values in theexponential average of Eq. 4.Bennett laid the foundations for the solution to thisproblem in his development of the acceptance ratiomethod for free energy perturbation calculations (atechnique for computing free energy changes by simu-lating inﬁnitely fast transformations). He realized thatan optimal solution requires combination of work mea-surements from both forward and reverse switches. Theacceptance ratio method was later extended to ﬁnite-timeswitches , shown to a maximum-likelihood free energymethod , related to the problem of logistic regres-sion , and extended to a network of thermody-namic states connected with many protocols . In thispaper, we develop a Bayesian formalism that extendsthese results to provide not only a reliable estimate ofthe free energy, but also reliable estimates of the statis-tical uncertainty. In this formalism, it is straightforwardto incorporate additional prior information about the ex-periment into the analysis. In particular, we show howto allow for experimental measurement noise. The mag-nitude of the noise can be determined from the data andan error-corrected free energy estimate recovered. We usethis approach to reanalyze a recent experiment in whicha single RNA hairpin was unfolded and refolded at threediﬀerent rates using optical tweezers . II. POSTERIOR FREE ENERGY ESTIMATE

Formally, we require the probability that the free en-ergy change ∆ F has a particular value, given a collectionof work measurements W , the protocol used for eachmeasurement (either Λ or ˜Λ), and the (ﬁxed) temper-ature of the environment T . Initially, we consider thesimplest case, in which there are two protocols that areconjugate to each other, so that the work distributionsare related by the ﬂuctuation relation Eq. (3). We alsoassume, for now, that the measurements are error free.The essential element in solving this problem is to treatboth the work and the protocol as random variables thatare uncorrelated from one observation to the next . Werewrite the free energy probability density given a sin-gle measurement in terms of these variables using Bayes’rules, P ( A | B ) = P ( B | A ) P ( A ) /P ( B ): P (∆ F Λ | W, Λ) = P ( W, Λ | ∆ F Λ ) P (∆ F Λ ) P ( W, Λ) . (5) Since a priori the free energy could be positive or nega-tive and of any magnitude, the prior distribution of freeenergy P (∆ F Λ ) can be reasonably taken as uniform (seeKass and Wasserman for an in-depth discussion of pri-ors). The denominator, which does not depend on ∆ F Λ ,can be absorbed into a normalization constant.The distribution P ( W, Λ | ∆ F Λ ) is the ﬁnal undeter-mined factor on the right-hand side of Eq. (5). In the ab-sence of detailed knowledge about the work likelihood forthe system under investigation, we should choose a maxi-mally uninformative, system independent distribution. Ifthe work were not conditional on the free energy we couldagain assign a uniform distribution, since a single workmeasurement could be positive or negative and of anymagnitude. But, we expect that the work will probably(but not certainly) be larger than that value of the freeenergy. Concretely, any work probability distributionmust satisfy the work ﬂuctuation symmetry, Eq. (3). Wecan satisfy this constraint by ﬁrst considering the sym-metrized distribution P ( W, Λ | ∆ F Λ ) + P ( − W, ˜Λ | ∆ F ˜Λ ).This averaged distribution does not need to satisfy anysymmetry and therefore we can again assign a maximallyuninformative improper prior: P ( W, Λ | ∆ F Λ ) + P ( − W, ˜Λ | ∆ F ˜Λ ) = constant . (6)However, the work ﬂuctuation relation implies that P (+ W, Λ | ∆ F Λ ) P ( − W, ˜Λ | ∆ F ˜Λ ) = e βW − β ∆ F Λ + M Λ (7)where M Λ = ln P (Λ | ∆ F Λ ) /P (˜Λ | ∆ F ˜Λ ). It follows that P ( W, Λ | ∆ F Λ ) ∝

11 + e βW − β ∆ F Λ + M Λ (8)Together with an uninformative free energy prior, we ﬁ-nally obtain P (∆ F Λ | W, Λ) ∝ P ( W, Λ | ∆ F Λ ) ∝ f (cid:0) βW − β ∆ F Λ + M Λ (cid:1) , (9)where f ( x ) is the logistic function (Fig. 3), the cumula-tive distribution function of the standard logistic distri-bution (see appendix, Fig. 7): f ( x ) = 11 + e − x . (10)Essentially, each measurement of the work provides asoft upper bound to the free energy change. Measure-ments made on the conjugate protocol provide soft lowerbounds to the same free energy. Therefore, combiningmeasurements from conjugate protocol pairs provides re-liable, but fuzzy, free energy bounds. This is in contrastto the Clausius inequality [Eq. (2)] where the average work provides a hard bound to the free energy change.Figure 4 illustrates the posterior distribution result-ing from combining two work measurements, one fromeach of a conjugate protocol pair, where the measured FIG. 3: The standard logistic function, f ( x ) = 1 / (1 + e − x ).FIG. 4: Posterior free energy given two work measurements,one from each of two conjugate protocols with values βW = ± δ . The posterior variance, π / δ /

12, is minimized whenthe rectiﬁed work variables coincide, and increases quadrati-cally with separation. values are βW = ± δ . If the work values are widelyseparated, then the posterior free energy distribution isbroad and ﬂat. We only obtain a tight constraint on thefree energy if the separation is less than about 4 k B T . Theminimum uncertainty for a single pair of measurementsis σ ≈ . k B T , which occurs when δ = 0.Assuming that each measurement of the work is inde-pendent, we can combine measurements by multiplyingthe separate posterior distributions together. So far, wehave been considering a single pair of conjugate protocolsswitching between two thermodynamics states. How-ever, it was recently demonstrated that we can combinemeasurements from many diﬀerent protocols connectingmany diﬀerent thermodynamic states in a network oftransformations . Each measurement provides a singlesoft constraint [Eq. (9)], which we can combine by mul-tiplying the diﬀerent posterior distributions: P ( F | W , Λ ) = 1 C N Y k =1 f (cid:0) βW k − β ∆ F Λ k + M Λ k (cid:1) , (11)where W k is the work measured in the k th experiment,performed with protocol Λ k , ∆ F Λ k is the free energychange associated with that protocol, C is a normaliza-tion constant and N is the total number of measure-ments. In the simplest case we have only a single conju-gate protocol pair, forward and reverse. In general, wecan have many diﬀerent protocols (for example, pullinga molecule apart at diﬀerent loading rates.), and dif-ferent protocols could connect diﬀerent thermodynamicstates . In the equation above, F = { F , F , F , . . . } arethe free energies of the initial and ﬁnal states of thesetransformations. At least one free energy F i is ﬁxed atzero, or some other convenient reference point, since only diﬀerences in free energy are signiﬁcant.The M Λ k terms compensate for a diﬀerence in theprobability of observing a forward or reverse protocolfrom a conjugate protocol pair. In the absence of detailedprior information about the work distributions, it is bestto pick each member of a conjugate pair equally often .However, the diﬃculties of real world experiments mayresult in unequal numbers of forward and reverse mea-surements. In such cases, we can estimate a reasonablevalue for M Λ k from the number of observations, N Λ , ob-tained from each protocol: M Λ = ln P (Λ | ∆ F Λ ) P (˜Λ | ∆ F ˜Λ ) ≈ ln N Λ + 1 N ˜Λ + 1 . (12)The additional ‘+1’ is a pseudocount which regularizesthe frequency estimate. It can be justiﬁed as a Laplaceprior on the probabilities . Note that without thisregularization, Eq. (12), and thus also Eq. (11), wouldbecome invalid in the single sample limit. With the ad-dition of the pseudocount, the probability distribution inEq. (11) may still only produce one-sided bounds (for ex-ample, when there is no protocol that ends in a certainstate, one has at best an upper bound for the free energyof that state.) However, we could recover a ﬁnite freeenergy posterior distribution if we were to use a moreinformative free energy prior in Eq. (11).The experimental measurements of the work valuescan typically be considered to be uncorrelated. However,when the measurements, or simulation results, are cor-related, the maximum likelihood, or Bayesian estimates,may need to be modiﬁed to result in an optimal estimateof the free energy . In the absence of a general-purposeformulation for correlated work measurements, the esti-mators discussed in this paper are likely to underestimatethe errors.The Bayesian free energy posterior is an optimal esti-mate in the sense that it uses all of the available data andmakes the fewest possible assumptions. We can, in prin-ciple, improve the estimate by incorporating additionalinformation, either by using more informative priors, orby adding additional assumptions, for example, by as-suming that the work distribution is smoothly varying ,or that it can be parameterized in terms of a particularfunctional form .In many practical cases, the posterior distribution of∆ F quickly converges to a normal one as a consequenceof the central limit theorem. We can summarize this pos-terior distribution with a point estimate and reasonableerror bounds, for example the posterior mean free en-ergy and 95% conﬁdence intervals. The posterior meanwill coincide with the maximum likelihood, and the con-ﬁdence interval will be ± III. EXPERIMENTAL ERRORS

The preceding analysis does not include the possibilityof experimental errors, an omission that we now address, (a)(b)(c)FIG. 5: (a) Histograms of work measurements for folding andunfolding an RNA hairpin at three diﬀerent rates. Observa-tions are binned into integers centered at 1 k B T intervals.This data corresponds to Fig. 2 of Collin et al. . Note thatEq. (3) predicts that the folding and unfolding work distri-butions cross at the free energy change. (b) The posteriordistribution of the error correction factor γ [Eq. (16)]. (c)Posterior free energy derived from the data in (a), both with[Solid line, Eq. (16)] and without [Dashed line, Eq. (11)] cor-rection for measurement noise. Notice that the correctionis substantial for the slowest experiment (1.5 pN/s), minorfor the intermediate rate, and the corrected and uncorrectedposteriors are indistinguishable (at this scale) for the fastestrate. The most reliable free energy estimate is obtained bycombining the three separate noise corrected free energy pos-terior distributions. since real experiments are not ideal and real measure-ments can be inaccurate.We initially assume that the instrument error can beadequately described as additive white noise with zeromean and standard deviation σ . Since we do not knowthe magnitude of the noise, we estimate the joint distri-bution of the free energy and the noise, then integrate out the noise to obtain a ﬁnal free energy estimate: P (∆ F Λ | W, Λ) = Z P (∆ F Λ , σ | W, Λ) dσ. (13)Let us write W = w + ǫ where W is the observed workvalue, w is the true work and ǫ is the measurement error.Using Eq. (9) we get, P (∆ F Λ , σ | W, Λ) ∝ (14) + ∞ Z −∞ f (cid:0) βW − βǫ − β ∆ F Λ + M Λ ) N ( ǫ ; 0 , σ ) dǫ. Here, N ( x ; µ, σ ) is a Gaussian distribution with mean µ and standard deviation σ . [See Eq. (20)].This convolution of a logistic function and a Gaus-sian distribution generates a new sigmoidal function, il-lustrated in Fig. 6. This function does not have a simpleclosed form, but fortunately it can be closely approxi-mated by a reparametrized logistic distribution P (∆ F Λ , σ | W, Λ) ∝ f (cid:16) γ ( βW − β ∆ F Λ + M Λ ) (cid:17) , (15)where the parameter γ = p πβ σ / γ as the principle experimental error factordirectly, without reference to an explicit error model orto the standard deviation of the noise, σ . For example, asystematic miscalibration of the work measurement or anincorrect thermostat would also result in a non-unit γ . Insuch cases γ could be less than 1. Therefore, we allow γ to be any positive number. We introduce an uninforma-tive prior for γ , P ( γ ) = 1 /γ . This distribution is scaleinvariant and follows given only that γ is positive and a priori of unknown magnitude . We can now averageover the free energy to obtain the posterior distributionof the error correction factor γ , or average over the er-ror correction factor to obtain the posterior free energyestimate corrected for instrument error P (∆ F Λ | W, Λ) = (16)1 C ′ + ∞ Z γ Y k f (cid:16) γ ( βW k − β ∆ F Λ + M Λ ) (cid:17) dγ, where C ′ a normalization constant. Note that instrumenterror, and thus the distribution of γ , will vary with theprotocol. One could construct a complex hierarchicalprior for the experimental error factors, that would feedinformation about the typical scale of the errors from oneprotocol to the next. In this work, we ﬁnd it suﬃcient toestimate γ independently for each protocol, and obtain aﬁnal posterior: P ( F | W , Λ ) = Y Λ P (∆ F Λ | W, Λ) . (17) N U N R ∆ F ∆ F γ (Uncorrected) (Corrected) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . N U and N R : Number of unfolding and refolding work measurementsat each pulling rate, respectively. ∆ F : Posterior mean freeenergy estimate with 95% conﬁdence intervals, both correctedand uncorrected for measurement error. γ : Posterior meanestimate of the noise correction factor, with 95% conﬁdenceintervals Here, as in Eq. (11), F = { F , F , F , . . . } are the freeenergies of the initial and ﬁnal thermodynamic states.Another potential source of errors arises from unin-tended variations of the experimental procedure fromone measurement to the next. For example, we may in-tend to forcibly unfold an RNA hairpin in a particulartime, but each experimental run may be slightly fasteror slower than another. Instead of an experiment be-ing described by a single protocol, each measurement ismade with a similar, but slightly diﬀerent procedure (e.g.due to hysteresis eﬀects in the mechanical response ofthe actuators). However, if a protocol variation has thesame probability both forward and reverse, then the fac-tor M Λ [Eq. (12)] does not change. Consequently, if thevariations in protocol are statistically the same for theconjugate forward and reverse protocol pairs then thatvariation has no eﬀect on the free energy estimate. IV. APPLICATION AND DISCUSSION

Figure 5 shows the result of applying the Bayesian freeenergy estimate to data from the single-molecule RNApulling experiments reported in , both with and withoutnoise correction. This data set is particularly useful toillustrate the previous analysis, since it represents threedistinct protocols; i.e. the same RNA hairpin is unfoldedat three diﬀerent rates: slow, medium, and fast. Thefree energy change is the same in each case; we can seethat this is qualitatively true by noting that the forward-reverse work histograms all cross at roughly the samevalue of the work. The experimental noise is expected toaccumulate during a single experiment, and so we expectthe data from the fastest pulling rate to be contaminatedwith the least measurement error. This is indeed whatthe Bayesian error analysis ﬁnds: γ approaches 1 as thepulling rate increases.Qualitatively, the eﬀect of instrument noise is tobroaden both the forward and reverse work distributions.This broadening tends not to signiﬁcantly change thecrossing point, but it does increase the overlap betweenthe conjugate distributions. Therefore, ironically, the in-strument error does not greatly change the free energy estimate, but it does signiﬁcantly (and erroneously) re-duce the calculated error bars. Fortunately, the noiseinvalidates the ﬂuctuation theorem, and the magnitudeof that violation allows us to estimate the magnitude ofthe instrument errors and to extract noise-corrected freeenergy estimates with meaningful error bounds.A useful feature of this error analysis is that we canuse the parameter γ as a measure of how well the ex-periments have conﬁrmed the work ﬂuctuation relation[Eq. (3)]. For the fastest pulling, highest quality data,we ﬁnd that γ = 1 ± .

14; in other words, the ﬂuctua-tion relation is conﬁrmed to within 14% at the 95% con-ﬁdence limit. Although more accurate constraints canbe obtained by performing experiments on systems withsimple potentials , this is the best availableexperimental data for irreversibly switching a complexsystem . We can also use the interrelation betweenthe noise and the correction factor ( γ = p πβ σ / ≈ k B T accuracy, which is wellwithin the limits of modern optical tweezer instruments.The quantitative eﬀect of the noise corrections to ∆ F can be seen in Fig. 5c and Table I. The noise correc-tion makes a substantial diﬀerence to the free energyconﬁdence interval for the slowest data, but very littlediﬀerence to the posterior mean free energy or the er-ror bounds for the faster data. Note that the free en-ergy considered in this analysis includes unfolding theRNA hairpin and stretching the DNA/RNA handles; de-convoluting the contributions of the handles introducesadditional uncertainty not considered here . Havingapplied the instrument noise correction, we can safelycombine the posterior free energy estimates from thethree diﬀerent protocols to obtain a combined estimateof ∆ F = 110 . ± . k B T . This result is a substantial im-provement over the best, single protocol, maximum like-lihood estimate, ∆ F = 110 . ± . k B T , extracted fromthe same data .In summary, we have presented a Bayesian formalismfor estimating free-energy changes from non-equilibriumwork measurements. The formalism compensates for in-strument noise and combines results from multiple ex-perimental protocols. The method is widely applicableand could be used in the analysis of single-molecule ex-perimental data from optical tweezers, AFM, or mag-netic tweezers. Together with advances in single-moleculetraps and use of multiple experimental setups (e.g.,changing bead sizes, trap power, or the length of the han-dles), it will aid in extending the scope of single-moleculeexperiments. Acknowledgments

This research was supported by the U.S. Dept of En-ergy, under contracts DE-AC02-05CH11231. The re-

FIG. 6: The approximation of the sigmoidal function g ( x ; α, σ ) [Eq. (18)] by the logistic function f ( x ; γ ) = 1 / (1 +exp( − x/γ )), where γ = p πσ / p /π . search of F.R. was supported by the Spanish and Catalanresearch councils FIS2004-3454, NAN2004-09348, andSGR05-00688. The research of C.B. was supported byNIH Grant GM 32543 and U.S. Dept. of Energy grantAC0376Sf00098. The research of M.K. at Harvard wassupported in part by a grant from the NIH. Appendix : Approximate convolution of a logisticfunction with a Gaussian distribution

We are interested in the function g ( x ; α, σ ) = Z + ∞−∞ f ( x + ǫ ; α ) N ( ǫ ; 0 , σ ) dǫ , (18)the convolution of a logistic (or Fermi) function f ( x ; α ) = 11 + e − x/α = 12 + 12 tanh x/α , (19)with a Gaussian (or normal) distribution with zero meanand standard deviation σ : N ( x ; µ, σ ) = 1 √ πσ exp (cid:18) − ( x − µ ) σ (cid:19) . (20) The function g ( x ; α, σ ) does not have a simple, closedform. However, as is illustrated in the ﬁgure, it can bereasonably approximated by a reparameterized logisticfunction: g ( x ; α, σ ) ≈ f ( x ; γ ) , (21)where γ is a function of α and σ . We ﬁx γ by requiringequality of the derivative at the origin, since, for ourpurposes, it is more important to minimize the errorsaround the origin than elsewhere. The value of g ( x ; α, σ )at the origin is 1 /

2, the same as f (0; γ ). Note that ddx f ( x ; γ ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 = 12 γ + 2 γ cosh ( x/γ ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 = 14 γ , (22)and therefore γ − = 4 ddx g ( x ; α, σ ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 = 4 Z + ∞−∞ (cid:18) ddx f ( x + ǫ ; α ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 (cid:19) N ( ǫ ; σ ) dǫ = 4 Z + ∞−∞ (cid:18) α + 2 α cosh ǫ/α (cid:19) N ( ǫ ; σ ) dǫ. (23)The expression inside the bracket is a logistic distribu-tion, which is closely approximated by the Gaussian dis-tribution N ( ǫ ; 0 , α p /π ) (See Fig. 7). These parametersensure that the two distributions agree exactly at the ori-gin. Therefore, our problem reduces to a straightforwardGaussian integral: γ − ≈ Z + ∞−∞ N ( ǫ ; 0 , α r π ) N ( ǫ ; 0 , σ ) dǫγ = r π α σ . (24)For α = − /β we recover the case of white noise dis-cussed in the main text. † Electronic address: [email protected] R. Clausius, Annalen der Physik und Chemie , 352(1865). D. Collin, F. Ritort, C. Jarzynski, S. B. Smith,I. Tinoco Jr., and C. Bustamante, Nature , 231 (2005). G. Hummer and A. Szabo, Proc. Natl. Acad. Sci. USA ,3658 (2001). C. Bustamante, J. Liphardt, and F. Ritort, Phys. Today , 43 (2005). G. Hummer and A. Szabo, Acc. Chem. Res. , 504 (2005). A. Dhar, Phys. Rev. E , 036126 (2005). A. Imparato and L. Peliti, Europhys. Lett. , 643 (2005). F. Ritort, Pramana-J. Phys. , 1135 (2005). F. Ritort, J. Phys.: Condens. Matter , R531 (2006). O. Braun, A. Hanke, and U. Seifert, Phys. Rev. Lett. ,158105 (2004). O. Braun and U. Seifert, Europhys. Lett. , 746 (2004). C. Jarzynski, Phys. Rev. E , 5018 (1997). C. Jarzynski, Phys. Rev. Lett. , 2690 (1997). C. Jarzynski, Acta. Phys. Pol. B , 1609 (1998). G. E. Crooks, J. Stat. Phys. , 1481 (1998). G. E. Crooks, Phys. Rev. E , 2721 (1999). G. E. Crooks, Phys. Rev. E , 2361 (2000). J. Liphardt, S. Dumont, S. B. Smith, I. Tinoco Jr., andC. Bustamante, Science , 1832 (2002). S. K. Blau, Phys. Today , 19 (2002). D. J. Evans and D. J. Searles, Adv. Phys. , 1529 (2002). D. J. Evans, Mol. Phys. , 1551 (2003). J. C. Reid, E. M. Sevick, and D. J. Evans, Europhys. Lett. , 726 (2005). C. Jarzynski, J. Stat. Mech.: Theor. Exp. p. P09005(2004). D. A. Hendrix and C. Jarzynski, J. Chem. Phys. , 5974(2001). G. Hummer, J. Chem. Phys. , 7330 (2001). G. Hummer, Mol. Simul. , 81 (2002). D. M. Zuckerman and T. B. Woolf, Chem. Phys. Lett. ,445 (2002). M. R. Shirts, E. Bair, G. Hooker, and V. S. Pande, Phys.Rev. Lett. , 140601 (2003). S. Park, F. Khalili-Araghi, E. Tajkhorshid, and K. Schul-ten, J. Chem. Phys. , 3559 (2003). J. Gore, F. Ritort, and C. Bustamante, Proc. Natl. Acad.Sci. USA , 12564 (2003). S. X. Sun, J. Chem. Phys. , 5769 (2003). D. Wu and D. A. Kofke, J. Chem. Phys. , 8742 (2004). F. M. Ytreberg and D. M. Zuckerman, J. Comput. Chem. , 1749 (2004). D. Wu and D. A. Kofke, J. Chem. Phys. , 204104(2005). M. de Koning, J. Chem. Phys. , 104106 (2005). W. Lechner, H. Oberhofer, C. Dellago, and P. L. Geissler,J. Chem. Phys. , 044113 (2006). C. Jarzynski, Phys. Rev. E , 046105 (2006). P. Maragakis, M. Spichty, and M. Karplus, Phys. Rev.Lett. , 100602 (2006). F. Douarche, S. Ciliberto, A. Petrosyan, and I. Rabbiosi,Europhys. Lett. , 593 (2005). A. Imparato and L. Peliti, Phys. Rev. E , 046114 (2005). D. A. Kofke, Mol. Phys. , 3701 (2006). R. C. Lua and A. Y. Grosberg, J. Phys. Chem. B ,6805 (2005). G. E. Crooks and C. Jarzynski, Phys. Rev. E , 021116(2007). D. D. L. Minh, Phys. Rev. E , 061120 (2006). C. H. Bennett, J. Comput. Phys. , 245 (1976). M. R. Shirts and V. S. Pande, J. Chem. Phys. , 144107(2005). J. A. Anderson, Biometrika , 19 (1972). A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Ru-bin,

Bayesian Data Analysis (Chapman & Hall/CRC, NewYork, 2004), 2nd ed. R. E. Kass and L. Wasserman, J. Amer. Statist. Assoc. ,1343 (1996). E. T. Jaynes,

Probability Theory: The Logic of Science (Cambridge University Press, Cambridge, 2003). R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison,

Bi-ological Sequence Analysis (Cambridge University Press,Cambridge, 1998). H. Nanda, N. Lu, and T. B. Woolf, J. Chem. Phys. ,134110 (2005). D. M. Carberry, J. C. Reid, G. M. Wang, E. M. Sevick,D. J. Searles, and D. J. Evans, Phys. Rev. Lett. , 140601(2004). S. Schuler, T. Speck, C. Tietz, J. Wrachtrup, andU. Seifert, Phys. Rev. Lett. , 180602 (2005). G. M. Wang, E. M. Sevick, E. Mittag, D. J. Searles, andD. J. Evans, Phys. Rev. Lett. , 050601 (2002). G. M. Wang, J. C. Reid, D. M. Carberry, D. R. M.Williams, E. M. Sevick, and D. J. Evans, Phys. Rev. E , 046142 (2005). G. M. Wang, D. M. Carberry, J. C. Reid, E. M. Sevick, andD. J. Evans, J. Phys.: Condens. Matter , S3239 (2005). E. H. Trepagnier, C. Jarzynski, F. Ritort, G. E. Crooks,C. J. Bustamante, and J. Liphardt, Proc. Natl. Acad. Sci.USA101