Featured Researches

Data Analysis Statistics And Probability

Easy computation of the Bayes Factor to fully quantify Occam's razor

The Bayes factor is the gold-standard figure of merit for comparing fits of models to data, for hypothesis selection and parameter estimation. However it is little used because it is computationally very intensive. Here it is shown how Bayes factors can be calculated accurately and easily, so that any least-squares or maximum-likelihood fits may be routinely followed by the calculation of Bayes factors to guide the best choice of model and hence the best estimations of parameters. Approximations to the Bayes factor, such as the Bayesian Information Criterion (BIC), are increasingly used. Occam's razor expresses a primary intuition, that parameters should not be multiplied unnecessarily, and that is quantified by the BIC. The Bayes factor quantifies two further intuitions. Models with physically-meaningful parameters are preferable to models with physically-meaningless parameters. Models that could fail to fit the data, yet which do fit, are preferable to models which span the data space and are therefore guaranteed to fit the data. The outcomes of using Bayes factors are often very different from traditional statistics tests and from the BIC. Three examples are given. In two of these examples, the easy calculation of the Bayes factor is exact. The third example illustrates the rare conditions under which it has some error and shows how to diagnose and correct the error.

Read more
Data Analysis Statistics And Probability

Effect of centrality bin width corrections on two-particle number and transverse momentum differential correlation functions

Two-particle number and transverse momentum differential correlation functions are powerful tools for unveiling the detailed dynamics and particle production mechanisms involved in relativistic heavy-ion collisions. Measurements of transverse momentum correlators P 2 and G 2 , in particular, provide added information not readily accessible with better known number correlation functions R 2 . However, it is found that the R 2 and G 2 correlators are somewhat sensitive to the details of the experimental procedure used to measure them. They exhibit, in particular, a dependence on the collision centrality bin width, which may have a rather detrimental impact on their physical interpretation. A technique to correct these correlators for collision centrality bin-width averaging is presented. The technique is based on the hypothesis that the shape of single- and pair- probability densities vary slower with collision centrality than the corresponding integrated yields. The technique is tested with Pb-Pb simulations based on the HIJING and ultrarelativistic quantum molecular dynamics models and shown to enable a precision better than 1% for particles in the kinematic range 0.2≤ p T ≤2.0 GeV/ c .

Read more
Data Analysis Statistics And Probability

Effects of hidden nodes on the reconstruction of bidirectional networks

Much research effort has been devoted to developing methods for reconstructing the links of a network from dynamics of its nodes. Many current methods require the measurements of the dynamics of all the nodes be known. In real-world problems, it is common that either some nodes of a network of interest are unknown or the measurements of some nodes are unavailable. These nodes, either unknown or whose measurements are unavailable, are called hidden nodes. In this paper, we derive analytical results that explain the effects of hidden nodes on the reconstruction of bidirectional networks. These theoretical results and their implications are verified by numerical studies.

Read more
Data Analysis Statistics And Probability

Efficiency correction for cumulants of multiplicity distributions based on track-by-track efficiency

We propose a simplified procedure for the experimental application of the efficiency correction on higher order cumulants in heavy-ion collisions. By using the track-by-track efficiency, we can eliminate possible bias arising from the average efficiencies calculated within the arbitrary binning of the phase space. Furthermore, the corrected particle spectra is no longer necessary for the average efficiency estimation and the time cost for the calculation of bootstrap statistical error can be significantly reduced.

Read more
Data Analysis Statistics And Probability

Efficient Bayesian inversion for shape reconstruction of lithography masks

Background: Scatterometry is a fast, indirect and non-destructive optical method for quality control in the production of lithography masks. To solve the inverse problem in compliance with the upcoming need for improved accuracy, a computationally expensive forward model has to be defined which maps geometry parameters to diffracted light intensities. Aim: To quantify the uncertainties in the reconstruction of the geometry parameters, a fast to evaluate surrogate for the forward model has to be introduced. Approach: We use a non-intrusive polynomial chaos based approximation of the forward model which increases speed and thus enables the exploration of the posterior through direct Bayesian inference. Additionally, this surrogate allows for a global sensitivity analysis at no additional computational overhead. Results: This approach yields information about the complete distribution of the geometry parameters of a silicon line grating, which in return allows to quantify the reconstruction uncertainties in the form of means, variances and higher order moments of the parameters. Conclusion: The use of a polynomial chaos surrogate allows to quantify both parameter influences and reconstruction uncertainties. This approach is easy to use since no adaptation of the expensive forward model is required.

Read more
Data Analysis Statistics And Probability

Efficient description of experimental effects in amplitude analyses

Amplitude analysis is a powerful technique to study hadron decays. A significant complication in these analyses is the treatment of instrumental effects, such as background and selection efficiency variations, in the multidimensional kinematic phase space. This paper reviews conventional methods to estimate efficiency and background distributions and outlines the methods of density estimation using Gaussian processes and artificial neural networks. Such techniques see widespread use elsewhere, but have not gained popularity in use for amplitude analyses. Finally, novel applications of these models are proposed, to estimate background density in the signal region from the sidebands in multiple dimensions, and a more general method for model-assisted density estimation using artificial neural networks.

Read more
Data Analysis Statistics And Probability

Emergent limits of an indirect measurement from phase transitions of inference

Measurements are inseparable from inference, where the estimation of signals of interest from other observations is called an indirect measurement. While a variety of measurement limits have been defined by the physical constraint on each setup, the fundamental limit of an indirect measurement is essentially the limit of inference. Here, we propose the concept of statistical limits on indirect measurement: the bounds of distinction between signals and noise and between a signal and another signal. By developing the asymptotic theory of Bayesian regression, we investigate the phenomenology of a typical indirect measurement and demonstrate the existence of these limits. Based on the connection between inference and statistical physics, we also provide a unified interpretation in which these limits emerge from phase transitions of inference. Our results could pave the way for novel experimental design, enabling assess to the required quality of observations according to the assumed ground truth before the concerned indirect measurement is actually performed.

Read more
Data Analysis Statistics And Probability

End-to-End Physics Event Classification with CMS Open Data: Applying Image-Based Deep Learning to Detector Data for the Direct Classification of Collision Events at the LHC

This paper describes the construction of novel end-to-end image-based classifiers that directly leverage low-level simulated detector data to discriminate signal and background processes in pp collision events at the Large Hadron Collider at CERN. To better understand what end-to-end classifiers are capable of learning from the data and to address a number of associated challenges, we distinguish the decay of the standard model Higgs boson into two photons from its leading background sources using high-fidelity simulated CMS Open Data. We demonstrate the ability of end-to-end classifiers to learn from the angular distribution of the photons recorded as electromagnetic showers, their intrinsic shapes, and the energy of their constituent hits, even when the underlying particles are not fully resolved, delivering a clear advantage in such cases over purely kinematics-based classifiers.

Read more
Data Analysis Statistics And Probability

Enhancing noise-induced switching times in systems with distributed delays

The paper addresses the problem of calculating the noise-induced switching rates in systems with delay-distributed kernels and Gaussian noise. A general variational formulation for the switching rate is derived for any distribution kernel, and the obtained equations of motion and boundary conditions represent the most probable, or optimal, path, which maximizes the probability of escape. Explicit analytical results for the switching rates for small mean time delays are obtained for the uniform and bi-modal (or two-peak) distributions. They suggest that increasing the width of the distribution leads to an increase in the switching times even for longer values of mean time delays for both examples of the distribution kernel, and the increase is higher in the case of the two-peak distribution. Analytical predictions are compared to the direct numerical simulations, and show excellent agreement between theory and numerical experiment.

Read more
Data Analysis Statistics And Probability

Enhancing the accuracy of a data-driven reconstruction of bivariate jump-diffusion models with corrections for higher orders of the sampling interval

We evaluate the significance of a recently proposed bivariate jump-diffusion model for a data-driven characterization of interactions between complex dynamical systems. For various coupled and non-coupled jump-diffusion processes, we find that the inevitably finite sampling interval of time-series data negatively affects the reconstruction accuracy of higher-order conditional moments that are required to reconstruct the underlying jump-diffusion equations. We derive correction terms for conditional moments in higher orders of the sampling interval and demonstrate their suitability to strongly enhance the data-driven reconstruction accuracy.

Read more

Ready to get started?

Join us today