Featured Researches

Data Analysis Statistics And Probability

Functional Decomposition: A new method for search and limit setting

In the analysis of High-Energy Physics data, it is frequently desired to separate resonant signals from a smooth, non-resonant background. This paper introduces a new technique - functional decomposition (FD) - to accomplish this task. It is universal and readily able to describe often-problematic effects such as sculpting and trigger turn-ons. Functional decomposition models a dataset as a truncated series expansion in a complete set of orthonormal basis functions, using a process analogous to Fourier analysis. A new family of orthonormal functions is presented, which has been expressly designed to accomplish this in a succinct way. A consistent signal extraction methodology based on linear signal estimators is also detailed, as is an automated method for selecting the method's (few) hyperparameters and preventing over-fitting. The full collection of algorithms described in this paper have been implemented in an easy-to-use software package, which will also be briefly described.

Read more
Data Analysis Statistics And Probability

Fusion of laser diffraction and chord length distribution data for estimation of particle size distribution using multi-objective optimisation

The in situ measurement of the particle size distribution (PSD) of a suspension of particles presents huge challenges. Various effects from the process could introduce noise to the data from which the PSD is estimated. This in turn could lead to the occurrence of artificial peaks in the estimated PSD. Limitations in the models used in the PSD estimation could also lead to the occurrence of these artificial peaks. This could pose a significant challenge to in situ monitoring of particulate processes, as there will be no independent estimate of the PSD to allow a discrimination of the artificial peaks to be carried out. Here, we present an algorithm which is capable of discriminating between artificial and true peaks in PSD estimates based on fusion of multiple data streams. In this case, chord length distribution and laser diffraction data have been used. The data fusion is done by means of multi-objective optimisation using the weighted sum approach. The algorithm is applied to two different particle suspensions. The estimated PSDs from the algorithm are compared with offline estimates of PSD from the Malvern Mastersizer and Morphologi G3. The results show that the algorithm is capable of eliminating an artificial peak in a PSD estimate when this artificial peak is sufficiently displaced from the true peak. However, when the artificial peak is too close to the true peak, it is only suppressed but not completely eliminated.

Read more
Data Analysis Statistics And Probability

GPS Fit Method for Paths of Non Drunken Sailors and its Connection to Entropy

Estimating the altimeters a cyclist has climbed from noisy GPS data is a challenging problem. In this article a method is proposed that assumes that a person locally takes the shortest path. This results in an algorithm that does not need smoothing parameters. Moreover, it turns out that this assumption allows one to find a similarity between entropy and likelihood which results to the introduction of an entropic force.

Read more
Data Analysis Statistics And Probability

Gaining insight from large data volumes with ease

Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by the desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction, various optimization of data workflows, and pipelines). In this paper, we discuss how modern technologies are transforming well established patterns in HEP communities. The new data insight can be achieved by embracing Big Data tools for a variety of use-cases, from analytics and monitoring to training Machine Learning models on a terabyte scale. We provide concrete examples within context of the CMS experiment where Big Data tools are already playing or would play a significant role in daily operations.

Read more
Data Analysis Statistics And Probability

Galerkin Approximation of Dynamical Quantities using Trajectory Data

Understanding chemical mechanisms requires estimating dynamical statistics such as expected hitting times, reaction rates, and committors. Here, we present a general framework for calculating these dynamical quantities by approximating boundary value problems using dynamical operators with a Galerkin expansion. A specific choice of basis set in the expansion corresponds to estimation of dynamical quantities using a Markov state model. More generally, the boundary conditions impose restrictions on the choice of basis sets. We demonstrate how an alternative basis can be constructed using ideas from diffusion maps. In our numerical experiments, this basis gives results of comparable or better accuracy to Markov state models. Additionally, we show that delay embedding can reduce the information lost when projecting the system's dynamics for model construction; this improves estimates of dynamical statistics considerably over the standard practice of increasing the lag time.

Read more
Data Analysis Statistics And Probability

Gamma-ray intensities in multi-gated spectra

The level structure of nuclei offers a large amount and variety of information to improve our knowledge of the strong interaction and of mesoscopic quantum systems. Gamma spectroscopy is a powerful tool to perform such studies: modern gamma multi-detectors present increasing performances in terms of sensitivity and efficiency, allowing to extend ever more our ability to observe and characterize abundant nuclear states. For instance, the high-spin part of level schemes often reflects intriguing nuclear shape phenomena: this behaviour is unveiled by high-fold experimental data analysed through multi-coincidence spectra, in which long deexcitation cascades become observable. Determining the intensity of newly discovered transitions is important to characterize the nuclear structure and formation mechanism of the emitting levels. However, it is not trivial to relate the apparent intensity observed in multi-gated spectra to the actual transition intensity. In this work, we introduce the basis of a formalism affiliated with graph theory: we have obtained analytic expressions from which data-analysis methods can eventually be derived to recover this link in a rigorous way.

Read more
Data Analysis Statistics And Probability

Gaussian Process Accelerated Feldman-Cousins Approach for Physical Parameter Inference

The unified approach of Feldman and Cousins allows for exact statistical inference of small signals that commonly arise in high energy physics. It has gained widespread use, for instance, in measurements of neutrino oscillation parameters in long-baseline experiments. However, the approach relies on the Neyman construction of the classical confidence interval and is computationally intensive as it is typically done in a grid-based fashion over the entire parameter space. In this letter, we propose an efficient algorithm for the Feldman-Cousins approach using Gaussian processes to construct confidence intervals iteratively. We show that in the neutrino oscillation context, one can obtain confidence intervals 5 times faster in one dimension and 10 times faster in two dimensions, while maintaining an accuracy above 99.5%.

Read more
Data Analysis Statistics And Probability

Gaussian processes for data fulfilling linear differential equations

A method to reconstruct fields, source strengths and physical parameters based on Gaussian process regression is presented for the case where data are known to fulfill a given linear differential equation with localized sources. The approach is applicable to a wide range of data from physical measurements and numerical simulations. It is based on the well-known invariance of the Gaussian under linear operators, in particular differentiation. Instead of using a generic covariance function to represent data from an unknown field, the space of possible covariance functions is restricted to allow only Gaussian random fields that fulfil the homogeneous differential equation. The resulting tailored kernel functions lead to more reliable regression compared to using a generic kernel and makes some hyperparameters directly interpretable. For differential equations representing laws of physics such a choice limits realizations of random fields to physically possible solutions. Source terms are added by superposition and their strength estimated in a probabilistic fashion, together with possibly unknown hyperparameters with physical meaning in the differential operator.

Read more
Data Analysis Statistics And Probability

General Resolution Enhancement Method in Atomic Force Microscopy (AFM) Using Deep Learning

This paper develops a resolution enhancement method for post-processing the images from Atomic Force Microscopy (AFM). This method is based on deep learning neural networks in the AFM topography measurements. In this study, a very deep convolution neural network is developed to derive the high-resolution topography image from the low-resolution topography image. The AFM measured images from various materials are tested in this study. The derived high-resolution AFM images are comparable with the experimental measured high-resolution images measured at the same locations. The results suggest that this method can be developed as a general post-processing method for AFM image analysis.

Read more
Data Analysis Statistics And Probability

Generalized asymptotic formulae for estimating statistical significance in high energy physics analyses

Within the framework of likelihood-based statistical tests for high energy physics measurements, we derive generalized expressions for estimating the statistical significance of discovery using the asymptotic approximations of Wilks and Wald for a variety of measurement models. These models include arbitrary numbers of signal regions, control regions, and Gaussian constraints. We extend our expressions to use the representative or "Asimov" dataset proposed by Cowan et al. such that they are made Monte Carlo-free. While many of the generalized expressions are complicated and often involve solving systems of coupled, multivariate equations, we show these expressions reduce to closed-form results under simplifying assumptions. We also validate the predicted significance using Monte Carlo toys in select cases.

Read more

Ready to get started?

Join us today