Physics Data Analysis Statistics And Probability - Researchain

Featured Researches

Data Analysis Statistics And Probability

Learning representations of irregular particle-detector geometry with distance-weighted graph networks

We explore the use of graph networks to deal with irregular-geometry detectors in the context of particle reconstruction. Thanks to their representation-learning capabilities, graph networks can exploit the full detector granularity, while natively managing the event sparsity and arbitrarily complex detector geometries. We introduce two distance-weighted graph network architectures, dubbed GarNet and GravNet layers, and apply them to a typical particle reconstruction task. The performance of the new architectures is evaluated on a data set of simulated particle interactions on a toy model of a highly granular calorimeter, loosely inspired by the endcap calorimeter to be installed in the CMS detector for the High-Luminosity LHC phase. We study the clustering of energy depositions, which is the basis for calorimetric particle reconstruction, and provide a quantitative comparison to alternative approaches. The proposed algorithms provide an interesting alternative to existing methods, offering equally performing or less resource-demanding solutions with less underlying assumptions on the detector geometry and, consequently, the possibility to generalize to other detectors.

Data Analysis Statistics And Probability

Learning to Identify Electrons

We investigate whether state-of-the-art classification features commonly used to distinguish electrons from jet backgrounds in collider experiments are overlooking valuable information. A deep convolutional neural network analysis of electromagnetic and hadronic calorimeter deposits is compared to the performance of typical features, revealing a ≈5% gap which indicates that these lower-level data do contain untapped classification power. To reveal the nature of this unused information, we use a recently developed technique to map the deep network into a space of physically interpretable observables. We identify two simple calorimeter observables which are not typically used for electron identification, but which mimic the decisions of the convolutional network and nearly close the performance gap.

Data Analysis Statistics And Probability

Learning to Isolate Muons

Distinguishing between prompt muons produced in heavy boson decay and muons produced in association with heavy-flavor jet production is an important task in analysis of collider physics data. We explore whether there is information available in calorimeter deposits that is not captured by the standard approach of isolation cones. We find that convolutional networks and particle-flow networks accessing the calorimeter cells surpass the performance of isolation cones, suggesting that the radial energy distribution and the angular structure of the calorimeter deposits surrounding the muon contain unused discrimination power. We assemble a small set of high-level observables which summarize the calorimeter information and partially close the performance gap with networks which analyze the calorimeter cells directly. These observables are theoretically well-defined and can be applied to studies of collider data. The remaining performance gap suggests the need for a new class of calorimeter-based observables.

Data Analysis Statistics And Probability

Lectures on Statistics in Theory: Prelude to Statistics in Practice

This is a writeup of lectures on "statistics" that have evolved from the 2009 Hadron Collider Physics Summer School at CERN to the forthcoming 2018 school at Fermilab. The emphasis is on foundations, using simple examples to illustrate the points that are still debated in the professional statistics literature. The three main approaches to interval estimation (Neyman confidence, Bayesian, likelihood ratio) are discussed and compared in detail, with and without nuisance parameters. Hypothesis testing is discussed mainly from the frequentist point of view, with pointers to the Bayesian literature. Various foundational issues are emphasized, including the conditionality principle and the likelihood principle.

Data Analysis Statistics And Probability

Local and Global Perspectives on Diffusion Maps in the Analysis of Molecular Systems

Diffusion maps approximate the generator of Langevin dynamics from simulation data. They afford a means of identifying the slowly-evolving principal modes of high-dimensional molecular systems. When combined with a biasing mechanism, diffusion maps can accelerate the sampling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local and global perspectives on diffusion maps, based on whether or not the data distribution has been fully explored. In the global setting, we use diffusion maps to identify metastable sets and to approximate the corresponding committor functions of transitions between them. We also discuss the use of diffusion maps within the metastable sets, formalising the locality via the concept of the quasi-stationary distribution and justifying the convergence of diffusion maps within a local equilibrium. This perspective allows us to propose an enhanced sampling algorithm. We demonstrate the practical relevance of these approaches both for simple models and for molecular dynamics problems (alanine dipeptide and deca-alanine).

Data Analysis Statistics And Probability

Long-timescale predictions from short-trajectory data: A benchmark analysis of the trp-cage miniprotein

Elucidating physical mechanisms with statistical confidence from molecular dynamics simulations can be challenging owing to the many degrees of freedom that contribute to collective motions. To address this issue, we recently introduced a dynamical Galerkin approximation (DGA) [Thiede et al. J. Phys. Chem. 150, 244111 (2019)], in which chemical kinetic statistics that satisfy equations of dynamical operators are represented by a basis expansion. Here, we reformulate this approach, clarifying (and reducing) the dependence on the choice of lag time. We present a new projection of the reactive current onto collective variables and provide improved estimators for rates and committors. We also present simple procedures for constructing suitable smoothly varying basis functions from arbitrary molecular features. To evaluate estimators and basis sets numerically, we generate and carefully validate a dataset of short trajectories for the unfolding and folding of the trp-cage miniprotein, a well-studied system. Our analysis demonstrates a comprehensive strategy for characterizing reaction pathways quantitatively.

Data Analysis Statistics And Probability

Low-dimensional offshore wave input for extreme event quantification

In offshore engineering design, nonlinear wave models are often used to propagate stochastic waves from an input boundary to the location of an offshore structure. Each wave realization is typically characterized by a high-dimensional input time series, and a reliable determination of the extreme events is associated with substantial computational effort. As the sea depth decreases, extreme events become more difficult to evaluate. We here construct a low-dimensional characterization of the candidate input time series to circumvent the search for extreme wave events in a high-dimensional input probability space. Each wave input is represented by a unique low-dimensional set of parameters for which standard surrogate approximations, such as Gaussian processes, can estimate the short-term exceedance probability efficiently and accurately. We demonstrate the advantages of the new approach with a simple shallow-water wave model based on the Korteweg-de Vries equation for which we can provide an accurate reference solution based on the simple Monte Carlo method. We furthermore apply the method to a fully nonlinear wave model for wave propagation over a sloping seabed. The results demonstrate that the Gaussian process can learn accurately the tail of the heavy-tailed distribution of the maximum wave crest elevation based on only 1.7% of the required Monte Carlo evaluations.

Data Analysis Statistics And Probability

MJOLNIR: A Software Package for Multiplexing Neutron Spectrometers

Novel multiplexing triple-axis neutron scattering spectrometers yield significant improvements of the common triple-axis instruments. While the planar scattering geometry keeps ensuring compatibility with complex sample environments, a simultaneous detection of scattered neutrons at various angles and energies leads to tremendous improvements in the data acquisition rate. Here we report on the software package MJOLNIR that we have developed to handle the resulting enhancement in data complexity. Using data from the new CAMEA spectrometer of the Swiss Spallation Neutron Source at the Paul Scherrer Institut, we show how the software reduces, visualises and treats observables measured on multiplexing spectrometers. The software package has been generalised to a uniformed framework, allowing for collaborations across multiplexing instruments at different facilities, further facilitating new developments in data treatment, such as fitting routines and modelling of multi-dimensional data.

Data Analysis Statistics And Probability

MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks

In general-purpose particle detectors, the particle-flow algorithm may be used to reconstruct a comprehensive particle-level view of the event by combining information from the calorimeters and the trackers, significantly improving the detector resolution for jets and the missing transverse momentum. In view of the planned high-luminosity upgrade of the CERN Large Hadron Collider (LHC), it is necessary to revisit existing reconstruction algorithms and ensure that both the physics and computational performance are sufficient in an environment with many simultaneous proton-proton interactions (pileup). Machine learning may offer a prospect for computationally efficient event reconstruction that is well-suited to heterogeneous computing platforms, while significantly improving the reconstruction quality over rule-based algorithms for granular detectors. We introduce MLPF, a novel, end-to-end trainable, machine-learned particle-flow algorithm based on parallelizable, computationally efficient, and scalable graph neural networks optimized using a multi-task objective on simulated events. We report the physics and computational performance of the MLPF algorithm on a Monte Carlo dataset of top quark-antiquark pairs produced in proton-proton collisions in conditions similar to those expected for the high-luminosity LHC. The MLPF algorithm improves the physics response with respect to a rule-based benchmark algorithm and demonstrates computationally scalable particle-flow reconstruction in a high-pileup environment.

Data Analysis Statistics And Probability

MODULO: A software for Multiscale Proper Orthogonal Decomposition of data

In the era of the Big Data revolution, methods for the automatic discovery of regularities in large datasets are becoming essential tools in applied sciences. This article presents an open software package, named MODULO (MODal mULtiscale pOd), to perform the Multiscale Proper Orthogonal Decomposition (mPOD) of numerical and experimental data. This novel decomposition combines Multi-resolution Analysis (MRA) and standard Proper Orthogonal Decomposition (POD) to allow for the optimal compromise between decomposition convergence and spectral purity of its modes. The software is equipped with a Graphical User Interface (GUI) and enriched by numerous examples and video tutorials (see Youtube channel MODULO mPOD). The MATLAB source codes and an executable for Windows users can be downloaded at \url{this https URL}; a collection of exercises in Matlab and Python are provided in \url{this https URL}

Ready to get started?

Join us today

Archive Your Research