Physics Data Analysis Statistics And Probability - Researchain

Featured Researches

Data Analysis Statistics And Probability

Generative Models for Fast Calorimeter Simulation.LHCb case

Simulation is one of the key components in high energy physics. Historically it relies on the Monte Carlo methods which require a tremendous amount of computation resources. These methods may have difficulties with the expected High Luminosity Large Hadron Collider (HL LHC) need, so the experiment is in urgent need of new fast simulation techniques. We introduce a new Deep Learning framework based on Generative Adversarial Networks which can be faster than traditional simulation methods by 5 order of magnitude with reasonable simulation accuracy. This approach will allow physicists to produce a big enough amount of simulated data needed by the next HL LHC experiments using limited computing resources.

Data Analysis Statistics And Probability

Generic predictions of output probability based on complexities of inputs and outputs

For a broad class of input-output maps, arguments based on the coding theorem from algorithmic information theory (AIT) predict that simple (low Kolmogorov complexity) outputs are exponentially more likely to occur upon uniform random sampling of inputs than complex outputs are. Here, we derive probability bounds that are based on the complexities of the inputs as well as the outputs, rather than just on the complexities of the outputs. The more that outputs deviate from the coding theorem bound, the lower the complexity of their inputs. Our new bounds are tested for an RNA sequence to structure map, a finite state transducer and a perceptron. These results open avenues for AIT to be more widely used in physics.

Data Analysis Statistics And Probability

Ghost Imaging with the Optimal Binary Sampling

To extract the maximum information about the object from a series of binary samples in ghost imaging applications, we propose and demonstrate a framework for optimizing the performance of ghost imaging with binary sampling to approach the results without binarization. The method is based on maximizing the information content of the signal arm detection, by formulating and solving the appropriate parameter estimation problem - finding the binarization threshold that would yield the reconstructed image with optimal Fisher information properties. Applying the 1-bit quantized Poisson statistics to a ghost-imaging model with pseudo-thermal light, we derive the fundamental limit, i.e., the Cramer-Rao lower bound, as the benchmark for the evaluation of the accuracy of the estimator. Our theoertical model and experimental results suggest that, with the optimal binarization threshold, coincident with the statistical mean of all bucket samples, and large number of measurements, the performance of binary sampling GI can approach that of the ordinary one without binarization.

Data Analysis Statistics And Probability

Good and bad predictions: Assessing and improving the replication of chaotic attractors by means of reservoir computing

The prediction of complex nonlinear dynamical systems with the help of machine learning techniques has become increasingly popular. In particular, reservoir computing turned out to be a very promising approach especially for the reproduction of the long-term properties of a nonlinear system. Yet, a thorough statistical analysis of the forecast results is missing. Using the Lorenz and Rössler system we statistically analyze the quality of prediction for different parametrizations - both the exact short-term prediction as well as the reproduction of the long-term properties (the "climate") of the system as estimated by the correlation dimension and largest Lyapunov exponent. We find that both short and longterm predictions vary significantly among the realizations. Thus special care must be taken in selecting the good predictions as predictions which deliver better short-term prediction also tend to better resemble the long-term climate of the system. Instead of only using purely random Erdös-Renyi networks we also investigate the benefit of alternative network topologies such as small world or scale-free networks and show which effect they have on the prediction quality. Our results suggest that the overall performance with respect to the reproduction of the climate of both the Lorenz and Rössler system is worst for scale-free networks. For the Lorenz system there seems to be a slight benefit of using small world networks while for the Rössler system small world and Erdös -Renyi networks performed equivalently well. In general the observation is that reservoir computing works for all network topologies investigated here.

Data Analysis Statistics And Probability

Gradient Profile Estimation Using Exponential Cubic Spline Smoothing in a Bayesian Framework

Attaining reliable profile gradients is of utmost relevance for many physical systems. In most situations, the estimation of gradient can be inaccurate due to noise. It is common practice to first estimate the underlying system and then compute the profile gradient by taking the subsequent analytic derivative. The underlying system is often estimated by fitting or smoothing the data using other techniques. Taking the subsequent analytic derivative of an estimated function can be ill-posed. The ill-posedness gets worse as the noise in the system increases. As a result, the uncertainty generated in the gradient estimate increases. In this paper, a theoretical framework for a method to estimate the profile gradient of discrete noisy data is presented. The method is developed within a Bayesian framework. Comprehensive numerical experiments are conducted on synthetic data at different levels of random noise. The accuracy of the proposed method is quantified. Our findings suggest that the proposed gradient profile estimation method outperforms the state-of-the-art methods.

Data Analysis Statistics And Probability

Graph Generative Adversarial Networks for Sparse Data Generation in High Energy Physics

We develop a graph generative adversarial network to generate sparse data sets like those produced at the CERN Large Hadron Collider (LHC). We demonstrate this approach by training on and generating sparse representations of MNIST handwritten digit images and jets of particles in proton-proton collisions like those at the LHC. We find the model successfully generates sparse MNIST digits and particle jet data. We quantify agreement between real and generated data with a graph-based Fréchet Inception distance, and the particle and jet feature-level 1-Wasserstein distance for the MNIST and jet datasets respectively.

Data Analysis Statistics And Probability

Graph neural network for 3D classification of ambiguities and optical crosstalk in scintillator-based neutrino detectors

Deep learning tools are being used extensively in high energy physics and are becoming central in the reconstruction of neutrino interactions in particle detectors. In this work, we report on the performance of a graph neural network in assisting with particle flow event reconstruction. The three-dimensional reconstruction of particle tracks produced in neutrino interactions can be subject to ambiguities due to high multiplicity signatures in the detector or leakage of signal between neighboring active detector volumes. Graph neural networks potentially have the capability of identifying all these features to boost the reconstruction performance. As an example case study, we tested a graph neural network, inspired by the GraphSAGE algorithm, on a novel 3D-granular plastic-scintillator detector, that will be used to upgrade the near detector of the T2K experiment. The developed neural network has been trained and tested on diverse neutrino interaction samples, showing very promising results: the classification of particle track voxels produced in the detector can be done with efficiencies and purities of 94-96% per event and most of the ambiguities can be identified and rejected, while being robust against systematic effects.

Data Analysis Statistics And Probability

HEPLike: an open source framework for experimental likelihood evaluation

We present a computer framework to store and evaluate likelihoods coming from High Energy Physics experiments. Due to its flexibility it can be interfaced with existing fitting codes and allows to uniform the interpretation of the experimental results among users. The code is provided with large open database, which contains the experimental measurements. The code is of use for users who perform phenomenological studies, global fits or experimental averages.

Data Analysis Statistics And Probability

HMCF - Hamiltonian Monte Carlo Sampling for Fields - A Python framework for HMC sampling with NIFTy

HMCF "Hamiltonian Monte Carlo for Fields" is a software add-on for the NIFTy "Numerical Information Field Theory" framework implementing Hamiltonian Monte Carlo (HMC) sampling in Python. HMCF as well as NIFTy are designed to address inference problems in high-dimensional spatially correlated setups such as image reconstruction. HMCF adds an HMC sampler to NIFTy that automatically adjusts the many free parameters steering the HMC sampling machinery. A wide variety of features ensure efficient full-posterior sampling for high-dimensional inference problems. These features include integration step size adjustment, evaluation of the mass matrix, convergence diagnostics, higher order symplectic integration and simultaneous sampling of parameters and hyperparameters in Bayesian hierarchical models.

Data Analysis Statistics And Probability

Harmonizing discovery thresholds and reporting two-sided confidence intervals: a modified Feldman & Cousins method

When searching for new physics effects, collaborations will often wish to publish upper limits and intervals with a lower confidence level than the threshold they would set to claim an excess or a discovery. However, confidence intervals are typically constructed to provide constant coverage, or probability to contain the true value, with possible overcoverage if the random parameter is discrete. In particular, that means that the confidence interval will contain the 0 -signal case with the same frequency as the confidence level. This paper details a modification to the Feldman-Cousins method to allow a different, higher excess reporting significance than the interval confidence level.

Ready to get started?

Join us today

Archive Your Research