Featured Researches

Data Analysis Statistics And Probability

DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC

We introduce a new high dimensional algorithm for efficiency corrected, maximally Monte Carlo event generator independent fiducial measurements at the LHC and beyond. The approach is driven probabilistically using a Deep Neural Network on an event-by-event basis, trained using detector simulation and even only pure phase space distributed events. This approach gives also a glimpse into the future of high energy physics, where experiments publish new type of measurements in a radically multidimensional way.

Read more
Data Analysis Statistics And Probability

DeepRICH: Learning Deeply Cherenkov Detectors

Imaging Cherenkov detectors are largely used for particle identification (PID) in nuclear and particle physics experiments, where developing fast reconstruction algorithms is becoming of paramount importance to allow for near real time calibration and data quality control, as well as to speed up offline analysis of large amount of data. In this paper we present DeepRICH, a novel deep learning algorithm for fast reconstruction which can be applied to different imaging Cherenkov detectors. The core of our architecture is a generative model which leverages on a custom Variational Auto-encoder (VAE) combined to Maximum Mean Discrepancy (MMD), with a Convolutional Neural Network (CNN) extracting features from the space of the latent variables for classification. A thorough comparison with the simulation/reconstruction package FastDIRC is discussed in the text. DeepRICH has the advantage to bypass low-level details needed to build a likelihood, allowing for a sensitive improvement in computation time at potentially the same reconstruction performance of other established reconstruction algorithms. In the conclusions, we address the implications and potentialities of this work, discussing possible future extensions and generalization.

Read more
Data Analysis Statistics And Probability

Delay Parameter Selection in Permutation Entropy Using Topological Data Analysis

Permutation Entropy (PE) is a powerful tool for quantifying the predictability of a sequence which includes measuring the regularity of a time series. Despite its successful application in a variety of scientific domains, PE requires a judicious choice of the delay parameter τ . While another parameter of interest in PE is the motif dimension n , Typically n is selected between 4 and 8 with 5 or 6 giving optimal results for the majority of systems. Therefore, in this work we focus solely on choosing the delay parameter. Selecting τ is often accomplished using trial and error guided by the expertise of domain scientists. However, in this paper, we show that persistent homology, the flag ship tool from Topological Data Analysis (TDA) toolset, provides an approach for the automatic selection of τ . We evaluate the successful identification of a suitable τ from our TDA-based approach by comparing our results to a variety of examples in published literature.

Read more
Data Analysis Statistics And Probability

Denoising scheme based on singular-value decomposition for one-dimensional spectra and its application in precision storage-ring mass spectrometry

This work concerns noise reduction for one-dimensional spectra in the case that the signal is corrupted by an additive white noise. The proposed method starts with mapping the noisy spectrum to a partial circulant matrix. In virtue of singular-value decomposition of the matrix, components belonging to the signal are determined by inspecting the total variations of left singular vectors. Afterwards, a smoothed spectrum is reconstructed from the low-rank approximation of the matrix consisting of the signal components only. The denoising effect of the proposed method is shown to be highly competitive among other existing nonparametric methods, including moving average, wavelet shrinkage, and total variation. Furthermore, its applicable scenarios in precision storage-ring mass spectrometry are demonstrated to be rather diverse and appealing.

Read more
Data Analysis Statistics And Probability

Density estimation on small datasets

How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a non-perturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.

Read more
Data Analysis Statistics And Probability

Dependence of exponents on text length versus finite-size scaling for word-frequency distributions

Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid. Here we give solid quantitative evidence for the validity of such scaling law, both using careful statistical tests and analytical arguments based on the generalized central-limit theorem applied to the moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product). We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [Yan and Minnhagen, Physica A 444, 828 (2016)] does not stand with rigorous statistical analysis. Instead, we show that the distributions are perfectly described by power-law tails with stable exponents, whose values are close to 2, in agreement with the classical Zipf's law. Some misconceptions about scaling are also clarified.

Read more
Data Analysis Statistics And Probability

Designing compact training sets for data-driven molecular property prediction

In this paper, we consider the problem of designing a training set using the most informative molecules from a specified library to build data-driven molecular property models. Specifically, we use (i) sparse generalized group additivity and (ii) kernel ridge regression as two representative classes of models, we propose a method combining rigorous model-based design of experiments and cheminformatics-based diversity-maximizing subset selection within the epsilon--greedy framework to systematically minimize the amount of data needed to train these models. We demonstrate the effectiveness of the algorithm on subsets of various databases, including QM7, NIST, and a catalysis dataset. For sparse group additive models, a balance between exploration (diversity-maximizing selection) and exploitation (D-optimality selection) leads to learning with a fraction (sometimes as little as 15%) of the data to achieve similar accuracy as five-fold cross validation on the entire set. On the other hand, kernel ridge regression prefers diversity-maximizing selections.

Read more
Data Analysis Statistics And Probability

Detecting Directed Interactions of Networks by Random Variable Resetting

We propose a novel method of detecting directed interactions of a general dynamic network from measured data. By repeating random state variable resetting of a target node and appropriately averaging over the measurable data, the pairwise coupling function between the target and the response nodes can be inferred. This method is applicable to a wide class of networks with nonlinear dynamics, hidden variables and strong noise. The numerical results have fully verified the validity of the theoretical derivation.

Read more
Data Analysis Statistics And Probability

Detecting dynamic spatial correlation patterns with generalized wavelet coherence and non-stationary surrogate data

Time series measured from real-world systems are generally noisy, complex and display statistical properties that evolve continuously over time. Here, we present a method that combines wavelet analysis and non-stationary surrogates to detect short-lived spatial coherent patterns from multivari- ate time-series. In contrast with standard methods, the surrogate data used here are realisations of a non-stationary stochastic process, preserving both the amplitude and time-frequency distributions of original data. We evaluate this framework on synthetic and real-world time series, and we show that it can provide useful insights into the time-resolved structure of spatially extended systems.

Read more
Data Analysis Statistics And Probability

Detecting new signals under background mismodelling

Searches for new astrophysical phenomena often involve several sources of non-random uncertainties which can lead to highly misleading results. Among these, model-uncertainty arising from background mismodelling can dramatically compromise the sensitivity of the experiment under study. Specifically, overestimating the background distribution in the signal region increases the chances of missing new physics. Conversely, underestimating the background outside the signal region leads to an artificially enhanced sensitivity and a higher likelihood of claiming false discoveries. The aim of this work is to provide a unified statistical strategy to perform modelling, estimation, inference, and signal characterization under background mismodelling. The method proposed allows to incorporate the (partial) scientific knowledge available on the background distribution and provides a data-updated version of it in a purely nonparametric fashion without requiring the specification of prior distributions on the parameters. Applications in the context of dark matter searches and radio surveys show how the tools presented in this article can be used to incorporate non-stochastic uncertainty due to instrumental noise and to overcome violations of classical distributional assumptions in stacking experiments.

Read more

Ready to get started?

Join us today