Featured Researches

Data Analysis Statistics And Probability

Machine Learning Peeling and Loss Modelling of Time-Domain Reflectometry

A fundamental pursuit of microwave metrology is the determination of the characteristic impedance profile of microwave systems. Among other methods, this can be practically achieved by means of time-domain reflectometry (TDR) that measures the reflections from a device due to an applied stimulus. Conventional TDR allows for the measurement of systems comprising a single impedance. However, real systems typically feature impedance variations that obscure the determination of all impedances subsequent to the first one. This problem has been studied previously and is generally known as scattering inversion or, in the context of microwave metrology, time-domain "peeling". In this article, we demonstrate the implementation of a space-time efficient peeling algorithm that corrects for the effect of prior impedance mismatch in a nonuniform lossless transmission line, regardless of the nature of the stimulus. We generalize TDR measurement analysis by introducing two tools: A stochastic machine learning clustering tool and an arbitrary lossy transmission line modeling tool. The former mitigates many of the imperfections typically plaguing TDR measurements (except for dispersion) and allows for an efficient processing of large datasets; the latter allows for a complete transmission line characterization including both conductor and dielectric loss.

Read more
Data Analysis Statistics And Probability

Machine Learning scientific competitions and datasets

A number of scientific competitions have been organised in the last few years with the objective of discovering innovative techniques to perform typical High Energy Physics tasks, like event reconstruction, classification and new physics discovery. Four of these competitions are summarised in this chapter, from which guidelines on organising such events are derived. In addition, a choice of competition platforms and available datasets are described

Read more
Data Analysis Statistics And Probability

Machine and Deep Learning Applications in Particle Physics

The many ways in which machine and deep learning are transforming the analysis and simulation of data in particle physics are reviewed. The main methods based on boosted decision trees and various types of neural networks are introduced, and cutting-edge applications in the experimental and theoretical/phenomenological domains are highlighted. After describing the challenges in the application of these novel analysis techniques, the review concludes by discussing the interactions between physics and machine learning as a two-way street enriching both disciplines and helping to meet the present and future challenges of data-intensive science at the energy and intensity frontiers.

Read more
Data Analysis Statistics And Probability

Machine and deep learning techniques in heavy-ion collisions with ALICE

Over the last years, machine learning tools have been successfully applied to a wealth of problems in high-energy physics. A typical example is the classification of physics objects. Supervised machine learning methods allow for significant improvements in classification problems by taking into account observable correlations and by learning the optimal selection from examples, e.g. from Monte Carlo simulations. Even more promising is the usage of deep learning techniques. Methods like deep convolutional networks might be able to catch features from low-level parameters that are not exploited by default cut-based methods. These ideas could be particularly beneficial for measurements in heavy-ion collisions, because of the very large multiplicities. Indeed, machine learning methods potentially perform much better in systems with a large number of degrees of freedom compared to cut-based methods. Moreover, many key heavy-ion observables are most interesting at low transverse momentum where the underlying event is dominant and the signal-to-noise ratio is quite low. In this work, recent developments of machine- and deep learning applications in heavy-ion collisions with ALICE will be presented, with focus on a deep learning-based b-jet tagging approach and the measurement of low-mass dielectrons. While the b-jet tagger is based on a mixture of shallow fully-connected and deep convolutional networks, the low-mass dielectron measurement uses gradient boosting and shallow neural networks. Both methods are very promising compared to default cut-based methods.

Read more
Data Analysis Statistics And Probability

Machine learning technique to improve anti-neutrino detection efficiency for the ISMRAN experiment

The Indian Scintillator Matrix for Reactor Anti-Neutrino detection - ISMRAN experiment aims to detect electron anti-neutrinos ( ν ¯ e ) emitted from a reactor via inverse beta decay reaction (IBD). The setup, consisting of 1 ton segmented Gadolinium foil wrapped plastic scintillator array, is planned for remote reactor monitoring and sterile neutrino search. The detection of prompt positron and delayed neutron from IBD will provide the signature of ν ¯ e event in ISMRAN. The number of segments with energy deposit ( N bars ) and sum total of these deposited energies are used as discriminants for identifying prompt positron event and delayed neutron capture event. However, a simple cut based selection of above variables leads to a low ν ¯ e signal detection efficiency due to overlapping region of N bars and sum energy for the prompt and delayed events. Multivariate analysis (MVA) tools, employing variables suitably tuned for discrimination, can be useful in such scenarios. In this work we report the results from an application of artificial neural network -- the multilayer perceptron (MLP), particularly the Bayesian extension -- MLPBNN, to the simulated signal and background events in ISMRAN. The results from application of MLP to classify prompt positron events from delayed neutron capture events on Hydrogen, Gadolinium nuclei and also from the typical reactor γ -ray and fast neutron backgrounds is reported. An enhanced efficiency of ∼ 91 % with a background rejection of ∼ 73 % for prompt selection and an efficiency of ∼ 89 % with a background rejection of ∼ 71 % for the delayed capture event, is achieved using the MLPBNN classifier for the ISMRAN experiment.

Read more
Data Analysis Statistics And Probability

Making \emph{ordinary least squares} linear classfiers more robust

In the field of statistics and machine learning, the sums-of-squares, commonly referred to as \emph{ordinary least squares}, can be used as a convenient choice of cost function because of its many nice analytical properties, though not always the best choice. However, it has been long known that \emph{ordinary least squares} is not robust to outliers. Several attempts to resolve this problem led to the creation of alternative methods that, either did not fully resolved the \emph{outlier problem} or were computationally difficult. In this paper, we provide a very simple solution that can make \emph{ordinary least squares} less sensitive to outliers in data classification, by \emph{scaling the augmented input vector by its length}. We show some mathematical expositions of the \emph{outlier problem} using some approximations and geometrical techniques. We present numerical results to support the efficacy of our method.

Read more
Data Analysis Statistics And Probability

Managing Many Simultaneous Systematic Uncertainties

Recent statistical evaluations for High-Energy Physics measurements, in particular those at the Large Hadron Collider, require careful evaluation of many sources of systematic uncertainties at the same time. While the fundamental aspects of the statistical treatment are now consolidated, both using a frequentist or a Bayesian approach, the management of many sources of uncertainties and their corresponding nuisance parameters in analyses that combine multiple control regions and decay channels, in practice, may pose challenging implementation issues, that make the analysis infrastructure complex and hard to manage, eventually resulting in simplifications in the treatment of systematics, and in limitations to the result interpretation. Typical cases will be discussed, having in mind the most popular implementation tool, RooStats, with possible ideas about improving the management of such cases in future software implementations.

Read more
Data Analysis Statistics And Probability

Manifold Learning for Organizing Unstructured Sets of Process Observations

Data mining is routinely used to organize ensembles of short temporal observations so as to reconstruct useful, low-dimensional realizations of an underlying dynamical system. In this paper, we use manifold learning to organize unstructured ensembles of observations ("trials") of a system's response surface. We have no control over where every trial starts; and during each trial operating conditions are varied by turning "agnostic" knobs, which change system parameters in a systematic but unknown way. As one (or more) knobs "turn" we record (possibly partial) observations of the system response. We demonstrate how such partial and disorganized observation ensembles can be integrated into coherent response surfaces whose dimension and parametrization can be systematically recovered in a data-driven fashion. The approach can be justified through the Whitney and Takens embedding theorems, allowing reconstruction of manifolds/attractors through different types of observations. We demonstrate our approach by organizing unstructured observations of response surfaces, including the reconstruction of a cusp bifurcation surface for Hydrogen combustion in a Continuous Stirred Tank Reactor. Finally, we demonstrate how this observation-based reconstruction naturally leads to informative transport maps between input parameter space and output/state variable spaces.

Read more
Data Analysis Statistics And Probability

MatDRAM: A pure-MATLAB Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo Sampler

Markov Chain Monte Carlo (MCMC) algorithms are widely used for stochastic optimization, sampling, and integration of mathematical objective functions, in particular, in the context of Bayesian inverse problems and parameter estimation. For decades, the algorithm of choice in MCMC simulations has been the Metropolis-Hastings (MH) algorithm. An advancement over the traditional MH-MCMC sampler is the Delayed-Rejection Adaptive Metropolis (DRAM). In this paper, we present MatDRAM, a stochastic optimization, sampling, and Monte Carlo integration toolbox in MATLAB which implements a variant of the DRAM algorithm for exploring the mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in data science, Machine Learning, and scientific inference. The design goals of MatDRAM include nearly-full automation of MCMC simulations, user-friendliness, fully-deterministic reproducibility, and the restart functionality of simulations. We also discuss the implementation details of a technique to automatically monitor and ensure the diminishing adaptation of the proposal distribution of the DRAM algorithm and a method of efficiently storing the resulting simulated Markov chains. The MatDRAM library is open-source, MIT-licensed, and permanently located and maintained as part of the ParaMonte library at this https URL.

Read more
Data Analysis Statistics And Probability

Maximum Entropy competes with Maximum Likelihood

Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there exists a well-defined prior Dirichlet density for unknown probabilities, and that the average Kullback-Leibler (KL) distance can be employed for deciding on the quality and applicability of various estimators. These allow to evaluate the relevance of various MAXENT constraints, check its general applicability, and compare MAXENT with estimators having various degrees of dependence on the prior, viz. the regularized maximum likelihood (ML) and the Bayesian estimators. We show that MAXENT applies in sparse data regimes, but needs specific types of prior information. In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.

Read more

Ready to get started?

Join us today