Featured Researches

Data Analysis Statistics And Probability

A new quantity for statistical analysis: "Scaling invariable Benford distance"

For the first time, we introduce "Scaling invariable Benford distance" and "Benford cyclic graph", which can be used to analyze any data set. Using the quantity and the graph, we analyze some date sets with common distributions, such as normal, exponent, etc., find that different data set has a much different value of "Scaling invariable Benford distance" and different figure feature of "Benford cyclic graph". We also explore the influence of data size on "Scaling invariable Benford distance", and find that it firstly reduces with data size increasing, then approximate to a fixed value when the size is large enough.

Read more
Data Analysis Statistics And Probability

A note on causation versus correlation

Recently, it has been shown that the causality and information flow between two time series can be inferred in a rigorous and quantitative sense, and, besides, the resulting causality can be normalized. A corollary that follows is, in the linear limit, causation implies correlation, while correlation does not imply causation. Now suppose there is an event A taking a harmonic form (sine/cosine), and it generates through some process another event B so that B always lags A by a phase of π/2 . Here the causality is obviously seen, while by computation the correlation is, however, zero. This seemingly contradiction is rooted in the fact that a harmonic system always leaves a single point on the Poincaré section; it does not add information. That is to say, though the absolute information flow from A to B is zero, i.e., T A→B =0 , the total information increase of B is also zero, so the normalized T A→B , denoted as τ A→B , takes the form of 0 0 . By slightly perturbating the system with some noise, solving a stochastic differential equation, and letting the perturbation go to zero, it can be shown that τ A→B approaches 100\%, just as one would have expected.

Read more
Data Analysis Statistics And Probability

A novel approach to the localization and the estimate of radioactivity in contaminated waste packages via imaging techniques

Dismantling nuclear power plants entails the production of a large amount of contaminated (or potentially contaminated) material whose disposal is of crucial importance. Most of the end products have to be stored in special repositories but it can happen that some of them are slightly contaminated or not contaminated at all, making it possible to free release them. One possible approach to free release measurements uses Large Clearance Monitors, chambers surrounded by plastic scintillation detectors that allow a decision about the clearance of waste packages up to 1000 kg. Due to the composite nature of the detectors in a Large Clearance Monitor, it is easy to imagine that one can apply 3D imaging algorithms to localize radioactive sources inside a waste package. In this work we will show how a special algorithm that maximizes the conditional informational entropy allows decisions about the clearance of portions of the sample.

Read more
Data Analysis Statistics And Probability

A practical way to regularize unfolding of sharply varying spectra with low data statistics

Unfolding is a well-established tool in particle physics. However, a naive application of the standard regularization techniques to unfold the momentum spectrum of protons ejected in the process of negative muon nuclear capture led to a result exhibiting unphysical artifacts. A finite data sample limited the range in which unfolding can be performed, thus introducing a cutoff. A sharply falling "true" distribution led to low data statistics near the cutoff, which exacerbated the regularization bias and produced an unphysical spike in the resulting spectrum. An improved approach has been developed to address these issues and is illustrated using a toy model. The approach uses full Poisson likelihood of data, and produces a continuous, physically plausible, unfolded distribution. The new technique has a broad applicability since spectra with similar features, such as sharply falling spectra, are common.

Read more
Data Analysis Statistics And Probability

A proposed solution for analysis management in high energy physics

This paper presents an architecture for the analysis management in high energy physics experiments. Some new concepts on data analysis are introduced. A protocol for organizing and operating an analysis is raised. A toolkit following this architecture is developed, which provides a solution of analysis management with both flexibility and reproducibility. A foreseen development of this toolkit is discussed.

Read more
Data Analysis Statistics And Probability

A robust principal component analysis for outlier identification in messy microcalorimeter data

A principal component analysis (PCA) of clean microcalorimeter pulse records can be a first step beyond statistically optimal linear filtering of pulses towards a fully non-linear analysis. For PCA to be practical on spectrometers with hundreds of sensors, an automated identification of clean pulses is required. Robust forms of PCA are the subject of active research in machine learning. We examine a version known as coherence pursuit that is simple, fast, and well matched to the automatic identification of outlier records, as needed for microcalorimeter pulse analysis.

Read more
Data Analysis Statistics And Probability

A shadowing-based inflation scheme for ensemble data assimilation

Artificial ensemble inflation is a common technique in ensemble data assimilation, whereby the ensemble covariance is periodically increased in order to prevent deviation of the ensemble from the observations and possible ensemble collapse. This manuscript introduces a new form of covariance inflation for ensemble data assimilation based upon shadowing ideas from dynamical systems theory. We present results from a low order nonlinear chaotic system that supports using shadowing inflation, demonstrating that shadowing inflation is more robust to parameter tuning than standard multiplicative covariance inflation, outperforming in observation-sparse scenarios and often leading to longer forecast shadowing times.

Read more
Data Analysis Statistics And Probability

A simple decomposition of European temperature variability capturing the variance from days to a decade

We analyze European temperature variability from station data with the method of detrended fluctuation analysis. This method is known to give a scaling exponent indicating long range correlations in time for temperature anomalies. However, by a more careful look at the fluctuation function we are able to explain the emergent scaling behaviour by short time relaxation, the yearly cycle and one additional process. It turns out that for many stations this interannual variability is an oscillatory mode with a period length of approximately 7-8 years, which is consistent with results of other methods. We discuss the spatial patterns in all parameters and validate the finding of the 7-8 year period by comparing stations with and without this mode.

Read more
Data Analysis Statistics And Probability

A surrogate-based optimal likelihood function for the Bayesian calibration of catalytic recombination in atmospheric entry protection materials

This work deals with the inference of catalytic recombination parameters from plasma wind tunnel experiments for reusable thermal protection materials. One of the critical factors affecting the performance of such materials is the contribution to the heat flux of the exothermic recombination reactions at the vehicle surface. The main objective of this work is to develop a dedicated Bayesian framework that allows us to compare uncertain measurements with model predictions which depend on the catalytic parameter values. Our framework accounts for uncertainties involved in the model definition and incorporates all measured variables with their respective uncertainties. The physical model used for the estimation consists of a 1D boundary layer solver along the stagnation line. The chemical production term included in the surface mass balance depends on the catalytic recombination efficiency. As not all the different quantities needed to simulate a reacting boundary layer can be measured or known (such as the flow enthalpy at the inlet boundary), we propose an optimization procedure built on the construction of the likelihood function to determine their most likely values based on the available experimental data. This procedure avoids the need to introduce any a priori estimates on the nuisance quantities, namely, the boundary layer edge enthalpy, wall temperatures, static and dynamic pressures, which would entail the use of very wide priors. We substitute the optimal likelihood of the experimental data with a surrogate model to make the inference procedure both faster and more robust. We show that the resulting Bayesian formulation yields meaningful and accurate posterior distributions of the catalytic parameters with a reduction of more than 20% of the standard deviation with respect to previous works. We also study the implications of an extension of the experimental procedure.

Read more
Data Analysis Statistics And Probability

A tail-regression estimator for heavy-tailed distributions of known tail indices and its application to continuum quantum Monte Carlo data

Standard statistical analysis is unable to provide reliable confidence intervals on expectation values of probability distributions that do not satisfy the conditions of the central limit theorem. We present a regression-based estimator of an arbitrary moment of a probability distribution with power-law heavy tails that exploits knowledge of the exponents of its asymptotic decay to bypass this issue entirely. Our method is applied to synthetic data and to energy and atomic force data from variational and diffusion quantum Monte Carlo calculations, whose distributions have known asymptotic forms [J. R. Trail, Phys. Rev. E 77, 016703 (2008); A. Badinski et al., J. Phys.: Condens. Matter 22 074202 (2010)]. We obtain convergent, accurate confidence intervals on the variance of the local energy of an electron gas and on the Hellmann-Feynman force on an atom in the all-electron carbon dimer. In each of these cases the uncertainty on our estimator is 45% and 60 times smaller, respectively, than the nominal (ill-defined) standard error.

Read more

Ready to get started?

Join us today