Featured Researches

Data Analysis Statistics And Probability

Biased bootstrap sampling for efficient two-sample testing

The so-called 'energy test' is a frequentist technique used in experimental particle physics to decide whether two samples are drawn from the same distribution. Its usage requires a good understanding of the distribution of the test statistic, T, under the null hypothesis. We propose a technique which allows the extreme tails of the T-distribution to be determined more efficiently than possible with present methods. This allows quick evaluation of (for example) 5-sigma confidence intervals that otherwise would have required prohibitively costly computation times or approximations to have been made. Furthermore, we comment on other ways that T computations could be sped up using established results from the statistics community. Beyond two-sample testing, the proposed biased bootstrap method may provide benefit anywhere extreme values are currently obtained with bootstrap sampling.

Read more
Data Analysis Statistics And Probability

BlurRing

A code package, BlurRing, is developed as a method to allow for multi-dimensional likelihood visualisation. From the BlurRing visualisation additional information about the likelihood can be extracted. The spread in any direction of the overlaid likelihood curves gives information about the uncertainty on the confidence intervals presented in the two-dimensional likelihood plots.

Read more
Data Analysis Statistics And Probability

Border effect corrections for diagonal line based recurrence quantification analysis measures

Recurrence Quantification Analysis (RQA) defines a number of quantifiers, which base upon diagonal line structures in the recurrence plot (RP). Due to the finite size of an RP, these lines can be cut by the borders of the RP and, thus, bias the length distribution of diagonal lines and, consequently, the line based RQA measures. In this letter we investigate the impact of the mentioned border effects and of the thickening of diagonal lines in an RP (caused by tangential motion) on the estimation of the diagonal line length distribution, quantified by its entropy. Although a relation to the Lyapunov spectrum is theoretically expected, the mentioned entropy yields contradictory results in many studies. Here we summarize correction schemes for both, the border effects and the tangential motion and systematically compare them to methods from the literature. We show that these corrections lead to the expected behavior of the diagonal line length entropy, in particular meaning zero values in case of a regular motion and positive values for chaotic motion. Moreover, we test these methods under noisy conditions, in order to supply practical tools for applied statistical research.

Read more
Data Analysis Statistics And Probability

Boundary conditions for similarity Index

The recent development, shows that the Bray-Curtis's formula for similarity Index (1957), has been applied in various fields like Ecology, Astrophysics, etc. In this paper, we found the possible boundary conditions for this evolved formula (i.e. the numerical range in which the formula becomes in-effective to give the expected result). Here we have simulated the real world data in the form of normally distributed random numbers, that directly shows the range (or conditions) at which this formula gives unambiguous similarity result.

Read more
Data Analysis Statistics And Probability

Breaking Symmetries of the Reservoir Equations in Echo State Networks

Reservoir computing has repeatedly been shown to be extremely successful in the prediction of nonlinear time-series. However, there is no complete understanding of the proper design of a reservoir yet. We find that the simplest popular setup has a harmful symmetry, which leads to the prediction of what we call mirror-attractor. We prove this analytically. Similar problems can arise in a general context, and we use them to explain the success or failure of some designs. The symmetry is a direct consequence of the hyperbolic tangent activation function. Further, four ways to break the symmetry are compared numerically: A bias in the output, a shift in the input, a quadratic term in the readout, and a mixture of even and odd activation functions. Firstly, we test their susceptibility to the mirror-attractor. Secondly, we evaluate their performance on the task of predicting Lorenz data with the mean shifted to zero. The short-time prediction is measured with the forecast horizon while the largest Lyapunov exponent and the correlation dimension are used to represent the climate. Finally, the same analysis is repeated on a combined dataset of the Lorenz attractor and the Halvorsen attractor, which we designed to reveal potential problems with symmetry. We find that all methods except the output bias are able to fully break the symmetry with input shift and quadratic readout performing the best overall.

Read more
Data Analysis Statistics And Probability

Bryan's Maximum Entropy Method -- diagnosis of a flawed argument and its remedy

The Maximum Entropy Method (MEM) is a popular data analysis technique based on Bayesian inference, which has found various applications in the research literature. While the MEM itself is well-grounded in statistics, I argue that its state-of-the-art implementation, suggested originally by Bryan, artificially restricts its solution space. This restriction leads to a systematic error often unaccounted for in contemporary MEM studies. The goal of this paper is to carefully revisit Bryan's train of thought, point out its flaw in applying linear algebra arguments to an inherently nonlinear problem, and suggest possible ways to overcome it.

Read more
Data Analysis Statistics And Probability

Burst-tree decomposition of time series reveals the structure of temporal correlations

Comprehensive characterization of non-Poissonian, bursty temporal patterns observed in various natural and social processes is crucial to understand the underlying mechanisms behind such temporal patterns. Among them bursty event sequences have been studied mostly in terms of interevent times (IETs), while the higher-order correlation structure between IETs has gained very little attention due to the lack of a proper characterization method. In this paper we propose a method of decomposing an event sequence into a set of IETs and a burst tree, which exactly captures the structure of temporal correlations that is entirely missing in the analysis of IET distributions. We apply the burst-tree decomposition method to various datasets and analyze the structure of the revealed burst trees. In particular, we observe that event sequences show similar burst-tree structure, such as heavy-tailed burst size distributions, despite of very different IET distributions. The burst trees allow us to directly characterize the preferential and assortative mixing structure of bursts responsible for the higher-order temporal correlations. We also show how to use the decomposition method for the systematic investigation of such higher-order correlations captured by the burst trees in the framework of randomized reference models. Finally, we devise a simple kernel-based model for generating event sequences showing appropriate higher-order temporal correlations. Our method is a tool to make the otherwise overwhelming analysis of higher-order correlations in bursty time series tractable by turning it into the analysis of a tree structure.

Read more
Data Analysis Statistics And Probability

Calculating p -values and their significances with the Energy Test for large datasets

The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T -value). The method has recently been used in particle physics to search for differences between samples that arise from CP violation. The generalised extreme value function has previously been used to describe the distribution of T -values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T -values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p -values associated with the results of the test.

Read more
Data Analysis Statistics And Probability

Calculating permutation entropy without permutations

A method for analyzing sequential data sets, similar to the permutation entropy one, is discussed. The characteristic features of this method are as follows: it preserves information about equal values, if any, in the embedding vectors; it is exempt of combinatorics; it delivers the same entropy value as does the permutation method, provided the embedding vectors do not have equal components. In the latter case this method can be used instead of the permutation one. If embedding vectors have equal components this method could be more precise in discriminating between similar data sets.

Read more
Data Analysis Statistics And Probability

Causal network discovery by iterative conditioning: comparison of algorithms

Estimating causal interactions in complex dynamical systems is an important problem encountered in many fields of current science. While a theoretical solution for detecting the causal interactions has been previously formulated in the framework of prediction improvement, it generally requires the computation of high-dimensional information functionals -- a situation invoking the curse of dimensionality with increasing network size. Recently, several methods have been proposed to alleviate this problem, based on iterative procedures for assessment of conditional (in)dependences. In the current work, we bring a comparison of several such prominent approaches. This is done both by theoretical comparison of the algorithms using a formulation in a common framework, and by numerical simulations including realistic complex coupling patterns. The theoretical analysis highlights the key similarities and differences between the algorithms, hinting on their comparative strengths and weaknesses. The method assumptions and specific properties such as false positive control and order dependence are discussed. Numerical simulations suggest that while the accuracy of most of the algorithms is almost indistinguishable, there are substantial differences in their computational demands, ranging theoretically from polynomial to exponential complexity, and leading to substantial differences in computation time in realistic scenarios depending on the density and size of networks. Based on analysis of the algorithms and numerical simulations, we propose a hybrid approach providing competitive accuracy with improved computational efficiency.

Read more

Ready to get started?

Join us today