Featured Researches

Data Analysis Statistics And Probability

Fitting very flexible models: Linear regression with large numbers of parameters

There are many uses for linear fitting; the context here is interpolation and denoising of data, as when you have calibration data and you want to fit a smooth, flexible function to those data. Or you want to fit a flexible function to de-trend a time series or normalize a spectrum. In these contexts, investigators often choose a polynomial basis, or a Fourier basis, or wavelets, or something equally general. They also choose an order, or number of basis functions to fit, and (often) some kind of regularization. We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof. We emphasize that it is often valuable to choose far more parameters than data points, despite folk rules to the contrary: Suitably regularized models with enormous numbers of parameters generalize well and make good predictions for held-out data; over-fitting is not (mainly) a problem of having too many parameters. It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a Gaussian process. We recommend cross-validation as a good empirical method for model selection (for example, setting the number of parameters and the form of the regularization), and jackknife resampling as a good empirical method for estimating the uncertainties of the predictions made by the model. We also give advice for building stable computational implementations.

Read more
Data Analysis Statistics And Probability

Fluorescence decay data analysis correcting for detector pulse pile-up at very high count rates

Using Time-Correlated Single Photon Counting (TCSPC) for the purpose of fluorescence lifetime measurements is usually limited in speed due to pile-up. With modern instrumentation this limitation can be lifted significantly but some artefacts due to frequent merging of closely spaced detector pulses (detector pulse pile-up) remains an issue to be addressed. We propose here a data analysis method correcting for this type of artefact and the resulting systematic errors. It physically models the photon losses due to detector pulse pile-up and incorporates the loss in the decay fit model employed to obtain fluorescence lifetimes and relative amplitudes of the decay components. Comparison of results with and without this correction show a significant reduction of systematic errors at count rates approaching the excitation rate. This allows quantitatively accurate fluorescense lifetime imaging (FLIM) at very high frame rates.

Read more
Data Analysis Statistics And Probability

Formation of Regression Model for Analysis of Complex Systems Using Methodology of Genetic Algorithms

This study presents the approach to analyzing the evolution of an arbitrary complex system whose behavior is characterized by a set of different time-dependent factors. The key requirement for these factors is only that they must contain an information about the system; it does not matter at all what the nature (physical, biological, social, economic, etc.) of a complex system is. Within the framework of the presented theoretical approach, the problem of searching for non-linear regression models that express the relationship between these factors for a complex system under study is solved. It will be shown that this problem can be solved using the methodology of \emph{genetic (evolutionary)} algorithms. The resulting regression models make it possible to predict the most probable evolution of the considered system, as well as to determine the significance of some factors and, thereby, to formulate some recommendations to drive by this system. It will be shown that the presented theoretical approach can be used to analyze the data (information) characterizing the educational process in the discipline "Physics" in the secondary school, and to develop the strategies for improving academic performance in this discipline.

Read more
Data Analysis Statistics And Probability

Fragment Graphical Variational AutoEncoding for Screening Molecules with Small Data

In the majority of molecular optimization tasks, predictive machine learning (ML) models are limited due to the unavailability and cost of generating big experimental datasets on the specific task. To circumvent this limitation, ML models are trained on big theoretical datasets or experimental indicators of molecular suitability that are either publicly available or inexpensive to acquire. These approaches produce a set of candidate molecules which have to be ranked using limited experimental data or expert knowledge. Under the assumption that structure is related to functionality, here we use a molecular fragment-based graphical autoencoder to generate unique structural fingerprints to efficiently search through the candidate set. We demonstrate that fragment-based graphical autoencoding reduces the error in predicting physical characteristics such as the solubility and partition coefficient in the small data regime compared to other extended circular fingerprints and string based approaches. We further demonstrate that this approach is capable of providing insight into real world molecular optimization problems, such as searching for stabilization additives in organic semiconductors by accurately predicting 92% of test molecules given 69 training examples. This task is a model example of black box molecular optimization as there is minimal theoretical and experimental knowledge to accurately predict the suitability of the additives.

Read more
Data Analysis Statistics And Probability

From Logistic Growth to Exponential Growth in a Population Dynamical Model

Dynamics among central sources (hubs) providing a resource and large number of components enjoying and contributing to this resource describes many real life situations. Modeling, controlling, and balancing this dynamics is a general problem that arises in many scientific disciplines. We analyze a stochastic dynamical system exhibiting this dynamics with a multiplicative noise. We show that this model can be solved exactly by passing to variables that describe the mass ratio between the components and the hub. We derive a deterministic equation for the average mass ratio. This equation describes logistic growth. We derive the full phase diagram of the model and identify three regimes by calculating the sample and moment Lyapunov exponent of the system. The first regime describes full balance between the non-hub components and the hub, in the second regime the entire resource is concentrated mainly in the hub, and in the third regime the resource is localized on a few non-hub components and the hub. Surprisingly, in the limit of large number of components the transition values do not depend on the amount of resource given by the hub. This model has interesting application in the context of analysis of porous media using Magnetic Resonance (MR) techniques.

Read more
Data Analysis Statistics And Probability

From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling

Copula modeling consists in finding a probabilistic distribution, called copula, whereby its coupling with the marginal distributions of a set of random variables produces their joint distribution. The present work aims to use this technique to connect the statistical distributions of weakly chaotic dynamics and deterministic subdiffusion. More precisely, we decompose the jumps distribution of Geisel-Thomae map into a bivariate one and determine the marginal and copula distributions respectively by infinite ergodic theory and statistical inference techniques. We verify therefore that the characteristic tail distribution of subdiffusion is an extreme value copula coupling Mittag-Leffler distributions. We also present a method to calculate the exact copula and joint distributions in the case where weakly chaotic dynamics and deterministic subdiffusion statistical distributions are already known. Numerical simulations and consistency with the dynamical aspects of the map support our results.

Read more
Data Analysis Statistics And Probability

From periodic sampling to irregular sampling through PNS (Periodic Nonuniform Sampling)

Resampling is an operation costly in calculation time and accuracy. It regularizes irregular sampling, replacing N data by N periodic estimations. This stage can be suppressed, using formulas built with incoming data and completed by sequences of elements which influence decreases when the number of data increases. Obviously, some spectral properties (for processes) and some asymptotic properties (for the sampling sequence) have to be fulfilled. In this paper, we explain that it is possible to elaborate a logical theory, starting from the ordinary periodic sampling, and generalized by the PNS (Periodic Nonuniform Sampling), to treat more general irregular samplings. The "baseband spectrum" hypothesis linked to the "Nyquist bound" (or Shannon bound) is generalized to spectra in a finite number of intervals, suited to the "Landau condition".

Read more
Data Analysis Statistics And Probability

From time-series to complex networks: Application to the cerebrovascular flow patterns in atrial fibrillation

A network-based approach is presented to investigate the cerebrovascular flow patterns during atrial fibrillation (AF) with respect to normal sinus rhythm (NSR). AF, the most common cardiac arrhythmia with faster and irregular beating, has been recently and independently associated with the increased risk of dementia. However, the underlying hemodynamic mechanisms relating the two pathologies remain mainly undetermined so far; thus the contribution of modeling and refined statistical tools is valuable. Pressure and flow rate temporal series in NSR and AF are here evaluated along representative cerebral sites (from carotid arteries to capillary brain circulation), exploiting reliable artificially built signals recently obtained from an in silico approach. The complex network analysis evidences, in a synthetic and original way, a dramatic signal variation towards the distal/capillary cerebral regions during AF, which has no counterpart in NSR conditions. At the large artery level, networks obtained from both AF and NSR hemodynamic signals exhibit elongated and chained features, which are typical of pseudo-periodic series. These aspects are almost completely lost towards the microcirculation during AF, where the networks are topologically more circular and present random-like characteristics. As a consequence, all the physiological phenomena at microcerebral level ruled by periodicity - such as regular perfusion, mean pressure per beat, and average nutrient supply at cellular level - can be strongly compromised, since the AF hemodynamic signals assume irregular behaviour and random-like features. Through a powerful approach which is complementary to the classical statistical tools, the present findings further strengthen the potential link between AF hemodynamic and cognitive decline.

Read more
Data Analysis Statistics And Probability

Full Automation for Rapid Modulator Characterization and Accurate Analysis Using SciPy

Modulator testing involved complex biasing conditions, hardware connections and data analysis. Also, any optical signal distortion due to the grating coupler effect could potentially induce additional difficulty in setting the correct bias condition for an accurate measurement of the modulator performance. In this paper, we proposed to use SciPy, an open-source scientific computing library, for automation in the silicon modulator test with bias setting and data analysis.

Read more
Data Analysis Statistics And Probability

Fully Bayesian Unfolding with Regularization

Fully Bayesian Unfolding differs from other unfolding methods by providing the full posterior probability of unfolded spectra for each bin. We extended the method for the feature of regularization which could be helpful for unfolding non-smooth, over-binned or generally non-standard shaped spectra. To decrease the computation time, the iteration process is presented.

Read more

Ready to get started?

Join us today