Featured Researches

Other Statistics

Assessing the association between pre-course metrics of student preparation and student performance in introductory statistics: Results from early data on simulation-based inference vs. nonsimulation based inference

The recent simulation-based inference (SBI) movement in algebra-based introductory statistics courses (Stat 101) has provided preliminary evidence of improved student conceptual understanding and retention. However, little is known about whether these positive effects are preferentially distributed across types of students entering the course. We consider how two metrics of Stat 101 student preparation (pre-course performance on concept inventory and math ACT score) may or may not be associated with end of course student performance on conceptual inventories. Students across all preparation levels tended to show improvement in Stat 101, but more improvement was observed across all student preparation levels in early versions of a SBI course. Furthermore, students' gains tended to be similar regardless of whether students entered the course with more preparation or less. Recent data on a sample of students using a current version of an SBI course showed similar results, though direct comparison with non-SBI students was not possible. Overall, our analysis provides additional evidence that SBI curricula are effective at improving students' conceptual understanding of statistical ideas post-course regardless student preparation. Further work is needed to better understand nuances of student improvement based on other student demographics, prior coursework, as well as instructor and institutional variables.

Read more
Other Statistics

Asymmetry approach to study for chemotherapy treatment and devices failure times data using modified Power function distribution with some modified estimators

In order to improve the already existing models that are used extensively in bio sciences and applied sciences research, a new class of Weighted Power function distribution (WPFD) has been proposed with its various properties and different modifications to be more applicable in real life. We have provided the mathematical derivations for the new distribution including moments, incomplete moments, conditional moments, inverse moments, mean residual function, vitality function, order statistics, mills ratio, information function, Shannon entropy, Bonferroni and Lorenz curves and quantile function. We have also characterized the WPFD, based on doubly truncated mean. The aim of the study is to increase the application of the Power function distribution. The main feature of the proposed distribution is that there is no induction of parameters as compare to the other generalization of the distributions, which are complexed having many parameters. We have used R programming to estimate the parameters of the new class of WPFD using Maximum Likelihood Method (MLM), Percentile Estimators (P.E) and their modified estimators. After analyzing the data, we conclude that the proposed model WPFD performs better in the data sets while compared to different competitor models.

Read more
Other Statistics

BFDA: A Matlab Toolbox for Bayesian Functional Data Analysis

We provide a MATLAB toolbox, BFDA, that implements a Bayesian hierarchical model to smooth multiple functional data with the assumptions of the same underlying Gaussian process distribution, a Gaussian process prior for the mean function, and an Inverse-Wishart process prior for the covariance function. This model-based approach can borrow strength from all functional data to increase the smoothing accuracy, as well as estimate the mean-covariance functions simultaneously. An option of approximating the Bayesian inference process using cubic B-spline basis functions is integrated in BFDA, which allows for efficiently dealing with high-dimensional functional data. Examples of using BFDA in various scenarios and conducting follow-up functional regression are provided. The advantages of BFDA include: (1) Simultaneously smooths multiple functional data and estimates the mean-covariance functions in a nonparametric way; (2) flexibly deals with sparse and high-dimensional functional data with stationary and nonstationary covariance functions, and without the requirement of common observation grids; (3) provides accurately smoothed functional data for follow-up analysis.

Read more
Other Statistics

BNSP: an R Package for Fitting Bayesian Semiparametric Regression Models and Variable Selection

The R package BNSP provides a unified framework for semiparametric location-scale regression and stochastic search variable selection. The statistical methodology that the package is built upon utilizes basis function expansions to represent semiparametric covariate effects in the mean and variance functions, and spike-slab priors to perform selection and regularization of the estimated effects. In addition to the main function that performs posterior sampling, the package includes functions for assessing convergence of the sampler, summarizing model fits, visualizing covariate effects and obtaining predictions for new responses or their means given feature/covariate vectors.

Read more
Other Statistics

Baby Morse Theory in Data Analysis

A methodology is proposed for inferring the topology underlying point cloud data. The approach employs basic elements of Morse Theory, and is capable of producing not only a point estimate of various topological quantities (e.g., genus), but it can also assess their sampling uncertainty in a probabilistic fashion. Several examples of point cloud data in three dimensions are utilized to demonstrate how the method yields interval estimates for the topology of the data as a 2-dimensional surface embedded in R^3.

Read more
Other Statistics

Batch Self Organizing maps for distributional data using adaptive distances

The paper deals with a Batch Self Organizing Map algorithm (DBSOM) for data described by distributional-valued variables. This kind of variables is characterized to take as values one-dimensional probability or frequency distributions on a numeric support. The objective function optimized in the algorithm depends on the choice of the distance measure. According to the nature of the date, the L 2 Wasserstein distance is proposed as one of the most suitable metrics to compare distributions. It is widely used in several contexts of analysis of distributional data. Conventional batch SOM algorithms consider that all variables are equally important for the training of the SOM. However, it is well known that some variables are less relevant than others for this task. In order to take into account the different contribution of the variables we propose an adaptive version of the DBSOM algorithm that tackles this problem with an additional step: a relevance weight is automatically learned for each distributional-valued variable. Moreover, since the L 2 Wasserstein distance allows a decomposition into two components: one related to the means and one related to the size and shape of the distributions, also relevance weights are automatically learned for each of the measurement components to emphasize the importance of the different estimated parameters of the distributions. Examples of real and synthetic datasets of distributional data illustrate the usefulness of the proposed DBSOM algorithms.

Read more
Other Statistics

Bayes' Theorem under Conditional Independence

In this article we provide a substantial discussion on the statistical concept of conditional independence, which is not routinely mentioned in most elementary statistics and mathematical statistics textbooks. Under the assumption of conditional independence, an extended version of Bayes' Theorem is then proposed with illustrations from both hypothetical and real-world examples of disease diagnosis.

Read more
Other Statistics

BayesVarSel: Bayesian Testing, Variable Selection and model averaging in Linear Models using R

This paper introduces the R package BayesVarSel which implements objective Bayesian methodology for hypothesis testing and variable selection in linear models. The package computes posterior probabilities of the competing hypotheses/models and provides a suite of tools, specifically proposed in the literature, to properly summarize the results. Additionally, \ourpack\ is armed with functions to compute several types of model averaging estimations and predictions with weights given by the posterior probabilities. BayesVarSel contains exact algorithms to perform fast computations in problems of small to moderate size and heuristic sampling methods to solve large problems. The software is intended to appeal to a broad spectrum of users, so the interface has been carefully designed to be highly intuititive and is inspired by the well-known lm function. The issue of prior inputs is carefully addressed. In the default usage (fully automatic for the user)BayesVarSel implements the criteria-based priors proposed by Bayarri et al (2012), but the advanced user has the possibility of using several other popular priors in the literature. The package is available through the Comprehensive R Archive Network, CRAN. We illustrate the use of BayesVarSel with several data examples.

Read more
Other Statistics

Bayesian prediction for physical models with application to the optimization of the synthesis of pharmaceutical products using chemical kinetics

Quality control in industrial processes is increasingly making use of prior scientific knowledge, often encoded in physical models that require numerical approximation. Statistical prediction, and subsequent optimization, is key to ensuring the process output meets a specification target. However, the numerical expense of approximating the models poses computational challenges to the identification of combinations of the process factors where there is confidence in the quality of the response. Recent work in Bayesian computation and statistical approximation (emulation) of expensive computational models is exploited to develop a novel strategy for optimizing the posterior probability of a process meeting specification. The ensuing methodology is motivated by, and demonstrated on, a chemical synthesis process to manufacture a pharmaceutical product, within which an initial set of substances evolve according to chemical reactions, under certain process conditions, into a series of new substances. One of these substances is a target pharmaceutical product and two are unwanted by-products. The aim is to determine the combinations of process conditions and amounts of initial substances that maximize the probability of obtaining sufficient target pharmaceutical product whilst ensuring unwanted by-products do not exceed a given level. The relationship between the factors and amounts of substances of interest is theoretically described by the solution to a system of ordinary differential equations incorporating temperature dependence. Using data from a small experiment, it is shown how the methodology can approximate the multivariate posterior predictive distribution of the pharmaceutical target and by-products, and therefore identify suitable operating values. Materials to replicate the analysis can be found at this http URL.

Read more
Other Statistics

Benchmarking in cluster analysis: A white paper

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made.

Read more

Ready to get started?

Join us today