Featured Researches

Other Statistics

Fisher, Neyman-Pearson or NHST? A Tutorial for Teaching Data Testing

Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well ingrained in the minds and current practice of researchers; thus, the optimal opportunity for such change is at the time the procedure is taught, be this at undergraduate or at postgraduate levels. This paper presents a tutorial for the teaching of data testing procedures, often referred to as hypothesis testing theories. The first procedure introduced is the approach to data testing followed by Fisher (tests of significance); the second is the approach followed by Neyman and Pearson (tests of acceptance); the final procedure is the incongruent combination of the previous two theories into the current approach (NSHT). For those researchers sticking with the latter, two compromise solutions on how to improve NHST conclude the tutorial.

Read more
Other Statistics

Fitting A Mixture Distribution to Data: Tutorial

This paper is a step-by-step tutorial for fitting a mixture distribution to data. It merely assumes the reader has the background of calculus and linear algebra. Other required background is briefly reviewed before explaining the main algorithm. In explaining the main algorithm, first, fitting a mixture of two distributions is detailed and examples of fitting two Gaussians and Poissons, respectively for continuous and discrete cases, are introduced. Thereafter, fitting several distributions in general case is explained and examples of several Gaussians (Gaussian Mixture Model) and Poissons are again provided. Model-based clustering, as one of the applications of mixture distributions, is also introduced. Numerical simulations are also provided for both Gaussian and Poisson examples for the sake of better clarification.

Read more
Other Statistics

Fourier Analysis and Benford Random Variables

This paper has several major purposes. The central purpose is to describe the "Benford analysis" of a positive random variable and to summarize some results from investigations into base dependence of Benford random variables. The principal tools used to derive these results are Fourier series and Fourier transforms, and a second major purpose of this paper is to present an introductory exposition about these tools. My motivation for writing this paper is twofold. First, I think the theory of Benford random variables and the Benford analysis of a positive random variable are interesting and deserve to be better known. Second, I think that Benford analysis provides a really excellent illustration of the utility of Fourier series and transforms, and reveals certain interconnections between series and transforms that are not obvious from the usual way these subjects are introduced.

Read more
Other Statistics

Frequentism-as-model

Most statisticians are aware that probability models interpreted in a frequentist manner are not really true in objective reality, but only idealisations. I argue that this is often ignored when actually applying frequentist methods and interpreting the results, and that keeping up the awareness for the essential difference between reality and models can lead to a more appropriate use and interpretation of frequentist models and methods, called frequentism-as-model. This is elaborated showing connections to existing work, appreciating the special role of i.i.d. models and subject matter knowledge, giving an account of how and under what conditions models that are not true can be useful, giving detailed interpretations of tests and confidence intervals, confronting their implicit compatibility logic with the inverse probability logic of Bayesian inference, re-interpreting the role of model assumptions, appreciating robustness, and the role of ``interpretative equivalence'' of models. Epistemic (often referred to as Bayesian) probability shares the issue that its models are only idealisations and not really true for modelling reasoning about uncertainty, meaning that it does not have an essential advantage over frequentism, as is often claimed. Bayesian statistics can be combined with frequentism-as-model, leading to what Gelman and Hennig (2017) call ``falsificationist Bayes''.

Read more
Other Statistics

Frequentist Inference without Repeated Sampling

Frequentist inference typically is described in terms of hypothetical repeated sampling but there are advantages to an interpretation that uses a single random sample. Contemporary examples are given that indicate probabilities for random phenomena are interpreted as classical probabilities, and this interpretation is applied to statistical inference using urn models. Both classical and limiting relative frequency interpretations can be used to communicate statistical inference, and the effectiveness of each is discussed. Recent descriptions of p-values, confidence intervals, and power are viewed through the lens of classical probability based on a single random sample from the population.

Read more
Other Statistics

From Curriculum Guidelines to Learning Objectives: A Survey of Five Statistics Programs

The 2000 ASA Guidelines for Undergraduate Statistics majors aimed to provide guidance to programs with undergraduate degrees in statistics as to the content and skills that statistics majors should be learning. With new guidelines forthcoming, it is important to help programs develop an assessment cycle of evaluation. How do we know the students are learning what we want them to learn? How do we improve the program over time? The first step in this process is to translate the broader Guidelines into institution-specific measurable learning outcomes. This paper provides examples of how five programs did so for the 2000 Guidelines. We hope they serve as illustrative examples for programs moving forward with the new guidelines.

Read more
Other Statistics

From Ordinary Differential Equations to Structural Causal Models: the deterministic case

We show how, and under which conditions, the equilibrium states of a first-order Ordinary Differential Equation (ODE) system can be described with a deterministic Structural Causal Model (SCM). Our exposition sheds more light on the concept of causality as expressed within the framework of Structural Causal Models, especially for cyclic models.

Read more
Other Statistics

From Statistician to Data Scientist

According to a recent report from the European Commission, the world generates every minute 1.7 million of billions of data bytes, the equivalent of 360,000 DVDs, and companies that build their decision-making processes by exploiting these data increase their productivity. The treatment and valorization of massive data has consequences on the employment of graduate students in statistics. Which additional skills do students trained in statistics need to acquire to become data scientists ? How to evolve training so that future graduates can adapt to rapid changes in this area, without neglecting traditional jobs and the fundamental and lasting foundation for the training? After considering the notion of big data and questioning the emergence of a "new" science: Data Science, we present the current developments in the training of engineers in Mathematical and Modeling at INSA Toulouse.

Read more
Other Statistics

Game time: statistical contests in the classroom

We describe a contest in variable selection which was part of a statistics course for graduate students. In particular, the possibility to create a contest themselves offered an additional challenge for more advanced students. Since working with data is becoming more important in teaching statistics, we greatly encourage other instructors to try the same.

Read more
Other Statistics

Generalised Reichenbachian Common Cause Systems

The principle of the common cause claims that if an improbable coincidence has occurred, there must exist a common cause. This is generally taken to mean that positive correlations between non-causally related events should disappear when conditioning on the action of some underlying common cause. The extended interpretation of the principle, by contrast, urges that common causes should be called for in order to explain positive deviations between the estimated correlation of two events and the expected value of their correlation. The aim of this paper is to provide the extended reading of the principle with a general probabilistic model, capturing the simultaneous action of a system of multiple common causes. To this end, two distinct models are elaborated, and the necessary and sufficient conditions for their existence are determined.

Read more

Ready to get started?

Join us today