Featured Researches

Other Statistics

Statistical methods research done as science rather than mathematics

This paper is about how we study statistical methods. As an example, it uses the random regressions model, in which the intercept and slope of cluster-specific regression lines are modeled as a bivariate random effect. Maximizing this model's restricted likelihood often gives a boundary value for the random effect correlation or variances. We argue that this is a problem; that it is a problem because our discipline has little understanding of how contemporary models and methods map data to inferential summaries; that we lack such understanding, even for models as simple as this, because of a near-exclusive reliance on mathematics as a means of understanding; and that math alone is no longer sufficient. We then argue that as a discipline, we can and should break open our black-box methods by mimicking the five steps that molecular biologists commonly use to break open Nature's black boxes: design a simple model system, formulate hypotheses using that system, test them in experiments on that system, iterate as needed to reformulate and test hypotheses, and finally test the results in an "in vivo" system. We demonstrate this by identifying conditions under which the random-regressions restricted likelihood is likely to be maximized at a boundary value. Resistance to this approach seems to arise from a view that it lacks the certainty or intellectual heft of mathematics, perhaps because simulation experiments in our literature rarely do more than measure a new method's operating characteristics in a small range of situations. We argue that such work can make useful contributions including, as in molecular biology, the findings themselves and sometimes the designs used in the five steps; that these contributions have as much practical value as mathematical results; and that therefore they merit publication as much as the mathematical results our discipline esteems so highly.

Read more
Other Statistics

Statistical testing in a Linear Probability Space

Imagine that you could calculate of posttest probabilities, i.e. Bayes theorem with simple addition. This is possible if we stop thinking of probabilities as ranging from 0 to 1.0. There is a naturally occurring linear probability space when data are transformed into the logarithm of the odds ratio (log10 odds). In this space, probabilities are replaced by W (Weight) where W=log10(probability/(1-probability)). I would like to argue the multiple benefits of performing statistical testing in a linear probability space: 1) Statistical testing is accurate in linear probability space but not in other spaces. 2) Effect size is called Impact (I) and is the difference in means between two treatments (I=Wmean2-Wmean1). 3) Bayes theorem is simply Wposttest=Wpretest+Itest. 4) Significance (p value) is replaced by Certainty (C) which is the W of the p value. Methods to transform data into and out of linear probability space are described.

Read more
Other Statistics

Statistical witchhunts: Science, justice & the p-value crisis

We provide accessible insight into the current 'replication crisis' in 'statistical science', by revisiting the old metaphor of 'court trial as hypothesis test'. Inter alia, we define and diagnose harmful statistical witch-hunting both in justice and science, which extends to the replication crisis itself, where a hunt on p-values is currently underway.

Read more
Other Statistics

Statistics Educational Challenge in the 21st Century

What do we teach and what should we teach? An honest answer to this question is painful, very painful--what we teach lags decades behind what we practice. How can we reduce this `gap' to prepare a data science workforce of trained next-generation statisticians? This is a challenging open problem that requires many well-thought-out experiments before finding the secret sauce. My goal in this article is to lay out some basic principles and guidelines (rather than creating a pseudo-curriculum based on cherry-picked topics) to expedite this process for finding an `objective' solution.

Read more
Other Statistics

Statistics students' identification of inferential model elements within contexts of their own invention

Statistical thinking partially depends upon an iterative process by which essential features of a problem setting are identified and mapped onto an abstract model or archetype, and then translated back into the context of the original problem setting (Wild and Pfannkuch 1999). Assessment in introductory statistics often relies on tasks that present students with data in context and expects them to choose and describe an appropriate model. This study explores post-secondary student responses to an alternative task that prompts students to clearly identify a sample, population, statistic, and parameter using a context of their own invention. The data include free text narrative responses of a random sample of 500 students from a sample of more than 1600 introductory statistics students. Results suggest that students' responses often portrayed sample and population accurately. Portrayals of statistic and parameter were less reliable and were associated with descriptions of a wide variety of other concepts. Responses frequently attributed a variable of some kind to the statistic, or a study design detail to the parameter. Implications for instruction and research are discussed, including a call for emphasis on a modeling paradigm in introductory statistics.

Read more
Other Statistics

Statistique et Big Data Analytics; Volumétrie, L'Attaque des Clones

This article assumes acquired the skills and expertise of a statistician in unsupervised (NMF, k-means, SVD) and supervised learning (regression, CART, random forest). What skills and knowledge do a statistician must acquire to reach the "Volume" scale of big data? After a quick overview of the different strategies available and especially of those imposed by Hadoop, the algorithms of some available learning methods are outlined in order to understand how they are adapted to the strong stresses of the Map-Reduce functionalities

Read more
Other Statistics

Stem-ming the Tide: Predicting STEM attrition using student transcript data

Science, technology, engineering, and math (STEM) fields play growing roles in national and international economies by driving innovation and generating high salary jobs. Yet, the US is lagging behind other highly industrialized nations in terms of STEM education and training. Furthermore, many economic forecasts predict a rising shortage of domestic STEM-trained professions in the US for years to come. One potential solution to this deficit is to decrease the rates at which students leave STEM-related fields in higher education, as currently over half of all students intending to graduate with a STEM degree eventually attrite. However, little quantitative research at scale has looked at causes of STEM attrition, let alone the use of machine learning to examine how well this phenomenon can be predicted. In this paper, we detail our efforts to model and predict dropout from STEM fields using one of the largest known datasets used for research on students at a traditional campus setting. Our results suggest that attrition from STEM fields can be accurately predicted with data that is routinely collected at universities using only information on students' first academic year. We also propose a method to model student STEM intentions for each academic term to better understand the timing of STEM attrition events. We believe these results show great promise in using machine learning to improve STEM retention in traditional and non-traditional campus settings.

Read more
Other Statistics

Sterrett Procedure for the Generalized Group Testing Problem

Group testing is a useful method that has broad applications in medicine, engineering, and even in airport security control. Consider a finite population of N items, where item i has a probability p i to be defective. The goal is to identify all items by means of group testing. This is the generalized group testing problem. The optimum procedure, with respect to the expected total number of tests, is unknown even in case when all p i are equal. \cite{H1975} proved that an ordered partition (with respect to p i ) is the optimal for the Dorfman procedure (procedure D ), and obtained an optimum solution (i.e., found an optimal partition) by dynamic programming. In this paper, we investigate the Sterrett procedure (procedure S ). We provide close form expression for the expected total number of tests, which allows us to find the optimum arrangement of the items in the particular group. We also show that an ordered partition is not optimal for the procedure S or even for a slightly modified Dorfman procedure (procedure D ′ ). This discovery implies that finding an optimal procedure S appears to be a hard computational problem. However, by using an optimal ordered partition for all procedures, we show that procedure D ′ is uniformly better than procedure D , and based on numerical comparisons, procedure S is uniformly and significantly better than procedures D and D ′ .

Read more
Other Statistics

Stop the tests: Opinion bias and statistical tests

When statisticians quarrel about hypothesis testing, the debate usually focus on which method is the correct one. The fundamental question of whether we should test hypothesis at all tends to be forgotten. This lack of debate has its roots on our desire to have ideas we believe and defend. But cognitive experiments have been showing that, when we do choose ideas, we become prey to a large number of biases. Several of our biases can be grouped together in a single description, an opinion bias. This opinion bias is nothing more than our desire to believe in something and to defend it. Also, despite our feelings, believing has no solid logical or philosophical grounds. In this paper, I will show that if we combine the fact that even logic can never prove an idea right or wrong and the problems our brains cause when we pick ideas, hypothesis testing and its terminology are a recipe for disaster. Testing should have no place when we are thinking about hypothesis.

Read more
Other Statistics

Studies on properties and estimation problems for modified extension of exponential distribution

The present paper considers modified extension of the exponential distribution with three parameters. We study the main properties of this new distribution, with special emphasis on its median, mode and moments function and some characteristics related to reliability studies. For Modified- extension exponential distribution (MEXED) we have obtained the Bayes Estimators of scale and shape parameters using Lindley's approximation (L-approximation) under squared error loss function. But, through this approximation technique it is not possible to compute the interval estimates of the parameters. Therefore, we also propose Gibbs sampling method to generate sample from the posterior distribution. On the basis of generated posterior sample we computed the Bayes estimates of the unknown parameters and constructed 95 % highest posterior density credible intervals. A Monte Carlo simulation study is carried out to compare the performance of Bayes estimators with the corresponding classical estimators in terms of their simulated risk. A real data set has been considered for illustrative purpose of the study.

Read more

Ready to get started?

Join us today