Featured Researches

Other Statistics

Do Reichenbachian Common Cause Systems of Arbitrary Finite Size Exist?

The principle of common cause asserts that positive correlations between causally unrelated events ought to be explained through the action of some shared causal factors. Reichenbachian common cause systems are probabilistic structures aimed at accounting for cases where correlations of the aforesaid sort cannot be explained through the action of a single common cause. The existence of Reichenbachian common cause systems of arbitrary finite size for each pair of non-causally correlated events was allegedly demonstrated by Hofer-Szabó and Rédei in 2006. This paper shows that their proof is logically deficient, and we propose an improved proof.

Read more
Other Statistics

Does preregistration improve the credibility of research findings?

Preregistration entails researchers registering their planned research hypotheses, methods, and analyses in a time-stamped document before they undertake their data collection and analyses. This document is then made available with the published research report to allow readers to identify discrepancies between what the researchers originally planned to do and what they actually ended up doing. This historical transparency is supposed to facilitate judgments about the credibility of the research findings. The present article provides a critical review of 17 of the reasons behind this argument. The article covers issues such as HARKing, multiple testing, p-hacking, forking paths, optional stopping, researchers' biases, selective reporting, test severity, publication bias, and replication rates. It is concluded that preregistration's historical transparency does not facilitate judgments about the credibility of research findings when researchers provide contemporary transparency in the form of (a) clear rationales for current hypotheses and analytical approaches, (b) public access to research data, materials, and code, and (c) demonstrations of the robustness of research conclusions to alternative interpretations and analytical approaches.

Read more
Other Statistics

Does the specification of uncertainty hurt the progress of scientometrics?

In "Caveats for using statistical significance tests in research assessments,"--Journal of Informetrics 7(1)(2013) 50-62, available at arXiv:1112.2516 -- Schneider (2013) focuses on Opthof & Leydesdorff (2010) as an example of the misuse of statistics in the social sciences. However, our conclusions are theoretical since they are not dependent on the use of one statistics or another. We agree with Schneider insofar as he proposes to develop further statistical instruments (such as effect sizes). Schneider (2013), however, argues on meta-theoretical grounds against the specification of uncertainty because, in his opinion, the presence of statistics would legitimate decision-making. We disagree: uncertainty can also be used for opening a debate. Scientometric results in which error bars are suppressed for meta-theoretical reasons should not be trusted.

Read more
Other Statistics

Dynamic Data in the Statistics Classroom

The call for using real data in the classroom has long meant using datasets which are culled, cleaned, and wrangled prior to any student working with the observations. However, an important part of teaching statistics should include actually retrieving data from the Internet. Nowadays, there are many different sources of data that are continually updated by the organization hosting the data website. The R tools to download such dynamic data have improved in such a way to make accessing the data possible even in an introductory statistics class. We provide five full analyses on dynamic data as well as an additional nine sources of dynamic data that can be brought into the classroom. The goal of our work is to demonstrate that using dynamic data can have a short learning curve, even for introductory students or faculty unfamiliar with the landscape. The examples provided are unlikely to create expert data scrapers, but they should help motivate students and faculty toward more engaged use of online data sources.

Read more
Other Statistics

Dynamic Question Ordering in Online Surveys

Online surveys have the potential to support adaptive questions, where later questions depend on earlier responses. Past work has taken a rule-based approach, uniformly across all respondents. We envision a richer interpretation of adaptive questions, which we call dynamic question ordering (DQO), where question order is personalized. Such an approach could increase engagement, and therefore response rate, as well as imputation quality. We present a DQO framework to improve survey completion and imputation. In the general survey-taking setting, we want to maximize survey completion, and so we focus on ordering questions to engage the respondent and collect hopefully all information, or at least the information that most characterizes the respondent, for accurate imputations. In another scenario, our goal is to provide a personalized prediction. Since it is possible to give reasonable predictions with only a subset of questions, we are not concerned with motivating users to answer all questions. Instead, we want to order questions to get information that reduces prediction uncertainty, while not being too burdensome. We illustrate this framework with an example of providing energy estimates to prospective tenants. We also discuss DQO for national surveys and consider connections between our statistics-based question-ordering approach and cognitive survey methodology.

Read more
Other Statistics

Dynamics of ternary statistical experiments with equilibrium state

We study the scenarios of the dynamics of ternary statistical experiments, modeled employing difference equations. The important features are a balance condition and the existence of a steady-state (equilibrium). We give a classification of scenarios of the model evolution which are significantly different between them, depending on the domain of the values of the model basic parameters.

Read more
Other Statistics

Econométrie et Machine Learning

Econometrics and machine learning seem to have one common goal: to construct a predictive model, for a variable of interest, using explanatory variables (or features). However, these two fields developed in parallel, thus creating two different cultures, to paraphrase Breiman (2001). The first was to build probabilistic models to describe economic phenomena. The second uses algorithms that will learn from their mistakes, with the aim, most often to classify (sounds, images, etc.). Recently, however, learning models have proven to be more effective than traditional econometric techniques (with a price to pay less explanatory power), and above all, they manage to manage much larger data. In this context, it becomes necessary for econometricians to understand what these two cultures are, what opposes them and especially what brings them closer together, in order to appropriate tools developed by the statistical learning community to integrate them into Econometric models.

Read more
Other Statistics

Effective Degrees of Freedom: A Flawed Metaphor

To most applied statisticians, a fitting procedure's degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. In particular, it is often used to parameterize the bias-variance tradeoff in model selection. We argue that, contrary to folk intuition, model complexity and degrees of freedom are not synonymous and may correspond very poorly. We exhibit and theoretically explore various examples of fitting procedures for which degrees of freedom is not monotonic in the model complexity parameter, and can exceed the total dimension of the response space. Even in very simple settings, the degrees of freedom can exceed the dimension of the ambient space by an arbitrarily large amount. We show the degrees of freedom for any non-convex projection method can be unbounded.

Read more
Other Statistics

Elements of the Kopula (eventological copula) theory

New in the probability theory and eventology theory, the concept of Kopula (eventological copula) is introduced. The theorem on the characterization of the sets of events by Kopula is proved, which serves as the eventological pre-image of the well-known Sclar's theorem on copulas (1959). The Kopulas of doublets and triplets of events are given, as well as of some N-sets of events.

Read more
Other Statistics

Emanuel Parzen: A Memorial, and a Model With the Two Kernels That He Championed

Manny Parzen passed away in February 2016, and this article is written partly as a memorial and appreciation. Manny made important contributions to several areas, but the two that influenced me most were his contributions to kernel density estimation and to Reproducing Kernel Hilbert Spaces, the two kernels of the title. Some fond memories of Manny as a PhD advisor begin this memorial, followed by a discussion of Manny's influence on density estimation and RKHS methods. A picture gallery of trips comes next, followed by the technical part of the article. Here our goal is to show how risk models can be built using RKHS penalized likelihood methods where subjects have personal (sample) densities which can be used as {\it attributes} in such models.

Read more

Ready to get started?

Join us today