Featured Researches

Other Statistics

Bringing Order to the Chaos in the Brickyard

An allegory published in 1963 titled Chaos in the Brickyard spoke to the decline in the quality of research. In the intervening time greater awareness of the issues and actions to improve research endeavors have emerged. Still, problems persist. This paper is intended to clarify some of the challenges, particularly with respect to quantitative research, then suggest ways to improve the quality of published research. The paper highlights where feasible refinements in analytical techniques can be made and provides a guide to fundamental principles related to data analysis in research.

Read more
Other Statistics

Bringing Visual Inference to the Classroom

In the classroom, we traditionally visualize inferential concepts using static graphics or interactive apps. For example, there is a long history of using apps to visualize sampling distributions. Recent developments in statistical graphics have created an opportunity to bring additional visualizations into the classroom to hone student understanding. Specifically, the lineup protocol for visual inference provides a framework for students see the difference between signal and noise by embedding a plot of observed data in a field of null (noise) plots. Lineups have proven valuable in visualizing randomization/permutation tests, diagnosing models, and even conducting valid inference when distributional assumptions break down. This paper provides an overview of how the lineup protocol for visual inference can be used to hone understanding of key statistical topics throughout the statistics curricula.

Read more
Other Statistics

Building Communication Skills in a Theoretical Statistics Course

The traditional theoretical statistics course which develops the theoretical underpinnings of the discipline (usually following a probability course) is undergoing near-continuous revision in the statistics community. In particular, recent versions of this course have incorporated more and more computation. We take a look at a different aspect of the revision - building student communication skills in the course, in both written and verbal forms, to allow students to demonstrate their ability to explain statistical concepts. Two separate projects are discussed, both of which were engaged in by a class of size 17 in Spring 2015. The first project had a computational aspect (performed using R), a statistical theory component, and a writing component, and was based on the historical German tank problem. The second project involved a class presentation and written report summarizing, critiquing, and/or explaining an article selected from The American Statistician.

Read more
Other Statistics

CMS Sematrix: A Tool to Aid the Development of Clinical Quality Measures (CQMs)

As part of the effort to improve quality and to reduce national healthcare costs, the Centers for Medicare and Medicaid Services (CMS) are responsible for creating and maintaining an array of clinical quality measures (CQMs) for assessing healthcare structure, process, outcome, and patient experience across various conditions, clinical specialties, and settings. The development and maintenance of CQMs involves substantial and ongoing evaluation of the evidence on the measure's properties: importance, reliability, validity, feasibility, and usability. As such, CMS conducts monthly environmental scans of the published clinical and health service literature. Conducting time consuming, exhaustive evaluations of the ever-changing healthcare literature presents one of the largest challenges to an evidence-based approach to healthcare quality improvement. Thus, it is imperative to leverage automated techniques to aid CMS in the identification of clinical and health services literature relevant to CQMs. Additionally, the estimated labor hours and related cost savings of using CMS Sematrix compared to a traditional literature review are roughly 818 hours and 122,000 dollars for a single monthly environmental scan.

Read more
Other Statistics

Can everyday AI be ethical. Fairness of Machine Learning Algorithms

Combining big data and machine learning algorithms, the power of automatic decision tools induces as much hope as fear. Many recently enacted European legislation (GDPR) and French laws attempt to regulate the use of these tools. Leaving aside the well-identified problems of data confidentiality and impediments to competition, we focus on the risks of discrimination, the problems of transparency and the quality of algorithmic decisions. The detailed perspective of the legal texts, faced with the complexity and opacity of the learning algorithms, reveals the need for important technological disruptions for the detection or reduction of the discrimination risk, and for addressing the right to obtain an explanation of the auto- matic decision. Since trust of the developers and above all of the users (citizens, litigants, customers) is essential, algorithms exploiting personal data must be deployed in a strict ethical framework. In conclusion, to answer this need, we list some ways of controls to be developed: institutional control, ethical charter, external audit attached to the issue of a label.

Read more
Other Statistics

Can rational choice guide us to correct {\em de se} beliefs?

Significant controversy remains about what constitute correct self-locating beliefs in scenarios such as the Sleeping Beauty problem, with proponents on both the "halfer" and "thirder" sides. To attempt to settle the issue, one natural approach consists in creating decision variants of the problem, determining what actions the various candidate beliefs prescribe, and assessing whether these actions are reasonable when we step back. Dutch book arguments are a special case of this approach, but other Sleeping Beauty games have also been constructed to make similar points. Building on a recent article (James R.~Shaw. {\em De se} belief and rational choice. {\em Synthese}, 190(3):491-508, 2013), I show that in general we should be wary of such arguments, because unintuitive actions may result for reasons that are unrelated to the beliefs. On the other hand, I show that, when we restrict our attention to {\em additive} games, then a thirder will necessarily maximize her {\em ex ante} expected payout, but a halfer in some cases will not (assuming causal decision theory). I conclude that this does not necessarily settle the issue and speculate about what might.

Read more
Other Statistics

Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect

Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlapping, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recommended to reduce such dichotomous interpretations but their effectiveness has also been challenged. We ran two experiments on researchers with expertise in statistical analysis to compare several alternative representations of confidence intervals and used Bayesian multilevel models to estimate the effects of the representation styles on differences in researchers' subjective confidence in the results. We also asked the respondents' opinions and preferences in representation styles. Our results suggest that adding visual information to classic CI representation can decrease the tendency towards dichotomous interpretations - measured as the `cliff effect': the sudden drop in confidence around p-value 0.05 - compared with classic CI visualization and textual representation of the CI with p-values. All data and analyses are publicly available at this https URL.

Read more
Other Statistics

Canadian Crime Rates in the Penalty Box

Over the 1962 to 2016 period, the Canadian violent crime rate has remained strongly correlated with National Hockey League (NHL) penalties. The Canadian property crime rate was similarly correlated with stolen base attempts in the Major League Baseball (MLB). Of course, correlation does not imply causation or prove association. It is simply presented here as an observation. Curious readers might be tempted to conduct additional research and ask questions in order to enhance the conversation, transition away from a state of confusion, clarify the situation, prevent false attribution, and possibly solve a problem that economists call identification.

Read more
Other Statistics

Casting Multiple Shadows: High-Dimensional Interactive Data Visualisation with Tours and Embeddings

Non-linear dimensionality reduction (NLDR) methods such as t-distributed stochastic neighbour embedding (t-SNE) are ubiquitous in the natural sciences, however, the appropriate use of these methods is difficult because of their complex parameterisations; analysts must make trade-offs in order to identify structure in the visualisation of an NLDR technique. We present visual diagnostics for the pragmatic usage of NLDR methods by combining them with a technique called the tour. A tour is a sequence of interpolated linear projections of multivariate data onto a lower dimensional space. The sequence is displayed as a dynamic visualisation, allowing a user to see the shadows the high-dimensional data casts in a lower dimensional view. By linking the tour to an NLDR view, we can preserve global structure and through user interactions like linked brushing observe where the NLDR view may be misleading. We display several case studies from both simulations and single cell transcriptomics, that shows our approach is useful for cluster orientation tasks.

Read more
Other Statistics

Causal influence in linear response models

The intuition of causation is so fundamental that almost every research study in life sciences refers to this concept. However a widely accepted formal definition of causal influence between observables is still missing. In the framework of linear Langevin networks without feedbacks (linear response models) we developed a measure of causal influence based on a decomposition of information flows over time. We discuss its main properties and compare it with other information measures like the Transfer Entropy. Finally we outline some difficulties of the extension to a general definition of causal influence for complex systems.

Read more

Ready to get started?

Join us today