Featured Researches

Other Statistics

InSilicoVA: A Method to Automate Cause of Death Assignment for Verbal Autopsy

Verbal autopsies (VA) are widely used to provide cause-specific mortality estimates in developing world settings where vital registration does not function well. VAs assign cause(s) to a death by using information describing the events leading up to the death, provided by care givers. Typically physicians read VA interviews and assign causes using their expert knowledge. Physician coding is often slow, and individual physicians bring bias to the coding process that results in non-comparable cause assignments. These problems significantly limit the utility of physician-coded VAs. A solution to both is to use an algorithmic approach that formalizes the cause-assignment process. This ensures that assigned causes are comparable and requires many fewer person-hours so that cause assignment can be conducted quickly without disrupting the normal work of physicians. Peter Byass' InterVA method is the most widely used algorithmic approach to VA coding and is aligned with the WHO 2012 standard VA questionnaire. The statistical model underpinning InterVA can be improved; uncertainty needs to be quantified, and the link between the population-level CSMFs and the individual-level cause assignments needs to be statistically rigorous. Addressing these theoretical concerns provides an opportunity to create new software using modern languages that can run on multiple platforms and will be widely shared. Building on the overall framework pioneered by InterVA, our work creates a statistical model for automated VA cause assignment.

Read more
Other Statistics

Incertitudes et mesures

Educational guide focused on the statistical treatment of measurement uncertainties. The conditions of application of current practices are detailed and precised: mean values, central limit theorem, linear regression. The last two chapters are devoted to an introduction to the Bayesian inference and a series of application cases: machine failure date, elimination of a background noise, linear adjustment with elimination of outliers.

Read more
Other Statistics

Incomplete Reparameterizations and Equivalent Metrics

Reparameterizing a probabilisitic system is common advice for improving the performance of a statistical algorithm like Markov chain Monte Carlo, even though in theory such reparameterizations should leave the system, and the performance of any algorithm, invariant. In this paper I show how the reparameterizations common in practice are only incomplete reparameterizations which result in different interactions between a target probabilistic system and a given algorithm. I then consider how these changing interactions manifest in the context of Markov chain Monte Carlo algorithms defined on Riemannian manifolds. In particular I show how any incomplete reparameterization is equivalent to modifying the metric geometry directly.

Read more
Other Statistics

Information vs. Uncertainty as the Foundation for a Science of Environmental Modeling

Information accounting provides a better foundation for hypothesis testing than does uncertainty quantification. A quantitative account of science is derived under this perspective that alleviates the need for epistemic bridge principles, solves the problem of ad hoc falsification criteria, and deals with verisimilitude by facilitating a general approach to process-level diagnostics. Our argument is that the well-known inconsistencies of both Bayesian and classical statistical hypothesis tests are due to the fact that probability theory is an insufficient logic of science. Information theory, as an extension of probability theory, is required to provide a complete logic on which to base quantitative theories of empirical learning. The organizing question in this case becomes not whether our theories or models are more or less true, or about how much uncertainty is associated with a particular model, but instead whether there is any information available from experimental data that might allow us to improve the model. This becomes a formal hypothesis test, provides a theory of model diagnostics, and suggests a new approach to building dynamical systems models.

Read more
Other Statistics

Inhomogeneous K-function for germ-grain models

In this paper, we propose a generalization to germ-grain models of the inhomogeneous K-function of Point Processes. We apply them to a sample of images of peripheral blood smears obtained from patients with Sickle Cell Disease, in order to decide whether the sample belongs to the thin, thick or morphological region.

Read more
Other Statistics

Integrating computing in the statistics and data science curriculum: Creative structures, novel skills and habits, and ways to teach computational thinking

Nolan and Temple Lang (2010) argued for the fundamental role of computing in the statistics curriculum. In the intervening decade the statistics education community has acknowledged that computational skills are as important to statistics and data science practice as mathematics. There remains a notable gap, however, between our intentions and our actions. In this special issue of the *Journal of Statistics and Data Science Education* we have assembled a collection of papers that (1) suggest creative structures to integrate computing, (2) describe novel data science skills and habits, and (3) propose ways to teach computational thinking. We believe that it is critical for the community to redouble our efforts to embrace sophisticated computing in the statistics and data science curriculum. We hope that these papers provide useful guidance for the community to move these efforts forward.

Read more
Other Statistics

Integrating data science ethics into an undergraduate major

We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for weaving ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated into existing courses at different levels of instruction with minimal disruption to syllabi. We conclude with next steps and preliminary assessments.

Read more
Other Statistics

Inter-Rater: Software for analysis of inter-rater reliability by permutating pairs of multiple users

Inter-Rater quantifies the reliability between multiple raters who evaluate a group of subjects. It calculates the group quantity, Fleiss kappa, and it improves on existing software by keeping information about each user and quantifying how each user agreed with the rest of the group. This is accomplished through permutations of user pairs. The software was written in Python, can be run in Linux, and the code is deposited in Zenodo and GitHub. This software can be used for evaluation of inter-rater reliability in systematic reviews, medical diagnosis algorithms, education applications, and others.

Read more
Other Statistics

Interactive graphics for functional data analyses

Although there are established graphics that accompany the most common functional data analyses, generating these graphics for each dataset and analysis can be cumbersome and time consuming. Often, the barriers to visualization inhibit useful exploratory data analyses and prevent the development of intuition for a method and its application to a particular dataset. The refund.shiny package was developed to address these issues for several of the most common functional data analyses. After conducting an analysis, the plot_shiny() function is used to generate an interactive visualization environment that contains several distinct graphics, many of which are updated in response to user input. These visualizations reduce the burden of exploratory analyses and can serve as a useful tool for the communication of results to non-statisticians.

Read more
Other Statistics

Introducing Bayesian Analysis with m&m's ® : an active-learning exercise for undergraduates

We present an active-learning strategy for undergraduates that applies Bayesian analysis to candy-covered chocolate m&m's ® . The exercise is best suited for small class sizes and tutorial settings, after students have been introduced to the concepts of Bayesian statistics. The exercise takes advantage of the non-uniform distribution of m&m's ® colours, and the difference in distributions made at two different factories. In this paper, we provide the intended learning outcomes, lesson plan and step-by-step guide for instruction, and open-source teaching materials. We also suggest an extension to the exercise for the graduate-level, which incorporates hierarchical Bayesian analysis.

Read more

Ready to get started?

Join us today