Featured Researches

Other Statistics

Data learning from big data

Technology is generating a huge and growing availability of observa tions of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical analysis of enormous batches of data. In this paper, we discuss the role of statistics regarding some of the issues raised by big data in this new paradigm and also propose the name of data learning to describe all the activities that allow to obtain relevant knowledge from this new source of information.

Read more
Other Statistics

Data scraping, ingestation, and modeling: bringing data from cars.com into the intro stats class

New tools have made it much easier for students to develop skills to work with interesting data sets as they begin to extract meaning from data. To fully appreciate the statistical analysis cycle, students benefit from repeated experiences collecting, ingesting, wrangling, analyzing data and communicating results. How can we bring such opportunities into the classroom? We describe a classroom activity, originally developed by Danny Kaplan (Macalester College), in which students can expand upon statistical problem solving by hand-scraping data from this http URL, ingesting these data into R, then carrying out analyses of the relationships between price, mileage, and model year for a selected type of car.

Read more
Other Statistics

Data-Mining Research in Education

As an interdisciplinary discipline, data mining (DM) is popular in education area especially when examining students' learning performances. It focuses on analyzing educational related data to develop models for improving learners' learning experiences and enhancing institutional effectiveness. Therefore, DM does help education institutions provide high-quality education for its learners. Applying data mining in education also known as educational data mining (EDM), which enables to better understand how students learn and identify how improve educational outcomes. Present paper is designed to justify the capabilities of data mining approaches in the filed of education. The latest trends on EDM research are introduced in this review. Several specific algorithms, methods, applications and gaps in the current literature and future insights are discussed here.

Read more
Other Statistics

Data-driven discovery of coordinates and governing equations

The discovery of governing equations from scientific data has the potential to transform data-rich fields that lack well-characterized quantitative descriptions. Advances in sparse regression are currently enabling the tractable identification of both the structure and parameters of a nonlinear dynamical system from data. The resulting models have the fewest terms necessary to describe the dynamics, balancing model complexity with descriptive ability, and thus promoting interpretability and generalizability. This provides an algorithmic approach to Occam's razor for model discovery. However, this approach fundamentally relies on an effective coordinate system in which the dynamics have a simple representation. In this work, we design a custom autoencoder to discover a coordinate transformation into a reduced space where the dynamics may be sparsely represented. Thus, we simultaneously learn the governing equations and the associated coordinate system. We demonstrate this approach on several example high-dimensional dynamical systems with low-dimensional behavior. The resulting modeling framework combines the strengths of deep neural networks for flexible representation and sparse identification of nonlinear dynamics (SINDy) for parsimonious models. It is the first method of its kind to place the discovery of coordinates and models on an equal footing.

Read more
Other Statistics

Dealing with multiple testing: To adjust or not to adjust

Multiple testing problems arise naturally in scientific studies because of the need to capture or convey more information with more variables. The literature is enormous, but the emphasis is primarily methodological, providing numerous methods with their mathematical justification and practical implementation. Our aim is to highlight the logical issues involved in the application of multiple testing adjustment.

Read more
Other Statistics

Decision Making for Inconsistent Expert Judgments Using Negative Probabilities

In this paper we provide a simple random-variable example of inconsistent information, and analyze it using three different approaches: Bayesian, quantum-like, and negative probabilities. We then show that, at least for this particular example, both the Bayesian and the quantum-like approaches have less normative power than the negative probabilities one.

Read more
Other Statistics

Defending the P-value

Attacks on the P-value are nothing new, but the recent attacks are increasingly more serious. They come from more mainstream sources, with widening targets such as a call to retire the significance testing altogether. While well meaning, I believe these attacks are nevertheless misdirected: Blaming the P-value for the naturally tentative trial-and-error process of scientific discoveries, and presuming that banning the P-value would make the process cleaner and less error-prone. However tentative, the skeptical scientists still have to form unambiguous opinions, proximately to move forward in their investigations and ultimately to present results to the wider community. With obvious reasons, they constantly need to balance between the false-positive and false-negative errors. How would banning the P-value or significance tests help in this balancing act? It seems trite to say that this balance will always depend on the relative costs or the trade-off between the errors. These costs are highly context specific, varying by area of applications or by stage of investigation. A calibrated but tunable knob, such as that given by the P-value, is needed for controlling this balance. This paper presents detailed arguments in support of the P-value.

Read more
Other Statistics

Degrees of Equivalence in a Key Comparison

In an interlaboratory key comparison, a data analysis procedure for this comparison was proposed and recommended by CIPM [1, 2, 3], therein the degrees of equivalence of measurement standards of the laboratories participated in the comparison and the ones between each two laboratories were introduced but a corresponding clear and plausible measurement model was not given. Authors in [4] offered possible measurement models for a given comparison and a suitable model was selected out after rigorous analyzing steps for expectation values of these degrees of equivalence. The systematic laboratory-effects model was then selected as a right one in this report. Those models were all based on the one true value existence assumption. However in the year 2008, a new version of the Vocabulary for International Metrology (VIM) [7] was issued where the true value of a given measurement standard should be now perceived as multi true values which following a given statistics distribution. Applying this perception of true values of a measurement standard with combination of the steps in [4], measurement models have been developed and degrees of equivalence have been analyzed. The results show that although with new definition, the systematic laboratory-effects model is still the reasonable one in a given key comparison.

Read more
Other Statistics

Demographic perspectives in research on global environmental change

Human population is at the centre of research on global environmental change. On the one hand, population dynamics influence the environment and the global climate system through consumption-based carbon emissions. On the other hand, health and wellbeing of the population is already being affected by climate change. The knowledge on population dynamics and population heterogeneity thus is fundamental in improving our understanding of how population size, composition and distribution influence global environmental change and how these changes affect subgroups of population differentially by demographic characteristics and spatial distribution. Existing theoretical concepts and methodological tools in demography can be readily applied to the study of population and global environmental change. In the past couple of decades, demographic research has enriched climate change research both in the analysis of the impact of population dynamics on the global climate system as well as the impact of climate change on human population. What is missing in the literature is the study that investigates how global environmental change affect current and future demographic processes and consequently population trends. If global environmental change does influence fertility, mortality and migration, the three key demographic components underlying population change, population estimates and forecast need to adjust from the climate feedback in population projections. Indisputably, this is the new area of research that directly requires expertise in population science and contribution from demographers.

Read more
Other Statistics

Designing Modular Software: A Case Study in Introductory Statistics

Modular programming is a development paradigm that emphasizes self-contained, flexible, and independent pieces of functionality. This practice allows new features to be seamlessly added when desired, and unwanted features to be removed, thus simplifying the user-facing view of the software. The recent rise of web-based software applications has presented new challenges for designing an extensible, modular software system. In this paper, we outline a framework for designing such a system, with a focus on reproducibility of the results. We present as a case study a Shiny-based web application called intRo, that allows the user to perform basic data analyses and statistical routines. Finally, we highlight some challenges we encountered, and how to address them, when combining modular programming concepts with reactive programming as used by Shiny.

Read more

Ready to get started?

Join us today