Featured Researches

Other Statistics

Karl Pearson and the Logic of Science: Renouncing Causal Understanding (the Bride) and Inverted Spinozism

Karl Pearson is the leading figure of XX century statistics. He and his co-workers crafted the core of the theory, methods and language of frequentist or classical statistics -- the prevalent inductive logic of contemporary science. However, before working in statistics, K.Pearson had other interests in life, namely, in this order, philosophy, physics, and biological heredity. Key concepts of his philosophical and epistemological system of anti-Spinozism (a form of transcendental idealism) are carried over to his subsequent works on the logic of scientific discovery. This article's main goal is to analyze K.Pearson early philosophical and theological ideas and to investigate how the same ideas came to influence contemporary science, either directly or indirectly -- by the use of variant theories, methods and dialects of statistics, corresponding to variant statistical inference procedures and their specific belief calculi.

Read more
Other Statistics

Key Factor Not to Drop Out is to Attend the Lecture

In addition to the learning check testing results performed at each lectures, we have extended the factors to find the key dropping out factors. Among them are, the number of successes in the learning check testing, the number of attendances to the follow-up program classes, and etc. Then, we have found key factors strongly related to the students at risk. They are the following. 1) Badly failed students (score range is 0-39 in the final examination) tend to be absent for the regular classes and fail in the learning check testing even if they attended, and they are very reluctant to attend the follow-up program classes. 2) Successful students (score range is 60-100 in the final examination) attend classes and get good scores in every learning check testing. 3) Failed students but not so badly (score range is 40-59 in the final examination) reveal both sides of features appeared in score range of 0-39 and score range of 60-100. Therefore, it is crucial to attend the lectures in order not to drop out. Students who failed in learning check testing more than half out of all testing times almost absolutely failed in the final examination, which could cause the drop out. Also, students who were successful to learning check testing more than two third out of all testing times took better score in the final examination.

Read more
Other Statistics

Kill The Math and Let the Introductory Course Be Born

Our introductory classes in statistics and data science use too much mathematics. The key causal effect which our students want our classes to have is to improve their future performance and opportunities. The more professional their computing skills (in the context of data analysis), the greater their likely success. Introductory courses should feature almost no mathematical/statistical formulas beyond simple algebra.

Read more
Other Statistics

L p -nested symmetric distributions

Tractable generalizations of the Gaussian distribution play an important role for the analysis of high-dimensional data. One very general super-class of Normal distributions is the class of ν -spherical distributions whose random variables can be represented as the product $\x = r\cdot \u$ of a uniformly distribution random variable $\u$ on the 1 -level set of a positively homogeneous function ν and arbitrary positive radial random variable r . Prominent subclasses of ν -spherical distributions are spherically symmetric distributions ($\nu(\x)=\|\x\|_2$) which have been further generalized to the class of L p -spherically symmetric distributions ($\nu(\x)=\|\x\|_p$). Both of these classes contain the Gaussian as a special case. In general, however, ν -spherical distributions are computationally intractable since, for instance, the normalization constant or fast sampling algorithms are unknown for an arbitrary ν . In this paper we introduce a new subclass of ν -spherical distributions by choosing ν to be a nested cascade of L p -norms. This class is still computationally tractable, but includes all the aforementioned subclasses as a special case. We derive a general expression for L p -nested symmetric distributions as well as the uniform distribution on the L p -nested unit sphere, including an explicit expression for the normalization constant. We state several general properties of L p -nested symmetric distributions, investigate its marginals, maximum likelihood fitting and discuss its tight links to well known machine learning methods such as Independent Component Analysis (ICA), Independent Subspace Analysis (ISA) and mixed norm regularizers. Finally, we derive a fast and exact sampling algorithm for arbitrary L p -nested symmetric distributions, and introduce the Nested Radial Factorization algorithm (NRF), which is a form of non-linear ICA.

Read more
Other Statistics

Le Her and Other Problems in Probability Discussed by Bernoulli, Montmort and Waldegrave

Part V of the second edition of Pierre Rémond de Montmort's Essay d'analyse sur les jeux de hazard published in 1713 contains correspondence on probability problems between Montmort and Nicolaus Bernoulli. This correspondence begins in 1710. The last published letter, dated November 15, 1713, is from Montmort to Nicolaus Bernoulli. There is some discussion of the strategy of play in the card game Le Her and a bit of news that Montmort's friend Waldegrave in Paris was going to take care of the printing of the book. From earlier correspondence between Bernoulli and Montmort, it is apparent that Waldegrave had also analyzed Le Her and had come up with a mixed strategy as a solution. He had also suggested working on the "problem of the pool," or what is often called Waldegrave's problem. The Universitätsbibliothek Basel contains an additional forty-two letters between Bernoulli and Montmort written after 1713, as well as two letters between Bernoulli and Waldegrave. The letters are all in French, and here we provide translations of key passages. The trio continued to discuss probability problems, particularly Le Her which was still under discussion when the Essay d'analyse went to print. We describe the probability content of this body of correspondence and put it in its historical context. We also provide a proper identification of Waldegrave based on manuscripts in the Archives nationales de France in Paris.

Read more
Other Statistics

Learning as We Go: An Examination of the Statistical Accuracy of COVID19 Daily Death Count Predictions

This paper provides a formal evaluation of the predictive performance of a model (and its various updates) developed by the Institute for Health Metrics and Evaluation (IHME) for predicting daily deaths attributed to COVID19 for each state in the United States. The IHME models have received extensive attention in social and mass media, and have influenced policy makers at the highest levels of the United States government. For effective policy making the accurate assessment of uncertainty, as well as accurate point predictions, are necessary because the risks inherent in a decision must be taken into account, especially in the present setting of a novel disease affecting millions of lives. To assess the accuracy of the IHME models, we examine both forecast accuracy as well as the predictive performance of the 95% prediction intervals provided by the IHME models. We find that the initial IHME model underestimates the uncertainty surrounding the number of daily deaths substantially. Specifically, the true number of next day deaths fell outside the IHME prediction intervals as much as 70% of the time, in comparison to the expected value of 5%. In addition, we note that the performance of the initial model does not improve with shorter forecast horizons. Regarding the updated models, our analyses indicate that the later models do not show any improvement in the accuracy of the point estimate predictions. In fact, there is some evidence that this accuracy has actually decreased over the initial models. Moreover, when considering the updated models, while we observe a larger percentage of states having actual values lying inside the 95% prediction intervals (PI), our analysis suggests that this observation may be attributed to the widening of the PIs. The width of these intervals calls into question the usefulness of the predictions to drive policy making and resource allocation.

Read more
Other Statistics

Lessons from the German Tank Problem

During World War II the German army used tanks to devastating advantage. The Allies needed accurate estimates of their tank production and deployment. They used two approaches to find these values: spies, and statistics. This note describes the statistical approach. Assuming the tanks are labeled consecutively starting at 1, if we observe k serial numbers from an unknown number N of tanks, with the maximum observed value m , then the best estimate for N is m(1+1/k)?? . This is now known as the German Tank Problem, and is a terrific example of the applicability of mathematics and statistics in the real world. The first part of the paper reproduces known results, specifically deriving this estimate and comparing its effectiveness to that of the spies. The second part presents a result we have not found in print elsewhere, the generalization to the case where the smallest value is not necessarily 1. We emphasize in detail why we are able to obtain such clean, closed-form expressions for the estimates, and conclude with an appendix highlighting how to use this problem to teach regression and how statistics can help us find functional relationships.

Read more
Other Statistics

Leveraging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities

Clinicians decisions are becoming more and more evidence-based meaning in no other field the big data analytics so promising as in healthcare. Due to the sheer size and availability of healthcare data, big data analytics has revolutionized this industry and promises us a world of opportunities. It promises us the power of early detection, prediction, prevention and helps us to improve the quality of life. Researchers and clinicians are working to inhibit big data from having a positive impact on health in the future. Different tools and techniques are being used to analyze, process, accumulate, assimilate and manage large amount of healthcare data either in structured or unstructured form. In this paper, we would like to address the need of big data analytics in healthcare: why and how can it help to improve life?. We present the emerging landscape of big data and analytical techniques in the five sub-disciplines of healthcare i.e.medical image analysis and imaging informatics, bioinformatics, clinical informatics, public health informatics and medical signal analytics. We presents different architectures, advantages and repositories of each discipline that draws an integrated depiction of how distinct healthcare activities are accomplished in the pipeline to facilitate individual patients from multiple perspectives. Finally the paper ends with the notable applications and challenges in adoption of big data analytics in healthcare.

Read more
Other Statistics

Likelihood-based solution to the Monty Hall puzzle and a related 3-prisoner paradox

The Monty Hall puzzle has been solved and dissected in many ways, but always using probabilistic arguments, so it is considered a probability puzzle. In this paper the puzzle is set up as an orthodox statistical problem involving an unknown parameter, a probability model and an observation. This means we can compute a likelihood function, and the decision to switch corresponds to choosing the maximum likelihood solution. One advantage of the likelihood-based solution is that the reasoning applies to a single game, unaffected by the future plan of the host. I also describe an earlier version of the puzzle in terms of three prisoners: two to be executed and one released. Unlike the goats and the car, these prisoners have consciousness, so they can think about exchanging punishments. When two of them do that, however, we have a paradox, where it is advantageous for both to exchange their punishment with each other. Overall, the puzzle and the paradox are useful examples of statistical thinking, so they are excellent teaching topics.

Read more
Other Statistics

Limits on Inferring the Past

Here we define and study the properties of retrodictive inference. We derive equations relating retrodiction entropy and thermodynamic entropy, and as a special case, show that under equilibrium conditions, the two are identical. We demonstrate relations involving the KL-divergence and retrodiction probability, and bound the time rate of change of retrodiction entropy. As a specific case, we invert various Langevin processes, inferring the initial condition of \(N\) particles given their final positions at some later time. We evaluate the retrodiction entropy for Langevin dynamics exactly for special cases, and find that one's ability to infer the initial state of a system can exhibit two possible qualitative behaviors depending on the potential energy landscape, either decreasing indefinitely, or asymptotically approaching a fixed value. We also study how well we can retrodict points that evolve based on the logistic map. We find singular changes in the retrodictivity near bifurcations. Counterintuitively, the transition to chaos is accompanied by maximal retrodictability.

Read more

Ready to get started?

Join us today