Statistics Other Statistics - Researchain

Featured Researches

Benford or not Benford: new results on digits beyond the first

In this paper, we will see that the proportion of d as p th digit, where p > 1 and d ∈ 0, 9, in data (obtained thanks to the hereunder developed model) is more likely to follow a law whose probability distribution is determined by a specific upper bound, rather than the generalization of Benford's Law to digits beyond the first one. These probability distributions fluctuate around theoretical values determined by Hill in 1995. Knowing beforehand the value of the upper bound can be a way to find a better adjusted law than Hill's one.

Other Statistics

Benford's law: A theoretical explanation for base 2

In this paper, we present a possible theoretical explanation for benford's law. We develop a recursive relation between the probabilities, using simple intuitive ideas. We first use numerical solutions of this recursion and verify that the solutions converge to the benford's law. Finally we solve the recursion analytically to yeild the benford's law for base 2.

Other Statistics

Bernoulli Trials With Skewed Propensities for Certification and Validation

The impetus for writing this paper are the well publicized media reports that software failure was the cause of the two recent mishaps of the Boeing 737 Max aircraft. The problem considered here though, is a specific one, in the sense that it endeavors to address the general matter of conditions under which an item such as a drug, a material specimen, or a complex, system can be certified for use based on a large number of Bernoulli trials, all successful. More broadly, the paper is an attempt to answer the old and honorable philosophical question, namely," when can empirical testing on its own validate a law of nature?" Our message is that the answer depends on what one starts with, namely, what is one's prior distribution, what unknown does this prior distribution endow, and what has been observed as data. The paper is expository in that it begins with a historical overview, and ends with some new ideas and proposals for addressing the question posed. In the sequel, it also articulates on Popper's notion of "propensity" and its role in providing a proper framework for Bayesian inference under Bernoulli trials, as well as the need to engage with posterior distributions that are subjectively specified; that is, without a recourse to the usual Bayesian prior to posterior iteration.

Other Statistics

Beyond subjective and objective in statistics

We argue that the words "objectivity" and "subjectivity" in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, impartiality, and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. The advantage of these reformulations is that the replacement terms do not oppose each other. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can recognize desirable attributes such as transparency and acknowledgment of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling, and socioeconomic stratification.

Other Statistics

Biologists meet statisticians: A workshop for young scientists to foster interdisciplinary team work

Life science and statistics have necessarily become essential partners. The need to plan complex, structured experiments, involving elaborated designs, and the need to analyse datasets in the era of systems biology and high throughput technologies has to build upon professional statistical expertise. On the other hand, conducting such analyses and also developing improved or new methods, also for novel kinds of data, has to build upon solid biological understanding and practise. However, the meeting of scientists of both fields is often hampered by a variety of communicative hurdles - which are based on field-specific working languages and cultural differences. As a step towards a better mutual understanding, we developed a workshop concept bringing together young experimental biologists and statisticians, to work as pairs and learn to value each others competences and practise interdisciplinary communication in a casual atmosphere. The first implementation of our concept was a cooperation of the German Region of the International Biometrical Society and the Leibnitz Institute DSMZ-German Collection of Microorganisms and Cell Cultures (short: DSMZ), Braunschweig, Germany. We collected feedback in form of three questionnaires, oral comments, and gathered experiences for the improvement of this concept. The long-term challenge for both disciplines is the establishment of systematic schedules and strategic partnerships which use the proposed workshop concept to foster mutual understanding, to seed the necessary interdisciplinary cooperation network, and to start training the indispensable communication skills at the earliest possible phase of education.

Other Statistics

Boundary detection in disease mapping studies

In disease mapping, the aim is to estimate the spatial pattern in disease risk over an extended geographical region, so that areas with elevated risks can be identified. A Bayesian hierarchical approach is typically used to produce such maps, which models the risk surface with a set of spatially smooth random effects. However, in complex urban settings there are likely to be boundaries in the risk surface, which separate populations that are geographically adjacent but have very different risk profiles. Therefore this paper proposes an approach for detecting such risk boundaries, and tests its effectiveness by simulation. Finally, the model is applied to lung cancer incidence data in Greater Glasgow, Scotland, between 2001 and 2005.

Other Statistics

Bounds on Bayes Factors for Binomial A/B Testing

Bayes factors, in many cases, have been proven to bridge the classic -value based significance testing and bayesian analysis of posterior odds. This paper discusses this phenomena within the binomial A/B testing setup (applicable for example to conversion testing). It is shown that the bayes factor is controlled by the \emph{Jensen-Shannon divergence} of success ratios in two tested groups, which can be further bounded by the Welch statistic. As a result, bayesian sample bounds almost match frequentionist's sample bounds. The link between Jensen-Shannon divergence and Welch's test as well as the derivation are an elegant application of tools from information geometry.

Other Statistics

Box-Cox symmetric distributions and applications to nutritional data

We introduce the Box-Cox symmetric class of distributions, which is useful for modeling positively skewed, possibly heavy-tailed, data. The new class of distributions includes the Box-Cox t, Box-Cox Cole-Gree, Box-Cox power exponential distributions, and the class of the log-symmetric distributions as special cases. It provides easy parameter interpretation, which makes it convenient for regression modeling purposes. Additionally, it provides enough flexibility to handle outliers. The usefulness of the Box-Cox symmetric models is illustrated in applications to nutritional data.

Other Statistics

Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets

We conduct the first natural field experiment to explore the relationship between the "meaningfulness" of a task and worker effort. We employed about 2,500 workers from Amazon's Mechanical Turk (MTurk), an online labor market, to label medical images. Although given an identical task, we experimentally manipulated how the task was framed. Subjects in the meaningful treatment were told that they were labeling tumor cells in order to assist medical researchers, subjects in the zero-context condition (the control group) were not told the purpose of the task, and, in stark contrast, subjects in the shredded treatment were not given context and were additionally told that their work would be discarded. We found that when a task was framed more meaningfully, workers were more likely to participate. We also found that the meaningful treatment increased the quantity of output (with an insignificant change in quality) while the shredded treatment decreased the quality of output (with no change in quantity). We believe these results will generalize to other short-term labor markets. Our study also discusses MTurk as an exciting platform for running natural field experiments in economics.

Other Statistics

Bridging Breiman's Brook: From Algorithmic Modeling to Statistical Learning

In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breiman's own Random Forest methods. While this can be simplistically described as "Breiman won", these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.

Ready to get started?

Join us today

Archive Your Research