Featured Researches

Other Statistics

On new data sources for the production of official statistics

In the past years we have witnessed the rise of new data sources for the potential production of official statistics, which, by and large, can be classified as survey, administrative, and digital data. Apart from the differences in their generation and collection, we claim that their lack of statistical metadata, their economic value, and their lack of ownership by data holders pose several entangled challenges lurking the incorporation of new data into the routinely production of official statistics. We argue that every challenge must be duly overcome in the international community to bring new statistical products based on these sources. These challenges can be naturally classified into different entangled issues regarding access to data, statistical methodology, quality, information technologies, and management. We identify the most relevant to be necessarily tackled before new data sources can be definitively considered fully incorporated into the production of official statistics.

Read more
Other Statistics

On optimal policy in the group testing with incomplete identification

Consider a very large (infinite) population of items, where each item independent from the others is defective with probability p, or good with probability q=1-p. The goal is to identify N good items as quickly as possible. The following group testing policy (policy A) is considered: test items together in the groups, if the test outcome of group i of size n_i is negative, then accept all items in this group as good, otherwise discard the group. Then, move to the next group and continue until exact N good items are found. The goal is to find an optimal testing configuration, i.e., group sizes, under policy A, such that the expected waiting time to obtain N good items is minimal. Recently, Gusev (2012) found an optimal group testing configuration under the assumptions of constant group size and N=\infty. In this note, an optimal solution under policy A for finite N is provided. Keywords: Dynamic programming; Optimal design; Partition problem; Shur-convexity

Read more
Other Statistics

On some properties of the new Sine-skewed Cardioid Distribution

The new Sine Skewed Cardioid (ssc) distribution been just introduced and characterized by Ahsanullah (2018). Here, we study the asymptotic properties of its tails by determining its extreme value domain, the characteristic function, the moments and likelihood estimators of the two parameters, the asymptotic normality of the moments estimators and the random generation of data from the \textit{ssc} distribution. Finally, we proceed to a simulation study to show the performance of the random generation method and the quality of the moments estimation of the parameters.

Read more
Other Statistics

On statistical deficiency: Why the test statistic of the matching method is hopelessly underpowered and uniquely informative

The random variate m is, in combinatorics, a basis for comparing permutations, as well as the solution to a centuries-old riddle involving the mishandling of hats. In statistics, m is the test statistic for a disused null hypothesis statistical test (NHST) of association, the matching method. In this paper, I show that the matching method has an absolute and relatively low limit on its statistical power. I do so first by reinterpreting Rae's theorem, which describes the joint distributions of m with several rank correlation statistics under a true null. I then derive this property solely from m's unconditional sampling distribution, on which basis I develop the concept of a deficient statistic: a statistic that is insufficient and inconsistent and inefficient with respect to its parameter. Finally, I demonstrate an application for m that makes use of its deficiency to qualify the sampling error in a jointly estimated sample correlation.

Read more
Other Statistics

On the Virtues of Automated QSAR The New Kid on the Block

Quantitative Structure-Activity Relationship (QSAR) has proved an invaluable tool in medicinal chemistry. Data availability at unprecedented levels through various databases have collaborated to a resurgence in the interest for QSAR. In this context, rapid generation of quality predictive models is highly desirable for hit identification and lead optimization. We showcase the application of an automated QSAR approach, which randomly selects multiple training/test sets and utilizes machine-learning algorithms to generate predictive models. Results demonstrate that AutoQSAR produces models of improved or similar quality to those generated by practitioners in the field but in just a fraction of the time. Despite the potential of the concept to the benefit of the community, the AutoQSAR opportunity has been largely undervalued.

Read more
Other Statistics

On the asymptotics of Ajtai-Komlós-Tusnády statistics

In our days there is a widespread analysis of Wasserstein distances between theoretical and empirical measures. One of the first investigation of the topic is given in the paper written by Ajtai, Komlós and Tusnády in 1984. Interestingly, all the neighboring questions posed by that paper were settled already without the original one. In this paper we are going to delineate the limit behavior of the original statistics with the help of computer simulations. At the same time we kept an eye on theoretical grasping of the problem. Based on our computer simulations our opinion is that the limit distribution is Gaussian.

Read more
Other Statistics

On the mathematics of the free-choice paradigm

Chen and Risen pointed out a logical flaw affecting the conclusions of a number of past experiments that used the free-choice paradigm to measure choice-induced attitude change. They went on to design and implement a free-choice experiment that used a novel type of control group in order to avoid this logical pitfall. In this paper, we describe a method by which a free-choice experiment can be correctly conducted even without a control group.

Read more
Other Statistics

Online Statistics Teaching and Learning

For statistics courses at all levels, teaching and learning online poses challenges in different aspects. Particular online challenges include how to effectively and interactively conduct exploratory data analyses, how to incorporate statistical programming, how to include individual or team projects, and how to present mathematical derivations efficiently and effectively. This article draws from the authors' experience with seven different online statistics courses to address some of the aforementioned challenges. One course is an online exploratory data analysis course taught at Bowling Green State University. A second course is an upper level Bayesian statistics course taught at Vassar College and shared among 10 liberal arts colleges through a hybrid model. We alo describes a five-course MOOC specialization on Coursera, offered by Duke University.

Read more
Other Statistics

Online detection of cascading change-points

We propose an online detection procedure for cascading failures in the network from sequential data, which can be modeled as multiple correlated change-points happening during a short period. We consider a temporal diffusion network model to capture the temporal dynamic structure of multiple change-points and develop a sequential Shewhart procedure based on the generalized likelihood ratio statistics based on the diffusion network model assuming unknown post-change distribution parameters. We also tackle the computational complexity posed by the unknown propagation. Numerical experiments demonstrate the good performance for detecting cascade failures.

Read more
Other Statistics

Open data, open review and open dialogue in making social sciences plausible

Nowadays, protecting trust in social sciences also means engaging in open community dialogue, which helps to safeguard robustness and improve efficiency of research methods. The combination of open data, open review and open dialogue may sound simple but implementation in the real world will not be straightforward. However, in view of Begley and Ellis's (2012) statement that, "the scientific process demands the highest standards of quality, ethics and rigour," they are worth implementing. More importantly, they are feasible to work on and likely will help to restore plausibility to social sciences research. Therefore, I feel it likely that the triplet of open data, open review and open dialogue will gradually emerge to become policy requirements regardless of the research funding source.

Read more

Ready to get started?

Join us today