Featured Researches

Other Statistics

A fresh look at introductory data science

The proliferation of vast quantities of available datasets that are large and complex in nature has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. To keep up with this demand, attracting students early on to data science as well as providing them a solid foray into the field becomes increasingly important. We present a case study of an introductory undergraduate course in data science that is designed to address these needs. Offered at Duke University, this course has no pre-requisites and serves a wide audience of aspiring statistics and data science majors as well as humanities, social sciences, and natural sciences students. We discuss the unique set of challenges posed by offering such a course and in light of these challenges, we present a detailed discussion into the pedagogical design elements, content, structure, computational infrastructure, and the assessment methodology of the course. We also offer a repository containing all teaching materials that are open-source, along with supplemental materials and the R code for reproducing the figures found in the paper.

Read more
Other Statistics

A generalization of the symmetrical and optimal probability-to-possibility transformations

Possibility and probability theories are alternative and complementary ways to deal with uncertainty, which has motivated over the last years an interest for the study of ways to transform probability distributions into possibility distributions and conversely. This paper studies the advantages and shortcomings of two well-known discrete probability to possibility transformations: the optimal transformation and the symmetrical transformation, and presents a novel parametric family of probability to possibility transformations which generalizes them and alleviate their shortcomings, showing a big potential for practical application. The paper also introduces a novel fuzzy measure of specificity for probability distributions based on the concept of fuzzy subsethood and presents a empirical validation of the generalized transformation usefulness applying it to the text authorship attribution problem.

Read more
Other Statistics

A geometer's view of the the Cramér-Rao bound on estimator variance

The classical Cramér-Rao inequality gives a lower bound for the variance of a unbiased estimator of an unknown parameter, in some statistical model of a random process. In this note we rewrite the statment and proof of the bound using contemporary geometric language.

Read more
Other Statistics

A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models

We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.

Read more
Other Statistics

A method for comparing chess openings

A quantitative method is described for comparing chess openings. Test openings and baseline openings are run through chess engines under controlled conditions and compared to evaluate the effectiveness of the test openings. The results are intuitively appealing and in some cases they agree with expert opinion. The specific contribution of this work is the development of an objective measure that may be used for the evaluation and refutation of chess openings, a process that had been left to thought experiments and subjective conjectures and thereby to a large variety of opinion and a great deal of debate.

Read more
Other Statistics

A mobile web for enhancing statistics and mathematics education

A freely available educational application (a mobile website) is presented. This provides access to educational material and drilling on selected topics within mathematics and statistics with an emphasis on tablets and mobile phones. The application adapts to the student's performance, selecting from easy to difficult questions, or older material etc. These adaptations are based on statistical models and analyses of data from testing precursors of the system within several courses, from calculus and introductory statistics through multiple linear regression. The application can be used in both on-line and off-line modes. The behavior of the application is determined by parameters, the effects of which can be estimated statistically. Results presented include analyses of how the internal algorithms relate to passing a course and general incremental improvement in knowledge during a semester.

Read more
Other Statistics

A multi-dimensional stream and its signature representation

The signature of a path is an essential object in the theory of rough paths. The signature representation of the data stream can recover standard statistics, e.g. the moments of the data stream. The classification of random walks indicates the advantages of using the signature of a stream as the feature set for machine learning.

Read more
Other Statistics

A network flow approach to visualising the roles of covariates in random forests

We propose novel applications of parallel coordinates plots and Sankey diagrams to represent the hierarchies of interacting covariate effects in random forests. Each visualisation summarises the frequencies of all of the paths through all of the trees in a random forest. Visualisations of the roles of covariates in random forests include: ranked bar or dot charts depicting scalar metrics of the contributions of individual covariates to the predictive accuracy of the random forest; line graphs depicting various summaries of the effect of varying a particular covariate on the predictions from the random forest; heatmaps of metrics of the strengths of interactions between all pairs of covariates; and parallel coordinates plots for each response class depicting the distributions of the values of all covariates among the observations most representative of those predicted to belong that class. Together these visualisations facilitate substantial insights into the roles of covariates in a random forest but do not communicate the frequencies of the hierarchies of covariates effects across the random forest or the orders in which covariates occur in these hierarchies. Our visualisations address these gaps. We demonstrate our visualisations using a random forest fitted to publicly available data and provide a software implementation in the form of an R package.

Read more
Other Statistics

A new approach of chain sampling inspection plan

To develop decision rules regarding acceptance or rejection of production lots based on sample data is the purpose of acceptance sampling inspection plan. Dependent sampling procedures cumulate results from several preceding production lots when testing is expensive or destructive. This chaining of past lots reduce the sizes of the required samples, essential for acceptance or rejection of production lots. In this article, a new approach for chaining the past lot(s) results proposed, named as modified chain group acceptance sampling inspection plan, requires a smaller sample size than the commonly used sampling inspection plan, such as group acceptance sampling inspection plan and single acceptance sampling inspection plan. A comparison study has been done between the proposed and group acceptance sampling inspection plan as well as single acceptance sampling inspection plan. A example has been given to illustrate the proposed plan in a good manner.

Read more
Other Statistics

A note on Bayesian logistic regression for spatial exponential family Gibbs point processes

Recently, a very attractive logistic regression inference method for exponential family Gibbs spatial point processes was introduced. We combined it with the technique of quadratic tangential variational approximation and derived a new Bayesian technique for analysing spatial point patterns. The technique is described in detail, and demonstrated on numerical examples.

Read more

Ready to get started?

Join us today