Featured Researches

Other Statistics

An introduction to Bent Jorgensen's ideas

We briefly expose some key aspects of the theory and use of dispersion models, for which Bent Jorgensen played a crucial role as a driving force and an inspiration source. Starting with the general notion of dispersion models, built using minimalistic mathematical assumptions, we specialize in two classes of families of distributions with different statistical flavors: exponential dispersion and proper dispersion models. The construction of dispersion models involves the solution of integral equations that are, in general, untractable. These difficulties disappear when a more mathematical structure is assumed: it reduces to the calculation of a moment generating function or of a Riemann-Stieltjes integral for the exponential dispersion and the proper dispersion models, respectively. A new technique for constructing dispersion models based on characteristic functions is introduced turning the integral equations above into a tractable convolution equation and yielding examples of dispersion models that are neither proper dispersion nor exponential dispersion models. A corollary is that the cardinality of regular and non-regular dispersion models are both large. Some selected applications are discussed including exponential families non-linear models (for which generalized linear models are particular cases) and several models for clustered and dependent data based on a latent Levy process.

Read more
Other Statistics

An overview and perspective on social network monitoring

In this expository paper we give an overview of some statistical methods for the monitoring of social networks. We discuss the advantages and limitations of various methods as well as some relevant issues. One of our primary contributions is to give the relationships between network monitoring methods and monitoring methods in engineering statistics and public health surveillance. We encourage researchers in the industrial process monitoring area to work on developing and comparing the performance of social network monitoring methods. We also discuss some of the issues in social network monitoring and give a number of research ideas.

Read more
Other Statistics

Analyzing Factors Associated with Fatal Road Crashes: A Machine Learning Approach

Road traffic injury accounts for a substantial human and economic burden globally. Understanding risk factors contributing to fatal injuries is of paramount importance. In this study, we proposed a model that adopts a hybrid ensemble machine learning classifier structured from sequential minimal optimization and decision trees to identify risk factors contributing to fatal road injuries. The model was constructed, trained, tested, and validated using the Lebanese Road Accidents Platform (LRAP) database of 8482 road crash incidents, with fatality occurrence as the outcome variable. A sensitivity analysis was conducted to examine the influence of multiple factors on fatality occurrence. Seven out of the nine selected independent variables were significantly associated with fatality occurrence, namely, crash type, injury severity, spatial cluster-ID, and crash time (hour). Evidence gained from the model data analysis will be adopted by policymakers and key stakeholders to gain insights into major contributing factors associated with fatal road crashes and to translate knowledge into safety programs and enhanced road policies.

Read more
Other Statistics

Anna Karenina and The Two Envelopes Problem

The Anna Karenina principle is named after the opening sentence in the eponymous novel: Happy families are all alike; every unhappy family is unhappy in its own way. The Two Envelopes Problem (TEP) is a much-studied paradox in probability theory, mathematical economics, logic, and philosophy. Time and again a new analysis is published in which an author claims finally to explain what actually goes wrong in this paradox. Each author (the present author included) emphasizes what is new in their approach and concludes that earlier approaches did not get to the root of the matter. We observe that though a logical argument is only correct if every step is correct, an apparently logical argument which goes astray can be thought of as going astray at different places. This leads to a comparison between the literature on TEP and a successful movie franchise: it generates a succession of sequels, and even prequels, each with a different director who approaches the same basic premise in a personal way. We survey resolutions in the literature with a view to synthesis, correct common errors, and give a new theorem on order properties of an exchangeable pair of random variables, at the heart of most TEP variants and interpretations. A theorem on asymptotic independence between the amount in your envelope and the question whether it is smaller or larger shows that the pathological situation of improper priors or infinite expectation values has consequences as we merely approach such a situation.

Read more
Other Statistics

Apocalypse Now? Reviving the Doomsday Argument

Whether the fate of our species can be forecast from its past has been the topic of considerable controversy. One refutation of the so-called Doomsday Argument is based on the premise that we are more likely to exist in a universe containing a greater number of observers. Here we present a Bayesian reformulation of the Doomsday Argument which is immune to this effect. By marginalising over the spatial configuration of observers, we find that any preference for a larger total number of observers has no impact on the inferred local number. Our results remain unchanged when we adopt either the Self-Indexing Assumption (SIA) or the Self-Sampling Assumption (SSA). Furthermore the median value of our posterior distribution is found to be in agreement with the frequentist forecast. Humanity's prognosis for the coming century is well approximated by a global catastrophic risk of 0.2% per year.

Read more
Other Statistics

Application of Bayesian Networks for Estimation of Individual Psychological Characteristics

An accurate qualitative and comprehensive assessment of human potential is one of the most important challenges in any company or collective. We apply Bayesian networks for developing more accurate overall estimations of psychological characteristics of an individual, based on psychological test results, which identify how much an individual possesses a certain trait. Examples of traits could be a stress resistance, the readiness to take a risk, the ability to concentrate on certain complicated work. The most common way of studying psychological characteristics of each individual is testing. Additionally, the overall estimation is usually based on personal experiences and the subjective perception of a psychologist or a group of psychologists about the investigated psychological personality traits.

Read more
Other Statistics

Application of Robust Estimators in Shewhart S-Charts

Maintaining the quality of manufactured products at a desired level is known to increase customer satisfaction and profitability. Shewhart control chart is the most widely used in statistical process control (SPC) technique to monitor the quality of products and control process variability. Based on the assumption of independent and normally distributed data sets, sample mean and standard deviation statistics are known to be the most efficient conventional estimators to determine the process location and scale, respectively. On the other hand, there is not guarantee that the real-world process data would be normally distributed: outliers may exist, and/or sampled population may be contaminated. In such cases, efficiency of the conventional estimators is significantly reduced, and power of the Shewhart charts may be undesirably low, e.g. occasional outliers in the rational subgroups (Phase I dataset) may drastically affect the sample mean and standard deviation, resulting a serious delay in detection of inferior products (Phase II procedure). For more efficient analyses, it is required to use robust estimators against contaminations. Consequently, it is determined that robust estimators are more efficient both against diffuse localized and symmetric-asymmetric contaminations, and have higher power in detecting disturbances, compared to conventional methods.

Read more
Other Statistics

Applications of band-limited extrapolation to forecasting of weather and financial time series

This paper describes the practical application of causal extrapolation of sequences for the purpose of forecasting. The methods and proofs have been applied to simulations to measure the range which data can be accurately extrapolated. Real world data from the Australian Stock exchange and the Australian Bureau of Meteorology have been tested and compared with simple linear extrapolation of the same data. In a majority of the tested scenarios casual extrapolation has been proved to be the more effective forecaster.

Read more
Other Statistics

Approaching Ethical Guidelines for Data Scientists

The goal of this article is to inspire data scientists to participate in the debate on the impact that their professional work has on society, and to become active in public debates on the digital world as data science professionals. How do ethical principles (e.g., fairness, justice, beneficence, and non-maleficence) relate to our professional lives? What lies in our responsibility as professionals by our expertise in the field? More specifically this article makes an appeal to statisticians to join that debate, and to be part of the community that establishes data science as a proper profession in the sense of Airaksinen, a philosopher working on professional ethics. As we will argue, data science has one of its roots in statistics and extends beyond it. To shape the future of statistics, and to take responsibility for the statistical contributions to data science, statisticians should actively engage in the discussions. First the term data science is defined, and the technical changes that have led to a strong influence of data science on society are outlined. Next the systematic approach from CNIL is introduced. Prominent examples are given for ethical issues arising from the work of data scientists. Further we provide reasons why data scientists should engage in shaping morality around and to formulate codes of conduct and codes of practice for data science. Next we present established ethical guidelines for the related fields of statistics and computing machinery. Thereafter necessary steps in the community to develop professional ethics for data science are described. Finally we give our starting statement for the debate: Data science is in the focal point of current societal development. Without becoming a profession with professional ethics, data science will fail in building trust in its interaction with and its much needed contributions to society!

Read more
Other Statistics

Arbitrariness of peer review: A Bayesian analysis of the NIPS experiment

The principle of peer review is central to the evaluation of research, by ensuring that only high-quality items are funded or published. But peer review has also received criticism, as the selection of reviewers may introduce biases in the system. In 2014, the organizers of the ``Neural Information Processing Systems\rq\rq{} conference conducted an experiment in which 10% of submitted manuscripts (166 items) went through the review process twice. Arbitrariness was measured as the conditional probability for an accepted submission to get rejected if examined by the second committee. This number was equal to 60% , for a total acceptance rate equal to 22.5% . Here we present a Bayesian analysis of those two numbers, by introducing a hidden parameter which measures the probability that a submission meets basic quality criteria. The standard quality criteria usually include novelty, clarity, reproducibility, correctness and no form of misconduct, and are met by a large proportions of submitted items. The Bayesian estimate for the hidden parameter was equal to 56% ( 95% CI: I=(0.34,0.83) ), and had a clear interpretation. The result suggested the total acceptance rate should be increased in order to decrease arbitrariness estimates in future review processes.

Read more

Ready to get started?

Join us today