Featured Researches

Other Statistics

Generalized Labeled Multi-Bernoulli Approximation of Multi-Object Densities

In multi-object inference, the multi-object probability density captures the uncertainty in the number and the states of the objects as well as the statistical dependence between the objects. Exact computation of the multi-object density is generally intractable and tractable implementations usually require statistical independence assumptions between objects. In this paper we propose a tractable multi-object density approximation that can capture statistical dependence between objects. In particular, we derive a tractable Generalized Labeled Multi-Bernoulli (GLMB) density that matches the cardinality distribution and the first moment of the labeled multi-object distribution of interest. It is also shown that the proposed approximation minimizes the Kullback-Leibler divergence over a special tractable class of GLMB densities. Based on the proposed GLMB approximation we further demonstrate a tractable multi-object tracking algorithm for generic measurement models. Simulation results for a multi-object Track-Before-Detect example using radar measurements in low signal-to-noise ratio (SNR) scenarios verify the applicability of the proposed approach.

Read more
Other Statistics

Generalized probabilities in statistical theories

In this review article we present different formal frameworks for the description of generalized probabilities in statistical theories. We discuss the particular cases of probabilities appearing in classical and quantum mechanics, possible generalizations of the approaches of A. N. Kolmogorov and R. T. Cox to non-commutative models, and the approach to generalized probabilities based on convex sets.

Read more
Other Statistics

Governance on Social Media Data: Different Focuses between Government and Internet Company

How governments and Internet companies regulate user data on social media attracts public attention. This study tried to answer two questions: What kind of countries send more requests for Facebook user data? What kind of countries get more requests replies from Facebook? We aim to figure out how a country's economic, political and social factors affect its government requests for user data and Facebook's responses rate to those requests. Results show that countries with higher GDP per capita, a higher level of human freedom and a lower level of rule of law send more requests for user data; while Facebook tends to reply to government requests from countries with a higher level of human freedom and a lower level of political stability. In conclusion, governments and Facebook show different focuses on governance on social media data.

Read more
Other Statistics

Greater data science at baccalaureate institutions

Donoho's JCGS (in press) paper is a spirited call to action for statisticians, who he points out are losing ground in the field of data science by refusing to accept that data science is its own domain. (Or, at least, a domain that is becoming distinctly defined.) He calls on writings by John Tukey, Bill Cleveland, and Leo Breiman, among others, to remind us that statisticians have been dealing with data science for years, and encourages acceptance of the direction of the field while also ensuring that statistics is tightly integrated. As faculty at baccalaureate institutions (where the growth of undergraduate statistics programs has been dramatic), we are keen to ensure statistics has a place in data science and data science education. In his paper, Donoho is primarily focused on graduate education. At our undergraduate institutions, we are considering many of the same questions.

Read more
Other Statistics

Helix modelling through the Mardia-Holmes model framework and an extension of the Mardia-Holmes model

For noisy two-dimensional data, which are approximately uniformly distributed near the circumference of an ellipse, Mardia and Holmes (1980) developed a model to fit the ellipse. In this paper we adapt their methodology to the analysis of helix data in three dimensions. If the helix axis is known, then the Mardia-Holmes model for the circular case can be fitted after projecting the helix data onto the plane normal to the helix axis. If the axis is unknown, an iterative algorithm has been developed to estimate the axis. The methodology is illustrated using simulated protein alpha-helices. We also give a multivariate version of the Mardia-Holmes model which will be applicable for fitting an ellipsoid and in particular a cylinder.

Read more
Other Statistics

High-dimensional nonparametric monotone function estimation using BART

For the estimation of a regression relationship between Y and a large set of potential predictors x 1 , . . . , x p , the flexible nature of a nonparametric approach such as BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than a more restrictive parametric approach. However, it may often occur that subject matter considerations suggest the relationship will be monotone in one or more of the predictors. For such situations, we propose monotone BART, a constrained version of BART that uses the monotonicity information to improve function estimation without the need of using a parametric form. Imposing monotonicity, when appropriate, results in (i) function estimates that are smoother and more interpretable, (ii) better out-of-sample predictive performance, (iii) less uncertainty, and (iv) less sensitivity to prior choice. While some of the key aspects of the unconstrained BART model carry over directly to monotone BART, the imposition of the monotonicity constraints necessitates a fundamental rethinking of how the model is implemented. In particular, in the original BART algorithm, the Markov Chain Monte Carlo algorithm relied on a conditional conjugacy that is no longer available in a high-dimensional, constrained space.

Read more
Other Statistics

Hotelling's test for highly correlated data

This paper is motivated by the analysis of gene expression sets, especially by finding differentially expressed gene sets between two phenotypes. Gene log 2 expression levels are highly correlated and, very likely, have approximately normal distribution. Therefore, it seems reasonable to use two-sample Hotelling's test for such data. We discover some unexpected properties of the test making it different from the majority of tests previously used for such data. It appears that the Hotelling's test does not always reach maximal power when all marginal distributions are differentially expressed. For highly correlated data its maximal power is attained when about a half of marginal distributions are essentially different. For the case when the correlation coefficient is greater than 0.5 this test is more powerful if only one marginal distribution is shifted, omparing to the case when all marginal distributions are equally shifted. Moreover, when the correlation coefficient increases the power of Hotelling's test increases as well.

Read more
Other Statistics

How and Why Did Probability Theory Come About?

This paper is a top down historical perspective on the several phases in the development of probability from its prehistoric origins to its modern day evolution, as one of the key methodologies in artificial intelligence, data science, and machine learning. It is written in honor of Barry Arnold's birthday for his many contributions to statistical theory and methodology. Despite the fact that much of Barry's work is technical, a descriptive document to mark his achievements should not be viewed as being out of line. Barry's dissertation adviser at Stanford (he received a Ph.D. in Statistics there) was a philosopher of Science who dug deep in the foundations and roots of probability, and it is this breadth of perspective is what Barry has inherent. The paper is based on lecture materials compiled by the first author from various published sources, and over a long period of time. The material below gives a limited list of references, because the cast of characters is many, and their contributions are a part of the historical heritage of those of us who are interested in probability, statistics, and the many topics they have spawned.

Read more
Other Statistics

How sure are we? Two approaches to statistical inference

Suppose you are told that taking a statin will reduce your risk of a heart attack or stroke by 3% in the next ten years, or that women have better emotional intelligence than men. You may wonder how accurate the 3% is, or how confident we should be about the assertion about women's emotional intelligence, bearing in mind that these conclusions are only based on samples of data? My aim here is to present two statistical approaches to questions like these. Approach 1 is often called null hypothesis testing but I prefer the phrase "baseline hypothesis": this is the standard approach in many areas of inquiry but is fraught with problems. Approach 2 can be viewed as a generalisation of the idea of confidence intervals, or as the application of Bayes' theorem. Unlike Approach 1, Approach 2 provides a tentative estimate of the probability of hypotheses of interest. For both approaches, I explain, from first principles, building only on "common sense" statistical concepts like averages and randomness, both how to derive answers, and the rationale behind the answers. This is achieved by using computer simulation methods (resampling and bootstrapping using a spreadsheet available on the web) which avoid the use of probability distributions (t, normal, etc). Such a minimalist, but reasonably rigorous, analysis is particularly useful in a discipline like statistics which is widely used by people who are not specialists. My intended audience includes both statisticians, and users of statistical methods who are not statistical experts.

Read more
Other Statistics

How to read probability distributions as statements about process

Probability distributions can be read as simple expressions of information. Each continuous probability distribution describes how information changes with magnitude. Once one learns to read a probability distribution as a measurement scale of information, opportunities arise to understand the processes that generate the commonly observed patterns. Probability expressions may be parsed into four components: the dissipation of all information, except the preservation of average values, taken over the measurement scale that relates changes in observed values to changes in information, and the transformation from the underlying scale on which information dissipates to alternative scales on which probability pattern may be expressed. Information invariances set the commonly observed measurement scales and the relations between them. In particular, a measurement scale for information is defined by its invariance to specific transformations of underlying values into measurable outputs. Essentially all common distributions can be understood within this simple framework of information invariance and measurement scale.

Read more

Ready to get started?

Join us today