Featured Researches

Other Statistics

Continuously Updated Data Analysis Systems

When doing data science, it's important to know what you're building. This paper describes an idealized final product of a data science project, called a Continuously Updated Data-Analysis System (CUDAS). The CUDAS concept synthesizes ideas from a range of successful data science projects, such as Nate Silver's FiveThirtyEight. A CUDAS can be built for any context, such as the state of the economy, the state of the climate, and so on. To demonstrate, we build two CUDAS systems. The first provides continuously-updated ratings for soccer players, based on the newly developed Augmented Adjusted Plus-Minus statistic. The second creates a large dataset of synthetic ecosystems, which is used for agent-based modeling of infectious diseases.

Read more
Other Statistics

Cooperative spectrum sensing over unreliable reporting channel

This article aims to analyze a cooperative spectrum sensing scheme using a centralized approach with unreliable reporting channel. The spectrum sensing is applied to a cognitive radio system, where each cognitive radio performs a simple energy detection and send the decision to a fusion center through a reporting channel. When the decisions are available at the fusion center, a n-out-of-K rule is applied. The impact of the choice of the parameter n in the cognitive radio system performance is analyzed in the case where the reporting channel introduces errors.

Read more
Other Statistics

Cramer-Rao-Induced Bounds for CANDECOMP/PARAFAC tensor decomposition

This paper presents a Cramer-Rao lower bound (CRLB) on the variance of unbiased estimates of factor matrices in Canonical Polyadic (CP) or CANDECOMP/PARAFAC (CP) decompositions of a tensor from noisy observations, (i.e., the tensor plus a random Gaussian i.i.d. tensor). A novel expression is derived for a bound on the mean square angular error of factors along a selected dimension of a tensor of an arbitrary dimension. The expression needs less operations for computing the bound, O(NR^6), than the best existing state-of-the art algorithm, O(N^3R^6) operations, where N and R are the tensor order and the tensor rank. Insightful expressions are derived for tensors of rank 1 and rank 2 of arbitrary dimension and for tensors of arbitrary dimension and rank, where two factor matrices have orthogonal columns. The results can be used as a gauge of performance of different approximate CP decomposition algorithms, prediction of their accuracy, and for checking stability of a given decomposition of a tensor (condition whether the CRLB is finite or not). A novel expression is derived for a Hessian matrix needed in popular damped Gauss-Newton method for solving the CP decomposition of tensors with missing elements. Beside computing the CRLB for these tensors the expression may serve for design of damped Gauss-Newton algorithm for the decomposition.

Read more
Other Statistics

Creating, Automating, and Assessing Online Homework in Introductory Statistics and Mathematics Classes

Although textbook publishers offer course management systems, they do so to promote brand loyalty, and while an open source tool such as WeBWorK is promising, it requires administrative and IT buy-in. So supported in part by a College Access Challenge Grant from the Department of Education, we collaborated with other instructors to create online homework sets for three classes: Elementary Algebra, Intermediate Algebra, and Statistics for Behavioral Sciences I. After experimentation, some of these question pools are now created by Mathematica programs that can generate data sets from specified distributions, generate random polynomials that factor in a given way, create image files of histograms, scatterplots, and so forth. These programs produce files that can be read by the software package, Respondus, which then uploads the questions into Blackboard Learn, the course management system used by the Connecticut State University system. Finally, we summarize five classes worth of student performance data along with lessons learned while working on this project.

Read more
Other Statistics

Curriculum Guidelines for Undergraduate Programs in Data Science

The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science.

Read more
Other Statistics

Data Science in Biomedicine

We highlight the role of Data Science in Biomedicine. Our manuscript goes from the general to the particular, presenting a global definition of Data Science and showing the trend for this discipline together with the terms of cloud computing and big data. In addition, since Data Science is mostly related to areas like economy or business, we describe its importance in biomedicine. Biomedical Data Science (BDS) presents the challenge of dealing with data coming from a range of biological and medical research, focusing on methodologies to advance the biomedical science discoveries, in an interdisciplinary context.

Read more
Other Statistics

Data Science in Statistics Curricula: Preparing Students to "Think with Data"

A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to utilize databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this paper is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of undergraduates with data and data science.

Read more
Other Statistics

Data Science vs. Statistics: Two Cultures?

Data science is the business of learning from data, which is traditionally the business of statistics. Data science, however, is often understood as a broader, task-driven and computationally-oriented version of statistics. Both the term data science and the broader idea it conveys have origins in statistics and are a reaction to a narrower view of data analysis. Expanding upon the views of a number of statisticians, this paper encourages a big-tent view of data analysis. We examine how evolving approaches to modern data analysis relate to the existing discipline of statistics (e.g. exploratory analysis, machine learning, reproducibility, computation, communication and the role of theory). Finally, we discuss what these trends mean for the future of statistics by highlighting promising directions for communication, education and research.

Read more
Other Statistics

Data Science: A Three Ring Circus or a Big Tent?

This is part of a collection of discussion pieces on David Donoho's paper 50 Years of Data Science, appearing in Volume 26, Issue 4 of the Journal of Computational and Graphical Statistics (2017).

Read more
Other Statistics

Data Visualization on Day One: Bringing Big Ideas into Intro Stats Early and Often

In a world awash with data, the ability to think and compute with data has become an important skill for students in many fields. For that reason, inclusion of some level of statistical computing in many introductory-level courses has grown more common in recent years. Existing literature has documented multiple success stories of teaching statistics with R, bolstered by the capabilities of R Markdown. In this article, we present an in-class data visualization activity intended to expose students to R and R Markdown during the first week of an introductory statistics class. The activity begins with a brief lecture on exploratory data analysis in R. Students are then placed in small groups tasked with exploring a new dataset to produce three visualizations that describe particular insights that are not immediately obvious from the data. Upon completion, students will have produced a series of univariate and multivariate visualizations on a real dataset and practiced describing them.

Read more

Ready to get started?

Join us today