Benjamin Baumer
Smith College
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Benjamin Baumer.
The American Statistician | 2015
Johanna Hardin; Roger Hoerl; Nicholas J. Horton; Deborah Nolan; Benjamin Baumer; O. Hall-Holt; Paul Murrell; Roger D. Peng; P. Roback; D. Temple Lang; Mark Daniel Ward
A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to use databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this article is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of undergraduates with data and data science. [Received November 2014. Revised July 2015.]
The American Statistician | 2015
Benjamin Baumer
Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be nontraditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students to a variety of techniques to analyze small, neat, and clean datasets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that are considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms. Supplementary materials for this article are available online. [Received June 2014. Revised July 2015.]
Annual Review of Statistics and Its Application | 2017
Richard D. De Veaux; Mahesh Agarwal; Maia Averett; Benjamin Baumer; Andrew Bray; Thomas C. Bressoud; Lance Bryant; Lei Z. Cheng; Amanda Francis; Robert G. Gould; Albert Y. Kim; Matt Kretchmar; Qin Lu; Ann Moskol; Deborah Nolan; Roberto Pelayo; Sean Raleigh; Ricky J. Sethi; Mutiara Sondjaja; Neelesh Tiruviluamala; Paul X. Uhlig; Talitha M. Washington; Curtis L. Wesley; David White; Ping Ye
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science.
Journal of Quantitative Analysis in Sports | 2015
Benjamin Baumer; Shane T. Jensen; Gregory J. Matthews
Abstract Within sports analytics, there is substantial interest in comprehensive statistics intended to capture overall player performance. In baseball, one such measure is wins above replacement (WAR), which aggregates the contributions of a player in each facet of the game: hitting, pitching, baserunning, and fielding. However, current versions of WAR depend upon proprietary data, ad hoc methodology, and opaque calculations. We propose a competitive aggregate measure, openWAR, that is based on public data, a methodology with greater rigor and transparency, and a principled standard for the nebulous concept of a “replacement” player. Finally, we use simulation-based techniques to provide interval estimates for our openWAR measure that are easily portable to other domains.
The American Statistician | 2018
Benjamin Baumer
ABSTRACT Many current and future data scientists will be “isolated”—working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports industry and teaching in academia, I discuss troubled waters likely to be encountered by newly minted data scientists and offer advice about how to navigate them. Neither the issues raised nor the advice given are particular to sports and should be applicable to a wide range of knowledge domains.
Journal of Computational and Graphical Statistics | 2018
Benjamin Baumer
ABSTRACT Many interesting datasets available on the Internet are of a medium size—too big to fit into a personal computer’s memory, but not so large that they would not fit comfortably on its hard disk. In the coming years, datasets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality. Supplementary material for this article is available online.
Journal of Computational and Graphical Statistics | 2017
Amelia McNamara; Nicholas J. Horton; Benjamin Baumer
Donohos JCGS (in press) paper is a spirited call to action for statisticians, who he points out are losing ground in the field of data science by refusing to accept that data science is its own domain. (Or, at least, a domain that is becoming distinctly defined.) He calls on writings by John Tukey, Bill Cleveland, and Leo Breiman, among others, to remind us that statisticians have been dealing with data science for years, and encourages acceptance of the direction of the field while also ensuring that statistics is tightly integrated. As faculty at baccalaureate institutions (where the growth of undergraduate statistics programs has been dramatic), we are keen to ensure statistics has a place in data science and data science education. In his paper, Donoho is primarily focused on graduate education. At our undergraduate institutions, we are considering many of the same questions.
Discussiones Mathematicae Graph Theory | 2016
Benjamin Baumer; Yijin Wei; Gary S. Bloom
Abstract Suppose that G is a simple, vertex-labeled graph and that S is a multiset. Then if there exists a one-to-one mapping between the elements of S and the vertices of G, such that edges in G exist if and only if the absolute difference of the corresponding vertex labels exist in S, then G is an autograph, and S is a signature for G. While it is known that many common families of graphs are autographs, and that infinitely many graphs are not autographs, a non-autograph has never been exhibited. In this paper, we identify the smallest non-autograph: a graph with 6 vertices and 11 edges. Furthermore, we demonstrate that the infinite family of graphs on n vertices consisting of the complement of two non-intersecting cycles contains only non-autographs for n ≥ 8.
Archive | 2013
Benjamin Baumer; Andrew Zimbalist
arXiv: Computation | 2015
Nicholas J. Horton; Benjamin Baumer; Hadley Wickham