Larry A. Pace
University of Georgia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Larry A. Pace.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
We have obviously been working with graphics since the beginning of this book, as it is not easy to separate statistics and graphics, and perhaps it is impossible to do so. In Chapter 9, we will fill in some gaps and introduce the ggplot2 package as an effective alternative to the graphics package distributed with R.1
Archive | 2015
Joshua F. Wiley; Larry A. Pace
Throughout the book we used quite a few graphs to help you visualize and understand data, but we never went through them systematically. In this cookbook chapter, we will show you how to make a number of different kinds of common graphs in R (most of which you have not seen yet, though for completeness we duplicate one or two).
Archive | 2015
Joshua F. Wiley; Larry A. Pace
There are compelling reasons to use R. Enthusiastic users, programmers, and contributors support R and its development. A dedicated core team of R experts maintains the language. R is accurate, produces excellent graphics, has a variety of built-in functions, and is both a functional language and an object-oriented one. There are (literally) thousands of contributed packages available to R users for specialized data analyses.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
The world of data and data analytics is changing rapidly. Data analysts are facing major issues related to the use of larger datasets, including cloud computing and the creation of so-called data lakes, which are enterprise-wide data management platforms consisting of vast amounts of data in their original format stored in an single unmanaged and unstructured location available to the entire organization. This flies in the face of the carefully structured and highly managed data most of us have come to know and love.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
The longer one programs, the easier it becomes to think like a programmer. You learn that the best way to solve a problem is to solve it once in such a way that the adjustments you need to make when the problem changes slightly are very small ones. It is better to use variables and even other functions in your code so that you can change a single value once rather than many times. This is the essence of the pragmatic programmer who writes with purpose. Programmers who come to R from other languages such as C++ or Python tend to think in loops. You are probably convinced by now that R’s vectorization allows us to avoid loops in many situations. As you saw in Chapter 4, looping is possible when it is needed. Efficient code allows us to automate as many tasks as we can so that we don’t repeat ourselves, and to avoid looping as much as possible.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
The linear regressions we have been exploring have some underlying assumptions. First and foremost is that response and prediction variables should have a linear relationship. Buried in that assumption is the idea that these variables are quantitative. However, what if the response variable is qualitative or discrete? If it is binary, such as measuring whether participants are satisfied or not satisfied, we could perhaps dummy-code satisfaction as 1 and no satisfaction as 0. In that case, while a linear regression may provide some guidance, it will also likely provide outputs well beyond the range of [0,1], which is clearly not right. Should we desire to predict more than two responses (e.g., not satisfied, mostly satisfied, and satisfied), the system breaks down even more.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
The idea of multiple regression is that rather than just having a single predictor, a model built on multiple predictors may well allow us to more accurately understand a particular system. As in Chapter 13, we will still focus overall on linear models—we will simply have a new goal to increase our number of predictors. Also as before, we save for other texts a focus on the behind-the-scenes mathematics; our goal is to conceptually explore the methods of using and understanding multiple regression. Having offered this caveat, however, doesn’t free us from taking just a little bit of a look at some of the math that power these types of models.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
Tables are very useful for summarizing data. We can use tables for all kinds of data, ranging from nominal to ratio. In Chapter 7, you will learn how to use tables to create frequency distributions and cross-tabulations as well as how to conduct chi-square tests to determine whether the frequencies are distributed according to some null hypothesis.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
Statistics is an evolving, growing field. Consider that, at this moment, there are hundreds of scholars working on their graduate degrees in statistics. Each of those scholars must make an original contribution to the field of statistics, either in theory or in application. This is not to mention the statistics faculty and other faculty members in various research fields who are working on the cutting edge of statistical applications. Add to this total the statistical innovators in government, business, the biomedical field, and other organizations. You get the picture.
Archive | 2015
Joshua F. Wiley; Larry A. Pace
Statistics benefited greatly from the introduction of the modern digital computer in the middle of the 20th century. Simulations and other analyses that once required laborious and error-prone hand calculations could be programmed into the computer, saving time and increasing accuracy. We have already used simulations for some demonstrations. In this chapter, we will discuss modern robust alternatives to the standard statistical techniques we discussed in Chapter 10.