Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joshua F. Wiley is active.

Publication


Featured researches published by Joshua F. Wiley.


Archive | 2016

Every Cloud has a Shiny Lining

Matt Wiley; Joshua F. Wiley

When serving data to the public, easy access and interactive information make a real difference in comprehension. In this chapter, our goal is to provide access and some interesting uses for a more recent application, shiny (Chang, Cheng, Allaire, Xie, and McPherson, 2016).


Archive | 2016

Data Munging with data.table

Matt Wiley; Joshua F. Wiley

We already introduced the data.table package (Dowle, Srinivasan, Short, and Lianoglou, 2015). The data.table package is the heart of this chapter, covering the basics of accessing, editing, and manipulating data under the broad term data management. Although not glamorous, data management is a critical first step to data visualization or analysis. Furthermore, the majority of time on a particular analysis project often comes from data management. For example, running a linear model in R takes one line of code, once the data is clean and in the expected format. Data management is challenging because raw data comes in all types, shapes, and formats, and missing data is common. In addition, you may also have to combine or merge separate data sources. In this chapter, we go beyond the basic use of data.table to more-complex data management tasks.


Archive | 2016

Introduction to Data Management Using data.table

Matt Wiley; Joshua F. Wiley

We already briefly introduced the data.table package. This package is the heart of this chapter, which covers the basics of accessing, editing, and manipulating data under the broad term data management. Although not glamorous, data management is a critical first step to data visualization or analysis. Furthermore, the majority of time on a particular analysis project may come from the data management. For example, running a linear model in R can take one line of code, once the data is clean and in the format that the lm() function in R expects. Data management can be challenging, because raw data come in all types, shapes, and formats; missing data is common; and you may also have to combine or merge separate data sources. In this chapter, we introduce both mechanical and philosophical techniques to approach data management. All packages used in this chapter are already in our checkpoint.R file. Thus you need only source the file to get started.


Archive | 2016

Shiny Dashboard Sampler

Matt Wiley; Joshua F. Wiley

This chapter is not required in order to understand other chapters in this book. While we introduce some new techniques, what you primarily find here is one entire dashboard sample ready to be modified to suit your needs.


Archive | 2016

Writing a Package

Matt Wiley; Joshua F. Wiley

Packages are the fundamental way to document, share, and distribute R code and data. Our goal is to write our own packages, and our fair warning is that this chapter is particularly complex because of the many software tools that are employed during R package development.


Archive | 2016

Reading Big Data(bases)

Matt Wiley; Joshua F. Wiley

Now that you understand how to manage data inside R, let’s consider where data is found. While smaller data is found in comma-separated values (CSV) files or files easily converted to such, larger data tends to live in other places. This chapter deals with big data, or at least data that may be big. What is the challenge with big data? R works in memory, random access memory, not hard drives. A quick check of your system settings should reveal the amount of memory you have. We, the authors, use between 4 and 32 gigabytes in our real-world systems, with the larger number being a somewhat expensive habit. R cannot analyze data larger than available RAM.


Archive | 2016

Writing Classes and Methods

Matt Wiley; Joshua F. Wiley

It is often helpful to have a function behave differently depending on the type of object passed. For example, when summarizing a variable, it makes sense to create a different summary for numeric or string data. It is possible to have a different function for every type of object, but then users would have to remember many function names, and to remain unique, function names may be longer. Object-oriented programming (OOP) is based on objects and is implemented in R (as in most programming languages) by using two concepts: classes and methods. A class defines a template, or blueprint, describing the variables and features of an object as well as determining what methods work for it. For example, a house may be defined as having a floor, four walls, a roof, and a door. Specific data represents these properties, such as the dimensions and color of each wall. The methods are behaviors or actions that can be performed on a particular object type. For instance, a house can be painted, which changes its color, but a house cannot be eaten. R has three object-oriented systems: S3, S4, and R5. This chapter covers the S3 and S4 systems, which are the most common.


Archive | 2016

Dynamic Reports and the Cloud

Matt Wiley; Joshua F. Wiley

In the preceding chapters, you saw how Shiny delivers interactive environments based on dynamic data. Dynamic reports and the live dashboards have many similarities. On the other hand, this chapter, because the reports end up as PDFs, is less interactive. Reports are a fact of life in many fields, and stakeholders tend to require snapshots in time rather than fully interactive environments. Through the knitr and rmarkdown packages, we create documents (for example, PDF, HTML, or Microsoft Word) based on data input. For regular reports that build on continuously changing data, yet that have the same structure overall, this is a great time-saver.


Archive | 2016

Getting a Cloud

Matt Wiley; Joshua F. Wiley

Depending on your needs and uses for R, it can be convenient to have a reasonable amount of memory and processor capability. Often this power is necessary only on occasion, and it may not be cost-effective to own the hardware. This is where hosting R on the cloud may be helpful. Cloud instances bring on-demand resources that are readily scaled up or down as each situation requires. These days, there are several tolerable outfits that provide such services at very reasonable prices. The challenge we face is threefold, and we walk through those steps over the next three chapters. We need to get a cloud, we need to administer our cloud’s operating system, and we need to do some fun things with R.


Archive | 2016

Other Tools for Data Management

Matt Wiley; Joshua F. Wiley

Comparing data frames and data tables leads to an interesting question. What if there were more types of data? Particularly, what if there were different ways to store data that are all, at their heart, tables of some sort? In addition to data frames or tables, there are many ways to store data, many of which are just tables. The idea behind dplyr (Wickham and Francois, 2015) is that regardless of what the data back end might be, our experience should be the same. To allow this, dplyr implements generic functions for common data management tasks. For each of these generic functions, specific methods are written that translate the generic operation into whatever code or language is required for a specific back end. Using a layer of abstraction ensures that users get a consistent experience, regardless of the specific data format, or back end, being used. It also makes dplyr extensible, in that support for a new format can be added by simply writing additional functions or methods. The user experience need not change. For this chapter, our checkpoint header needs to have both the tibble and the dplyr packages installed and added:

Collaboration


Dive into the Joshua F. Wiley's collaboration.

Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge