[PDF] Comparison of plotting system outputs in beginner analysts

Abstract

The R programming language is built on an ecosystem of packages, some that allow analysts to accomplish the same tasks. For example, there are at least two clear workflows for creating data visualizations in R: using the base graphics package (referred to as "base R") and the ggplot2 add-on package based on the grammar of graphics. Here we perform an empirical study of the quality of scientific graphics produced by beginning R users. In our experiment, learners taking a data science course on the Coursera platform were randomized to complete identical plotting exercises in either the base R or the ggplot2 system. Learners were then asked to evaluate their peers in terms of visual characteristics key to scientific cognition. We observed that graphics created with the two systems rated similarly on many characteristics. However, ggplot2 graphics were generally judged to be more visually pleasing and, in the case of faceted scientific plots, easier to understand. Our results suggest that while both graphic systems are useful in the hands of beginning users, ggplot2's natural faceting system may be easier to use by beginning users for displaying more complex relationships.

Full PDF

CComparison of plotting system outputs in beginner analysts

Leslie Myint , Aboozar Hadavand , Leah Jager , and Jeﬀrey Leek Department of Mathematics, Statistics, and Computer Science, Macalester College, 1600Grand Ave, Saint Paul, MN 55105 Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N.Wolfe St, Baltimore, MD 21212

Abstract

The R programming language is built on an ecosystem of packages, some that allow analysts toaccomplish the same tasks. For example, there are at least two clear workﬂows for creating data visual-izations in R: using the base graphics package (referred to as “base R”) and the ggplot2 add-on packagebased on the grammar of graphics. Here we perform an empirical study of the quality of scientiﬁc graph-ics produced by beginning R users. In our experiment, learners taking a data science course on theCoursera platform were randomized to complete identical plotting exercises in either the base R or theggplot2 system. Learners were then asked to evaluate their peers in terms of visual characteristics keyto scientiﬁc cognition. We observed that graphics created with the two systems rated similarly on manycharacteristics. However, ggplot2 graphics were generally judged to be more visually pleasing and, in thecase of faceted scientiﬁc plots, easier to understand. Our results suggest that while both graphic systemsare useful in the hands of beginning users, ggplot2’s natural faceting system may be easier to use bybeginning users for displaying more complex relationships.

Key Words:

Data Visualization; Statistical Perception; R; Randomized Trial

The R programming language is one of the most popular means of introducing computing into data sci-ence, data analytics, and statistics curricula (C¸ etinkaya-Rundel and Rundel, 2018). An advantage of the Recosystem is the powerful set of add-on packages that can be used to perform a range of tasks from experi-mental design (Groemping, 2018), to data cleaning (Ross et al., 2017; Grolemund and Wickham, 2017), to1 a r X i v : . [ s t a t . O T ] M a r isualization (Wickham, 2009), and modeling or machine learning (Kuhn, 2008).While these packages make it possible for users of R to accomplish a wide range of tasks, it also meansthere are often multiple workﬂows for accomplish the same data analytic goal. These competing workﬂowsoften lead to strong opinions and debates in the literature, on social media, and on blogs (Leek, 2016;Robinson, 2016). But we have collected relatively little information about the way that these tools are usedin the hands of end users.One of the more commonly debated aspects of data science education within the R community is theplotting system used to introduce learners to statistical graphics. Generally, the two main systems underconsideration are the base graphics package in R (called “base R”) and the ggplot2 graphics systems basedon the grammar of graphics (Wickham, 2009). There has been some online and informal debate about thegeneral strengths and weaknesses of these two systems for both research and teaching (Leek, 2016; Robinson,2016). More recently there has been discussion of the relative merits of the two plotting systems in teachingthe speciﬁc student population of beginner analysts (Robinson, 2014) and some investigation of learningoutcomes when using base R and ggplot2 in the classroom (Stander and Dalla Valle, 2017). In the latterinvestigation, Stander et al. provide instruction in both plotting systems in the classroom but do not formallycompare the systems in terms of student learning outcomes.There has also been a surge in the creation of resources that focus on ggplot2, and more broadly, theencompassing tidyverse framework (Grolemund and Wickham, 2017). The Modern Dive open source in-troductory textbook for data science education with R is one such example (Ismay and Kim, 2017). Rosset al. (2017) describe a full data analytic workﬂow in this framework. Generally, proponents of the ggplot2system cite the harmonization of the tidy data mindset and ggplot2 syntax for mapping between variablesand visual plot elements. They also appeal to the modular nature of the syntax that gives rise to the abilityto build plots in layers. Proponents of the base R system cite its power to create nearly any imaginablegraphic by acting on individual plot elements such as points and lines. Making plots in the base R systemalso increases exposure to for-loops (and related ideas), which can be helpful to students in other aspects oftheir data science training.A primary goal in statistical education is giving students the ability to communicate eﬀectively withdata. Of course, statistical graphics are a major part of eﬀective data communication, and a large body ofliterature on the visual display of scientiﬁc research and human cognition highlights the need for thinkingcritically about statistical graphics education. For one, Tversky et al. (2002) argue that a visual display mustbe accurately perceived to be eﬀective and refer to this as the “apprehension principle of visual displays”.In the context of presentation of physical processes, they show that animations are not more eﬀective thanstatic graphics. Smallman and John (2005) provide a similar analysis in the context of visual dimensions.2hey ﬁnd that people misperceive distances in depth and, therefore, 3D displays are not ideal for presentingabsolute distances. Kosslyn (2006) argues that graphics should not present information beyond what isneeded by the user. Rosenholtz et al. (2007) and Wickens and Carswell (1995) ﬁnd that presenting toomuch information in the display can lead to visual distraction in non-expert audiences of scientiﬁc research.Therefore, it is important for statistical educators to teach graphics systems that aid students’ creation ofeﬀective data displays - that is, data displays that enhance scientiﬁc cognition.Here we seek to better understand diﬀerences in the visual display and perception of plots made in thebase R and ggplot2 systems. We study this in a group of beginner learners within the Coursera platform.Speciﬁcally, we report results from a randomized experiment in which learners were randomized to completeidentical plotting exercises in either the base R or the ggplot2 system. Learners were then asked to evaluateplots from their peers in terms of visual characteristics key to scientiﬁc cognition.We hypothesized that plots made with ggplot2 would generally rate higher on aesthetics and clarity dueto the relative ease of the syntax and the default layout. That is, we believed that it would be syntacticallyeasier for students to create “correct” or eﬀective plots in ggplot2. At the same time, we hypothesized thatplots in base R would show clearer labels due to its undesirable default labels. We suspected that students’direct modiﬁcation of labels (as opposed to accepting defaults) would result in higher clarity labeling.We ﬁnd that, for the speciﬁc exercises given to the students, the aesthetic diﬀerences between the twoplotting systems (as measured by the peer review) are generally small. However, we ﬁnd that the plots madewith ggplot2 are generally of higher clarity than those made in the base R system, particularly when thestudents were asked to make a complex, multi-panel plot. We also observe diﬀerences between the systemsin the number of panels used in this complex, multi-plot, suggesting diﬀerent cognitive interactions with theR syntax. We ran a randomized experiment from July 2016 to September 2017 within the Reproducible Research coursein the Johns Hopkins Data Science Specialization on Coursera. This course covers the basics of RMarkdown,literate programming, and the principles of reproducible research. This course follows Exploratory DataAnalysis, a course that covers the base R and ggplot2 systems as well as concepts involved in thoroughexploratory analysis. Since the launch of Reproducible Research, 187,617 learners have enrolled, from which29,534 have completed the course. Demographic information summaries are available in Table 1. Thisdemographic information is speciﬁc to this oﬀering of the Reproducible Research course on Coursera, butbut not necessarily speciﬁc to the students who participated in the experiment.3able 1: Learner demographics in the Reproducible Research course

Characteristics SharesGender Male: 76%Female: 24%Student status Non-degree student: 68%Full-time student: 24%Part-time student: 8%Education College (no degree): 4%Bachelor’s degree: 34%Master’s degree: 46%Doctorate degree: 11%Other: 5%Employment Status Full-time: 68%Part-time: 4%Unemployed (looking for work): 16%Other: 12%Language English: 89%Chinese: 3%Other: 8%Country United States: 36%India: 12%Great Britain: 4%Canada: 3%Germany: 3%China: 3%Other: 39%

Note: The demographic information is for all students who took the course Repro-ducible Research as part of the Johns Hopkins Data Science Specialization on Cours-era. It is not necessarily speciﬁc to the students who took participated in our exper-iment.

To practice the plotting techniques you have learned so far, you will be making a graphic thatexplores relationships between variables. You will be looking at a subset of a United States medicalexpenditures dataset with information on costs for diﬀerent medical conditions and in diﬀerentareas of the country.You should do the following: Make a plot that answers the question: what is the relationship between mean covered charges(

Average.Covered.Charges ) and mean total payments (

Average.Total.Payments ) in NewYork? Make a plot (possibly multi-panel) that answers the question: how does the relation-ship between mean covered charges (

Average.Covered.Charges ) and mean total payments(

Average.Total.Payments ) vary by medical condition (

DRG.Definition ) and the state inwhich care was received (

Provider.State )? Use only the base graphics system to make your ﬁgure.

Please submit to the peer assess-ment two PDF ﬁles, one for each of the two plots. You will be graded on whether you answeredthe questions and a number of features describing the clarity of the plots including axis labels,ﬁgure legends, ﬁgure captions, and plots. For guidelines on how to create production quality plotssee Chapter 10 of the Elements of Data Analytic Style.

In the ggplot2 arm of the experiment, learners instead saw the sentence:

Use only the ggplot2 graphicssystem to make your ﬁgure.

Figure 1 shows sample submissions for both arms of the study and for boththe simple and complex questions. We emphasize that as part of the assignment prompt, we tell studentshow their plots will be evaluated. We tell them that they will be evaluated on answering the questions,including axis labels, ﬁgure legends, ﬁgure captions, and on the type of plot created. We also point them to5igure 1:

Sample student submissions.

The top panels show example student submissionsfor the simple plot that are of low (a) and high (b) quality. The bottom panels show examplesubmissions for the complex plot that are of low (c) and high (d) quality. The left hand plots ineach section were made in base R, and the right hand plots were made in ggplot2. For privacyreasons, none of these ﬁgures were actually made by students. These ﬁgures are recreations thatshow general types of ﬁgures that were commonly made by students.a resource from a previous course in the specialization that discusses exactly these points. That all studentsare aware of this before submitting means that students in both arms had opportunities to speciﬁcally workon these aspects of their plots. Thus diﬀerences between arms are not attributable to lack of awarenessabout assessment criteria but more so to students’ skill with the two plotting systems.After completing the assignment, students were asked to review one or more assignments from theirpeers. Students reviewed assignments that used the same plotting system in which they completed theirassignment. The review rubric is shown in Table 2. For the question, “Did they upload a plot?”, we providethree response choices for the simple plot to determine if the student uploaded a plot using the correctplotting system. Because the peer review rubric starts with assessments for the simple plot, we operatedunder the assumption that this compliance status also applied to the complex plot. For this reason, thereare only two answer choices for this question for the complex plot.The complex plot involved visualizing a relationship between two continuous variables in the 36(states/medical conditions) subgroups. For the complex plots only, we manually annotated certain visualfeatures of the uploaded plots. These annotations included 3 pieces of information:1. Our own judgment of whether the plot was made with the correct plotting system. (Correct/Incorrect)6able 2: Peer review rubric

Plot aspect Question/Answer ChoicesPlot 1 Plot 2

Presence Did they upload a plot?NoYesYes, and they made itwith (base R)(ggplot2). Did they upload a plot?NoYesContent Does the plot clearly show therelationship between mean coveredcharges (Average.Covered.Charges)and mean total payments(Average.Total.Payments) inNew York?NoYes Does the plot clearly show therelationship between mean coveredcharges (Average.Covered.Charges)and mean total payments(Average.Total.Payments) vary bymedical condition (DRG.Deﬁnition)and the state in which care wasreceived (Provider.State)?NoYesGeneral aesthetics Is the plot visually pleasing?NoYesClarity Can the plot be understood without a ﬁgure caption?NoYesAnnotation Are the legends and labels suﬃcient to explain what the plot is showing?NoYesDisplay Are the plot text and labels large enough to read?NoYesAnnotation Do the plot text and labels use full words instead of abbreviations?NoYes

2. The number of panels present in the plot (corresponding to subgroups). Most common values were 1,2, 6, 12, and 36.3. An indication (yes/no) of whether the plot had some other visual grouping that was not a panel. Forexample, points within one panel could be colored by medical condition to satisfy this criterion.

A total of 1078 students participated in the trial. In the base R arm, 436 students submitted a plot, and inthe ggplot2 arm, 642 students participated. This diﬀerential participation could be due to students feelingless comfortable with base R graphics than ggplot2, but in the absence of information on characteristics andcourse outcomes for these non-participants, we analyzed data for these 1078 students who completed theassignment and peer review. Among these students, there was a 100 percent response rate for all items onthe review rubric. Each student was asked to review the submissions of at least one and possibly multipleother students. There were 1267 total peer reviews in the base R arm and 1440 in the ggplot2 arm. In the7ollowing results, we remove peer review responses for which the reviewer answered “No” to the “Did theyupload a plot?” question.

Based on our manual annotation of the complex plots in both the base R and ggplot2 arms, we were able tocompute the exact percentage of complex plot submissions that were made in the correct plotting system. Inthe base R arm, 433 of the 436 submitted plots could be annotated. (The remaining 3 plots were completelyempty ﬁles.) Of the 433 annotated plots, 375 (86.6%) were made in the base R system. In the ggplot2 arm,637 of the 642 submitted plots could be annotated. (The remaining 5 plots were completely empty ﬁles.) Ofthe 637 annotated plots, 636 (99.8%) were made in the ggplot2 system. The higher rate of compliance forthe complex plot in the ggplot2 arm was expected given the more concise syntax of the ggplot2 system.We did not annotate the simple plots but we expect that rates of compliance would be similar to that forthe complex plots. We are still able to estimate compliance rates for the simple plot through the peer reviewquestion “Did they upload a plot?” Student reviewers were able to choose from “No”, “Yes”, and “Yes, andthey made it with base R (ggplot2).” In each arm, we estimate the compliance rate to be the fraction of thetime the third response was chosen in all peer reviews. We estimate the compliance rate for the simple plotto be 92.9% in the base R arm and 97.3% in the ggplot2 arm.

Peer review outcomes for all students are displayed in Table 3. Review outcomes for visual characteristicswere similar between the base R and ggplot2 systems. For most characteristics, the systems diﬀered by onlya few percentage points, but positive plot qualities were more likely to be seen in plots made in ggplot2.Further, positive qualities were more likely to be seen in the simple plot than in the complex plot for bothsystems.In terms of general aesthetics, plots made in ggplot2 were more likely to be viewed as visually pleasing,and this diﬀerence was more pronounced in the simple plot in the complex plot.Ratings of overall clarity (“Does the plot clearly show the relationship?”) were higher for ﬁgures madein ggplot2 for both the simple and complex plots, and the diﬀerence between the systems was larger for thecomplex plot. We also assessed plot clarity through the two questions: “Can the plot be understood withouta ﬁgure caption?” and “Are the legends and labels suﬃcient to explain what the plot is showing?”. Forthese two questions, the diﬀerence between the two systems is more pronounced for the complex plot. Forthe complex plot, submissions made in ggplot2 were more likely to be perceived as being suﬃciently clear as8 standalone ﬁgure.For both the simple and complex plots, there is no indication of diﬀerences in tendencies to use full wordsversus abbreviations between the two plotting systems. This is sensible given that users create the text ofplot annotations in nearly the same way in both systems. Interestingly, for the complex plot, graphics madein ggplot2 were less likely to have plot text and labels that were large enough to read. This may be due tothe nature of text resizing when plotting with facets in ggplot2 and to a lack of instructional time spent onﬁne tuning such visual aspects within the course.We also examined peer review outcomes on the subset of students that complied with their assignedplotting system, and we ﬁnd that results are almost identical to the results discussed above for the full setof reviews (Table 4).Table 3:

Comparison of peer review responses in the base R and ggplot2 arms(all student submissions).

Plot Prompt Response Base R ggplot2 ggplot2 - base Rsimple Clearly shows relationship? Yes 86.2% 89.7% 3.5% (1%, 6.1%)*simple Is the plot visually pleasing? Somewhat 23.1% 18.3% -4.8% (-8%, -1.7%)*simple Is the plot visually pleasing? Yes 73.7% 80.5% 6.9% (3.6%, 10.1%)*simple Understandable without caption? Yes 90.9% 91% 0.1% (-2.1%, 2.4%)simple Legends and labels suﬃcient? Yes 89.4% 90.8% 1.4% (-1%, 3.7%)simple Text and labels large enough? Yes 97.8% 99% 1.2% (0.2%, 2.2%)*simple Use full words vs. abbreviations? Yes 95.4% 96.1% 0.7% (-0.9%, 2.3%)complex Clearly shows relationships? Yes 72.3% 83.6% 11.4% (8.1%, 14.6%)*complex Is the plot visually pleasing? Somewhat 30% 30.8% 0.8% (-2.8%, 4.3%)complex Is the plot visually pleasing? Yes 59.5% 60.6% 1% (-2.8%, 4.8%)complex Understandable without caption? Yes 76.8% 81.5% 4.7% (1.5%, 7.9%)*complex Legends and labels suﬃcient? Yes 77.9% 82.4% 4.5% (1.4%, 7.6%)*complex Text and labels large enough? Yes 89.8% 86.3% -3.5% (-6%, -1%)*complex Use full words vs. abbreviations? Yes 83.6% 85.4% 1.8% (-1%, 4.6%)Note: For each rubric item and response, the percentage of reviews indicating that response are shown.The last column gives the diﬀerence between the ggplot2 and base R arms and the 95% conﬁdence intervalfor that diﬀerence.

Table 4:

Comparison of peer review responses in the base R and ggplot2 arms(compliant submissions).

Plot Prompt Response Base R ggplot2 ggplot2 - base Rsimple Clearly shows relationship? Yes 85.7% 89.7% 4% (1.3%, 6.7%)*simple Is the plot visually pleasing? Somewhat 23.1% 18.3% -4.8% (-8.1%, -1.5%)*simple Is the plot visually pleasing? Yes 73.6% 80.5% 6.9% (3.5%, 10.3%)*simple Understandable without caption? Yes 90.6% 90.9% 0.3% (-2.1%, 2.6%)simple Legends and labels suﬃcient? Yes 89.3% 90.7% 1.3% (-1.1%, 3.8%)simple Text and labels large enough? Yes 97.7% 99% 1.3% (0.2%, 2.4%)*simple Use full words vs. abbreviations? Yes 95.4% 96.2% 0.7% (-0.9%, 2.4%)complex Clearly shows relationships? Yes 72.5% 83.7% 11.2% (7.8%, 14.5%)*complex Is the plot visually pleasing? Somewhat 30.5% 31% 0.5% (-3.2%, 4.3%)complex Is the plot visually pleasing? Yes 59.7% 60.5% 0.7% (-3.2%, 4.6%)complex Understandable without caption? Yes 76.3% 81.4% 5.1% (1.8%, 8.4%)*complex Legends and labels suﬃcient? Yes 77.6% 82.4% 4.8% (1.6%, 8%)*complex Text and labels large enough? Yes 90.5% 86.3% -4.2% (-6.8%, -1.6%)*complex Use full words vs. abbreviations? Yes 83.8% 85.5% 1.7% (-1.2%, 4.6%)Note: For each rubric item and response, the percentage of reviews indicating that response are shown.The last column gives the diﬀerence between the ggplot2 and base R arms and the 95% conﬁdence intervalfor that diﬀerence. .3 Types of complex plots made Through our manual annotation of the complex plots, we were able to categorize the diﬀerent types ofstudent plots. Examples of the diﬀerent types of plots are shown in Figure 2. The prevalence of these plottypes in the base R and ggplot2 arms are shown in Table 5. In these results, we count base R and ggplot2plots according to our manual annotation of the plotting system used, not by the actual experimental armin which the student was enrolled.Before completing the annotations, we hypothesized that the percentage of students making the full 6-by-6 panel of 36 scatterplots would be much higher in the ggplot2 arm because of the ease of syntax withinthe facet grid() function used for creating panels by categorical variables. This was indeed the case as54.3% of the ggplot2 submissions were 36 panel plots, compared to 31.9% for base R plots (Table 5). The6-by-6 panel of scatterplots is of pedagogical interest because this ﬁgure allows students to fully explorethe interaction between the two categorical variables (medical condition and state). Although a 36 panelplot is the most concise formulation in the ggplot2 system, a 12 or 6 panel scatterplot that colors pointsby the remaining categorical variable (the one not used to deﬁne the panel) is perhaps more eﬀective formaking visual comparisons (Figure 2b). Such a ﬁgure places trends to be compared on the same plot, whichfacilitates comparisons more than the 36 panel plot. We see that all plot types aside from the 36 panel plotwere more likely to be made in the base R submissions (Table 5). This may suggest interesting diﬀerencesbetween the systems in how students process or approach the syntax needed to create such ﬁgures. We alsosee that the 36 panel, the colored 12 panel, and colored 6 panel plots were the most likely to be rated asclearly showing the intended relationship, as measured by the overall clarity rubric item (Table 6). Ratingsof clarity in these plot type subgroups are uniformly higher for plots made in ggplot2.Table 5:

Types of complex plots.

For the 6 most common plot types, the percentage of submissions inwhich that type of plot was made is shown. The last column gives the diﬀerence between the ggplot2 andbase R arms and the 95% conﬁdence interval for that diﬀerence.

Plot type Base R ggplot2 ggplot2 - base R36 panels 31.9% 54.3% 22.3% (16.1%, 28.5%)*12 panels, no color 8.2% 3.2% -5.1% (-8.3%, -1.8%)*12 panels, color 1.9% 0.1% -1.7% (-3.3%, -0.1%)*6 panels, no color 4% 2% -2% (-4.4%, 0.5%)6 panels, color 42% 36.7% -5.4% (-11.7%, 1%)2 panels, color 5.6% 1.7% -3.9% (-6.6%, -1.1%)*

We performed a randomized trial in a group of beginner learners to understand their perceptions of statisticalgraphics made in the base R and ggplot2 systems. We ﬁnd that for displaying bivariate relationships at an10 a) (b)(c)

Figure 2:

Examples of types of complex plots. (a) A typical 36 panel plot. (b) A typical 12panel plot with coloring has panels for the 6 states colored by medical condition and panels forthe 6 medical conditions colored by state. A typical 6 panel plot would show one of these tworows. (c) A typical 2 panel plot has one panel colored by the 6 states and a second panel coloredby the 6 medical conditions.Table 6:

Clarity of the diﬀerent types of complex plots.

For the 6 most common plot types, weshow the percentage of submissions of each type that were judged to clearly show the intended relationship.Numbers in parentheses indicate sample sizes.

Plot type Base R ggplot236 panels 81.3% (279/343) 88.9% (735/827)12 panels, no color 64.4% (56/87) 73.5% (25/34)12 panels, color 81.8% (9/11)6 panels, no color 46.2% (18/39) 60% (15/25)6 panels, color 79.3% (391/493) 78.7% (470/597)2 panels, color 45.6% (31/68) 56.4% (22/39) aggregate level and across strata, students using the ggplot2 system create graphics with slightly higheraesthetic appeal and greater scientiﬁc clarity, where clarity is measured by the questions “Does the plotclearly show the relationship?”, “Can the plot be understood without a ﬁgure caption?”, and “Are thelegends and labels suﬃcient to explain what the plot is showing?”. The clarity increase is greater whenstudents attempted the more complex task of trying to depict a bivariate relationship across strata. We alsoobserved that students were more likely to create plots in the assigned system when using ggplot2, suggestinga preference for ggplot2 due to factors we have not measured in our experiment.We also ﬁnd that students are more likely to explore more complex interactions between variables whenusing ggplot2 than base R. Speciﬁcally, we saw a higher rate of students creating a full grid of scatterplots toanswer the complex question when using ggplot2 than when using base R. This is in line with the relativelystraightforward syntax for creating faceted plots within the ggplot2 framework. For the same ﬁgure to be11ade in base R, the students would have to used two nested for-loops, which may be an idea with whichthey are less comfortable. Despite the increased programming skill required, the most common type of plotmade in the base R arm was a 6 panel ﬁgure that did require the use of a single for-loop.Our results indicate that ggplot2 may slightly outperform base R, particularly as students move to facetedplots across multiple conditions. This provides evidence in favor of those advocating for the use of ggplot2 inintroductory classes. The relatively small diﬀerences also suggest that both plotting systems can be capablyused by beginning users to display scientiﬁc information.The scope of our plotting assignment is limited in terms of the breadth of statistical graphics that areused in practice, but it does cover the concepts of bivariate relationships and stratiﬁcation, which are coreideas in data analysis in introductory statistics courses. The observed increase in reported scientiﬁc clarityfor ggplot2 ﬁgures suggests that students have more favorable evaluations of these plots than plots made inbase R. It is unclear whether this perceived increase in clarity is actually a result of more favorable aestheticevaluations, but even if this is the case, students may be able to extract more scientiﬁc meaning from theseplots simply because they are more comfortable with this plotting style.The strongest eﬀects we observed in the data was in the type of plots made for faceted analysis. Studentswere signiﬁcantly more likely to make the full 36 panel faceted ﬁgure in the complex case using the ggplot2system. While it does allow students to fully explore the interaction between the two categorical variables, itis not necessarily the most eﬀective visualization because it requires the viewer to jump their eyes back andforth between panels to compare trends. A more eﬀective plot collapses some of the panels by adding color,which was a more frequently made plot in the base R arm. These observations suggest student ease withthe syntax used to create scatterplot grids in ggplot2. However, this ease may come at a price in terms ofencouraging conscious eﬀorts at the most eﬀective visualizations. Although students seem to favor ggplot2in terms of clarity and aesthetics, educators should be careful to continually emphasize the principles behindeﬀective data visualization for communicating informative results.

Data and code availability:

Code and data to reproduce the analyses here is available at https://github.com/lmyint/ggplot_base . Funding:

This work was supported by National Institutes of Health grant R01GM115440.

Ethics:

We received approval to analyze this data from the Johns Hopkins Bloomberg School of PublicHealth: IRB number 00005988. 12 eferences

C¸ etinkaya-Rundel, M. and Rundel, C. (2018). Infrastructure and tools for teaching computing throughoutthe statistical curriculum.

Am. Stat. , 72(1):58–65.Groemping, U. (2018). Cran task view: Design of experiments (doe) & analysis of experimental data. https://CRAN.R-project.org/view=ExperimentalDesign . Accessed: 2018-2-8.Grolemund, G. and Wickham, H. (2017). R for data science. https://r4ds.had.co.nz/ . Accessed: 2019-2-8.Ismay, C. and Kim, A. Y. (2017). An introduction to statistical and data sciences via R. Accessed: 2019-02-08.Kosslyn, S. M. (2006).

Graph design for the eye and mind . OUP USA.Kuhn, M. (2008). Building predictive models in R using the caret package.

Journal of Statistical Software,Articles , 28(5):1–26.Leek, J. (2016). Why I don’t use ggplot2. Accessed: 2019-02-08.Robinson, D. (2014). Don’t teach built-in plotting to beginners (teach ggplot2). Accessed: 2019-02-08.Robinson, D. (2016). Why I use ggplot2. Accessed: 2019-02-08.Rosenholtz, R., Li, Y., and Nakano, L. (2007). Measuring visual clutter.

Journal of vision , 7(2):17–17.Ross, Z., Wickham, H., and Robinson, D. (2017). Declutter your R workﬂow with tidy tools.

PeerJ PrePrints;San Diego .Smallman, H. S. and John, M. S. (2005). Na¨ıve realism: Misplaced faith in realistic displays.

Ergonomicsin design , 13(3):6–13.Stander, J. and Dalla Valle, L. (2017). On enthusing students about big data and social media visualizationand analysis using r, RStudio, and RMarkdown.

J. Stat. Educ. , 25(2):60–67.Tversky, B., Morrison, J. B., and Betrancourt, M. (2002). Animation: can it facilitate?

International journalof human-computer studies , 57(4):247–262.Wickens, C. D. and Carswell, C. M. (1995). The proximity compatibility principle: its psychological foun-dation and relevance to display design.

Human factors , 37(3):473–494.Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysisggplot2: Elegant Graphics for Data Analysis