[PDF] Teachers' perception of Jupyter and R Shiny as digital tools for open education and science

Abstract

During the last ten years advances in open-source digital technology, used especially by data science, led to very accessible ways how to obtain, store, process, analyze or share data in almost every human activity. Data science tools bring not only transparency, accessibility, and reproducibility in open science, but also give benefits in open education as learning tools for improving effectiveness of instruction. Together with our pedagogical introduction and review of Jupyter as an interactive multimedia learning tool, we present our three-years long research in the framework of a complex mixed-methods approach which examines physics teachers' perception of Jupyter technology in three groups: Ph.D. candidates in physics education research (PER) ( N=9 ), pre-service physics teachers ( N=33 ) and in-service physics teachers ( N=40 ). Despite the fact that open-source Jupyter notebooks are natural and easy as email or web, the results suggest that in-service teachers are not prepared for Jupyter technology and open analysis, but positively accept open education data presented via another open-source data science tool, R Shiny interactive web application, as an important form of immediate feedback and learning about the quality of their instruction. Simultaneously our instruction results in the frame of the Flipped Learning also indicate that young beginning PER researchers and pre-service physics teachers can master key digital skills to work with Jupyter technology appreciating its big impact on their learning, data and statistical literacy or professional development. All results support the ongoing worldwide effort to implement Jupyter in traditional education as a promising free open-source interactive learning tool to foster learning process, especially for the upcoming young generation.

Full PDF

HHanˇc et al.

RESEARCH

Teachers’ perception of Jupyter and R Shinyas digital tools for open education and science

Jozef Hanˇc , Peter ˇStrauch , Eva Paˇnkov´a and Martina Hanˇcov´a * Correspondence:[email protected] Institute of Physics, Faculty ofScience, Pavol Jozef ˇSaf´arikUniversity, Koˇsice, SlovakiaFull list of author information isavailable at the end of the article † Equal contributor

Abstract

During the last ten years advances in open-source digital technology, usedespecially by data science, led to very accessible ways how to obtain, store,process, analyze or share data in almost every human activity. Data science toolsbring not only transparency, accessibility, and reproducibility in open science, butalso give beneﬁts in open education as learning tools for improving eﬀectivenessof instruction.Together with our pedagogical introduction and review of Jupyter as aninteractive multimedia learning tool we present our three-years long research inthe framework of a complex mixed-methods approach which examines physicsteachers’ perception of Jupyter technology in three groups: Ph.D. candidates inphysics education research (PER) ( N = 9 ), pre-service physics teachers( N = 33 ) and in-service physics teachers ( N = 40 ).Despite the fact that open-source Jupyter notebooks are natural and easy asemail or web, the results suggest that in-service teachers are not prepared forJupyter technology and open analysis, but positively accept open education datapresented via another open-source data science tool, R Shiny interactive webapplication, as an important form of immediate feedback and learning about thequality of their instruction.Simultaneously our instruction results in the frame of the Flipped Learning alsoindicate that young beginning PER researchers and pre-service physics teacherscan master key digital skills to work with Jupyter technology appreciating its bigimpact on their learning, data and statistical literacy or professional development.All results support the ongoing worldwide eﬀort to implement Jupyter intraditional education as a promising free open-source interactive learning tool tofoster learning process, especially for the upcoming young generation. Keywords: data science tools; learning tools; interactive multimedia; datavisualization; data analysis; user experience questionnaire; ﬂipped learning

Introduction

During the last decade, new digital technologies such as mobile devices, cloud in-frastructure, open data, artiﬁcial intelligence, decentralized and social networks(Downes, 2019) caused that we live in the digital and data-intensive age. Ninetypercent of all data has been created in the last two years (Boaler, 2020; Marr, 2018).Data touches all aspects of our lives. The world economy, our jobs, our health, ourenvironment and our roles as citizens increasingly depend on the knowledge, skillsand technology required to work with, understand, and eﬀectively use data (Boisvertet al, 2016).In April 2020 GitHub (GitHub, Inc., 2020; Warner, 2018), the world largest webcloud platform for storing, social coding and collaborating on any open code or a r X i v : . [ phy s i c s . e d - ph ] J u l anˇc et al. Page 2 of 32 digital content, reached a further milestone, 50 million developers working on over100 million repositories — 3,000 times more than after its starting year 2008.Simultaneously, during last years open-source software and data science tools,namely programming languages Python and R together with environments for work-ing with them — especially Jupyter and RStudio, have conquered the data scienceworld (kaggle, 2020) providing everybody free, open, revolutionary and very acces-sible ways how to gather, store, process, analyze, present or share data in almostevery human activity.Speaking about science, in the near future, the European Open Science Cloud(EOSC, eosc-portal.eu), oﬃcially launched at the end of 2018, will oﬀer to mil-lions of European researchers, professionals, but also university students, in science,technology, the humanities and social sciences a virtual environment with open andseamless services for storage, management, analysis and re-use of open researchdata, interoperable among all scientiﬁc domains or EU member states. By federat-ing, the EOSC will connect existing and being built scientiﬁc data infrastructures,currently dispersed and isolated across disciplines and borders of EU states. Oneof the key and easily accessible open digital research environments in EOSC (seee.g. the infrastructure EOSC project OpenDreamKit, 2015-2019, opendreamkit.org)should become the mentioned technology of Jupyter Notebooks. The highest ambi-tion of the EOSC is to change the way we do science, to open it.As for education, in 2015 from US a worldwide

Call for Action to promote dataliteracy swept the world (Boisvert et al, 2016; EDC Oceans of Data Institute, 2015).Its signatories call for “a revolution in education, placing data literacy at its core, in-tegrated throughout K-16 education nationwide and around the world. By enablinglearners to use data more eﬀectively, we prepare them to make better decisions andto lead more secure, better-informed and productive lives.”On February 2020, Jo Boaler, a professor of mathematics education at Stanford,invited a group of ﬁfty mathematicians, data scientists, teachers and educationpolicy leaders to start a movement, a YouCubed initiative, which will modernizethe K-12 math curriculum and will prepare high school students to the data age(Boaler, 2020; Spector, 2020).

Jupyter in science and education

One of the current main data science tools for open science is Jupyter technology ofinteractive multimedia notebooks — shortly and oﬃcially called

Jupyter Notebooks .From the viewpoint of users, the technology is natural and easy as email or web. Itwas released in 2014 in the frame of

Project Jupyter (jupyter.org, Kluyver et al, 2016;Project Jupyter et al, 2018) as a free, open alternative to the well-known Mathe-matica Notebooks from commercial software Wolfram Mathematica (wolfram.com).As for the technology platform, Project Jupyter evolved from IPython, a personalside project of a Jupyter-project co-founder and former nuclear physicist, FernandoPer´ez.Jupyter is an open source, free interactive web computing environment, accessi-ble through any modern web browser, that enables users to use, modify or createinteractive multimedia web documents which mix live code (over 100 computer lan-guages with a focus on Python, called kernels), interactive computations, equations,simulations, plots, images, narrative texts, annotations, audios or videos. anˇc et al.

Page 3 of 32

Over the last 5 years, Jupyter Notebooks have become the most widely used as anenvironment for performing scientiﬁc calculations, open analysis, data processingand scientiﬁc reporting (Frederickson, 2019). Data and analysis of the recent Nobelprize-winning detections of gravitational waves (2015), predicted by Einstein in1915, were released as a Jupyter Notebook. At the same time we are witnesses ofan explosion of scientiﬁc and technical articles dealing with Jupyter’s applicationin science, technology and education (Fig. 1).

Figure 1

Annual numbers of scientiﬁc publications dealing with Jupyter in four scholar databesesduring 2015-2019. Our review as bar plots was eﬀectively created using

Scientiﬁc Python in aJupyter notebook (data analysis library pandas ). As we can see in hundreds of scholarly papers connected to education (Fig. 1),e.g. Weiss (2017) – chemistry, Koehler and Kim (2018) – mathematics, Odden et al(2019) – physics, Wright et al (2020) – biology, Cardoso et al (2019) – engineer-ing, Jupyter is widely applied in open STEM and STEAM education (Khine andAreepattamannil, 2019), supporting data literacy and including also programming,statistics, data science, cognitive science, computer science, machine learning, digi-tal humanities, scientiﬁc computation or robotics (Barba et al, 2019).Today, Jupyter notebooks, as living interactive “storytelling” documents, isthe technology behind many innovative educational programs and also became anˇc et al.

Page 4 of 32 platform-of-choice for tutorials, workshops, online lessons, and even books. Oneof the best elaborated book examples is an open GitHub handbook

Teaching andLearning with Jupyter (Barba et al, 2019) covering topics like why and how to useJupyter in education, pedagogical instruction designs and case studies elicited fromauthors’ real experience or technical details of Jupyter implementation in practice.Another example is Coursera (coursera.org), a world-wide MOOC platform, oﬀeringmany free courses and guided projects using Jupyter as a key learning tool.The revolutionary signiﬁcance of Jupyter for open science and education alsobecame the subject of philosophical reﬂections, e.g. from a writer and programmerJ. Sommers (Somers, 2018) or a Nobel prize laureate P. Romer (Romer, 2018) or aCanadian philosopher and open education visionary S. Downes (Downes, 2019).

Jupyter as a tool of interactive teaching methods

Finally in this introductory section, we would like to point out three important, butless known, Jupyter features in connection with three modern interactive instruc-tion approaches which are “designed to promote conceptual understanding throughinteractive engagement of students in heads-on (always) and hands-on (usually)activities which yield immediate feedback through discussion with peers and/orinstructors” (Fraser et al, 2014; Hake, 1998; Paˇnkov´a et al, 2016; Redish, 2014).

Inquiry based science education.

Jupyter allows educators to narrate a “conversa-tion” between the student, concepts and data with goals such as building a model,carrying out a virtual experiment (simulation) or visualizing any data or process,all with or without programming. Pedagogically, such activities can be designed inthe sense of IBSE (Constantinou et al, 2018; Heering et al, 2012).To be more speciﬁc, Jupyter has the so-called widget functionality which providesthe notebook user access to slide bars, toggle buttons or text-boxes. Such elementscan hide the code and allow us to create a notebook app or simulation with aprimary goal to explore or visualize model, computation or data. Another veryuseful feature of Jupyter Notebook as a multimedia web document, is embedding functionality allowing to embed any available digital content from Web (via IPythoncommand

IFrame or HTML ). Regarding IBSE virtual experimentation and modeling,authors of the paper embed own or available interactive cloud Geogebra simulations(see our example in Fig. 2 ) [1] and mix with Jupyter calculations in their universitymath or physics subjects (Bu and Schoen, 2011; Hall and Lingefj¨ard, 2016; Hanˇcet al, 2011).

Peer instruction and Question driven instruction.

In physics education, Peer in-struction (Crouch and Mazur, 2001; Fraser et al, 2014; Mazur and Watkins, 2009)and Question driven instruction (Beatty and Gerace, 2009; Beatty et al, 2006) aretwo well-known and very similar interactive methods [2] which promote student ac-tive learning based on constructivism ideas, formative assessment and cooperative [1]

GeoGebra (geogebra.org, Hohenwarter et al, 2018) is the leading dynamic mathematics softwarefor STEM education allowing to create own interactive simulations without programming knowl-edge. Now it is oﬀering over 1 million free activities, simulations, exercises, lessons and games forsupporting STEM education and innovations worldwide. [2]

Both methods were established at the beginning of 1990s during the testing clickers in real schoolconditions at US universities and they are based on repeated peer-instruction or question cycles: anˇc et al.

Page 5 of 32

Figure 2

A Jupyter notebook example with an embedded interactive 3D Geogebra simulation anda SageMath code which can be applied as part of virtual experimentation in the frame of IBSE. learning. The key technological element of PI or QDI is e-voting which can berealized eﬀectively by virtual clickers (e.g. via cloud service polleverywhere.com)and embedded in a Jupyter notebook or it can be also done very easily in Jupyternotebooks using

Activity extension (Barba et al, 2019; Blank and Silvester, 2020).

Flipped learning.

Interactive methods like IBSE, PI and QDI can be integral partsof group space activities [3] in the ﬂipped learning (Bergmann and Sams, 2012; Nouri,2016; Talbert and Bergmann, 2017), which is according to Talbert “a pedagogicalframework in which the ﬁrst contact with new concepts moves from group learningto individual learning space in the form of structured activity, and the resultinggroup space is transformed into a dynamic, interactive learning environment wherethe educator guides students as they apply concepts and engage creatively in subjectmatter”. As for a screencast technology for creating teaching materials used in theﬁrst-contact, pre-class work of students (their home preparation), Jupyter oﬀers

Graﬃti extension (Downes, 2019; Kessler, 2020). This Jupyter extension allows tocreate interactive screencasts or “live videos” inside Jupyter Notebooks that studentcan watch and pause any time. During any pause, student can interactively “play”with the recorded instructor’s work and combine it with own work and ideas. (1) posing a question (problem) by the instructor; (2) small-group work of students on solutions –peer instruction; (3) collecting answers of students by e-voting; (4) displaying the answers withoutrevealing the correct answer; (5) class-wide discussion; (6) closure (e.g. summarizing the key pointsor giving an explanation). [3]

All these methods are regularly used by the authors during group space activities in ﬂippedmath and physics at secondary school and college levels (Hanˇc, 2013; Paˇnkov´a and Hanˇc, 2019b;Paˇnkov´a et al, 2016). anˇc et al.

Page 6 of 32

Research purpose and design

Typically, many of the found scholar papers (Fig. 1) are informal case studies, whichare generally based on best teaching practices and wisdom of authors. According toa distinguished cognitive psychologist Richard Mayer (Mayer, 2008) such appliedapproach in instruction usually leads to a set of empirical-based design principlesbut with limited applicability since they are not directly or weakly connected tothe cognitive or psychological theory of learning which would provide solid groundshow or why they work.From the perspective of cognitive learning theory (Mayer, 2014a), Jupyter note-book as a learning tool belongs to multimedia (online) which combines words andpictures in static and dynamic form Mayer (2019). Simultaneously Jupyter note-books are key elements of computer-based learning or online learning (Mayer, 2019)when instruction is delivered on a digital device with intention to support learning.It means that any successful application of Jupyter with general applicability mustrespect the cognitive theory of multimedia and online learning whose principlesand processes were discovered during last thirty years and can be found in Mayer(2014b, 2019) [4] . Our basic review of research literature in Fig. 1 demonstratesthat up-to-date only a few scholar papers refer or try to apply or connect theireducational research in Jupyter application with the mentioned cognitive principles.Other caveats of the considered empirical research are connected to the fact thatthe current cognitive theories of multimedia and online learning themselves wouldbeneﬁt from stronger incorporation of aﬀect, motivation and metacognition whereresearch is in its infancy (Mayer, 2019).

Research purpose.

Therefore the purpose of our research was to explore theteacher’s perception of Jupyter as a digital technology during and after their highereducation with the focus on aﬀective aspects of learning. We examined teacher’sperceptions from two viewpoints. The ﬁrst viewpoint concentrated on the teacher’spersonal experience and reﬂections on how the use of Jupyter contributed to theunderstanding of learning content, what self-progress and how transferable wereown application of technology in other context. The second viewpoint focused onthe teacher’s overall, comprehensive impression and satisfaction from user, “cus-tomer” experience representing aﬀective aspects — own emotions and attitudeswhen experiencing Jupyter. Particularly, we were interested in feelings of Jupyter attractiveness , diﬃculty to get familiar with it, eﬃciency during work with it, mo-tivation in using it and capturing user’s attention. Research design.

Our three-years long research was set in the framework of acomplex mixed-methods approach with the convergent design where we combinedresults of three parallel research studies providing all our available complementarysources of quantitative and qualitative data, in order to best understand the researchproblem (Creswell and Clark, 2017; Johnson and Christensen, 2016). We describethe studies conducted from June 2017 to June 2020 in the next three sections whenwe also address in detail for each study: background, context, particular design,participants, procedure, methods and results with the corresponding discussion. [4]

It seems that The Cambridge Handbook of Multimedia Learning (Mayer, 2014a) is the onlycomprehensive research-based reference monograph on multimedia and online learning. anˇc et al.

Page 7 of 32

Data collection and analysis tools.

Diagnostic, data collection and analysis meth-ods will be described in the given studies. As for technology, all statistical analysisin all three studies with data visualization, manipulation, and processing were car-ried out in open-source data science tools — Jupyter Notebooks with kernels (1)scientiﬁc Python (Jones et al, 2001; Oliphant, 2007, SciPy,) and (2) R (R Develop-ment Core Team, 2020) using free available Python libraries: numpy (Walt et al,2011), pandas (McKinney, 2010), matplotlib (Hunter, 2007) and R libraries:

Rcm-drMisc (Fox et al, 2020b), sjstats (L¨udecke, 2020), dplyr , readxl , scales (Wickhamet al, 2019, 2020a,b), cluster (Maechler et al, 2019), factoextra (Kassambara andMundt, 2020), hmisc (Harrell Jr and others, 2020), psychometric (Fletcher, 2010), repr (Angerer et al, 2020) and shiny (Chang et al, 2019).Our open data analysis in the form of Jupyter notebooks together with all usedtool and data ﬁles are stored and freely available at one of our GitHub repositorydevoted to this paper (Hanˇc et al, 2020b) in the frame of our GitHub researchproject Jupyter in Physics Education and Research (Hanˇc et al, 2020a).

Study I: Ph.D. candidates in physics education research

Background and context

In 2017, during one of the last preparatory Ph.D. seminars before the thesis defense,we observed that in their presentation only one of our six Ph.D. candidates inPhysics Education Research (PER) has applied data science tool R in own dataanalysis. All of them were our ﬁrst students who completed the course, taught by JHin the collaboration with MH from a statistical department at our university, wherethe main goal was the use of advanced statistical methods in the

R Commander (Fox, 2016; Fox et al, 2020a), a simple point-and-click graphical interface for R.The ﬁnding was really disturbing since the course was pedagogically designed inthe frame of ﬂipped learning applying interactive and active-learning methods. Itbecame a starting motive of our research to investigate what are the main reasonsof the failure, why and how to change it.

Participants

The study involved PER graduate students ( N = 9) with the master degree inteaching physics with other science subject, in that case math or biology. At ourdepartment of physics education the students were enrolled in the Ph.D. course Statistical methods in Educational Research where they experienced R with R Com-mander and Excel (up to 2017) or later R in Jupyter as a kernel. The left part ofTable 1 in the results section shows sample characteristics.

Design and methods

The qualitative case study had two sequential phases. In the ﬁrst phase (May-June2017), we retrospectively analyzed the use of statistical methods and digital tech-nologies applied by 6 participants in their Ph.D. thesis or projects (Table 1). Ourresearch methods included the content analysis of Ph.D. theses (or Ph.D. project ifthe student did not yet complete the thesis) and a short qualitative interview con-taining two questions connected to key obstacles in using R with R commander andreasons for choosing an alternative (for the exact question wording see Appendix.) anˇc et al.

Page 8 of 32

In the second phase (June 2017-June 2020), based on results of the ﬁrst one, wedecided for an intervention. In the same year (2017) we chose interactive Jupyternotebooks as the main tool for teaching and learning advanced statistical methods inPER. Moreover, three remaining participants, after completing the revised course,more regularly reported and discussed their progress in the application of chosenmethods and tools. We interacted with them and observed them longitudinallyduring the whole period 2017-2020.

Results and discussion

The right part of Table 1 presents a summary of our content analysis focusing onapplying advanced statistical methods, not only basic descriptive and inferential uni-variate statistics (column stats) and new digital technologies (cloud, special tools),not only Excel spreadsheets. Excel remained the main analysis tool and if necessary,students used special Excel add-ons, a trial version of XLSTAT (Addinsoft, 2020)and RSRPS (Zaiontz, 2019). Thanks to collaboration in her Ph.D. research, onestudent had access with professional statistical help to SPSS software (IBM Corp.,2015), the very powerful commercial, but also expensive statistical software [5] . Onestudent also processed his pilot research data in online web service Data Explorerat PhysPort (McKagan et al, 2019, physport.org), a webpage of AAPT to empowerphysics faculty to use eﬀective research-based physics teaching.

Table 1

Using digital tools for data processing and analysis in PhD projects and thesis ( N = 9 )sample characteristics content analysis resultsphase nick g phd date stats cloud? spreadsheets special swI PhD1 F thesis 2017 basic local Excel nonePhD2 F thesis 2017 basic local Excel RPhD3 F thesis 2017 adv. local Excel SPSSPhD4 F thesis 2017 basic local Excel RSRPPhD8 M project 2017 basic cloud Excel PhysPortPhD5 F thesis 2019 adv. local Excel XLSTATII PhD9 M project 2019 basic local Excel nonePhD6 F thesis 2020 adv. cloud Excel,Google R,JupyterPhD7 M thesis 2020 adv. cloud Excel,Google R,PhysPort,Jupyter Regarding the qualitative interview here is a summary of main results representingparticipants responses with typical examples: • onetime use : “Since I met R and R Commander only in this one-semester longcourse, and after one year of not using, I simply forgot it.” • installation problems : “I started to apply R but after reinstallation, I was notable to run it.” • diﬃcult reproducibility : “During the course, I understood all the importantthings and everything looked so simple. But when it came time to analyze mydata, I could not ﬁt it to my data.” • steep learning curve : “I was trying really hard to use some R scripts from Webto apply more advanced methods, but it was diﬃcult to understand.” [5] In the ﬁrst decade of this century, SPSS was one of the most popular in psychology, socialsciences, market research, business and government (Salkind, 2010, e. SPSS). Today, it is still verywidespread providing relatively easy access to modern and advanced statistical methods. anˇc et al.

Page 9 of 32 • easier way : “Excel is for me still more simple and natural than R Comman-der.”; “I found a great online tool [PhysPort] for my data and during realanalysis I saw that I really did not need R for my Ph.D.”, • ignoring advanced methods : “My Ph.D. research did not need methods shownin the course.”After our intervention situation was changed. Two of three Ph.D. candidates (thepaper’s coauthors – PS, EP) fully relied on collecting, processing and analyzing dataon Jupyter and R [6] . In thr previous Ph.D. works students used methods and toolsfor analysis almost in a purely pragmatic way, only reproducing existing procedures.Now, thanks to better and “more durable“ understanding which naturally resultedfrom working with Jupyter and its interactive and “storytelling” features, bothstudents used advanced data visualization or analysis, even in new creative ways.One of such examples is illustrated in Fig. 3 from the Ph.D. thesis of EP (Paˇnkov´aand Hanˇc, 2019a). We explain the corresponding details in the following discussionof the study results. The second example is presented in our parallel studies (StudyIII comes from the Ph.D. thesis of PS) dealing with a weighted benchmark analysisand plots of our data collected by UEQ (User Experience Questionnaire).The second phase of our qualitative longitudinal study brought us another valu-able information. Our three-year longitudinal research outlined the process of adapt-ing Jupyter technology suitable for a teacher or a researcher in education. Usingdata science tools, we mapped the adaptation of Jupyter technology in our researchand teaching practice via a longitudinal, time-series representation (Brockwell andDavis, 2016) graphically summarized and visualized by Fig. 4. We analyzed allJupyter notebooks, created by all authors of the paper, by type, kernel and date us-ing SciPy with pandas, a Python data analysis package originally developed in thecontext of ﬁnancial time series modeling, with an extensive set of tools for workingwith time series data (VanderPlas, 2016).The pandas stacked area plots in Fig. 4 display the evolution of the daily aver-age number notebooks created by authors (the overwhelming majority of Jupyternotebooks with R kernel was created by PS, with SageMath and Python kernelsby JH) with respect to a timeline. The timeline shows the list of important events– signing up to an online service or software installations. The ﬁrst moment whenwe started to use Jupyter is connected to June 2017 when we signed up a CoCalcaccount (cocalc.com) to the last one, our submission of the paper (June 2020).To avoid any problem with local installations mentioned by participants in the ﬁrstphase, it seems as the best choice to start with Jupyter using one of the current cloudservices with zero setup. We choose CoCalc (event: CoCalc account in Fig. 4) [7] .After a few months, we subscribed to a basic paid plan since the free trial runningon free servers can be sometimes very slow (event: CoCalc subscription). [6] The third one is planning to use data science tools in his research, but now from personal reasonshe has a study break. [7]

CoCalc is a virtual online cloud workspace for calculations, research, collaboration and author-ing documents combining the best free mathematical software and document editors. The serviceallows e.g. running Jupyter technology with many kernels (SageMath, Python, R, Julia etc.) andwith real-time collaboration and communication tools directly in a web browser with zero setup.However, there are also another very similar and easy cloud ways to run Jupyter without any soft-ware installation (Data School, 2019), e.g. you can use

Binder (binder.org) from Project Jupyter(Project Jupyter et al, 2018),

Kaggle kernels (kaggle.com/kernels),

Microsoft Azure notebooks (notebooks.azure.com) or

Google Colab (colab.research.google.com). anˇc et al.

Page 10 of 32

Figure 3

An example of advanced multivariate analysis method using R in Jupyter: Hierarchicalcluster analysis AGNES based on concentration factor (Bao and Redish, 2001) and scorecomputed and visualized using

R software (R packages cluster, factoextra ) supplemented by acontinuous density estimation for the distribution of student’s mental images behind quantumphysics concepts (R package likert ) in the cluster.

With improving skills and better understanding we dared to try local installations(Win-10 events). Being more familiar with the technology, we found that gettingmore beneﬁts like more comfortable installation, work or better open-source pack-ages means the transition to an open-source Linux operating system, in our caseFedora (event:Linux). Finally, our experience showed that leaving Windows is notan option for us. After two years we found a solution perfectly ﬁtting requirementsof eﬀective research and teacher work – the use of virtual machine Linux installation(via Oracle VM VirtualBox) which allow us to run Linux on Windows without shut- anˇc et al.

Page 11 of 32

Figure 4

An example of advanced data visualization: Timeline and time-series plots — so-calledstacked area plots, created in

Scientiﬁc Python (data analysis package pandas ) mappingadaptation of Jupyter technology by authors of the paper during last three years (2017–2020).anˇc et al.

Page 12 of 32 ting down Windows and combine the best features from the Windows and Linuxworld (VM events locally installing diﬀerent Jupyter kernels: Sage, R, Julia, Octave,Fricas, PARI/GP).

Discussion.

Results of our small qualitative case study suggest that a one-semesterlong course in using data science tools for Ph.D. PER candidates in the context ofadvanced statistical methods is not enough to start using them as main tools intheir own research and analysis. Possible reasons indicated by the results include ashort time (before our intervention for our students it was the ﬁrst and only coursein this ﬁeld of expertise), steep learning curve and diﬃcult reproducibility if we usethe point-and-click environment.Students typically return to Excel as a natural and rescue option. This appears inaccordance with the general notion that Excel is probably still the most widespreaddigital tool for collecting, storing and processing educational data (Heiberger andNeuwirth, 2009; Wilcox, 2017). However, young researchers use Excel, frequentlywith special add-ons allowing readily apply many advanced and modern methods,despite or unaware of the fact that overwhelming majority of their spreadsheetsusually appear poorly or very hardly reproducible, with always present errors as it isin any point-and-click graphical interface depending on human factor(Baumer et al,2017; Panko, 1998) [8] . This way of working with research data has been seen in widercircumstances and still belongs to not-negligible problems of current education-research publications (van der Zee and Reich, 2018). As for the shortness of learningtime, our conclusion agrees with McKiernan (2017) who also pointed out that onesemester for mastering skills with data tools is not enough.Only after our intervention consisting in the exchange of learning tools, from thepoint-and-click environment (R Commander) to interactive multimedia documents(Jupyter Notebooks), and the following continual advising, watching progress ofour participants, we saw the full acceptance of data science tools which led alsoto creativity of participants. This result is connected with ﬁndings in Odden andCaballero (2019); Odden et al (2019) where writing computational essays usingJupyter supports creative thinking.As for the signiﬁcance of participants’ results, we can comment on the case inFig. 3. let us shortly explain the green cluster with higher concentration C ( ≥ . S ( ≥ . C means concentrating students’answers to less number of question choices (in density visualization it means peaks).The higher score S says that concentration is on the disagreement side with thestatement. Therefore questions in the green cluster have approximately similar,higher concentration of students answers ’to one choice representing the disagree-ment.In PER the cluster analysis is still rare and perceived as diﬃcult and “magic”.EP was able to implement the hierarchical cluster analysis in her Ph.D. research,generally described in EMC Education Services (2015); Kassambara (2017), onlyafter the Jupyter adaptation. We believe that the main reason why only several PER [8] We must say that R Commander has the build MarkDown system for writing easily reproducibleand transparent reports. However, under the inﬂuence of guidebook (Heiberger and Neuwirth,2009) we did not pay attention to this feature which became evident and very natural in Jupyter. anˇc et al.

Page 13 of 32 groups applied and reported this method up to now (e.g.Ireson (1999, SPSS), Dingand Beichner (2009, SAS), Battaglia et al (2019, own C-code), Springuel et al (2019,Python code) consists in fact that all publications are missing the key components ofopen science (van der Zee and Reich, 2018) – analysis in an expensive commercialsoftware or no available public data or no details of data analysis in a readable,easily reproducible code.To complete our comments on cluster analysis, Both “convinced” Ph.D. candi-dates strongly appreciated transparency and reproducibility of Jupyter notebooksconsisting in the possibility to make own notes in along with computational proce-dures, analysis, plots, equations or algorithms. According to their experience, therewas practically no problem to retrieve ideas developed in a notebook even a fewmonths or a year after the end of work. Another highly valued feature was ﬂexibilityof open data science tools like R which is also generally accepted view when theyare compared with commercial softwares like SPSS (Wilcox, 2017).Before getting the ﬁrst knowledge about Jupyter in June 2017, from the mentionedEOSC project OpenDreamKit, all paper’s authors, also teachers by their master de-gree (Math-Phys), had basic algorithmic skills and digital experience mainly withpoint-and-click statistical or mathematical environments. During three years, wesuccessfully started with Jupyter, then naturally switched from the cloud to localwork, from Windows to Linux and from two key kernels (SageMath, Python) tomore kernels (R, Julia, Octave, etc.) – the mode which is also typical and eﬀectivefor the majority of current data scientists (kaggle, 2020). We apply Jupyter note-books as an education tool and also as a scientiﬁc tool in our statistical, time-seriesresearch (Hanˇcov´a et al, 2020) and educational research (Ph.D. theses of EP andPS). Finally, it is important to say that observed time-series seasonal pattern (localmaxima during summer months, minima during winter months) appears very spe-ciﬁc and they are strongly determined by nature of university academic workﬂowand environment (July, August – vacations; December, January – exam periods).

Study 2: Pre-service physics teachers

Background and context

The previous case study showed us valuable deep qualitative insights and detailedexperience dealing with the use and perception of Jupyter technology from the view-point of ordinary educational researchers, from his ﬁrst contact with the technologyto its more advanced application. We were fully aware that type of the study, size ofthe sample and speciﬁc study circumstances would not allow valid generalizations.Therefore, using the results of the ﬁrst phase of Study I, in the same year (2017),we grounded our approach in the following longitudinal quantitative study wherethe main goal was to explore teacher’s perception of Jupyter in more standardconditions with a larger available sample of participants. In that case we decidedto use the standardized, reliable quantitative diagnostic instrument called

UserExperience Questionnaire (Laugwitz et al, 2008; Schrepp et al, 2017), shortly UEQ.This marketing tool is widely used to measure a subjective, overall impression andsatisfaction of user or “customer” experience with an interactive digital product.According to originators, the UEQ is the ﬁrst tool covering user experience with anˇc et al.

Page 14 of 32 the product satisfying three important requirements – a quick assessment; coveringcomprehensive impression; simple and immediate way to express aﬀective attitudes.Particularly, the tool measures overall attractiveness of the product, its usabilityaspects (eﬃciency, perspicuity, dependability) and motivational aspects (stimula-tion, novelty). Perspicuity expresses how easy is to get familiar with the productand to learn how to use it. Eﬃciency stands for eﬃcient and quick interaction withthe product. Dependability expresses how the user feels in control of the interaction.Stimulation means motivation to use it and novelty measures how the innovativenessand creativeness of the product capture user’s attention.

Participants

The participants were all future, pre-service teachers of physics, who always studyphysics in combination with other science or humanity subject, taking the ﬁrst-yearcompulsory course of their bachelor study program, called

Fundamentals of calculusfor physicists at the Institute of Physics at P. J. ˇSaf´arik University in Koˇsice,Slovakia. We collected data after the course during three following years (2018-2020) from a total of N = 33 students (19 females, 14 males). Table 2 describes anumerical summary of participant demographics with respect to a year, gender andstudy ﬁeld. The fourth factor, high or low achiever (Table 3), was determined by agrade average (Nouri, 2016): high means an average from A to B, low represents anaverage from C to F. Table 2

Student demographics ( N = 33 ) in quantitative longitudinal study II.study ﬁeldyear g Math-Phys Phys-Bio Phys-Chem Phys-Comp Phys-Geo all2017 F 1 6 0 0 0 7M 4 2 1 0 0 72018 F 3 2 0 1 0 6M 4 1 0 1 0 62019 F 3 2 0 0 1 6M 1 0 0 0 0 1total 16 13 1 2 1 33 Table 3

The longitudinal distribution of the sample as high or low achievers.study ﬁeld Math-Phys Phys-Bio Phys-Chem Phys-Comp Phys-Geo allachiever high low high low high low low2017 4 1 2 6 1 0 0 142018 4 3 0 3 0 2 0 122019 3 1 1 1 0 0 1 7total 11 5 3 10 1 2 1 33

Design and methods

Instructional design of the course.

The course, a bridge between secondary schooland college level of mathematics and its study, focuses on the conceptual under-standing, getting clear ideas and mastering basic computational and applicationskills connected with fundamentals concepts of calculus: function, derivative, in-tegral, diﬀerential equation and complex number in one and especially their more anˇc et al.

Page 15 of 32 sophisticated versions in more dimensions. Our approach conceptually applies ideasof the Calculus reform (Haver, 1999) which took place during the 80’s and 90’s ofthe previous century and completely re-thought the calculus curriculum for non-majors in math leading to such innovative college textbooks like (Hughes-Hallettet al, 2016).After completing all essential and necessary preparation, from 2012 we teach thecourse in the frame of the Flipped learning (Bergmann and Sams, 2012; Talbertand Bergmann, 2017) using mentioned interactive teaching methods engaging stu-dent participation and active learning (Freeman et al, 2014). As for technology,we teach face-to-face, collaborative space group activities in a PC room whereashome preparation, individual space structured activities mainly based on interac-tive video lessons, are managed principally via Google Classroom as LMS and setof other supporting technologies, e.g. Edpuzzle (edpuzzle.com) or smartphones.An exploration, invention and application of mathematical concepts were realizedby students with the help of two mathematical softwares Geogebra in the visualand graphical domain and Maxima in the analytical domain. More details aboutour approach from content and pedagogical viewpoint can be found in Hanˇc (2013,2016); Paˇnkov´a et al (2016).Our major intervention made during June 2017-September 2017 consisted in re-placing digital learning tools for active learning. Particularly, we transformed themajority of learning materials, Pdfs created in L A TEX typographical system, directlyto one uniform format — online, interactive or static, Jupyter notebooks (Kuep-pers, 2017). The most important change was our transition from the point and clickmathematical softwares Maxima to SageMath [9] as a Maxima replacement.

Data collection.

The basic available demographics and background informationwere obtained from our academic information system at the university. All personalinformation, respecting rules of GDPR, were deleted and basic identiﬁcation wascoded by random codes.Since the ﬂipped math depends heavily on digital technologies and requires somebasic level of digital literacy from students, at the course beginning they ﬁlled outour self-developed

Digital Experience Questionnaire as an online Google form givingus contact information and their digital experience background. The questions areconnected to own digital devices they can bring and use in collaborative activitiesduring the face-to-face instruction, what in home preparation and what digitalbackground in math education they bring from a secondary school (exact questions’wording is in the Appendix and our GitHub storage).At the end of the course, the mentioned UEQ tool was administrated, againelectronically as an online Google form. Being recommended by UEQ originators,we translated the UEQ into Slovak [10] . Its long version, used by us, contains 26 itemsgrouped in six dimensions described above. In the UEQ, responses are in the form [9]

SageMath is a free open-source Python-based mathematics software containing Maxima as itspart (Beezer et al, 2013; Stein and others, 2020; Zimmermann et al, 2018) and now it is one ofkernels in Jupyter. Originally SageMath was created as a free open source, viable alternative to thepopular commercial computer algebra systems or scientiﬁc computing environments like WolframMathematica (wolfram.com), Maple (maplesoft.com), MATLAB (mathworks.com). [10]

Up to these days, there are more than 30 language mutations of the original German version,including our Slovak version, all available at ueq-online.org. anˇc et al.

Page 16 of 32 of the semantic diﬀerential where a respondent via a seven-point scale expresses towhat extent he agrees with the given characteristic of the tool described by twoopposite adjectives, e.g. annoying (cid:13) enjoyable .To address the ﬁrst personal viewpoint of how Jupyter contributed in the un-derstanding of learning content, what self-progress and how transferable were ownapplication of technology we extended UEQ by three 5-point Likert-scale questions(see the Appendix). The data collection was done three times with respect to thegiven conditions.

Data analysis.

In statistical data analysis we applied the exploratory analysisbased on basic descriptive numerical and advanced graphical summarizations al-lowed by data science tools (R and Python) and combined with standard ANOVAprocedures (Maxwell et al, 2017) to test their statistical signiﬁcance. The UEQ datahas a special procedure for analysis described in (Laugwitz et al, 2008) with thefollowing benchmark evaluation of a digital product for each dimension: • excellent , your product is in the 10% best product range, • good , 10% of the best placed are higher and 75% of the productsare rated worse than your product, • above average , 25% are better and 50% worse than your product, • below average , 50% are better and 25% worse than your product, • bad , only 25% of the products are worse than your product.For UEQ data analysis we have also created our own R package containing functionssuitable for our analysis. It can be found with all data, analysis and explanationsin corresponding Jupyter notebooks at our GitHub repository (Hanˇc et al, 2020b). Results and discussion

Initial Digital experience.

The longitudinal summary results of the Digital Expe-rience Questionnaire data from 2017-2020 are depicted by Fig. 5 [11] . We can seethat all students have access to several digital devices. Every student has their ownsmartphone (mainly Android). At home he can operate with a notebook (almost100%; 80% have own one) or at least with a PC. The dominant operating systemis Win10. A small part of students can use or bring a tablet. From the educationalviewpoint, teaching and learning in secondary school math still rely on classicalmeans (calculators, paper, chalk, blackboard). During a secondary school math in-struction, the half of students met PC-oriented technology (computers, notebooks,projectors, interactive whiteboard IWB), only 1/3 of them used web-based technol-ogy (LMS, email, videos or simulations) and a small number (15%) touch-sensitivetechnology (smartphones and tablets).

User Experience Questionnaire.

From the original total N = 33 students 15% didnot ﬁnish the ﬁrst semester at their original bachelor study program. Speciﬁcally,5 females: 2 (2017, low ach., Physics-Biology), 2 (2018, low ach., Physics-Biology, [11] Unlike other ﬁgures in the paper, the total percentages in the plots do not add up to 100% due to possible multiple student responses. Responses displayed by each bar were consideredindependently with respect to the total possible number N = 33. anˇc et al. Page 17 of 32

Figure 5

Pre-service teacher’s digital experience: own digital devices they can bring and use inthe course, what in home preparation and what digital technology experienced in secondary schoolmath education. Stacked bar plots were created in

Scientiﬁc Python (data analysis package pandas ). Physics-Computers), 1 (2018, high ach., Math-Physics) transferred to another studyﬁeld (1 from Math-Physics to Math) or left our university (remaining 4).Therefore the UEQ data contains answers from N ∗ = 28 students (14 females, 14males). Due to department politics, we always interview each student why he/sheis leaving our study program and nobody saw our course as the key reason. Typicalreasons were diﬀerent expectations or too diﬃcult introductory physics.The ﬁrst viewpoint of participants’ perception represented by our three extraquestions to UEQ (see graphical summary in Fig. 6) show positive personal reﬂec-tions of students in the meaning of Jupyter for their learning. Explicitly, more than70% agreed that Jupyter Notebooks in our educational approach helped them alot in understanding what they have learned. Overwhelming 90% reported greatimprovement in the used digital technology. Almost 80% see a high potential ofJupyter use in other subjects. The longitudinal distributions of students’ answers anˇc et al. Page 18 of 32 are qualitatively similar, although in understanding and improvement plots 2019-year modus is closer to the neutral level in comparison with previous years modi(2017, 2018).

Figure 6

Pre-service teacher’s perception of technology role in understanding the learningcontent, in an improvement of own digital skills and in other subjects (possible transfer).

The collected UEQ data, capturing students’ impression, feelings and aﬀectiveattitudes, led to the following benchmark results and interpretation summarized byTable 4. The means of dimensions are also displayed in Fig. 7 (solid line all ) whichpresents the UEQ benchmark results with respect to four factors – year, gender,study program and achievement.

Table 4

The UEQ benchmark results – average (M), standard error (SE) and interpretation.dimension M (SE) benchmark InterpretationAttractiveness 0.92 (0.19) below average 50% better, 25% worsePerspicuity 0.78 (0.16) below average 50% better, 25% worseEﬃciency 1.47 (0.17) good 10% better, 75% worseDependability 1.24 (0.16) above average 25% better, 50% worseStimulation 1.08 (0.20) above average 25% better, 50% worseNovelty 1.43 (0.15) excellent range of the best 10%

These results demonstrate that no aﬀective aspect of Jupyter is bad. Jupytertechnology leaves the best personal impression in motivational aspect Novelty. Thesecond most valuable for students is eﬃcient and quick work with Jupyter. On thecontrary, Perspicuity, the usability aspect how easy to learn Jupyter, appears as adimension with the weakest personal aﬀective attitude. Very similar below averageimpression is connected to overall attractiveness of Jupyter as the digital product.Here it is worth to say that we aggregated all non-mathematical combinationsto one level called shortly

Phys . This step is dictated not only by small numbersof students in Physics-Chemistry, Physics-Computers, Physics-Geography, but it isjustiﬁed also conceptually since all non-math major programs have practically thesame math and physics education, but diﬀerent from the Math-Physics programwith much wider and rigorous math education. anˇc et al.

Page 19 of 32

Figure 7

The UEQ benchmark plots with respect to four factors – year, gender, study programand achievement. Plots were created in R (own UEQ package).

Multiway ANOVA [12] was used to test the signiﬁcance of all four mentioned fac-tors. As the graphical representation in Fig. 7 indicates, ANOVA conﬁrmed the sta-tistical signiﬁcance of factor “achievement” at Attractiveness ( p = 0 . η = 0 . p = 0 . η = 0 .

142 – medium eﬀect size), Stim-ulation ( p = 0 . η = 0 .

135 – medium eﬀect size) and Novelty ( p = 0 . η = 0 .

159 – large eﬀect size). Factor “gender” has a signiﬁcant large eﬀect atNovelty ( p = 0 . η = 0 . α = 0 .

90 for the wholequestionnaire and with respect to given dimensions alphas were between 0 .

55 and0 .

84. The only dimension with questionable alpha was Dependability ( α = 0 . α > . Discussion.

Our results, based on the questionnaires and our teaching observa-tions, show that our future physics teachers as incoming secondary school studentsdid not experience any digital technology as a key active learning tool in secondaryschool math education. Moreover, only a small proportion of them have requireddigital skills enhancing an eﬀective and successful university study, especially in themode of Flipped Learning. These ﬁndings explain why 90% of students reportedfeelings of the great improvement in digital technology after our course.Together with another implication from Study I, that one-semester long mathe-matical course with new technology can be refused by students, results also tell usthat we have to be very careful in any digital innovation. Improper instructionalstudents’ expectations and motivation with wrong meta-cognitive models about [12]

As for the used inferential statistical analysis of the UEQ data, where each student evaluation isrepresented by six dimensional vector, MANOVA appears as a more suitable and powerful method.However we could not to apply it due to the insuﬃcient size of our sample. anˇc et al.

Page 20 of 32 the role of learning tools (digital technology in our case) can be one of the decisivefactors of low students’ learning gains or failure (Hattie, 2015, 2009).To avoid such situation and change students’ expectations in the right directionas much as possible, we realized several supporting steps from the beginning of ourresearch which we believe became important determiners of our successful inno-vations. We prepared a new version of our supplementary supporting course called

Students’ digital literacy . The original goal of the course was to provide the suﬃcientlevel of digital literacy in today’s modern technologies (smartphone, tablet, socialmedia via LMS, online web-technologies like Google drive) for better and moreeﬀective learning, active life in higher education and later professional activity.From 2017, regarding all university students – future science teachers includinggiven physics teachers, we incorporated in the course a special module for workand training with Jupyter and Geogebra. Since Jupyter does not have a specialintuitive menu like point-and-click interfaces, one of the digital literacy goals alsobecame developing very eﬀective searching and working skills how to use practicalcheat-sheets for Jupyter and its kernels which we collected and prepared in Slovak.Although creators and long-term Jupyter users feel Jupyter technology as simpleas email or web (Project Jupyter et al, 2018), our results suggest that there is aneed to devote special attention to low achievers for whom Jupyter can be one ofother unattractive obstacles in their study. Our interaction with such students ledus to the ﬁnding that this is probably a problem of expectations. Low achievers havetypically a negative attitude to math and automatically transmit it to everythingwhich is connected with math.Concerning technological aspects, before 2017, our digital learning tools Max-ima and Geogebra were treated by us and students as two independent tools. Nowwe have integrated SageMath with embedded Geogebra simulations and Jupyterwidgets apps (in SageMath called Interacts) in Jupyter Notebooks as a unifying in-teractive multimedia environment [13] . Interesting experience of our students showedthat in the case of suﬃciently large screens (at least 5 inches – such smartphonesare also called phablets) interactive Jupyter notebooks can be actively read andwatch as study materials in home preparation.Finally, as for generalizations, this intervention single-case study without controlgroup has again limits in external validity of results. However, our comparison ofplacement physics and math tests’ results from our future physics teachers withresults from secondary schools’ population (see our third research study) suggestthat our conclusion should represent upper limit what can be achieved.To complete our study report, we would like to mention that in our previous pub-lication (ˇStrauch and Hanˇc, 2017) we showed high eﬀectiveness of our ﬂipped mathcourse in the cognitive learning dimension, especially in the conceptual understand-ing of taught math concepts. From the content viewpoint, the form of course wasalso principal. It provided form, tools and time for us and for students to get clearand transparent ideas about those math concepts which in such an early coursewith the traditional instructional model were considered as impossible to expose,understand or learn. [13]

Technologically we run Jupyter with SageMath in three complementary modes: Binder for in-dividual explorations, CoCalc for cooperative activities and SageMathCell (sagecell.sagemath.org)for very quick calculations. At home students usually install a Windows local version of the Jupyterwith SageMath kernel. anˇc et al.

Page 21 of 32

Study 3: In-service physics teachers

Background and context

The third parallel study, one of the main fresh, unpublished results of Ph.D. re-search of PS, deals with in-service physics teachers and their perception of themeaning of data and data science tools. Generally, as for in-service teachers, thereis a substantial body of research papers how quantitative and qualitative data canfoster and improve eﬀective teacher decisions and behavior (see the special issue,Vol. 60, November 2016 in journal Teaching and Teacher Education, esp. Lai andMcNaughton, 2016; Mandinach and Gummer, 2016; Poortman et al, 2016). Simulta-neously, data use belongs to the common and key characteristics of high-performingschools (Ebbeler et al, 2016).Data literacy for teaching, the ability to transform, collect, control and understandall types of educational data for better instruction, becomes essential. Every teachershould be also an action researcher armed at least by basic level of methodologyknowledge and data literacy in his work for lifelong learning (Johnson and Chris-tensen, 2016). Such action research will lead not only to a better understandingand improvement of own teaching practice, but above all to the greater satisfactionfrom work and a positive attitude to be better at their own profession.Therefore the central goal of the Ph.D. thesis was to examine, verify and providesuch data science tools, Jupyter as our ﬁrst choice, for in-service teachers in realpractice which would help them easily and eﬀectively to collect, process and usetheir own educational data. Speciﬁcally, we focused on data about students’ pre-conceptions, misconceptions and mental models in physics understanding as one ofthe decisive factors in the quality of physics teaching (Fraser et al, 2014; Redish,2014).

Participants

Participants were a random sample of n = 40 in-service physics secondary schoolteachers. Random sampling was a part of our national cross-sectional survey (May2018-June 2018, 919 students) dealing with student conceptual understanding inmechanics with a stratiﬁed two-sample cluster sampling design according to an ap-plied algorithm used in TIMSS (PISA) surveys (see a scheme in Fig. 8). Particularly,we took a random sample from all eligible Slovak grammar schools ( N = 284), from H = 3 layers (strata), natural geographical Slovak regions – western, central andeastern. From collected students’ data we knew that all teachers successfully usedour data science tool, however, online questionnaire feedback provided only n ∗ = 33teachers (83% response rate). The left middle histogram in Fig. 10 in the result sec-tion shows demographic information about the sample of teachers (9 males and 24females). Design and methods

Pilot study.

We realized a pilot study in using Jupyter on a few small samples ofour active physics teachers during physics teacher clubs. It showed negative feed-back. Today, many requirements and complexity of teaching as a profession oftenlead to a very busy teacher, so in-service teachers are usually poorly concentrated onany longest adaptation of new technology also in the case of a simple one. Therefore,we left Jupyter and tried to ﬁnd even a simpler data science tool. anˇc et al.

Page 22 of 32

Figure 8

A scheme of our complex survey, a stratiﬁed two-stage cluster sampling design,according to our applied version of the TIMSS (PISA) algorithm (LaRoche et al, 2016, pp.3.1-3.37). We took independently for each stratum a random probability proportional (PPS)sample of schools and simple random sampling (SRS) of classes at each chosen school.

Data science tool: R Shiny.

One of the revolutionary web presentation tools ofdata science created in the R language is the R Shiny library (Chang et al, 2019),which was released in 2012 by RStudio team (shiny.rstudio.com). In open data andanalysis, Shiny became the gold standard in interactive web presentation of dataand statistics analyzes [14] . Therefore using R Shiny library, we created own tailor-made web application which helps a teacher automatically collect his educationaldata from a Google form and transform it into interactive web presentation show-ing statistical analysis, graphical and numerical summaries with interpretation. Ascreenshot from this web app [15] , with a very intuitive point-and-click interface re-quiring no setup and zero adaptation learning time, is shown in Fig. 9.

Data collection and analysis.

According to our instructions, during June 2018teachers themselves own collected educational data from their physics classes viaan online Google form oﬀered by us in school PC rooms and subsequently on owndevice interactively “played” with data in our R shiny app. After that teachers ﬁlledout our semi-closed self-developed attitude questionnaire (two items deal with theevaluation of R Shiny app, other ones are not directly relevant to this study, see theAppendix), and the UEQ evaluating our Shine web app. In statistical analysis weapplied the same methods as in Study II whereas in UEQ analysis to get unbiasedestimations, we incorporated calculations with weights according to the followinggeneral formula (Lohr, 2009): [14]

Shiny web apps have found application in many areas from statistics, through education, health-care, genetics, pharmacy, geography, agriculture, or even economics, marketing or tourism, see e.g.the oﬃcial New Zealand tourism webpage ( https://mbienz.shinyapps.io/tourism_dashboard_prod ). [15] The current version of our Shiny web app (now running only as Slovak after some extension)can be found at https://odfufv.shinyapps.io/hodnotiaci-nastroj/ . It can be freely exploredvia a host code

CrbVUA . Using Google translate, Google chrome extension, almost all parts ofthe web app are translated into English correctly. anˇc et al.

Page 23 of 32 ¯ S = H (cid:88) h =1 (cid:88) i ∈ S h (cid:88) j ∈ S hi w ∗ hi · S hijH (cid:88) h =1 (cid:88) i ∈ S hi (cid:88) j ∈ S hij w ∗ hi , w ∗ hi ∼ n ∗ h · U h c ∗ hi ∼ n ∗ h · M h c ∗ hi (1)where S hij is a response S (or its measure) from teacher j from school i in stratum h ( h = 1, 2, 3) and w hi is the corresponding weight given by stratum properties: n ∗ h – real number of participating teachers in the stratum, c ∗ hi – real number of par-ticipating classes from school i . U h is the current total number of teachers and M h current total number of students in given stratum. All these numbers are availablein data ﬁles at our GitHub storage. Figure 9

A screenshot of our R shiny web app for open educational data analysis and visualizationtested by in-service physics teachers, randomly chosen from all eligible Slovak grammar schools.

Results and discussion

The main results of our cross-sectional study connected to sample demographics,UEQ data for R Shiny app and attitude questionnaire data are summarized graphi-cally in Fig. 10. It contains a UEQ benchmark plot containing original and weighteddata (upper left plot), a bar chart with M and SE for all six UEQ dimensions (upperright plot). In the middle you can see demographics and a heat map of aﬀectiveUEQ perceptions with respect to teacher’s years of practice. The bottom left heat anˇc et al.

Page 24 of 32 map displays teachers’ attitude to use our web app in the future during own in-struction and the bottom right heat map teacher’s attitude to the next extensionof the web app.

Figure 10

Graphical visualization of teachers online feedback to our R shiny web application as anecessary replacement of Jupyter. The left middle histogram shows demographic informationabout the sample of in-service teachers ( n ∗ = 33 , 9 males and 24 females). The UEQ benchmark shows that statistically correcting weights led to almostidentical results – diﬀerences between weighted and unweighted values ranged from0.001 to 0.054. The most positive perception was expressed to Novelty (originality)of our digital product as good, which means that only 10% of products are better,75% are worse. Attractiveness, Perspicuity and Stimulation were perceived andrated as above average (50% other digital apps are worse and only 25% have abetter impression). Only Dependability (conﬁdence to work or control the product)with the weakest impression is on the border between above and below average.Cronbach’s alphas were 0 .

96 for the whole questionnaire and between 0 .

75 and 0 . α > . anˇc et al. Page 25 of 32

As for willingness to use the app again in the feature, there was no negativeattitude and more than 80% want to use it again (see Tab. 5). In the case of thefuture extension of our app to more tests and more subjects situation is even morepositive and more than 90%. According to the corresponding heat map more positiveattitudes are expressed by more experienced teachers.

Table 5

The teacher’s attitudes to future use of our app and its extending.question strongly agree agree neutralfuture app use 45% 36% 18%next app extension 58% 36% 6%

Discussion.

Speaking about Jupyter failure in our case, later we realized that thisﬁnding is probably connected to the general implication (Mandinach, 2016) that inthe case of data literacy teachers must start with continuous lifelong learning asteacher candidates. For teachers that are already in practice it is usually too late.Also similar experience leading to help physics teachers is presented in PhysPortrecommendations (McKagan et al, 2016, 2019).Very positive acceptation of R Shiny web app among our in-service teachers isreally promising. Since Shiny app provide also overall summaries from all teachersusing the application, a teacher can compare results of his students with otherswhich appears to be one of the big motivational beneﬁts. Considering R Shiny notonly as an open data tool, but also as a learning tool, we noticed many papersreporting a positive experience and educational potential of its application (e.g.Doi et al, 2016; Fawcett, 2018). However, open attitude questionnaire commentsof teachers and our interaction with teachers signalize problems in the clear anddeeper conceptual understanding of data visualization and its statistical interpre-tation (statistical literacy overlapping with data literacy, Gould (2017)).

Conclusions

Key ﬁndings

Open source digital data science tools over the past 10 years, especially Jupyterand R, have provided us with revolutionary, free, highly accessible ways to collect,store, process, or share data. In light of these changes, not only science, economicsbut also education is changing.Our theoretical analysis and research pointed out that there is a vast body ofpublications dealing with Jupyter application as a web-based interactive educa-tional digital tool in diﬀerent types of instruction intervention. Our pedagogicalreview also demonstrates the unifying nature of Jupyter technology. Jupyter note-books interactively combine all forms of information together as one multimediatool. Jupyter can also combine diﬀerent educational tools under one hood provid-ing important means for current modern interactive teaching approaches (PI, QDI,IBSE, FL), e.g. immediate feedback via e-voting (Jupyter activity extension), virtualactive experimentation (Jupyter widget and embedding functionality) or interactivevideo-lessons via screencasting (Jupyter Graﬃti extension).Empirical educational research dealing with Jupyter implementation only recentlystarted to apply general principles of the cognitive psychological theory of multi-media and online learning which themselves need the stronger incorporation of theaﬀective and motivational dimension of learning. anˇc et al.

Page 26 of 32

In this paper we presented our three empirical studies combined in the convergentdesign of the three-years long complex mixed-method research whose goal was toexamine aﬀective aspects and perception of Jupyter implementation during higherand after higher (lifelong) education, in three groups of physics teachers: PER can-didates and research, future pre-service teachers and current in-service teachers.Using self-developed qualitative and self-developed and standardized quantitativediagnostic tools, we uncovered the ﬁndings not only consistent with other, closelyconnected studies, but also highlighting speciﬁc, additional implications resultingfrom Jupyter application as a learning tool. Here we like to draw attention to thestandardized, reliable quantitative diagnostic instrument called User ExperienceQuestionnaire (UEQ) (Laugwitz et al, 2008) which is widely used to measure asubjective, overall impression and satisfaction of user, “customer” experience withany digital product. Our contribution is own, free available R implementation ofUEQ analysis extended by the possibility to use weighted data.Our results of Study I and II indicate that young beginning PER researchers andpre-service physics teachers can master key digital skills for work with Jupyter tech-nology, positively appreciating its big impact on their learning, data and statisticalliteracy or professional development. These results support the ongoing worldwideeﬀort to implement Jupyter in traditional education as a promising free open-sourceinteractive learning tools to foster learning process, especially for young generation.Despite the fact that open-source Jupyter notebooks are natural and easy as emailor web, results of Study III suggest that in-service teachers are not prepared forJupyter technology as a tool supporting their more eﬀective work and instructionalsteps. This ﬁnding is in accordance with the general conclusion dealing with gettingdata literacy for teaching. However, we found that they can work and very posi-tively accept open education data presented via another open-source data sciencetool, self-developed R Shiny interactive web application, as an important form ofimmediate feedback and learning about the quality of their instruction. Youngerteachers appreciate aﬀective aspects of R Shiny web app and older immediate datafeedback.

Implications for practice and further research

Here we highlight several important ﬁndings to avoid failures. Overcoming possiblebarriers in adapting data science tools like Jupyter with the appropriate kernel (R orPython) consists in using today free cloud services with zero setup at the beginningand apply just-in-time local installations of technology, ending with the mode ofworkﬂow typical for a current data scientist. Concerning technological installationaspects, a very eﬀective and time-saving Jupyter implementation is in a virtualmachine Linux running on Win10. Moreover, it seems that Jupyter features likeeasy to share, reproducibility, transparency and ﬂexibility can also help a beginnerto master advanced research methods of data collection, processing, analysis andvisualization, e.g. time series visualization or cluster analysis. By graphical andstatistical results our paper demonstrates what can be done in this direction.From the educational viewpoint, a one-time or short adaptation of Jupyter can beunsuccessful. We recommend to pay special attention to low achievers and femaleswho under wrong expectations can perceive Jupyter technology as more diﬃcult anˇc et al.

Page 27 of 32 than in reality is. Such special attention can be realized e.g. via an additional dig-ital literacy course supporting eﬀective learning with Jupyter and changing wrongexpectations in the right direction.In connection with R Shiny and the mentioned European open science cloud ser-vice, we see not only educational researchers, university students, but also everymodern teacher as an active and eﬀective participant in using open data, technolo-gies and e-infrastructure mediated by the EOSC on daily basis. Speaking aboutdata and digital literacy skills of teachers and the data meaning for eﬀective teach-ing (Van den Hurk et al, 2016; Mandinach, 2016; Mandinach and Gummer, 2016;Ndukwe and Daniel, 2020; Sergis et al, 2019), EOSC will allow to process their owneducational data or ﬁnd one from all publicly available research in Europe in a fewclicks, combine them in new, sophisticated ways without a signiﬁcant loss of time,without expensive or special hardware, operating in any modern Internet browserwithout the need for installation or complex administration.Finally it is necessary to say that results of our research (all three studies wereintervention single-cased or observational) cannot be fully generalized and mustbe conﬁrmed or disproved by future wider experimental quantitative research. Onother hand, we have to say that educational research in this ﬁeld is still in its infancyso there is still an urgent need to collect and summarize any available knowledgeand experience which can be later important in realizing more valid and generalexperimental studies.

Abbreviations

STEM: science, technology, engineering and mathematics; STEAM: science, technology, engineering, arts andmathematics; MOOC: massive open online courses; IBSE: inquiry based science education; PI: peer instruction;QDI: question driven instruction; FL: ﬂipped learning; PER: physics education research; SPSS: Statistical Packagefor the Social Sciences; RSRP: Real Statistics Resource Pack; AAPT: The American Association of PhysicsTeachers; VM: virtual machine software; UEQ: User Experience Questionnaire; PPS: random probabilityproportional sampling; SRS: simple random sampling.

Acknowledgments

The authors would like to thank Andrej Gajdoˇs for his advice in using R in advanced multivariate statistical analysis.In connection with applications of cluster analysis in physics education research our special thanks go to R. PadraicSpringuel and Onofrio R. Battaglia for communication dealing with their own cluster analysis code.

Funding

This work was supported by the Slovak Research and Development Agency under the contract No. APVV-17-0568.

Author details Institute of Physics, Faculty of Science, Pavol Jozef ˇSaf´arik University, Koˇsice, Slovakia. Institute ofMathematics, Faculty of Science, Pavol Jozef ˇSaf´arik University, Koˇsice, Slovakia.

References

Addinsoft (2020) XLSTAT | Statistical Software for Excel. URL

Angerer P, Kluyver T, Schulz J, abielr, Sa DFd, Hester J, karldw, Foster D, Sievert C (2020) repr: SerializableRepresentations. URL https://CRAN.R-project.org/package=repr

Bao L, Redish EF (2001) Concentration analysis: A quantitative assessment of student states. Am J Phys69(7):45–53,Barba LA, Barker LJ, Blank DS, Brown J, Downey AB, George T, Heagy LJ, Mandli KT, Moore JK, Lippert D,Niemeyer KE, Watkins RR, West RH, Wickes E, Willing C, Zingale M (2019) Teaching and Learning withJupyter. GitHub, Creative Commons Attribution CC-BY 4.0 International license, URL https://jupyter4edu.github.io/jupyter-edu-book/

Battaglia OR, Di Paola B, Fazio C (2019) Unsupervised quantitative methods to analyze student reasoning lines:Theoretical aspects and examples. Phys Rev Phys Educ Res 15(2):020,112,Baumer BS, Kaplan DT, Horton NJ (2017) Modern Data Science with R, 1st edn. Chapman and Hall/CRC, BocaRatonBeatty ID, Gerace WJ (2009) Technology-Enhanced Formative Assessment: A Research-Based Pedagogy forTeaching Science with Classroom Response Technology. Journal of Science Education and Technology18(2):146–162, anˇc et al.

Page 28 of 32

Beatty ID, Gerace WJ, Leonard WJ, Dufresne RJ (2006) Question driven instruction: Teaching science (well) withan audience response system, chap. 7. In: Banks DA (ed) Audience Response Systems in Higher Education:Applications and Cases, Information Science Publishing, London, pp 96–115Beezer RA, Bradshaw R, Grout J, Stein WA (2013) Sage. In: Hogben L (ed) Handbook of Linear Algebra, 2nd edn,Chapman and Hall/CRC, Boca Raton, pp 91–1–91–26Bergmann J, Sams A (2012) Flip Your Classroom: Reach Every Student in Every Class Every Day. InternationalSociety for Technology in Education, New YorkBlank DS, Silvester S (2020) Calysto/metakernel. URL https://github.com/Calysto/metakernel , original-date:2014-08-24T20:53:45ZBoaler J (2020) Data Science Initiative Video. URL

Boisvert D, Born K, Bowen M, Gould R, Kapaun R, Konold C, Lavista Ferres JM, Merchant O, Shaﬀner A,Whitney H, Williams M (2016) Building Global Interest in Data Literacy: A Dialogue. Workshop report, Oceansof Data Institute, Education Development Center, Inc., Waltham, URL http://oceansofdata.org/our-work/building-global-interest-data-literacy-dialogue-workshop-report https://CRAN.R-project.org/package=shiny

Constantinou CP, Tsivitanidou OE, Rybska E (2018) What Is Inquiry-Based Science Teaching and Learning? In:Tsivitanidou OE, Gray P, Rybska E, Louca L, Constantinou CP (eds) Professional Development for Inquiry-BasedScience Teaching and Learning, Contributions from Science Education Research, Springer InternationalPublishing, Cham, pp 1–23,Creswell JW, Clark VLP (2017) Designing and Conducting Mixed Methods Research, 3rd edn. SAGE Publications,Inc, LondonCrouch CH, Mazur E (2001) Peer Instruction: Ten years of experience and results. American Journal of Physics69(9):970–977,Data School (2019) Six easy ways to run your Jupyter Notebook in the cloud. URL

Ding L, Beichner R (2009) Approaches to data analysis of multiple-choice questions. Phys Rev ST Phys Educ Res5(2):020,103.1–020,103.17,Doi J, Potter G, Wong J, Alcaraz I, Chi P (2016) Web Application Teaching Tools for Statistics Using R and Shiny.Technology Innovations in Statistics Education 9(1), URL https://escholarship.org/uc/item/00d4q8cp

Downes S (2019) A Look at the Future of Open Educational Resources. International Journal of Open EducationalResources 1(2):33–49,Ebbeler J, Poortman CL, Schildkamp K, Pieters JM (2016) Eﬀects of a data use intervention on educators’ use ofknowledge and skills. Studies in Educational Evaluation 48:19–31,EDC Oceans of Data Institute (2015) Building Global Interest in Data Literacy: A Dialogue | Oceans of Data. URL http://oceansofdata.org/projects/building-global-interest-data-literacy-dialogue

EMC Education Services (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing andPresenting Data, 1st edn. Wiley, IndianapolisFawcett L (2018) Using Interactive Shiny Applications to Facilitate Research-Informed Learning and Teaching.Journal of Statistics Education 26(1):2–16Fletcher TD (2010) psychometric: Applied Psychometric Theory. URL https://CRAN.R-project.org/package=psychometric

Fox J (2016) Using the R Commander: A Point-and-Click Interface for R, 1st edn. Chapman and Hall/CRC, MiltonFox J, Bouchet-Valat M, Andronic L, Ash M, Boye T, Calza S, Chang A, Grosjean P, Heiberger R, Pour KK, KernsGJ, Lancelot R, Lesnoﬀ M, Ligges U, Messad S, Maechler M, Muenchen R, Murdoch D, Neuwirth E, Putler D,Ripley B, Ristic M, Wolf P, Wright K (2020a) Rcmdr: R Commander. URL https://CRAN.R-project.org/package=Rcmdr

Fox J, Muenchen R, Putler D (2020b) RcmdrMisc: R Commander Miscellaneous Functions. URL https://CRAN.R-project.org/package=RcmdrMisc

Fraser JM, Timan AL, Miller K, Dowd JE, Tucker L, Mazur E (2014) Teaching and physics education research:bridging the gap. Reports on Progress in Physics 77(3):032,401,Frederickson B (2019) Ranking Programming Languages by GitHub Users. URL

Freeman S, Eddy SL, McDonough M, Smith MK, Okoroafor N, Jordt H, Wenderoth MP (2014) Active learningincreases student performance in science, engineering, and mathematics. PNAS 111(23):8410–8415,GitHub, Inc (2020) GitHub Milestones. URL https://github.com/about/milestones

Gould R (2017) Data Literacy is Statistical Literacy. Statistics Education Research Journal 16(1):22–25Hake RR (1998) Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanicstest data for introductory physics courses. American Journal of Physics 66(1):64–74,Hall J, Lingefj¨ard T (2016) Mathematical Modeling: Applications with GeoGebra. John Wiley & Sons, Hoboken anˇc et al.

Page 29 of 32

Hanˇc J (2013) Application of the ﬂipped classroom model in science and math education in Slovakia. In: HSCI2013: Proceedings of the 10th International conference on Hands-on Science (1st-5th July 2013, Koˇsice), P.J.ˇSaf´arik University, Koˇsice, Slovakia, pp 229–234Hanˇc J (2016) What is going on in Slovakia? Current trends and ﬂipped learning. In: Santiago R (ed) Actas del IICongreso de Flipped Classroom: Comunicaciones y posters presentados, Edita MT Servicios Educativos,Zaragoza, Spain, pp 328–344Hanˇc J, Luk´aˇc S, Seker´ak J, ˇSveda D (2011) Geogebra — A complex digital tool for highly eﬀective math andscience teaching. In: 2011 9th International Conference on Emerging eLearning Technologies and Applications(ICETA), Elfa, Koˇsice, pp 131–136,Hanˇc J, ˇStrauch P, Hanˇcov´a M (2020a) JupyterPer. URL https://github.com/JupyterPER

Hanˇc J, ˇStrauch P, Hanˇcov´a M (2020b) JupyterPER: Open-Education-Science. URL https://github.com/JupyterPER/Open-Education-Science

Hanˇcov´a M, Gajdoˇs A, Hanˇc J, Voz´arikov´a G (2020) Estimating variances in time series kriging using convexoptimization and empirical BLUPs. Stat Papers , URL https://doi.org/10.1007/s00362-020-01165-5

Harrell Jr FE, et al (2020) Hmisc: Harrell Miscellaneous. URL https://CRAN.R-project.org/package=Hmisc

Hattie J (2015) The applicability of Visible Learning to higher education. Scholarship of Teaching and Learning inPsychology 1(1):79–91,Hattie JaC (2009) Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge,LondonHaver WE (ed) (1999) Calculus: Catalyzing a National Community for Reform : Awards 1987-1995. TheMathematical Association of AmericaHeering P, Grap´ı P, Bruneau O (eds) (2012) Innovative Methods for Science Education: History of Science, ICT andInquiry Based Science Teaching. Frank & Timme GmbH, BerlinHeiberger RM, Neuwirth E (2009) R Through Excel: A Spreadsheet Interface for Statistics, Data Analysis, andGraphics. Springer, New YorkHohenwarter M, Borcherds M, Ancsin G, Bencze B, Blossier M, ´Eli´as J, Frank K, G´al L, Hofst¨atter A, Jordan F,Koneˇcn´y Z, Kov´acs Z, Lettner E, Lizelfelner S, Parisse B, Solyom-Gecse C, Tomaschko M, Kuellinger W,Karacsony B (2018) GeoGebra - Dynamic Mathematics for Everyone, ver. 6.0.507.0. URL

Hughes-Hallett D, McCallum WG, Gleason AM, Flath DE, Lock PF, Gordon SP, Lomen DO, Lovelock D, OsgoodBG, Pasquale A, Quinney D, Tecosky-Feldman J, Thrash J, Rhea KR, Tucker TW (2016) Calculus: Single andMultivariable, 7th edn. Wiley, New YorkHunter JD (2007) Matplotlib: A 2D Graphics Environment. Comput Sci Eng 9(3):90–95,Van den Hurk H, Houtveen A, Van de Grift W (2016) Fostering eﬀective teaching behavior through the use ofdata-feedback. Teaching and Teacher Education 60:444–451,IBM Corp (2015) IBM SPSS Statistics for Windows, version 23.0Ireson G (1999) A multivariate analysis of undergraduate physics students’ conceptions of quantum phenomena. EurJ Phys 20(3):193–199,Johnson RB, Christensen L (2016) Educational Research: Quantitative, Qualitative, and Mixed Approaches, 6thedn. SAGE Publications, LondonJones E, Oliphant T, Peterson P, others (2001) SciPy: Open source scientiﬁc tools for Python. URL kaggle (2020) Kaggle’s State of Data Science and Machine Learning 2019, Enterprise Executive Summary. URL https://CRAN.R-project.org/package=factoextra

Kessler W (2020) jupytergraﬃti: Create interactive screencasts inside Jupyter Notebook that anybody can playback. URL https://github.com/willkessler/jupytergraffiti

Khine MS, Areepattamannil S (2019) STEAM Education: Theory and Practice. Springer, ChamKluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S,Ivanov P, Avila D, Abdalla S, Willing C (2016) Jupyter Notebooks-a publishing format for reproduciblecomputational workﬂows. In: Loizides F, Schmidt B (eds) Positioning and Power in Academic Publishing:Players, Agents and Agendas. Proceedings of the 20th International Conference on Electronic Publishing., IosPress, Amsterdam, pp 87 – 90Koehler JF, Kim S (2018) Interactive Classrooms with Jupyter and Python. The Mathematics Teacher111(4):304–308Kueppers B (2017) From Latex to Jupyter: Converting Traditional to Modern. In: Chova LG, Martinez AL, TorresIC (eds) Inted2017: 11th International Technology, Education and Development Conference, Iated-Int AssocTechnology Education & Development, Valencia, pp 1592–1596Lai M, McNaughton S (2016) The impact of data use professional development on student achievement. Teachingand Teacher Education 60:434–443,LaRoche S, Joncas M, Foy P (2016) Sample Design in TIMSS 2015. In: Martin MO, Mullis IV, Hooper M (eds)Methods and Procedures in TIMSS 2015, Boston College, TIMSS & PIRLS International Study Center, pp3.1–3.37Laugwitz B, Held T, Schrepp M (2008) Construction and Evaluation of a User Experience Questionnaire. In:Holzinger A (ed) HCI and Usability for Education and Work, proceedings, no. 5298 in Lecture Notes inComputer Science, Springer Berlin Heidelberg, pp 63–76Lohr SL (2009) Sampling: Design and Analysis. Cengage Learning, BostonL¨udecke D (2020) sjstats: Collection of Convenient Functions for Common Statistical Computations. URL https://CRAN.R-project.org/package=sjstats anˇc et al.

Page 30 of 32

Maechler M, original) PRF, original) ASS, original) MHS, Hornik [trl K, maintenance(1999 2000)) cptR, Studer M,Roudier P, Gonzalez J, Kozlowski K, pam()) ESfof, Murphy (volumeellipsoid( { d > = 3 } )) K (2019) cluster:”Finding Groups in Data”: Cluster Analysis Extended Rousseeuw et al. URL https://CRAN.R-project.org/package=cluster Mandinach EB (2016) Teachers learning how to use data: A synthesis of the issues and what is known. Teachingand Teacher Education p 6Mandinach EB, Gummer ES (2016) What does it mean for teachers to be data literate: Laying out the skills,knowledge, and dispositions. Teaching and Teacher Education 60:366–376,Marr B (2018) How much data do we create every day? The mind-blowing stats everyone should read. URL

Maxwell SE, Delaney HD, Kelley K (2017) Designing Experiments and Analyzing Data: A Model ComparisonPerspective, 3rd edn. RoutledgeMayer RE (2008) Applying the science of learning: Evidence-based principles for the design of multimediainstruction. Am Psychol 63(8):760–769Mayer RE (ed) (2014a) The Cambridge Handbook of Multimedia Learning, 2nd edn. Cambridge University Press,CambridgeMayer RE (2014b) Cognitive Theory of Multimedia Learning, chap. 3. In: Mayer RE (ed) The Cambridge Handbookof Multimedia Learning, 2nd edn, Cambridge University Press, Cambridge, pp 67–108Mayer RE (2019) Thirty years of research on online learning. Applied Cognitive Psychology 33(2):152–159,Mazur E, Watkins J (2009) Just-in-Time Teaching and Peer Instruction, chap. 3. In: Simkins S, Maier M (eds) Justin Time Teaching: Across the Disciplines, and Across the Academy, Stylus Publishing, Sterling, pp 39–62McKagan S, Madsen A, Barbato L, Mason B, Sayre E, Cunningham B, Hilborn B, Riggsbee M, Martinuk S, Bell A(2016) Physport: Supporting physics teaching with research-based resources. URL

McKagan SB, Strubbe LE, Barbato LJ, Madsen AM, Sayre EC, Mason BA (2019) PhysPort use and growth:Supporting physics teaching with research-based resources since 2011. arXiv:190503745 [physics] ArXiv:1905.03745McKiernan EC (2017) Imagining the “open” university: Sharing scholarship to improve research and education.PLoS Biol 15(10):e1002,614,McKinney W (2010) Data Structures for Statistical Computing in Python. In: Walt Svd, Millman J (eds)Proceedings of the 9th Python in Science Conference, pp 51 – 56Ndukwe IG, Daniel BK (2020) Teaching analytics, value and tools for teacher data literacy: a systematic andtripartite approach. Int J Educ Technol High Educ 17(1):22,Nouri J (2016) The ﬂipped classroom: for active, eﬀective and increased learning – especially for low achievers. Int JEduc Technol High Educ 13(1):33,Odden TOB, Caballero MD (2019) Computational Essays: An Avenue for Scientiﬁc Creativity in Physics. In: Cao Y,Wolf S, Bennet M (eds) 2019 Physics Education Research Conference Proceedings [Provo, UT, July 27-25, 2019,, available also as arXiv: 1909.12697Odden TOB, Lockwood E, Caballero MD (2019) Physics computational literacy: An exploratory case study usingcomputational essays. Phys Rev Phys Educ Res 15(2):020,152,Oliphant TE (2007) Python for Scientiﬁc Computing. Comput Sci Eng 9(3):10–20Panko RR (1998) What We Know About Spreadsheet Errors. Journal of End User Computing’s 10(Special issue onScaling Up End User Development):15–21Paˇnkov´a E, Hanˇc J (2019a) Flipped learning and interactive methods with smartphones in modern physics atsecondary schools. AIP Conference Proceedings 2152(1):030,025,Paˇnkov´a E, Hanˇc J (2019b) Teaching Feynman’s quantum physics at secondary schools using current digitaltechnologies. AIP Conference Proceedings 2152(1):030,026,Paˇnkov´a E, ˇStrauch P, Hanˇc J (2016) Practical strategies in formative and summative assessment of the ﬂippedmath and physics education. In: Santiago Campi´on R (ed) Actas del II Congreso de Flipped Classroom:Comunicaciones y posters presentados, Edita MT Servicios Educativos, Zaragoza, pp 308–327Poortman CL, Schildkamp K, Lai MK (2016) Professional development in data use: An international perspective onconditions, models, and eﬀects. Teaching and Teacher Education 60:363–365,Project Jupyter, Bussonnier M, Forde J, Freeman J, Granger B, Head T, Holdgraf C, Kelley K, Nalvarte G, OsheroﬀA, Pacer M, Panda Y, Perez F, Ragan-Kelley B, Willing C (2018) Binder 2.0 - Reproducible, interactive, sharableenvironments for science at scale. Proceedings of the 17th Python in Science Conference pp 113–120R Development Core Team (2020) R: A language and environment for statistical computing. URL

Redish EF (2014) Oersted Lecture 2013: How should we think about how our students think? American Journal ofPhysics 82(6):537–551,Romer P (2018) Jupyter, Mathematica, and the Future of the Research Paper. URL https://paulromer.net/jupyter-mathematica-and-the-future-of-the-research-paper/

Salkind NJ (ed) (2010) Encyclopedia of Research Design, 1st edn. SAGE Publications, IncSchrepp M, Hinderks A, Thomaschewski J (2017) Design and Evaluation of a Short Version of the User ExperienceQuestionnaire (UEQ-S). International Journal of Interactive Multimedia and Artiﬁcial Intelligence 4(6):103–108Sergis S, Sampson DG, Rodr´ıguez-Triana MJ, Gillet D, Pelliccione L, de Jong T (2019) Using educational data fromteaching and learning to inform teachers’ reﬂective educational design in inquiry-based STEM education.Computers in Human Behavior 92:724–738,Somers J (2018) The Scientiﬁc Paper Is Obsolete. The Atlantic URL

Spector C (2020) Bringing math class into the data age. URL https://ed.stanford.edu/news/bringing-math-class-data-age anˇc et al.

Page 31 of 32

Springuel RP, Wittmann MC, Thompson JR (2019) Reconsidering the encoding of data in physics educationresearch. Phys Rev Phys Educ Res 15(2):020,103,Stein WA, others (2020) Sage Mathematics Software, ver. 9.1. URL ˇStrauch P, Hanˇc J (2017) Quantitative diagnostics of misconceptions in science education [in Slovak]. Eduk´acia2(2):11 pp., URL

Talbert R, Bergmann J (2017) Flipped Learning: A Guide for Higher Education Faculty. Stylus Publishing, Sterling,VirginiaVanderPlas J (2016) Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media, Inc.,BostonWalt Svd, Colbert SC, Varoquaux G (2011) The NumPy Array: A Structure for Eﬃcient Numerical Computation.Comput Sci Eng 13(2):22–30,Warner J (2018) Thank you for 100 million repositories. URL https://github.blog/2018-11-08-100m-repos/

Weiss CJ (2017) Scientiﬁc Computing for Chemists: An Undergraduate Course in Simulations, Data Processing, andVisualization. J Chem Educ 94(5):592–597,Wickham H, Bryan J, attribution) RChoaRcaaCcwec, code) MKAoiR, code) KVAoil, code) CLAoil, code) BCAoil,code) DHAoil, code) EMAoil (2019) readxl: Read Excel Files. URL https://CRAN.R-project.org/package=readxl

Wickham H, Fran¸cois R, Henry L, M¨uller K, RStudio (2020a) dplyr: A Grammar of Data Manipulation. URL https://CRAN.R-project.org/package=dplyr

Wickham H, Seidel D, RStudio (2020b) scales: Scale Functions for Visualization. URL https://CRAN.R-project.org/package=scales

Wilcox RR (2017) Understanding and Applying Basic Statistical Methods Using R, 1st edn. Wiley, New YorkWright AM, Schwartz RS, Oaks JR, Newman CE, Flanagan SP (2020) The why, when, and how of computing inbiology classrooms. F1000Res 8, , URL

Zaiontz C (2019) Real Statistics Resource Pack | Real Statistics Using Excel. URL van der Zee T, Reich J (2018) Open Education Science. AERA Open 4(3):233285841878,746,Zimmermann P, Casamayou A, Cohen N, Connan G, Dumont T, Fousse L, Maltey F, Meulien M, Mezzarobba M,Pernet C, Thi´ery NM, Bray E, Cremona J, Forets M, Ghitza A, Thomas H (2018) Computational Mathematicswith SageMath. SIAM, Philadelphia anˇc et al.

Page 32 of 32

Appendix

Study I: Two open questions of the qualitative interview • Explain personal reasons of your choosing statistical methods and digital tools. • Describe key obstacles why data science tool R with R Commander did not ﬁt your research needs andrequirements.

Study II: Digital Experience Questionnaire • What mobile device WIFI connectable to the Internet you can bring to our course, if we were not in a PCroom? • What device you can use in home preparation, i.e. to study and view the learning e-content (course website,videos, presentations, interactive documents, simulations)? • From which secondary school do you come? • The most frequent half-year or end-year ﬁnal grade from secondary school math: • What digital tools did you regularly use in math lessons at your secondary school?

Study II: Three additional questions in the UEQ.The complete 26 items of UEQ questionnaire in English can be found at ueq-online.org. • The use of digital technology has helped me a lot to understand ideas we have learned. • I have greatly improved in the digital technology we used in learning. • The technology would also be very helpful in other subjects.

Study III: Teacher’s Attitude Questionnaire.The complete Slovak and English version of our short semi-closed, self-developed questionnaire is available at ourGitHub storage. • Future app use in own : Would you like to use our R shiny app in the future? • Next app extension to more tests and subjects:

How do you perceive further extension of our web app?

Additional Files

Our open data analysis in the form of Jupyter notebooks together with all used tools and data ﬁles are stored andfreely available at one of our GitHub repository devoted to this paper (Hanˇc et al, 2020b) in the frame of ourGitHub research project