[PDF] Effective Feedback for Introductory CS Theory: A JFLAP Extension and Student Persistence

Abstract

Computing theory analyzes abstract computational models to rigorously study the computational difficulty of various problems. Introductory computing theory can be challenging for undergraduate students, and the main goal of our research is to help students learn these computational models. The most common pedagogical tool for interacting with these models is the Java Formal Languages and Automata Package (JFLAP). We developed a JFLAP server extension, which accepts homework submissions from students, evaluates the submission as correct or incorrect, and provides a witness string when the submission is incorrect. Our extension currently provides witness feedback for deterministic finite automata, nondeterministic finite automata, regular expressions, context-free grammars, and pushdown automata. In Fall 2019, we ran a preliminary investigation on two sections (Control and Study) of the required undergraduate course Introduction to Computer Science Theory. The Study section used our extension for five targeted homework questions, and the Control section solved and submitted these problems using traditional means. Our results show that on these five questions, the Study section performed better on average than the Control section. Moreover, the Study section persisted in submitting attempts until correct, and from this finding, our preliminary conclusion is that minimal (not detailed or grade-based) witness feedback helps students to truly learn the concepts. We describe the results that support this conclusion as well as a related hypothesis conjecturing that with witness feedback and unlimited number of submissions, partial credit is both unnecessary and ineffective.

Full PDF

EEﬀective Feedback for Introductory CS Theory:A JFLAP Extension and Student Persistence

Ivona Bezáková ∗ Kimberly Fluet † Edith Hemaspaandra ∗ Hannah Miller ‡ David E. Narváez ‡ Abstract

Computing theory analyzes abstract computational models to rigorously study the computationaldiﬃculty of various problems. Introductory computing theory can be challenging for undergraduate stu-dents, and the main goal of our research is to help students learn these computational models. The mostcommon pedagogical tool for interacting with these models is the Java Formal Languages and AutomataPackage (JFLAP). We developed a JFLAP server extension, which accepts homework submissions fromstudents, evaluates the submission as correct or incorrect, and provides a witness string when the sub-mission is incorrect. Our extension currently provides witness feedback for deterministic ﬁnite automata,nondeterministic ﬁnite automata, regular expressions, context-free grammars, and pushdown automata.In Fall 2019, we ran a preliminary investigation on two sections (Control and Study) of the requiredundergraduate course Introduction to Computer Science Theory. The Study section used our extensionfor ﬁve targeted homework questions, and the Control section solved and submitted these problems usingtraditional means. Our results show that on these ﬁve questions, the Study section performed better onaverage than the Control section. Moreover, the Study section persisted in submitting attempts untilcorrect, and from this ﬁnding, our preliminary conclusion is that minimal (not detailed or grade-based)witness feedback helps students to truly learn the concepts. We describe the results that support thisconclusion as well as a related hypothesis conjecturing that with witness feedback and unlimited numberof submissions, partial credit is both unnecessary and ineﬀective.

Computing theory is diﬃcult for beginner students since the concepts are abstract. In introductory theorycourses, students construct computational models such as deterministic ﬁnite automata, nondeterministicﬁnite automata, regular expressions, context-free grammars, and pushdown automata (DFAs, NFAs, RegExs,CFGs, and PDAs, respectively). The most popular graphical interface for students to interact with theseconcepts [8] is the Java Formal Languages and Automata Package (JFLAP) [19, 20, 21].For our Automated Feedback in Undergraduate Computing Theory [3] research, we developed the Di-dactic And Visual Interface for Development (DAVID) extension to JFLAP. The DAVID extension acceptsintroductory computer science theory homework submissions from students and sends each submission to afeedback server, which automatically checks a student’s submission against the instructor’s correct solution;the server then provides immediate feedback to the student if the submission is correct or not. Figure 1shows JFLAP with the DAVID extension.In Fall 2019, we conducted a successful preliminary investigation of the beta version of the DAVID exten-sion. In this investigation, we compared a Control section and a Study section of students in Introductionto Computer Science Theory. The Study section was required to use the DAVID extension to submit ﬁvetargeted homework questions, and we considered these exploratory and deliberately broad research ques-tions about students and instructors. Our preliminary research questions were deliberately broad to preventartiﬁcially limiting ourselves. We discuss our most interesting ﬁnding in Section 4.3. ∗ Rochester Institute of Technology, Department of Computer Science, Rochester, New York, USA † University of Rochester, Center for Professional Development and Education Reform, Rochester, New York, USA ‡ Rochester Institute of Technology, Golisano College of Computing and Information Sciences, Rochester, New York, USA a r X i v : . [ c s . C Y ] D ec Q1 How do students use the DAVID extension? (Results in Section 4.1.)

RQ2

What were students’ experiences with the extension? (Results in Section 4.2.)

RQ3

How do instructors beneﬁt from the extension? (Results in Section 4.3.)A student’s submission to the DAVID extension is either incorrect or correct. If a student’s submission isincorrect, then the server provides immediate feedback to the student via a witness string , which is a stringthat the incorrect submission accepts (or rejects) but the correct solution should reject (or accept). For themodels used in introductory theory, these witness strings are usually just a few symbols long. Naturally,if the submission is correct, then the extension reports a simple “Correct!” to the student. We call thisfeedback of a witness string witness feedback , and we found that when given only witness feedback, studentstended to persist in submitting attempts until correct.Our preliminary investigation had multiple promising outcomes, including verifying the set-up of theDAVID extension, learning what additional telemetry would be valuable for our full investigation, under-standing the practical workﬂow to make the extension easy for instructors to use, and analyzing the collecteddata of homework submissions, student surveys, and students’ grades.

Figure 1.

JFLAP with the DAVID extension “Submit” option.

Norton [17] studied near-linear time algorithms [16, 14] for DFA equivalence and wrote JFLAP-compatiblecode to prove DFA equivalence and to produce a witness string if two DFAs are not equivalent. Since CFGequivalence is an undecidable problem, Sorrell used CFGAnalyzer [2] to experimentally show that checkingall strings up to length k = 10 suﬃces for typical homework assignments. Sorrell’s work included integratingthe PicoSAT solver [5] into CFGAnalyzer; this work is the CFGSolver software [23] used by the DAVIDextension. To be conservative, our extension checks all strings up to length k = 15 for strings generated bya CFG submission. We announced the setup of our preliminary investigation in [4].Automata Tutor is an alternative feedback tool for computing theory. Version 2 [12] provides a graphicaluser interface with roughly the functionality of JFLAP as well as additional features related to checkingequivalence for regular languages, including witness feedback and automated grading of DFAs [1, 11]. Pre-sented in May 2020, Version 3 [10] has many added features, including those that we implemented on top ofJFLAP.While the tools are similar, the focus of the Automata Tutor study [10] and our investigation are verydiﬀerent. Automata Tutor focuses on automation, including partial grading. We focus on educationalresearch, attempting to measure the educational beneﬁts of the DAVID extension, which provides a witnessstring for incorrect solutions and does not venture into partial credit. In fact, the most interesting ﬁndingof our preliminary investigation is that the majority of students persisted in submitting attempts until theDAVID extension reported correct. We discuss the implications of the persistence result in Section 4.3, fromwhich we hypothesize (Section 5.1) that witness feedback is the appropriate type of feedback (Section 5.2)and that partial credit is not needed in our setting (Section 5.3).2or this paper, we use the term feedback , which is deﬁned as “information provided by an agent (e.g.,teacher, peer, book, parent, experience) regarding aspects of one’s performance or understanding” [15]. Inparticular, we are interested in intermediate feedback (Section 5.2), which is a type of formative feedback [22]. In Summer 2019, we developed the DAVID extension. In Fall 2019, we ran a preliminary investigation onapproximately 65 students enrolled in Introduction to Computer Science Theory, and in Spring 2020, weanalyzed the investigation results. These results and the implications for our future work are discussed inthis paper.For our Fall 2019 preliminary study with the DAVID extension, two synchronized sections of the Intro-duction to Computer Science Theory course shared the same instructor, course delivery style, homeworkassignments, similar midterm exams, and an identical ﬁnal exam. The majority of students ( > ) inboth sections were majoring in Computer Science; the second-most popular major ( < ) was SoftwareEngineering.For both sections, we collected homework and grade data, surveyed the students about their experiencesin the course, and asked the Study section additional survey questions about their experiences with theDAVID extension, their perception of the extension, and their thoughts on automated feedback in general.We monitored how the Study students used the DAVID extension. We also interviewed the instructor. Atthe end of the course, there were 35 students in the Control section and 29 students in the Study sectionwho gave consent for their data to be used in our investigation. There were eleven homework assignmentsfor the semester with an average of ﬁve questions per homework.Out of these eleven assignments, we compared the performance between the two sections on ﬁve targetedhomework questions (one each on a DFA, NFA, RegEx, CFG, and PDA), which were identical betweenthe sections. For these ﬁve targeted homework questions, the Control section submitted their homeworksolutions to these ﬁve questions using traditional means the Study section was required to use the DAVIDextension. The text of the ﬁve targeted homework questions is below. The CFG question, which wasespecially challenging, had by far the most number of submissions to the DAVID extension.1. DFA.

Draw the state diagram of a ﬁnite automaton that accepts the language of all strings over { a, b } that contain at least 2 b ’s and do not contain the substring bb . In other words, a string is acceptedonly if both conditions hold. Your ﬁnite automaton should not be overly complicated.2. NFA.

Draw the state diagram of an NFA accepting the language of all strings over { a, b } that eitherstart or end with the substring aba . For full credit, you must use nondeterminism where possible tomake your state diagram as simple as possible.3. RegEx.

Give a regular expression for the language of all strings over { , } that have neither thesubstring 000 nor the substring 111. In other words, the language contains all strings for which neithersymbol ever appears more than twice consecutively.4. CFG.

Give a CFG that generates the language of all strings over { , } that have more consecutive 0’sat the beginning of the string than consecutive 1’s at the end of the string. (For example, the followingstrings are all in the language: {0, 001, 00001010101010111, 0111111110}. The following strings areall not in the language: { (cid:15) , 01, 10, 0011, 0010000000111}.)5. PDA.

Let L = { w ∈ { a, b } ∗ | w has more a ’s than b ’s } . Draw the state diagram of a PDA that acceptsthe language L . Your PDA should not be overly complicated. For

RQ1 (student use), for both the ﬁnal exam grade and the overall course grade, the Study section scoredlower than the Control section, and both lower scores were statistically signiﬁcant (Table 2); these diﬀerences3re not due to chance, which tells us that the two sections had diﬀerent levels of academic strength. However,on the ﬁve targeted homework questions where the Study section used the DAVID extension for feedback,the Study section’s average grade was always higher than the Control section’s average grade (Figure 2).

C S0255075100 a v e r ag e g r a d e ( % ) DFA C SNFA C SRegEx C SCFG C SPDA

Figure 2.

Homework grade average and standard deviation for the ﬁve targeted homework questions.“C” is the Control section; “S” is the Study section. The y -axis limits are the same amongthe subplots.The Study section strongly outperformed the Control section with respect to the percent of perfecthomework grades for the targeted homework questions (Figure 3). (The Study section NFA percentage islow compared to the other questions because the NFA submissions through the DAVID extension are notchecked for “suﬃcient” nondeterminism, which means that submissions counted as correct by the extensiondid not necessarily earn a perfect grade.) We saw high engagement with the extension from the Studystudents: on average, students submitted 9 times per homework question. C S0255075100 p e r f ec t g r a d e ( % ) DFA C SNFA C SRegEx C SCFG C SPDA

Figure 3.

The percentage of students who earned a perfect grade on the ﬁve targeted questions froman experienced professor. “C” is the Control section; “S” is the Study section. The y -axislimits are the same among the subplots.We used ANOVA with a threshold of p < . to compare the diﬀerences between the Control sectionand the Study section. Table 1 summarizes the statistical results. On three of the ﬁve targeted homeworkquestions, the Study section’s higher score was statistically signiﬁcant; for the other two targeted homeworkquestions, there was no statistically signiﬁcant diﬀerence between the sections. Because the Study sectionwas academically less proﬁcient than the Control section, it is particularly noteworthy that the Study sectionscored signiﬁcantly higher than the Control section on the DFA, RegEx, and PDA questions. p value SigniﬁcanceDFA 0.001 Study scored higher than ControlNFA – no statistical diﬀerenceRegEx 0.030 Study scored higher than ControlCFG – no statistical diﬀerencePDA 0.000 Study scored higher than ControlFinal exam grade 0.014 Study scored lower than ControlCourse grade 0.011 Study scored lower than Control Table 1.

Statistical results. A dash indicates no statistically signiﬁcant p value. All analysis used athreshold of p < . .To determine the relative academic strength of the Control vs. Study sections, we used the ﬁnal examgrade, the overall course grade, and the instructor’s perception. The ﬁnal exam was identical for both4ections, and for the ﬁnal exam grade, the Control section mean was . ± . for 35 students, andthe Study section mean was . ± . for 29 students. This diﬀerence was statistically signiﬁcant( p = 0 . ) and therefore is not due to chance. For the overall course grade (Table 2), the Control sectionperformed better than the Study section with respective means of . ± . and . ± . , whichwas statistically signiﬁcant ( p = 0 . ). Quartiles n µ σ min 25% 50% 75% maxControl 35 84.9 9.9 54.3 80.0 86.3 92.9 96.7Study 29 78.0 11.0 39.0 76.0 78.3 86.4 91.5 Table 2.

Overall course grades for both sections. The number of students is n , the mean grade is µ ,and the standard deviation of the grades is σ .Additionally, the course instructor said, “I do think it’s probably correct that just kind of as an averageperformance, the Control section was a little sharper [than the Study section].” This evidence of the ﬁnalexam and the overall course grade as well as the instructor’s perception supports our claim that the Studysection was academically less successful than the Control section. Therefore, the statistically signiﬁcanthigher scores by the Study section on three of the ﬁve targeted homework questions are even more strikingsince they outperformed the academically stronger Control section. For

RQ2 (student thoughts), we surveyed students in both sections twice about their experiences in thecourse. On each survey, the Study section responded positively about the DAVID extension. We share tworepresentative free-text responses: on Survey 1 (after DFAs and NFAs), Student Q1 The DAVID extension helped me solve the {DFA, NFA, RegEx, CFG, PDA} questions, and Q2 The DAVID extension helped me understand the {DFA, NFA, RegEx, CFG, PDA} questions.From the survey responses, the Study students did feel helped by the DAVID extension (Table 3). Forsurvey Q1 (DAVID extension helped the student solve), on the DFA, RegEx, and PDA questions where theStudy section did better than the Control section, the majority of Study students strongly agreed or agreedthat the extension helped them to solve that problem (54.9%, 60.7%, and 42.9%, respectively). For surveyQ2 (DAVID extension helped the student understand), the majority of Study students strongly agreed oragreed that the extension helped them understand the concepts (41.9% for DFA and 46.4% for RegEx). ForPDAs, the percentage of students who responded neutral (42.9%) was near equal to those who respondedstrongly agree or agree (39.3%); however, the percent of students who strongly agreed or agreed was morethan double those that disagreed or strongly disagreed (17.8%).51: Solve Q2: UnderstandSA+A N D+SD SA+A N D+SDDFA

Table 3.

Study section survey responses. Bold percentages indicate the homework questions where theStudy section performed statistically better than the Control section.On the second survey, we also asked the Study students about resubmission and partial credit. Thesurvey questions are below, and the Study students’ responses are in Table 4. Q3 The DAVID extension should allow users to resubmit until correct. Q4 Assignments submitted via the DAVID extension should be graded to allow partial credit. Q5 Assignments submitted via the DAVID extension should be graded as correct/incorrect (i.e., withoutpartial credit).Students overwhelmingly agreed ( > ) that the DAVID extension should allow users to resubmit untilcorrect, and they also agreed that assignments should be graded to allow partial credit, but they disagreedthat assignments should be graded as strictly correct or incorrect. In Section 4.3, we discuss the implicationsof these results for student learning and for optimum design of homework feedback.SA+A N D+SDQ3: Allow resubmit Survey response percentages from 28 students in the Study section about resubmission andpartial credit. Bold shows the highest percentage response for each question.

For

RQ3 (instructor beneﬁt), we saw that with the immediate feedback from the DAVID extension, morestudents in the Study section did eventually solve the homework problems correctly (Figure 3), which beneﬁtsan instructor because grading correct submissions is faster and easier than grading incorrect submissions.We have named this beneﬁt of our extension grading triage .Recall that the focus of our work is about developing a homework feedback server for students, and ourwork is not about automatic grading. In Figure 4, we see the percentage of students who continued to submitattempts until the extension reported “Correct.” We call this behavioral phenomenon persistence . DFA NFA RegEx CFG PDA0255075100 S t ud y s ec t i o np e r s i s t e n ce ( % ) Figure 4.

Persistence of the Study section until the DAVID extension reported “Correct.” In this ﬁgure, we omit the occasional student who did not get meaningful feedback because of syntax errors; for example,mismatched parentheses in regular expressions did not display an error message to the student.

Figure 5.

One student’s 26 unique submissions for a DFA for the homework question: “Draw a DFAthat accepts the language of all strings over { a, b } that contain at least 2 b ’s and do notcontain the substring bb .” The thin gray lines are fewer similar submissions, and the graylines are many similar submissions. The thick, dark gray line is the correct ﬁnal submission. For our exploratory and preliminary Fall 2019 investigation, our research questions were deliberately broadso that we could ﬁnd interesting directions for our future work. Our most interesting result was studentpersistence: with only the short witness string as feedback, students persisted in submitting attempts untilthe DAVID extension reported “Correct” (Figure 4). Our preliminary investigation led us to these hypothesesfor our full investigation. H1 In our setting, witness feedback is the appropriate type of feedback. H2 In our setting, partial credit is not needed.

Witness feedback is the minimal reasonable feedback . The educational literature calls this minimal interven-tion , which has been found to promote better learning than more detailed feedback [24]. Minimal reasonablefeedback convincingly shows that the submission is wrong, but the feedback does not give any hints for ﬁxingthe submission. Creighton et al. say, “If feedback attempts to provide too much guidance, there is nothingleft for the student to do or learn” [9]. Similar ideas are also in [13, 18].7itness feedback, which is very minimal, gives the student a reason why a submission is wrong, butthe witness feedback does not tell the student how to correct the mistakes. Thus, witness feedback will notlead a student to the correct solution; instead, the student must think independently. In fact, this sort ofminimal witness feedback mimics the feedback that an instructor or a tutor would give to a student seekinghelp by providing a short witness string showing where the student’s attempt is incorrect. Feedback of asingle, short witness string requires the student to actively learn in order to solve the question.One problem with allowing students to submit as many times as they like is that students may try torandom-walk to the correct solution. Because the witness does not give information about how to changethe submission, the potential of randomly converging on the correct solution is not an issue. For example,Figure 5 shows a student who clearly has the right idea and is reﬁning the solution based on the minimalfeedback of the DAVID extension.However, when giving more detailed feedback, encouraging “random walks to a solution” can be an issue.In addition, there is a real risk of “over helping” and leading the student to the solution step-by-step in sucha way that the student contributes very little (even though the student may not realize this). Finally, moredetailed feedback may encourage students to make local ﬁxes that create more and more bloated submissions.As a related but distinct point, giving more detailed feedback is hard. For example, if there is more thanone way to approach a problem, feedback can easily steer students into a direction that does not correlate tothe student’s approach. For a simple, standard example, there are two ways to approach designing a CFGfor the language { a i b k | i (cid:54) = k } . The ﬁrst one is to cross oﬀ a ’s and b ’s until you are left with just a ’s or just b ’s. This corresponds to a CFG with the following rules. S → aSb | A | BA → aA | aB → bB | b The second one is to view this language as the union of two cases: “ i < k ” and “ i > k .” This corresponds toa CFG with the following rules. S → A | BA → aAb | aA | aB → aBb | bB | b Telling a student who is following the ﬁrst approach to “think of the language as a union” is not helpful atbest. Of course, an instructor could add both approaches to a feedback system, and for regular expressions,Automata Tutor uses this technique of adding multiple “reasonable” approaches to the grading system. Ingeneral, it will be hard for an instructor to come up with all reasonable approaches, and it seems impossibleto do this with an automated program.

Our investigation focuses on automated feedback for intermediate student submissions. Of course, grading isa form of feedback as well. Automata Tutor uses the current grade as (part of) the feedback on intermediatestudent submissions. Automated grading is a very interesting topic in its own right, particularly given thelarge numbers of CS majors.For incorrect attempts, using partial grade credit as feedback in addition to the witness suﬀers from theproblems described in Section 5.2. If students are chasing more partial credit, then they may be randomlytrying to converge on the wrong thing (more points) rather than the right thing (the correct language). In[7], Cain and Babar paraphrase (Skinner 2014), saying, “Attaching marks to an assessment task means that,from the student’s perspective, the task will play a summative role and feedback is not seen as formative.”They continue, “Interestingly, it has been reported that students pay more careful attention to feedbackwhen there are no associated marks [6] or put another way ‘marks’ reduced student attention to formativefeedback.” In other words, it is diﬃcult to design a good scoring system that really drives to the correctsolution.Automated partial credit has other problems and drawbacks. One obvious problem is that diﬀerentinstructors may want to assign diﬀerent amounts of partial credit; although in practice, instructors would8robably accept “reasonable” partial credit (for example, no grade “inversions,” meaning that better solutionsshould not get less credit).However, it is hard if not impossible to automatically assign reasonable partial credit. For example, inAutomata Tutor, the fraction of points assigned for a CFG is computed as an estimate of | A ∩ B || A ∪ B | , where A is the language generated by the submitted CFG and B is the correct language. If we consider a simplelanguage like { a i b i | i ≥ } with the standard solution S → aaSb | aab, then the four solutions in Table 5, which are all fairly close, get no credit at all! This is not meant to benegative about Automata Tutor; any language-based partial credit metric will have similar problems.CFG language S → aSbb | abb { a i b i | i ≥ } S → bbSa | bba { b i a i | i ≥ } S → aSb | ab { a i b i | i ≥ } S → aaSb | ab { a i +1 b i +1 | i ≥ } Table 5.

Solutions that are close to the rule S → aaSb | aab , but do not intersect with that languageat all.Indeed, Automata Tutor [10] for RegExs looks at the distance from a few “sensible” RegExs suppliedby the instructor (if the submission is correct, the student always gets full credit), stating that “... This ispreferable to comparing the languages, because a small careless mistake in the RE [RegEx] can have a largeimpact on the language.” We agree (and of course the same argument holds for CFGs as well), but it maybe hard for an instructor to list all “sensible” RegExs, particularly for complicated RegExs. For a simpleexample, if a student writes a ∗ + b ∗ + ( a + b ) ∗ instead of ( a + b ) + , (where * is the Kleene star, the inﬁx operator + is the union operator, and the raised + is the Kleene plus),then the student will lose a lot of points, even though the submission is only missing the empty string.On a ﬁnal note, students and instructors may feel that not giving partial credit is overly harsh. Indeed,when we asked our students on the survey (Table 4), they were overwhelmingly ( > ) in favor of theDAVID extension giving partial credit. However, there are other ways to give students partial credit. Forexample, we can give ﬁve CFG questions and ask the students to submit four, which is a stress-decreasingapproach that works well in many situations, including exams. With our approach of minimal reasonablefeedback, not only will the students have an unlimited number of retries with immediate witness feedback,but also they can seek help from the instructor or tutors. The DAVID extension is successfully providing feedback for DFAs, NFAs, RegExs, CFGs, and PDAs. Wehave promising initial results: the Study section performed better on the ﬁve targeted homework questionsthan the Control section (Figure 2). The Study students persisted (Figure 4) from which we conjecture thatwitness feedback is the right feedback.Our future work will continue the educational focus of

RQ1 (student use),

RQ2 (student thoughts),and

RQ3 (instructor beneﬁt). Since our Fall 2019 investigation was preliminary and our extension targeted < of all homework questions in the course, we did not see (and did not expect to see) knowledgetransfer as measured by the students’ performance on related but unfamiliar questions on exams. In our fullinvestigation where students will use our extension on more homework questions, we expect to see knowledgetransfer. 9e are most excited about our unexpected result of student persistence. We believe that as studentspersist in solving problems via our extension ( RQ1 ), students will learn the material (

RQ2 ), which beneﬁtsnot only the students but also their instructors (

RQ3 ). From our ﬁnding of student persistence, we willexamine our additional hypotheses about witness feedback and partial credit as discussed in Section 5.1: H1 In our setting, witness feedback is the appropriate type of feedback. H2 In our setting, partial credit is not needed.As we prepare for our full investigation, we look forward to studying our preliminary conclusion thatminimal witness feedback is both necessary and suﬃcient for students to learn eﬀectively.

Acknowledgments

We thank the SIGCSE 2021 anonymous referees for helpful comments. We thank Aaron Deever and hisstudents for participating. We thank our advisory board: Douglas Baldwin, Joan Lucas, and Susan Rodger.Research supported in part by NSF grant DUE-1819546.

References [1] Rajeev Alur, Loris D’Antoni, Sumit Gulwani, Dileep Kini, and Mahesh Viswanathan. 2013. AutomatedGrading of DFA Constructions. In

Proceedings of the Twenty-Third International Joint Conference onArtiﬁcial Intelligence (Beijing, China) (IJCAI ’13) . AAAI Press, 1976–1982.[2] Roland Axelsson, Keijo Heljanko, and Martin Lange. 2008. Analyzing Context-Free Grammars Usingan Incremental SAT Solver. In

Automata, Languages, and Programming , Luca Aceto, Ivan Damgård,Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz (Eds.). SpringerBerlin Heidelberg, Berlin, Heidelberg, 410–422. https://link.springer.com/chapter/10.1007%2F978-3-540-70583-3_34 [3] Ivona Bezáková and Edith Hemaspaandra. [n.d.]. Automated Feedback in Undergraduate ComputingTheory Courses (Award Abstract DUE- [4] Ivona Bezáková, Edith Hemaspaandra, Aryeh Lieberman, Hannah Miller, and David E. Narváez.2020. Prototype of an Automated Feedback Tool for Intro CS Theory. In

Proceedings of the 51stACM Technical Symposium on Computer Science Education . Association for Computing Machinery. https://dl.acm.org/doi/10.1145/3328778.3372598 [5] Armin Biere. 2008. PicoSAT Essentials.

J. Satisf. Boolean Model. Comput.

Assessment in Education

Proceedingsof the 38th International Conference on Software Engineering Companion (Austin, Texas) (ICSE ’16) .Association for Computing Machinery, New York, NY, USA, 336–345. https://doi.org/10.1145/2889160.2889185 [8] Pinaki Chakraborty, P. C. Saxena, and C. P. Katti. 2011. Fifty Years of Automata Simulation: AReview.

ACM Inroads

2, 4 (Dec. 2011), 59–70. https://doi.org/10.1145/2038876.2038893 [9] Susan Janssen Creighton, Cheryl Rose Tobey, Eric E. Karnowski, and Emily Roche Fagan. 2015. Pro-viding and using formative feedback. In

Bringing math students into the formative assessment equation .Corwin, Chapter 4, 109–152. 1010] Loris D’Antoni, Martin Helfrich, Jan Kretinsky, Emanuel Ramneantu, and Maximilian Weininger. 2020.Automata Tutor v3.

CoRR abs/2005.01419 (2020). arXiv:2005.01419 https://arxiv.org/abs/2005.01419 [11] Loris D’Antoni, Dileep Kini, Rajeev Alur, Sumit Gulwani, Mahesh Viswanathan, and Björn Hartmann.2015. How Can Automatic Feedback Help Students Construct Automata?

ACM Trans. Comput.-Hum.Interact.

22, 2, Article 9 (March 2015), 24 pages. https://doi.org/10.1145/2723163 [12] Loris D’Antoni and Alexander Weinert. [n.d.]. Automata Tutor v2. http://automatatutor.com/index [13] Jeanne D. Day and Luis A. Cordón. 1993. Static and dynamic measures of ability: An experimentalcomparison.

Journal of Educational Psychology

85, 1 (Mar 1993), 75––82.[14] David Gries. 1972.

Describing an algorithm by Hopcroft . Technical Report Technical Report TR 72-151.Cornell University.[15] John Hattie and Helen Timperley. 2007. The Power of Feedback.

Review of Ed-ucational Research

77, 1 (2007), 81–112. https://doi.org/10.3102/003465430298487 arXiv:https://doi.org/10.3102/003465430298487[16] John Hopcroft. 1971. An n log n algorithm for minimizing states in a ﬁnite automaton . Technical ReportTechnical Report STAN-CS-71-190. Stanford University.[17] Daphne A. Norton. 2009. Algorithms for Testing Equivalence of Finite Automata, with a Grading Toolfor JFLAP . Master’s thesis. Rochester Institute of Technology. https://scholarworks.rit.edu/theses/6939/ [18] Katrin Rakoczy, Petra Pinger, Jan Hochweber, Eckhard Klieme, Birgit Schütze, and Michael Besser.2019. Formative assessment in mathematics: Mediated by feedback’s perceived usefulness and stu-dents’ self-eﬃcacy.

Learning and Instruction

60 (2019), 154–165. https://doi.org/10.1016/j.learninstruc.2018.01.004 [19] Susan H. Rodger. 1996. JFLAP: The Java Formal Languages and Automata Package. [20] Susan H. Rodger and Thomas W. Finley. 2006.

JFLAP: An Interactive Formal Languages and Au-tomata Package . Jones and Bartlett Publishers, Inc., USA. [21] Susan H. Rodger, Eric Wiebe, Kyung Min Lee, Chris Morgan, Kareem Omar, and Jonathan Su. 2009.Increasing Engagement in Automata Theory with JFLAP. In

Proceedings of the 40th ACM TechnicalSymposium on Computer Science Education (Chattanooga, TN, USA) (SIGCSE ’09) . Association forComputing Machinery, New York, NY, USA, 403–407. https://doi.org/10.1145/1508865.1509011 [22] D. Royce Sadler. 1989. Formative assessment and the design of instructional systems.

InstructionalScience

18 (1989), 119–144. Issue 2. https://doi.org/10.1007/BF00117714 [23] Jessica Sorrell. 2015. CFGSolver. https://github.com/hatgirl/CFGSolver [24] Dylan Wiliam. 1999. Formative Assessment in Mathematics Part 2: Feedback.