[PDF] Gradeer: An Open-Source Modular Hybrid Grader

Abstract

Automated assessment has been shown to greatly simplify the process of assessing students' programs. However, manual assessment still offers benefits to both students and tutors. We introduce Gradeer, a hybrid assessment tool, which allows tutors to leverage the advantages of both automated and manual assessment. The tool features a modular design, allowing new grading functionality to be added. Gradeer directly assists manual grading, by automatically loading code inspectors, running students' programs, and allowing grading to be stopped and resumed in place at a later time. We used Gradeer to assess an end of year assignment for an introductory Java programming course, and found that its hybrid approach offers several benefits.

Full PDF

GGradeer: An Open-Source Modular Hybrid Grader

Benjamin S. Clegg ∗ , Maria-Cruz Villa-Uriol ∗ , Phil McMinn ∗ and Gordon Fraser †∗ University of Shefﬁeld, † University of Passau

Abstract —Automated assessment has been shown to greatlysimplify the process of assessing students’ programs. However,manual assessment still offers beneﬁts to both students andtutors. We introduce

Gradeer , a hybrid assessment tool, whichallows tutors to leverage the advantages of both automatedand manual assessment. The tool features a modular design,allowing new grading functionality to be added.

Gradeer directlyassists manual grading, by automatically loading code inspectors,running students’ programs, and allowing grading to be stoppedand resumed in place at a later time. We used

Gradeer to assessan end of year assignment for an introductory Java programmingcourse, and found that its hybrid approach offers several beneﬁts.

I. I

NTRODUCTION

The demand for Computer Science and Software Engineeringeducation has continued to increase over recent years, witheducational institutions seeing larger cohorts of studentsenrolled in such courses [1]. As technology further advances,future generations of students will drive this demand further,with universities and schools facing several challenges inteaching a growing number of students. One of these challengesis the assessment of a large number of students’ solutions toprogramming tasks. Assessment is particularly important, sinceit both has the ability to further students’ development throughthe provision of detailed feedback, and serves to measure astudent’s understanding of a topic.Automated grading and feedback techniques offer severalbeneﬁts in assessing large numbers of students. Their automatednature allows users to perform other tasks while gradingis executed. It is also often much quicker to run a seriesof automated processes than to manually assess individualstudents’ solution programs. This is especially important forcourses with large numbers of students, where manual assess-ment would consume too much time, and manual feedbackcould be provided too late to be of relevance to students’learning. In addition, automated feedback allows for a largeamount of feedback to be generated, and providing morepieces of automated feedback has been shown to improvestudents’ performance [2]. Automated grading is also moreconsistent than manual grading, especially if students’ solutionsare assessed manually by multiple people [3], which wouldlikely be necessary to improve assessment times.There are, however, some issues with the use of automatedassessment alone. There is a signiﬁcant initial time cost ofusing automated assessment, with the need to either developor conﬁgure a tool before assessment can be performed.Additionally, with the exception of test-based systems, tutorsmay ﬁnd it difﬁcult to adapt an automated assessment systemto meet their requirements [4]. Similarly, there are a wide range of unique automated assessment approaches [5]–[10], someof which may be suited to certain tasks, but would requirea signiﬁcant degree of effort to combine into one gradingtool. Automated assessment also lacks some of the beneﬁtsof manual approaches. Manual assessment has the ability tocapture aspects of grading that are hard to automate, such asthe usefulness of variable names, or the appearance of a GUI.There is also evidence that manually provided feedback is ofgreater beneﬁt to students’ performance than automaticallygenerated feedback [11].In this paper, we introduce

Gradeer , a hybrid modulargrading system, with the goal of providing the beneﬁts ofboth approaches, while mitigating their challenges (Section II).We used

Gradeer to assess an end of year assignment for anintroductory programming course (Section III). We found thatthe tool’s hybrid approach allowed for the use of a large numberof consistent automated assessment criteria, and aided in theprovision of detailed manual feedback to students.

Gradeer alsoprovides a degree of automation to assist tutors in manualassessment, such as automatically launching students’ programsand code inspectors. We found that these features saved us aconsiderable amount of time when manually assessing students’solutions. The modular nature of grading components allowsa variety of automated grading techniques to be used inconjunction with one another, while minimising the effortrequired to combine their results.

Gradeer is available onGitHub under the GPLv3 license, which allows users to writetheir own extensions and integrations for the tool [12].II. T HE Gradeer G RADING T OOL

Gradeer is an assessment tool which provides tutors withthe beneﬁts of both automated and manual assessment in asingle package. The tool achieves this using a modular design,allowing a user to choose how to assess a programming taskusing simple conﬁguration ﬁles, or even deﬁne their ownmodules for speciﬁc purposes. To allow for manual assessment,

Gradeer is designed to be used by tutors on personal computers,where the user can interact with the program via a CLI. It ishowever possible for

Gradeer to be integrated with a GUI orweb interface.

Gradeer is implemented in Java, and allows forthe assessment of Java programs. Wider language support isplanned for future versions of the tool. This section describesour design of

Gradeer , alongside some of its beneﬁts.

A. Checks

We designed

Gradeer with a focus on modular gradingcomponents, called checks , each of which represents a singlegrading criterion. Different types of checks are currently a r X i v : . [ c s . C Y ] F e b mplemented, deﬁning how a criterion’s base score (a decimalvalue between zero and one) can be determined for a givenprocess and student’s solution. Various checks of different typescan be used together in a single run of Gradeer , constructing amarkscheme to assess several learning outcomes. For example,users can conﬁgure

Gradeer to use multiple checks to runvarious test suites, perform static analysis, and manually assessseveral aspects of a solution. Users conﬁgure their checks inJSON ﬁles. Users can also implement new checks to add thefunctionality of unique and domain-speciﬁc grading tools.One currently implemented type of check is the

TestSuiteCheck , which executes a given JUnit test classon a student’s solution via Apache Ant [13], then calculatesa score as the proportion of tests that pass. Tutors can assessindividual learning outcomes by grouping tests that evaluatethe same outcome into one class.We also implemented check types for two static analysistools, Checkstyle and PMD [14], [15], in order to automaticallyassess the code quality of students’ solutions. Such checkssearch the output of their respective tool for a user deﬁnedrule violation. The number of violations in each source ﬁle ofa solution is recorded and used to compute a base score. Userscan also deﬁne a minimum and maximum number of violations,which yield base scores of one and zero, respectively.To support manual assessment, we have implemented a

ManualCheck type, which displays a user-deﬁned promptand score limit to the user when executed. This check thenparses numeric input from the user and normalises it to ascore in the range of zero and one. For example, the followingresponse would produce a base score of 0.6:

How informative are the variable names?(0 = very poor, 10 = excellent)

Each check has an associated weight; a score multiplicationfactor to allow a test to have a greater or smaller impact on eachsolution’s overall grade, as discussed in Section II-B4. Thisweight can be deﬁned by the user. In addition, each check hasassociated feedback to provide to a student for their solution.For most checks, this feedback is determined by mapping abase score to one of several feedback values that have beenpre-deﬁned by the user. For example, the above manual checkmay provide students with feedback for the base scores, bs : • . ≤ bs ≤ . : “Your variable names are informative.” • . ≤ bs < . : “Some of your variable names could bemore informative.” • . ≤ bs < . : “Most of your variable names could bemore informative.”Manual checks can also read text input from the user, allowingfor additional feedback to be provided on an individual basis.For example, a tutor may enter “ a is not an informative variablename, leftMotor would be better.” B. Execution

Figure 1 shows an overview of

Gradeer’s execution process.

1) Compilation & Check Loading:

First,

Gradeer compilesevery students’ solution and every model solution (SectionII-B2). At this stage, any solutions which do not compileare ﬂagged as such. These solutions are reported to the tutorfor review, and are excluded from further execution. Next,

Gradeer loads every check deﬁned in the JSON ﬁles. The toolalso compiles the test classes that are provided by the user. Ifenabled,

Gradeer automatically generates a test suite check foreach test class which does not have a matching check alreadydeﬁned by the user.

2) Model Solution Execution:

The user can supply a setof one or more model solutions; entirely correct solutionsto the programming task being assessed. Users can chooseto use multiple model solutions to deﬁne different correctimplementations of the programming task. In order to identifyand remove invalid checks,

Gradeer executes every check oneach provided model solution. Checks which attain a base scoreof less than one on any of the model solutions are consideredto be invalid, and are removed; they falsely claim that a modelsolution is partly or completely incorrect. This prevents invalidchecks from being used in the assessment of students’ solutions,preventing them from unfairly losing or gaining grades, orbeing given inaccurate feedback. For example, uncompilabletest classes will not pass on any solutions, so their checks areremoved. The names of invalid checks are stored in a ﬁle forthe tutor to review and correct.

3) Solution Grading (for each Student’s Solution):a) Pre-checks:

In order for some checks to functionproperly, a series of pre-checks are executed on each solution.For example, checks for Checkstyle and PMD require pre-checks which execute their corresponding static analysis toolon the solution under test and store its output in memory. b) Solution Inspection:

To support effective manualgrading,

Gradeer includes a solution inspector which canperform two processes, as conﬁgured by the user. The ﬁrstexecutes a student’s solution in a separate thread before runningany manual checks. This allows the user to be able to interactwith the solution, and to observe its user interface, whichmay be relevant to the rubric of manual checks. The solutionexecution thread is closed following the completion of everymanual check on a given solution. The second opens each ofthe solution’s source ﬁles in an external user deﬁned text editor,such as Atom. This allows for the user to perform manual codeinspection, for example to determine the quality of variablenames or comments. The solution inspector removes the needfor the user to manually run a student’s solution to interactwith it, or open its source ﬁles to inspect it, saving time. c) Checks:

The ﬁnal step of a solution’s grading processis to run every check on it. In order to improve execution time,

Gradeer runs automated checks in parallel by default. Manualchecks are only executed in the main thread, however, as theyrequire user input, and henceforth could result in the occurrenceof race conditions otherwise. In order to prevent some JUnitchecks from taking too long to execute,

Gradeer has a userconﬁgurable global test timeout, where any tests that takelonger than this time are treated as failing. This is particularly odelSolutionsUnit TestsCheckConﬁgsStudents'Solutions CompilerCompilerCompiler PreprocessorsPre-checksPreprocessorsPreprocessorsPreprocessorsCheckGenerators CheckExecutor ValidChecks SolutionInspectorStateRestoration StoredCheckResults Grade &FeedbackCSVsGradeCalculator& OutputWriterChecks CheckExecutorCheckResultsPreprocessorsPre-checksStudents'SolutionsModelSolutionsCompilation & Check Loading Model Solution Execution Solution Grading Output

Fig. 1: Overview of

Gradeer’s ﬂow of execution. The dotted areas indicate different phases of the execution. Waved boxes areﬁles, parallelograms are internal data, and regular boxes are processes.important, since some students’ solutions may contain bugs thatprevent them from halting, such as incorrect loop conditions.

4) Output:

After executing every check on every solution,

Gradeer stores the appropriate grades and feedback for eachsolution in various CSV ﬁles. The ﬁnal grade of each solutionis stored in one CSV ﬁle. This grade is calculated by:

Grade (s) = (cid:80) c ∈ C w ( c ) · Base Score ( c, s ) (cid:80) c ∈ C w ( c ) ,where s = Student’s solution, C = Set of enabled checks, w ( c ) = Weight of check c Similarly, the combined feedback of each solution acrossall checks is also stored in a CSV ﬁle.

Gradeer also storesa CSV ﬁle with the individual base scores and feedback ofevery check for each solution. This ﬁle also includes the weightof each check. This allows for ﬁnal changes to be made inspreadsheet software if absolutely necessary. For example, theuser can post-process the students’ grades by adjusting thechecks’ weights, and recalculating the ﬁnal grades in the samemanner as

Gradeer . Users can also gather valuable informationon students’ performance for the grading criteria, facilitatingthe provision of group feedback to the entire student cohort.

C. State Restoration

Following the completion of checks on a solution,

Gradeer stores the results and feedback for every check in aJSON ﬁle. When

Gradeer is executed with such ﬁles present,it uses them to restore these check results for every applicablesolution, and skips the corresponding checks when processingthese solutions again. This has numerous advantages: • A tutor can effectively pause the grading process and comeback to it at a later time. This is particularly advantageouswhen using manual checks, as programming tasks withmany students’ solutions can take hours to manually assess.State restoration allows this arduous process to be splitinto more manageable marking sessions. • Assessment tasks can be allocated to multiple users, suchas TAs. Tutors can adjust users’

Gradeer conﬁgurations touse different solutions or checks. By allocating differentmanual checks to different users, grading can be completedmore quickly without reducing consistency. By mergingthe users’ JSON ﬁles and re-running

Gradeer , the ﬁnalgrades and feedback can be generated. • If Gradeer halts unexpectedly, perhaps due to a widersystem error, the user’s grading progress is not lost. • Tutors can either directly modify the result ﬁles to adjustthe results of individual checks, or delete them outright tore-assess a solution. Running

Gradeer again will updatethe ﬁnal output ﬁles (as described in Section II-B4). Tutorscan also choose to add new checks after initial executionsof the tool to capture additional assessment requirements.III. C

ASE S TUDY

In this section, we discuss our application of

Gradeer in anend of year introductory Java programming assignment with171 students’ solutions.

A. The Assignment

The assignment required students to parse a series ofstructured input ﬁles into a provided data structure, thenimplement a set of methods that query this data. The assignmentalso required students to plot graphs using this data in a GUIusing Java’s Swing library. A primary goal of the assignmentwas to provide students with experience in working on a multi-faceted project with codependent systems, which are moreakin to real software than the simpler introductory programsused earlier in the course. As an end of year assessment, theassignment had a fairly wide span of learning outcomes. Suchlearning outcomes included the use of polymorphism, bespokedata structures, the choice and use of various Java Collections,text manipulation, GUI programming, algorithm design, andthe use of good quality code and programming style.We ﬁrst determined the overall assignment speciﬁcation,then focused on creating a model solution that captured thisspeciﬁcation. We then created a set of grading unit tests,nsuring that they were valid and that the model solution passedon each of them. Following this, we duplicated the modelsolution to create a skeleton project, from which we removedthe classes and methods that students were to implement.

B. Release

We distributed the skeleton project to students. We alsoprovided the students with a set of input data ﬁles that were tobe read by their implemented parsers. These data ﬁles were asubset of those that we later used when grading the assignment.Around a week after we released the assignment, we alsoprovided students with a set of public tests. We designedthese tests to ensure that students’ code included the basicfunctionality of the assignment. This provided students witha degree of feedback as they worked on the assignment, anddissuaded students from submitting solutions which are notcompatible with our grading environment, such as includingincorrect class names.

C. Check Conﬁguration

We conﬁgured

Gradeer to use 45 checks: •

26 test suite checks (each check executed one unit test), • six PMD checks, • six Checkstyle checks, and • seven manual checks (for GUI functionality and subjectiveaspects of code review, such as variable names).By using these checks together, we were able to use Gradeer toassess all of our learning outcomes. The manual checks wereimportant in this regard, since the design of the GUI and someaspects of code quality cannot be fully graded automatically.

D. Assessment

While

Gradeer supports the use of all types of checks ina single execution, we split the checks across two separateexecution conﬁgurations; one for automated checks and onefor manual checks. This was necessary since we anticipatedthat some solutions would be problematic, containing issuesthat would prevent compilation or execution. As such, runningmanual checks on some of these solutions would have been awaste of effort if the solutions could not be executed properly.By splitting the checks we were able to ﬁrst compile thestudents’ solutions and run the automated checks to identify anyproblematic solutions, and to assess the working solutions. Weidentiﬁed 48 problematic solutions. We repaired these solutionswhere possible so that they could still be graded with

Gradeer ,but added a penalty for doing so when post-processing thegrades. We repeated the automated grading for these repairedsolutions. However, 11 of the solutions could not be repaireddue to severe issues. We wrote individual feedback for eachof these solutions to explain the nature of these problems.Finally, we re-executed

Gradeer with only the manual checkson every working and repaired solution. Table I shows theaverage amount of time that various aspects of running theassessment with

Gradeer took for each applicable solution,alongside the time taken to manage problematic solutions. TABLE I: Average time to perform each assessment task oneach applicable solution.

Assessment Task Average Time Per Solution

Compilable Solutions

Compilation ∼ ∼ ∼ Problematic Solutions

Problem Identiﬁcation ∼ ∼ ∼

10 minutesOnce we completed grading the assignments, we performedsome post-processing on the results. In particular, we addedsome more speciﬁc feedback and adjusted the weights of someof the checks. Providing the additional feedback revealed thepossible beneﬁt of being able to add speciﬁc feedback whenrunning

Gradeer , leading us to later implement the ability toadd user entered feedback for manual checks. We also providedmore detailed and general feedback to the entire student cohortusing the distribution of solutions’ base scores for individualchecks. In addition, we used this check performance datato adjust the checks’ weights. For example, we found thatthe scores of some checks would vary considerably betweensolutions, such as a PMD check for cyclomatic complexity,for which approximately half of the solutions achieved < . .In such cases, we increased the check’s weight, as it betterdifferentiated students’ solutions. However, we attempted tomaintain similar total weights between the broader groups oflearning outcomes, such as overall correctness and code quality,to assess students in a well-rounded manner. E. Beneﬁts of Gradeer

We found that

Gradeer’s hybrid grading approach providedseveral beneﬁts when assessing this programming assignment:

1) Fast Manual Assessment: Gradeer provides a particularbeneﬁt in allowing for quick manual assessment. This ismostly due to

Gradeer’s solution inspector, which automaticallyexecutes students’ solutions, and displays their source ﬁles ina text editor. Without this feature, a tutor must manually openthe correct directory, enter a command to run the solution, andopen the source ﬁles, before beginning the manual assessment.By removing the need to follow these steps for every solution,

Gradeer removes a signiﬁcant bottleneck in manual grading.

2) Automated Grading:

By using automated grading wher-ever possible, we were able to reduce the number of manualchecks. For example, we used some static analysis checksto evaluate the style of students’ solution programs, such asensuring that they used camel case formatting in variable names.By using these checks, the tutor did not have to look for theseissues when performing the manual code inspection. Similarly,the use of unit tests to assess correctness of some elementsof the program removed the need for the tutor to identifyaults in these elements manually. The additional beneﬁt ofautomated grading is that the checks are applied consistentlyacross solutions. Any two students’ solutions which have thesame faults will be assessed the exact same way.

3) High Quality Feedback:

We found that

Gradeer wascapable of providing useful feedback to students. Whileautomated checks only provide simple feedback, the largenumber of these checks gave students a very wide range offeedback; they could gain a good understanding of where theysucceeded and where they can improve. This is supported byFalkner et al.’s ﬁndings that students’ performance improvesas more pieces of automated feedback are provided [2]. Thisfeedback is further augmented by

Gradeer’s support for manualchecks, the scores of which we used to determine which ofseveral pieces of feedback to give to a student. The ability toprovide manual feedback at runtime in the current version of

Gradeer supports this even further.

4) Reusable:

In the past, we typically used unique autograd-ing scripts for each assessment. Developing these scripts is atime consuming process, and may involve repeated effort ofimplementing similar functionality across multiple assessments.Conversely,

Gradeer can be reused in different assessments,only requiring modiﬁcations to simple conﬁguration ﬁles.

F. Challenges

When assessing the assignment, we found that uncompilablesolutions introduced the greatest time cost. Around 48 of the171 solutions initially could not be compiled or executed, dueto missing ﬁles, syntax errors, or modifying ﬁles that should beunmodiﬁed. It is possible that such problems could be mitigatedby preventing students from uploading broken solutions, suchas by integrating

Gradeer with the solution upload system, andreporting to students if an issue is detected.Running the automated checks did take a considerableamount of time, at ∼ Gradeer that we used for this assessment didnot support multithreading. After implementing multithreading,we observed an execution time of ∼ Gradeer requires less effortthan writing a unique grading script, some tutors may bedissuaded by not understanding its internal functionality.Providing tests may increase tutors’ conﬁdence in such tools. IV. R

ELATED W ORK

Some existing automated grading tools also feature modularassessment elements [16]. For example, Nexus’s assessmentcomponents implemented as Docker micro-services [17]. Web-CAT uses modular plug-ins [18], [19]. JACK and ArTEMiSboth use multiple software components that can be split acrossmultiple servers, and interchanged to support different gradingfunctionalities [20], [21]. These tools are designed to be used asscalable web services, which can be beneﬁcial for large coursesand MOOCs. Such approaches do have considerable advantages,and may allow tutors to view students’ source code, but tutorscannot run and interact with students’ solutions directly, whichlimits their ability to perform manual assessment. By contrast,

Gradeer speciﬁcally accommodates manual assessment.It is not uncommon for assessment tools to take a “semi-automatic” approach, with support for user intervention andmanual assessments alongside automated processes [22]. Web-CAT allows tutors to manually inspect students’ source code,and provide feedback or additional grades [19]. Praktomatgrants TAs the ability to provide manual feedback by addingcomments to students’ code [23]. It also allows TAs to addmanual scores for learning outcomes. JACK enables tutors toprovide manual corrections for generated grades, and manualfeedback [20]. Jackson’s grading tool displays the contents of asolution’s ﬁles before reading the user’s input to determine thescores of a series of manual assessment elements [24]. Whilethese tools have provisions for manual assessment, none ofthem automate the process of launching students’ programs fortutors to interact with them. This may be problematic, as thebottleneck of manually running each solution is still presentwhen evaluating user interaction.

Gradeer’s solution inspectorremoves this bottleneck entirely.

Gradeer also combines theresults of automated and manual checks into a single grade,without additional user intervention.V. C

ONCLUSIONS AND F UTURE W ORK

In this paper we have presented

Gradeer , a modular gradingtool to support both the automated and manual assessment ofstudents’ programs. We have also discussed our experiencesin using the tool to assess an end of year assignment for anintroductory programming course. We ﬁnd that

Gradeer caneffectively support tutors in providing quality feedback tostudents, while maintaining a low time cost of assessment.

Gradeer also provides tutors with detailed data on students’performance, which can be used to inform and improve teachingquality, future assessment design, and feedback.

Gradeer isavailable at https://github.com/ben-clegg/gradeer [12].In our future work, we will extend our evaluation of

Gradeer ,by comparing the time saved using our solution inspector versusmanually running each solution, and by surveying more endusers. We plan to improve

Gradeer , such as enhancing itsmodularity, by further separating check modules from the restof the system, and modularising other components (such as pre-checks and language-speciﬁc functionality) as well. We alsointend to add web integration to the tool, to inform studentswhen they have submitted solutions with signiﬁcant problems.

SIGCSE 2014 - Proceedings of the 45th ACMTechnical Symposium on Computer Science Education , pp. 9–14, 2014.[3] I. Albluwi, “A Closer Look at the Differences between Graders inIntroductory Computer Science Exams,”

IEEE Transactions on Education ,vol. 61, pp. 253–260, aug 2018.[4] H. Keuning, J. Jeuring, and B. Heeren, “Towards a systematic review ofautomated feedback generation for programming exercises — ExtendedVersion,” tech. rep., Utrecht University, 2016.[5] X. Liu, S. Wang, P. Wang, and D. Wu, “Automatic Grading ofProgramming Assignments: An Approach Based on Formal Semantics,” , pp. 126–137,2019.[6] D. Insa and J. Silva, “Automatic assessment of Java code,”

ComputerLanguages, Systems and Structures , vol. 53, pp. 59–72, 2018.[7] R. Singh, S. Gulwani, and A. Solar-Lezama, “Automated feedbackgeneration for introductory programming assignments,”

Proceedingsof the ACM SIGPLAN Conference on Programming Language Designand Implementation (PLDI) , vol. 48, pp. 15–26, jun 2013.[8] S. Parihar, Z. Dadachanji, P. K. Singh, R. Das, A. Karkare, andA. Bhattacharya, “Automatic Grading and Feedback using ProgramRepair for Introductory Programming Courses,” in

Annual Conferenceon Innovation and Technology in Computer Science Education, ITiCSE ,2017.[9] B. C. W¨unsche, T. Suselo, W. Van Der Mark, Z. Chen, K. C. Leung,A. Luxton-Reilly, L. Shaw, D. Dimalen, and R. Lobb, “Automaticassessment of OpenGL computer graphics assignments,” in

AnnualConference on Innovation and Technology in Computer Science Education,ITiCSE , pp. 81–86, 2018.[10] S. Sridhara, B. Hou, J. Lu, and J. DeNero, “Fuzz Testing Projects inMassive Courses,” in

Proceedings of the Third (2016) ACM Conferenceon Learning @ Scale - L@S ’16 , pp. 361–367, ACM Press, 2016. [11] A. Leite and S. A. S. A. Blanco, “Effects of human vs. automatic feedbackon students’ understanding of ai concepts and programming style,” in

Proceedings of the 51st ACM Technical Symposium on Computer ScienceEducation (SIGCSE ’20) , vol. 20, pp. 44–50, Association for ComputingMachinery, feb 2020.[12] B. S. Clegg, “Gradeer Repository.” [Online; accessed 2020-10-18]https://github.com/ben-clegg/gradeer.[13] The Apache Software Foundation, “Apache Ant.” [Online; accessed2020-10-16] https://ant.apache.org/.[14] Checkstyle, “Checkstyle.” [Online; accessed 2020-10-16]https://checkstyle.sourceforge.io/.[15] PMD, “PMD.” [Online; accessed 2020-10-16] https://pmd.github.io/.[16] S. Zschaler, S. White, K. Hodgetts, and M. Chapman, “Modularityfor Automated Assessment: A Design-Space Exploration,” in

SoftwareEngineering 18

Proceedings of the Conferenceon Integrating Technology into Computer Science Education, ITiCSE ,(New York, New York, USA), p. 328, ACM Press, 2008.[19] S. H. Edwards, “What is Web-CAT? - Web-CAT.” [Online; accessed2020-10-15] http://web-cat.org/projects/Web-CAT/WhatIsWebCat.html.[20] M. Goedicke, M. Striewe, and M. Balz, “Computer aided assessmentsand programming exercises with JACK,” tech. rep., 2008.[21] S. Krusche and A. Seitz, “ArTEMiS - An Automatic AssessmentManagement System for Interactive Learning,” in

Proceedings of the 49thACM Technical Symposium on Computer Science Education - SIGCSE’18 , vol. 2018-Janua, (New York, New York, USA), pp. 284–289, ACMPress, feb 2018.[22] D. M. Souza, K. R. Felizardo, and E. F. Barbosa, “A systematic literaturereview of assessment tools for programming assignments,”

Proceedings- 2016 IEEE 29th Conference on Software Engineering Education andTraining, CSEEandT 2016 , pp. 147–156, apr 2016.[23] J. Breitner, M. Hecker, and G. Snelting, “Der Grader Praktomat,”

Autom.Bewertung der Program. Digit. , 2017.[24] D. Jackson, “A Semi-Automated Approach to Online Assessment,” in