[PDF] Check Your (Students') Proofs-With Holes

Abstract

Cyp (Check Your Proofs) (Durner and Noschinski 2013; Traytel 2019) verifies proofs about Haskell-like programs. We extended Cyp with a pattern matcher for programs and proof terms, and a type checker. This allows to use Cyp for auto-grading exercises where the goal is to complete programs and proofs that are partially given by the instructor, as terms with holes. Since this allows holes in programs, type-checking becomes essential. Before, Cyp assumed that the program was written by a type-correct instructor, and therefore omitted type-checking of proofs. Cyp gracefully handles incomplete student submissions. It accepts holes temporarily, and checks complete subtrees fully. We present basic design decisions, make some remarks on implementation, and include example exercises from a recent course that used Cyp as part of the Leipzig Autotool auto-grading system.

Full PDF

CCheck Your (Students’) Proofs—With Holes

Dennis Renz, Sibylle Schwarz, and Johannes Waldmann

HTWK Leipzig

Abstract.

Cyp (Check Your Proofs) (Durner and Noschinski 2013; Tray-tel 2019) veriﬁes proofs about Haskell-like programs. We extended Cypwith a pattern matcher for programs and proof terms, and a type checker.This allows to use Cyp for auto-grading exercises where the goal is tocomplete programs and proofs that are partially given by the instructor,as terms with holes. Since this allows holes in programs, type-checkingbecomes essential. Before, Cyp assumed that the program was written bya type-correct instructor, and therefore omitted type-checking of proofs.Cyp gracefully handles incomplete student submissions. It accepts holestemporarily, and checks complete subtrees fully.We present basic design decisions, make some remarks on implementa-tion, and include example exercises from a recent course that used Cypas part of the Leipzig Autotool auto-grading system.

When teaching programming, it is important to underline that each (sub)programhas a speciﬁcation, and requires a proof of actually meeting it—and that the bestway to write a program is to write its proof ﬁrst.For instance, here is a programming-by-proving exercise. Assume that Peanonumbers are known: data N = Z | S NdoubleN :: N -> NdoubleN Z = ZdoubleN (S x) = S (S (doubleN x))

The topic is binary numbers now: data B = Zero | Even B | Odd B

Their semantics is given by translation to Peano numbers: value :: B -> Nvalue Zero = Zvalue (Even x) = doubleN (value x)value (Odd x) = S (doubleN (value x))

The goal is to implement the successor function for the binary representation: a r X i v : . [ c s . P L ] S e p Dennis Renz, Sibylle Schwarz, and Johannes Waldmann succB :: B -> BsuccB Zero = _succB (Even x) = _succB (Odd x) = _ and to prove it correct:

Lemma succ : forall b :: B : value (succB b) .=. S (value b)Proof by induction on b :: B...QED

The goal here is to derive the program from the speciﬁcation (lemma succ ) bywriting the proof (replacing the dots “ ... ”) and ﬁlling holes (underscores) inthe program to make the proof work.For instance, for case b = Odd x of the inductive proof, we expect the fol-lowing chain of reasoning, starting with the instantiated right-hand side of thelemma. We already use Cyp notation for a proof by rewriting:

S (value (Odd x))(by def value) .=. S (S (doubleN (value x)))(by def doubleN) .=. doubleN (S (value x))(by IH) .=. doubleN (value (succB x))(by def value) .=. value (Even (succB x)) -- P(by def succB) .=. value (succB (Odd x))

Up to point P , all steps are straightforward. The induction hypothesis is denotedby IH . From point P to the ﬁnal expression (the left-hand side of the lemma),we have the creative step: succB should have equation succB (Odd x) = Even (succB x) to make the proof work. The all too common approach of writing the programﬁrst, and the proof later, is characterized as “putting the cart before the horse”by Dijkstra [8]. Development of the proof is the horse that pulls forward thedevelopment of the program.The above example exercise is taken from the “Fortgeschrittene Program-mierung” lecture, taught by one of the authors (Schwarz) in Sommersemester2020. Student solutions for this exercise are automatically checked with our ex-tension [18,11] of the Cyp system [9,10], integrated into Leipzig Autotool [17].In the present paper, we brieﬂy present our existing auto-grader for program-ming exercises (Section 2), then summarize Cyp (Section 3). We then explainthe design of our extensions: holes (Section 4) and types (Section 5). These ex-tensions were developed in the master’s thesis [18] of one of the authors (Renz).Finally we report on our experience of using Cyp (Section 6). Some implemen-tation detail is discussed in the Appendix. heck Your (Students’) Proofs—With Holes 3 Autotool [17] is an auto-grader that started out in 1999 as a CGI script written byone of the authors (Waldmann) to assist students executing derivations in Hilbertcalculus. Since then it has grown into a system supporting diﬀerent exercisetypes. It consists of web front-ends for students and instructors, a semanticsback-end that does the actual grading, and a persistence component. Autotool’sprimary functionality is: – The instructor conﬁgures and publishes exercises. The students submit solu-tions. Autotool checks solutions semantically—as opposed to just checkingsyntactically whether some known good solution was met. This allows fordetailed error messages that still do not give away the correct solution. – Exercises can be solved during a certain time period (set by the instructor).For students to pass the exercise, they have to submit a correct solution atleast once in that period. Number of submissions is unlimited, and experi-mentation is encouraged, as students will practice reading error messages. – Students can compete for and check on an (pseudonymized) leaderboard,ordered by size of submission, or some other problem-speciﬁc parameter.This motivates the search for better or more elegant solutions. – For some exercise types, Autotool generates individual problem instances foreach student. This does not apply to the programming and proving exercisesdescribed here.We are already using Autotool for auto-grading of Haskell programming exer-cises [20]. An instance of an exercise is a program text with holes for terms. Thestudent has to ﬁll in the holes. The text contains an expression test :: Bool (without holes, of course). Autotool will check that the submission is type-correct, and that test evaluates to

True . The implementation uses the Glas-gow Haskell Compiler[19]’s API via the Mueval library [3]. For notating andevaluating tests, we use the Leancheck library [14].The introductory example can be framed as a program-and-test exercise ifwe replace the lemma with test :: Booltest = holds 1000 $ \ (x :: B) -> value (succB x) == S (value x)

Note the syntactic and semantic similarity. “lambda” takes the role of forall .This is exactly the point of property-based testing [6].For actually running the exercise, we add these declarations {-

Dennis Renz, Sibylle Schwarz, and Johannes Waldmann

We need Eq for N to evaluate the condition in the test, Generic for B to generatetest data, and Show for B to print arguments for failed tests.Type-directed generation of test cases is an impressive showcase for the powerof typeclasses in Haskell. We make a point of including the complete source codein the exercise. That way students can run this on their own machine, and canalso copy the approach to other code. This would not be possible with a hiddenimplementation of testing. We abhor hidden extra test cases even more.In the above code, it is sad that the natural declaration (x :: B) can onlybe written after enabling language extension PatternSignatures , which is dep-recated, and will be replaced by

ScopedTypeVariables . This cannot be com-prehended. Where is the type variable?

For student exercises (and elsewhere), testing is good, and widely used. Provingis better, but rarely done. There seems to be a disconnect between proving (stu-dent writes homework proof on blackboard) and “actual programming” (studentsolves programming exercise that is auto-graded).Cyp is a proof checker that veriﬁes proofs about Haskell-like programs. Itwas developed by Dominik Durner and Lars Noschinski at TU Munich [9], andlater by Dmitriy Traytel at ETH Zurich [10]. We now present the basics of Cypby example, and show our extensions in later sections.Cyp was ﬁrst mentioned 2014 in a report [2] on teaching programming withHaskell to a large audience: “We developed tool support for simple proofs byinduction, in the form of a lightweight proof assistant and will integrate it intoour homework submission system”.Why not use a full proof checker, like Agda, Coq, Isabelle, PVS? There isno publication on Cyp by the original authors, so we have to guess a little. Afull proof checker may appear to require too much investment of instructors’and students’ time—to learn techniques and technicalities that are perceived as“not programming”. Agda is perhaps closest to Haskell, but was not availablewidely at the time. Also, a full proof checker may be hard to integrate in anauto-grader.Isabelle had been used for auto-grading, e.g., with Praktomat [5] but thatrequired heavy scaﬀolding and sandboxing and even then, did not allow for fullintegration: “we currently do not check what the students actually proved, forthat we still manually check their submission, but this way any sorries or otherproblems are detected.” [4]Cyp’s design was inﬂuenced by the Isar structured proof language [21] forIsabelle [15]. The main features of Cyp are: – a program is a collection of • algebraic data types (ADT) (Haskell: data declarations), and • functions deﬁned by term rewriting system (Haskell: oriented equations) – a speciﬁcation (lemma or axiom) is an equation, heck Your (Students’) Proofs—With Holes 5 – a proof may use term rewriting, extensionality for functions, case distinction(for the constructors of an ADT), and induction (following the ADT).We give examples for Cyp’s proof language. These are sometimes incomplete butthe semantics should be obvious. Full example texts and syntax and semanticsspeciﬁcation are available in the source repository. In the following, all typingjudgments ( :: Bool , etc.) are only available (and then, required) in Cyp-Leipzig.We discuss typing in Section 5. Proof by Rewriting.

A sequence of steps, where each step connects two terms byapplying a rule that comes from a function deﬁnition, or from a lemma.

Lemma succ_eq_plus_one: forall a :: N: S a .=. plus (S Z) aProof by rewriting S a(by def plus) .=. S (plus Z a)(by def plus) .=. plus (S Z) aQED

Cyp is not fully explicit here: the student just names the function (here, plus ).Cyp will determine the actual rule to use (here, plus has two rules), and thelocation and direction of its application, via pattern-matching all rules and sub-terms. We guess that authors wanted to avoid extra syntax for locations. Weconsider introducing syntax for naming a rule and specifying direction.

Proof by Extensionality.

Add a symbolic argument to prove equality of functions.

Lemma: not . not .=. idProof by extensionality with x :: BoolShow: (not . not) x .=. id x...

Proof by Case Analysis.

A complete case analysis for the constructor in the rootof the term. Continuing the previous example,

Proof by case analysis on x :: BoolCase TrueAssume AS: x .=. TrueThen Proof ...

Proof by Induction.

Structural induction on terms. The induction hypothesishas to be written explicitly.

Lemma symdiff_sym: forall x :: N, y :: N: symdiff x y .=. symdiff y xProof by induction on x :: N generalizing y :: NCase ZFor fixed y :: NShow: symdiff Z y .=. symdiff y Z

Dennis Renz, Sibylle Schwarz, and Johannes Waldmann

Proof ... QEDCase S xFix x :: NAssume IH: forall y :: N: symdiff x y .=. symdiff y xThen for fixed y :: NShow: symdiff (S x) y .=. symdiff y (S x)Proof ... QEDQED

Notation

Proof ... QED is valid in our extension, see Section 4.

Restrictions.

Cyp is not a full proof checker, but a tool to teach equationallogic and structural induction. With that goal, it keeps both programming andproving as simple as possible, and we do not intend to change that. – Deﬁnitions are global. There are no let , or where bindings, and no case or lambda expressions. This allows for a straightforward application of rewriterules. – Propositions (called lemmas) are equations, with universal quantiﬁcationover their variables. Cyp has no propositional logic or predicate logic. We didintroduce mandatory type declarations for variables in lemmas, see Section 5,making explicit the universal quantiﬁcation. – All proof steps have to be written down. This is in contrast to interactiveproof assistants that ﬁll in proof steps, e.g., as Isabelle does with “ by auto ”.The next two restrictions could possibly be lifted, as we discuss in Section 7. – Cyp programs, like Haskell programs, have no guarantee of termination. Still,equational reasoning for such programs is “morally correct” [7]. – The semantics of pattern matching in Cyp programs is diﬀerent from theoperational semantics of Haskell. If a function f is deﬁned by several equa-tions, Haskell semantics is to try them one after another, until a matchingleft-hand side is found. In a Cyp proof by rewriting, (by def f) is correctif any of the deﬁning equations of f can be used.The following properties of Cyp allow for an easy integration into systemsfor automated grading of homework: – Proof checking is pure, in the Haskell sense: it does not involve IO. RunningCyp in an auto-grader cannot change the system state (on the disk, in thedata base), and cannot leak secrets, so it does not require sandboxing. – Proof checking is strict, in the sense of “strong”: any attempt to cheat, byleaving holes in the submission, will be detected, see Section 4.

We mentioned earlier (Section 2) that we already have auto-grading for “blueprint”programming exercises where students replace holes by expressions. For auto-grading Cyp exercises, we copy that idea, and we also allow holes in proofs. Wedescribe the resulting syntax and semantics. heck Your (Students’) Proofs—With Holes 7

Design Goals for Holes.

The syntactic function of a hole is that it marks theplace in a problem instance where the student has to ﬁll in something. See A.4for details on syntactic matching.Holes do have semantics as well: Cyp should not reject them, but accepttemporarily, by assuming the hole can be ﬁlled, and continue checking otherparts of the proof. This allows for incremental development.If holes were rejected outright, then the student would ﬁrst replace eachsuch hole with some expression (or proof) that is syntactically valid, but (mostprobably) semantically wrong. By accepting the hole, we relieve the student ofsuch rote edits.We have a similar design for our Haskell exercises: the (expression) hole isactually undefined , which GHC accepts and type-checks. (We should switch tounderscore, which recent GHCs accept with -fdefer-type-errors .) Holes for Expressions, Names of Rules, and Rewrite Steps.

An underscore ( _ )may appear instead of a sub-expression: in the right-hand side of a functiondeﬁnition, or in an equational proof. An underscore can also be used in (by _) as a placeholder for the name of a lemma, axiom, or deﬁnition to be appliedin a rewrite step. In a proof by rewriting, three dots ( ... ) denote any chain ofrewrite steps where neither rules nor expressions are speciﬁed.Expression holes have type forall a. a (as does undefined in Haskell)when typechecking. When checking the validity of a chain of equations, Cyp willassume that anything can be rewritten into a hole, and out of a hole.Any rewrite step t (by _) t will be accepted. Cyp will not check that thereis indeed a rule that can be substituted for the hole. Example 1.

This is similar to one problem we used in a lecture, see Section 6. e :: Taxiom assoc: forall x :: T, y :: T, z :: T :times x (times y z) .=. times (times x y) zaxiom neutral_right: forall x :: T : times x e .=. xaxiom neutral_left: forall x :: T : times e x .=. xaxiom square: forall x :: T : times x x .=. eLemma : forall x :: T, y :: T : times x y .=. times y xProof by rewriting times x y(by _ ) .=. times (times x y) e(by square) .=. _(by assoc) .=. _(by assoc) .=. _(by assoc) .=. times (times x (times (times y y) x)) (times y x)... .=. times y xQED

We expect the student to infer (by neutral_right) for the ﬁrst rewrite step, asno other axiom matches. Then, (by square) can only match if e is rewritten to Dennis Renz, Sibylle Schwarz, and Johannes Waldmann some square, which must be guessed to be times (times y x) (times y x) ,and a strong hint is given by the expression that has to be reached via associativ-ity. From looking at (times y y) , the student should guess that (by square) comes next, and continue with the greedy approach of always simplifying whenpossible. The exercise is solved by applying neutral_left , square , neutral_left .By formulating the exercise with the given hints, we have removed the “grouptheoretic” aspects and reduced it to a pattern matching drill, on purpose. Holes for Proofs.

In proofs, three dots ( ... ) denote any number of missing lem-mas, missing proofs (also sub-proofs) or missing cases in proofs by case analysis,and proofs by induction. In practice (see Section 6), we did not use exerciseswith missing lemmas. For proofs by case analysis and by induction, we did useheavily the option of giving one case nearly completely, and leaving the other(s)for the student to ﬁll in.

Example 2.

The case for constructor S is missing in the following: data N = Z | S Nplus :: N -> N -> Nplus Z y = yplus (S x) y = S ( plus x y )Lemma plus_Z: forall x :: N : plus x Z .=. xProof by induction on x :: NCase ZShow: plus Z Z .=. ZProof ... QED...QED When Cyp encounters a missing proof it is treated as if it were correct, muchlike Isabelle treats sorry . When a proof by case analysis or by induction has ahole for missing cases, Cyp will assume that the present cases are complete.

Originally, Cyp did not have types for programs or for proofs, except for typejudgments to determine what constructors need to be handled in proofs by caseanalysis and by induction. These judgments were checked only implicitly: ifone constructor was used, all others from the same data declaration had to behandled as well.Contrary to what the student should expect from Haskell, there was noexplicit type checking, and the notation was confusing: the type deﬁned by data List a = Nil | Cons a (List a) had to appear in the proof withoutthe type argument:

Proof by induction on List xs . This looks like medieval(i.e., pre-1.4) Java and the corresponding xs :: List is a kind-error in Haskell. heck Your (Students’) Proofs—With Holes 9

Types for programs.

Cyp did not type-check the program since it was assumedto be given by the instructor, who could ensure correct typing by extra means(divination, or GHC perhaps), and the student could not change the program.Our version of Cyp allows holes in the program that the student can ﬁll witharbitrary expressions. For sanity, and for consistency with Haskell, we type-check programs. We use the “Typing Haskell in Haskell” library [13], restrictedto Cyp’s Haskell subset (e.g., without type classes). An expression hole is treatedas _ :: forall a . a . This allows to accept programs with holes.

Types for lemmas.

Cyp did not type-check proofs since it was assumed thatnothing bad could happen when applying rules (from a program that was knownto be type-correct) to terms from goals (also type-correct).

Example 3.

This assumption, however, is wrong, as shown by the following: data U = ULemma eek: x .=. yProof by case analysis on x :: UCase UAssume AS_x: x .=. UThen Proof by case analysis on y :: UCase UAssume AS_y: y .=. UThen Proof by rewritingx(by AS_x) .=. U(by AS_y) .=. yQED QED QED

The problem is that Lemma eek is implicitly assumed to be polymorphic in x and y , while it was only proved for monomorphic x and y of type U . We can thenprove False .=. True by applying eek .We solve this problem by requiring every lemma to be explicit about thetypes of its variables, and then we type-check its proof in that context. Theproblematic proof data U = ULemma eek: forall x :: a, y :: a : x .=. yProof by case analysis on x :: U...QED will now get rejected, since the type ( U ) of x in the discriminant is diﬀerent fromthe (polymorphic) type of x in the lemma. Local names in lemmas.

Explicit types also solve the following problem: is e avariable or a constant in: axiom right_neutral : times x e .=. x In original Cyp, this e is a local variable, unless there is some globally deﬁned e , then the name will refer to that, as in: e = e ; axiom right_neutral : times x e .=. x This is inconsistent with local declarations shadowing global declarations inHaskell. We now write axiom right_neutral : forall x :: T : times x e .=. x

Since e is not declared to be local in the axiom, it is a reference to a global name. Types in equational proofs.

We check that all terms in an equational proof haveidentical type, under the typing assumptions for the free variables of the lemma.This allows to give precise and early error messages, where otherwise theerror would likely have manifested in a more obscure way, e.g., by there notbeing a ﬁtting rewrite rule.

Typed application of rules.

When a rule from a lemma or from an axiom isapplied in a proof by rewriting, Cyp determines, by syntactic matching, whatsubterm t is rewritten to what t , and under what substitution of the lemma’svariables. We use our type-checking mechanism to see if the variables’ and theterms’ types unify. Example 4.

Assume we proved eek for the correct type: data U = ULemma eek: forall x :: U, y :: U: x .=. yProof ... QED

Then the following attempt to apply eekdata Bool = False | TrueLemma contradiction: False .=. TrueProof by rewritingFalse(by eek) .=. TrueQED will get rejected, because we obtain the substitution that maps x to False and y to True for rewriting the left- to the right-hand-side using eek , and the types x :: U and

False :: Bool as well as the types of y :: U and

True :: Bool do not unify. heck Your (Students’) Proofs—With Holes 11

Cyp, with our extensions, was being used in the “Fortgeschrittene Program-mierung” (Advanced Programming) lecture for Computer Science B. Sc. stu-dents in their 4th semester. (The recent addition of typed lemmas was not avail-able.) The lecture follows [20] and was taught by one of the authors (Schwarz)in 2020. While auto-grading for Haskell exercises was used before, this was theﬁrst instance of using Cyp as well. Because of Corona lock-down, the lecturewas virtual. This made auto-grading of part of homework all the more impor-tant. (There were other homework and discussions via internet platforms, shareddocuments, and audio conferencing.)

How Cyp was used.

We installed GHC (with Leancheck) and Cyp in the com-puter pool (on Debian GNU/Linux machines) and students logged in via SSH.Students were also encouraged to install GHC and compile Cyp on their homecomputers. The instructor used the pool installation in tutorial presentationson BBB (Big Blue Button) conferences, on a server hosted by our department:share the view of one terminal window, SSH to a computer in the pool, start tmux and split the window, run an editor in one half, and show GHCi or Cypoutput in the other half.While we have integrated Cyp in Autotool, we always maintain a commandline version of Cyp with identical semantics. It can be called with one argument(a module to be checked) or two (a blueprint module, and a solution module).The recommended “IDE for Cyp” is the command while (inotifywait -q *.[bc]yp); do cyp -m ex.byp ex.cyp; done where ex.byp (blueprint for cyp) is the problem instance, with the holes, down-loaded from Autotool; and ex.cyp is the student’s working ﬁle. Each time thisﬁle is saved from inside the editor, Cyp will check it (locally). Finally, the studentuploads the solution to Autotool, where it is checked again, and the outcome isstored.

Types of exercise used.

Over 14 weeks of teaching, we posed Cyp exercises on – proof by rewriting (only), cf. Example 1 – proof by case analysis (and rewriting), e.g., Example 5. data Bool = False | Truexor :: Bool -> Bool -> Boolxor False False = False ; xor False True = Truexor True False = True ; xor True True = FalseLemma xor_sym : forall x :: Bool, y :: Bool : xor x y .=. xor y xProof by case analysis on x :: BoolCase False

Assume AX: x .=. FalseThen Proof by case analysis on y :: BoolCase FalseAssume AY: y .=. FalseThen Proof by rewritingxor x y(by AX) .=. _(by AY) .=. _(by AY) .=. _(by AX) .=. _QEDCase TrueAssume AY: y .=. TrueThen Proof by rewritingxor x y... .=. xor y xQEDQED...QED – proofs by induction on . . . • Peano numbers, cf. Example 2, and another on associativity and commu-tativity of addition, with implementation of addition given, and assuming axiom plus_Z: plus x Z .=. xaxiom plus_S: plus x ( S y ) .=. S ( plus x y ) • singly linked lists: len ( append xs ys ) .=. plus ( len xs ) ( len ys ) • binary trees, prove mirror (mirror t) .=. t • a representation of binary numbers, see Section 1. Statistical data on auto-graded exercises.

In total, 39 mandatory problems wereassigned in Autotool, twelve of them contained proofs in Cyp and 15 were Haskellprograms with holes.Two problems were given with complete solutions beforehand. They werealso used by the instructor during an online session to demonstrate the usage ofthe Cyp installation in our computer pools.Most of the problems require just one proof, some up to three. In problemsthat contain more than one proof, the ﬁrst proof was sometimes given with anear-solution and the others without further hints.Two problems required proofs by rewriting, three by case analysis, and sevenby induction on Peano numbers, binary numbers, lists, or trees.Near-solutions containing only _ holes for expressions were provided for threeproofs. The other problems contained ... holes for whole proofs. In around halfof them, the proof method (rewriting, case analysis, induction) was required.Some problems were given in two versions. On the one hand, the studentshave to ﬁll holes for expressions in a Haskell program to satisfy the speciﬁcation heck Your (Students’) Proofs—With Holes 13 that is tested by Leancheck. On the other hand, they have to prove in Cypthat their code satisﬁes this speciﬁcation. In these cases, the solution to the ﬁrstproblem should not be given in the setting of the Cyp program. Therefore, weuse _ holes in programs in the Cyp problem in addition to the holes in proofs. Student performance and evaluation.

Among the 95 participants of the lecture,71 solved at least 19 Autotool problems correctly. Out of these 71, all solved atleast two Cyp problems and most solved at least half of all Cyp problems.In addition to Autotool, students had to solve exercise sheets and discussedtheir solutions online. Some of these problems also contained proofs. We observedthat many student solutions were given in Cyp format and had been checked byCyp, even if we did not explicitly require this.During the last lecture, the instructor asked the 43 participants via an onlinepoll:

Did Cyp exercises in Autotool help you to understand how to write proofs?

Out of 28 answers, 27 said yes and one said no.Therefore, we now consider using Cyp examples and exercises earlier in thecurriculum, maybe even before a thorough introduction to the Haskell program-ming language.

We discuss two topics for further development of Cyp.

Operational Semantics of Cyp Programs.

An interesting connection is that whilewe want to replace a property-based test by a proof of the property, the IsabelleIDE automatically can derive and execute such a test from a lemma.Could we do the same for Cyp, as the example from the introduction sug-gests? We can certainly add deriving (Eq, Show, Generic) to every data declaration, and mechanically generate a testable property from a lemma.But what is the relation between lemmas that are “operationally true” inHaskell and lemmas that we can prove? Cyp will accept f :: Bool -> Bool ; f False = False ; f False = TrueLemma False .=. TrueProof by rewriting False(by def f) .=. f False(by def f) .=. TrueQED but the second proof step does not have an operational equivalent in Haskell.We could enforce operational semantics of top-down matching. This wouldcomplicate rule application in equational proofs, as we have to check that earlierrules don’t match. Given this function declaration f (S x) y = _f x (S y) = _ we cannot rewrite the expression f a (S b) (in some equational proof) witheither rule, since a needs to be elaborated ﬁrst. Cyp could rewrite patternsby removing cases that are covered earlier. The second rule then comes out as f Z (S y) . A compiler will do this [1] when generating code for an abstractmachine, but Cyp generates code that must be used by students.It seems better to avoid this problem by enforcing that patterns of eachfunction deﬁnition are disjoint. Students can get the same behaviour from GHCvia -Woverlapping-patterns . Termination.

Totality of functions is a subject of teaching. Cyp’s functionsare not total. (The previous paragraph shows that they aren’t even functions.)One reason for non-totality is incomplete patterns in function deﬁnitions, whichwe could check, as GHC already does with -Wincomplete-patterns . Anotherreason is non-termination, coming from recursion, which is unrestricted both inHaskell and in Cyp. – We could prohibit recursion altogether. This is a drastic measure, and thenwe lose the motivation for induction. It still can be useful, e.g., to force thestudent to implement a speciﬁcation by using fold (with given axioms), andnot by ad-hoc recursion. – We could restrict recursion to those cases where if a well-founded monotoneorder of function calls can be determined by Cyp, or is provided by thestudent.The Cyp-style solution would certainly be to make this simple (no complicatedorder, just the subterm relation) and explicit (annotating the decreasing argu-ment).

Conclusion.

We extended the Cyp proof checker with a pattern matcher for pro-gram and proof terms, and with a type checker. This allows to auto-grade studentexercises on completing implementations and proofs of Haskell-like programs.Cyp can be used stand-alone (on the command line), and in our auto-gradingsystem Autotool. We used Cyp successfully in a lecture.

References

1. Lennart Augustsson. Compiling Pattern Matching. In Jean-Pierre Jouannaud, ed-itor,

Functional Programming Languages and Computer Architecture, FPCA 1985,Nancy, France, September 16-19, 1985, Proceedings , volume 201 of

Lecture Notesin Computer Science , pages 368–381. Springer, 1985.2. Jasmin Christian Blanchette, Lars Hupel, Tobias Nipkow, Lars Noschinski, andDmitriy Traytel. Experience report: the next 1100 Haskell programmers. In WouterSwierstra, editor,

Proceedings of the 2014 ACM SIGPLAN symposium on Haskell,Gothenburg, Sweden, September 4-5, 2014 , pages 25–30. ACM, 2014.3. Gwern Branwen. mueval: Safely evaluate pure Haskell expressions. https://hackage.haskell.org/package/mueval , 2008.heck Your (Students’) Proofs—With Holes 154. Joachim Breitner. Re: [isabelle] automatically grade Isabelle home-work? https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2016-July/msg00035.html , 2016.5. Joachim Breitner, Martin Hecker, and Gregor Snelting. Der Grader Praktomat. InOliver J. Bott, Peter Fricke, Uta Priss, and Michael Striewe, editors,

AutomatisierteBewertung in der Programmierausbildung , number 6 in Digitale Medien in derHochschullehre, pages 159–172. Waxmann Verlag GmbH, 2017.6. Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testingof Haskell programs. In Martin Odersky and Philip Wadler, editors,

Proceedings ofthe Fifth ACM SIGPLAN International Conference on Functional Programming(ICFP ’00), Montreal, Canada, September 18-21, 2000 , pages 268–279. ACM, 2000.7. Nils Anders Danielsson, John Hughes, Patrik Jansson, and Jeremy Gibbons. Fastand loose reasoning is morally correct. In J. Gregory Morrisett and Simon L. Pey-ton Jones, editors,

Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages, POPL 2006, Charleston, South Car-olina, USA, January 11-13, 2006 , pages 206–217. ACM, 2006.8. Edsger W. Dijkstra. Answers to questions from students of Software Engineering. , November 2000.9. Dominik Durner and Lars Noschinski. Cyp - Checker for ”morally correct” induc-tion proofs about Haskell programs. https://github.com/noschinl/cyp , 2013.10. Dominik Durner, Lars Noschinski, and Dmitriy Traytel. Cyp - Check Your Proof. https://github.com/dtraytel/cyp , 2019.11. Dominik Durner, Lars Noschinski, Dmitriy Traytel, Dennis Renz, and JohannesWaldmann. Cyp - Check Your Proof. https://gitlab.imn.htwk-leipzig.de/waldmann/cyp , 2020.12. Bertram Felgenhauer. haskell-src-exts-simple: A simpliﬁed view on the haskell-src-exts AST. https://hackage.haskell.org/package/haskell-src-exts-simple ,2016.13. Mark P. Jones. Typing Haskell in Haskell. https://web.cecs.pdx.edu/~mpj/thih/ , 2000.14. Rudy Matela. Leancheck: Enumerative property-based testing. https://hackage.haskell.org/package/leancheck , 2016.15. Tobias Nipkow. Term Rewriting and Beyond - Theorem Proving in Isabelle.

FormalAsp. Comput. , 1(4):320–338, 1989.16. Matthew Pickering, Gergo ´Erdi, Simon Peyton Jones, and Richard A. Eisenberg.Pattern Synonyms. In Geoﬀrey Mainland, editor,

Proceedings of the 9th Interna-tional Symposium on Haskell, Haskell 2016, Nara, Japan, September 22-23, 2016 ,pages 80–91. ACM, 2016.17. Mirko Rahn, Alf Richter, and Johannes Waldmann. The Leipzig Autotool sytemfor automatic grading student homework in (theoretical) computer science. https://gitlab.imn.htwk-leipzig.de/autotool/all0 , 2000.18. Dennis Renz.

Erweiterung eines Proof Checkers und Integration in ein E-Learning-System . Master’s thesis, HTWK Leipzig, 2020.19. Various. The Glasgow Haskell Compiler. , 2002–2020.20. Johannes Waldmann. How I Teach Functional Programming. In

WFLP, Wrzburg ,2017.21. Markus Wenzel and Lawrence C. Paulson. Isabelle/Isar. In Freek Wiedijk, editor,

The Seventeen Provers of the World, Foreword by Dana S. Scott , volume 3600 of

Lecture Notes in Computer Science , pages 41–49. Springer, 2006.6 Dennis Renz, Sibylle Schwarz, and Johannes Waldmann

A Technicalities

We comment on some design decision in the implementations of Cyp. Since thereis no formal publication on the original, we can only make educated guesses. Westill present it, as an exercise in understanding design decisions in real-life Haskellsoftware, and also as basic documentation for some changes that we executed.In very general terms, the design of a proof checker, or any other static ana-lyzer, is: “parse, then analyze, then print result or error message”. The semanticsof matching, type checking, and proof checking are well understood. The resultis just the empty tuple () , as we only want to know whether the input wasaccepted. But some details of parsing in Cyp are interesting, and so is printingof error messages. A.1 Parsing

Since Cyp is about proving properties of programs in (a subset of) Haskell, theauthors decided (we guess) to use a full Haskell parser [12] for theories, and ahand-written (combinatorial) parser for proofs (only).Using the full Haskell parser avoids duplication of work, and ensures thatCyp programs are syntactically valid Haskell. But it also creates problems: – Error messages from the full Haskell parser sometimes refer to concepts thatare (syntactically) forbidden in Cyp. This confuses the student. – These parts are not completely separated: (Haskell) expressions appear inequational proofs. – Mixing parsers is hard, since this full Haskell parser is not composable.We expand the last item:We expect a parser to have type

String -> m (result,String) , returningthe result, and the part of the input that was not consumed. These parsers canbe composed. But the full Haskell parser is not composable (probably becauseit is based on a shift/reduce model). It wants to completely read a string thatcontains a valid expression (or declaration, etc.) so we have to trim the stringon the outside.Cyp uses a simple algorithm for trimming: each (Haskell) expression extendsto end-of line, or to .=. in a lemma. This makes the implementation look awk-ward, as it contains a lot of actual string processing with jumping back and forthlooking for separator symbols \n and .=. We decided to rip out the full Haskell parser, and replace it with a hand-made combinatorial parser for the “Cyp subset of Haskell”. This is not muchextra work (the subset is small) and it gives us uniform, composable parserswith better error messages, and reduced compile time, as haskell-src-exts ,the basis of haskell-src-exts-simple , is hard for GHC.For backward compatibility, we still use “expression to end-of-line” syntax,and we implemented combinators heck Your (Students’) Proofs—With Holes 17 to_eol :: Parser a -> Parser ato_eol p = inline (p <* eof)inline :: Parser a -> Parser ainline p = dosrc <- getInputlet (pre, post) = span (\c -> not $ elem c "\n\r") srcsetInput prex <- ppre’ <- getInput ; setInput $ pre’ <> post ; whiteSpacereturn x with typical usage eqnSeqFromList <$> to_eol eterm<*> ( many $ (,) <$> link <* reservedOp ".=." <*> to_eol eterm )

The move away from haskell-src-exts(-simple) was also suggested bythe fact that haskell-src-exts is “on life support”. Maintenance was deemedtoo costly, as the library duplicates functionality that is already in GHC. Instead,GHC’s Haskell parser is available in a library ( ghc-lib-parser ). Switching Cypto that would also create work.

A.2 Printing

Another aspect of syntactical processing is (pretty) printing of parts of the input,in error messages.In the naive approach, a parser produces an AST (abstract syntax tree).Later, if some semantic analysis detects some error in some subtree, this subtreeshould be printed. But if the tree is truly abstract, we forgot how it actuallylooked in the input! So the pretty-printer needs to do some work, e.g., to intro-duce parentheses according to precedence and associativity of operators. Indeed,Cyp has all these pretty-printers (called “unparsers”) for ASTs.There is an alternative: we keep enough information in the AST nodes sothat we can always show them in their original form, and never need to computea “pretty” layout. It is, in fact, confusing for the student if Cyp (or any compiler)complains about code that is not recognizable as part of the source.We could attach the actual input slice to each AST node. But it is customaryto keep the full input elsewhere, and attach location (range) information of type

Span to AST nodes, by implementing the

HasSpan type class: import Text.Parsecdata Span = Span SourcePos SourcePosclass HasSpan a wheremspan :: a -> Maybe Spanset_span :: Maybe Span -> a -> a

In the parser, we typically have code like prop :: Parser RawPropprop = with_span $ Prop <$> term <*> ( reservedOp ".=." *> term ) using a function with_span :: HasSpan a => Parser a -> Parser awith_span p = dostart <- getPosition ; res <- p ; end <- getPositionreturn $ set_span (Just $ Span start end) res

Then the pretty-printer can exactly re-create the input. But then anotherproblem appears: the printer needs access to the input. This is solved by our(re)design of the “Error monad” where all semantical analysis take place: it isenhanced by a Reader monad for the input string. This monad originally had afunction err :: Doc -> Err a to show an error message and stop processing. We can now use a version withsource spans (extracted from ASTs) errs :: [Span] -> Doc -> Err a and the corresponding source slices will be rendered with the message.

A.3 Hiding Annotations by Pattern Synonyms createdautomatically

We now need to carry around the source spans (extra syntactical information)without destroying legacy code (for semantics):For example, Cyp has this type (not all constructors shown) data AbsTerm a= TermHole| Application (AbsTerm a) (AbsTerm a)| Const Name| ...

With source spans, it would be type AbsTerm a = AbsTermP (Maybe Span) adata AbsTermP p a= TermHoleP p| ApplicationP p (AbsTerm p a) (AbsTerm p a)| ConstP p Name| ... but now all pattern matches on

AbsTerm a would be broken by the extra argu-ment. Perhaps we should have written each constructor with named arguments,but we’d have to touch all code, and the result would be questionable, since heck Your (Students’) Proofs—With Holes 19 constructor application would then look much less like the function applicationthat it really is.GHC Haskell has “pattern synonyms” [16] to solve exactly this problem: Wecan write pattern Application :: AbsTerm a -> AbsTerm a -> AbsTerm apattern Application f a <- ApplicationP _ f awhere Application = ApplicationP Nothing and similarly for the other constructors. In fact, haskell-src-exts has an ASTtype like

AbsTermP (with source span), and haskell-src-exts-simple providespattern synonyms without.We are using exactly that idea, made easier by the following tool: for all ASTtypes (Terms, Types, Proofs), we derive the annotated type, and the synonyms,from the “plain” deﬁnition (original

AbsTerm in the example) by a TemplateHaskell program (it creates extra source code, in memory, during compilation,resulting in actual object code). This creates the interesting situation that Cypnow has modules (containing the original AST deﬁnitions) that only need to becompiled so that they can be analyzed when Template Haskell runs, but thatwill not be part of the executable.In all, this design allows to keep intact all Cyp code that pattern-matches onASTs. Still the extra source-span information is available when we need it (notby pattern matching, but by calling mspan ). A.4 Generic Traversal for Matching Holes

We mention another example of generic programming in Cyp: the matching of astudent’s submission with an instructor’s blueprint (with the holes) is done bya generic traversal. The basic idea for matching term t against blueprint b is: if b is a hole, we accept; and if b is not a hole, we check that the roots of b and t are identical, and we match corresponding children recursively.While original Cyp had type-speciﬁc matchers (and pretty-printers), we usethe capabilities of Data.Data for run-time reﬂection on (names of) constructors.We still use Haskell’s static typing, so we can be sure that b and t are of thesame type always, and have the same number of children when we recurse.We need to adapt this basic idea for (proof) holes that may be replaced bya list of things. If we allow this in full generality, we need to decide, for example [ ... , b1, ..., b2, ... ] matches [ t1, t2, t3, t4, t5, t6 ] Now, b matches t i for what i ? This is similar to the longest common subsequenceproblem, and can be solved similarly, in polynomial time, but it requires search.If the search fails, what should we print as an error message? So we are inclinedto severely restrict multi-holes in lists, allowing them to occur at most once.Our implementation uses code originally written by Bertram Felgenhauer forAutotool’s Haskell Blueprint exercise. It uses a monad M for combining resultsfrom comparing subtrees. A computation in M a can succeed, fail, or decline.

Failure is ﬁnal (it ends all processing). Declination leads to the next alternative(of msum in the following) to be tried. There are two essential combinators: – sequential composition ( (>>=) ): if left-hand side succeeds, evaluate right-hand side – alternative composition ( mplus ): if left-hand side declines, evaluate right-hand side.The monad also holds a state of the pair of last locations that we have seen(from the root). This is useful: not every node has location info, and for them,we will use the source span of the parent node.The main function is match :: forall a b . (Data a, Data b) => a -> b -> Match bmatch x y = domsum[ processSpanSub x y..., matchList @ParseLemma (cast x) (cast y), matchHole @RawTerm (cast x) (cast y), matchHole @ParseProof (cast x) (cast y)..., if toConstr x == toConstr ythen void $ gzipWithM’ match x y else continue, failLoc $ "abstract syntax sub-trees of type "<> (text $ dataTypeName $ dataTypeOf x)<> " have different roots: "<> (text $ show $ toConstr x) <> ", " <> (text $ show $ toConstr y)]return y Here, we are using generic helper functions ( matchHole ) instantiated at certaintypes ( @RawTerm@RawTerm