AAutomatized Evaluation of FormalizationExercises in Mathematics
Merlin CarlJuly 10, 2020
Learning the correct use of mathematical language frequently poses a chal-lenge for beginner students. At the same time, it is a basic skill, required bothfor understanding mathematical texts and for presenting one’s own work.In mathematical lectures and typical textbooks, this is rarely explictlydiscusses, though some offer a brief discussion, along with some formalizationexercises (see, e.g., [Ha]).In this note, we present two pieces of software that pursue the goal tosupport beginner students in learning the use of formal language.The first one, called “Math Dictations” (a word that we learned from M.Junk, who used the concept (but no automatization thereof) in introductorycourses at the university of Konstanz), challenges students to translate aproposition given in natural language, such as “the real function f is strictlyincreasing” into a quantifier formula such as ∀ x ∀ y ( x < y → f ( x ) < f ( y )). Itis similar to the formalization exercises that form part of the “MathematicalLogic Tutor” by A. Moreno (see [BM]), but goes beyond this in (i) allowingfirst-order logic rather than propositional logic and (ii) using a restrictedautomated theorem prover for evaluating solutions, so that many solutions,rather than a single one, are recognized as correct answers. After the “MathDictations” had been implemented and the first version of this article hadbeen posted, we were made aware of the fact that this kind of formalizationexercise is available in the Edukera system . The Edukera formalizationexercises work with an ATP in the background in full first-order logic andoffer various contexts for exercises, among them also real functions withinequalities. Our description of “Math Dictations” should thus not be seen as See a r X i v : . [ m a t h . L O ] J u l claim to priority of the concept, but serves as an explanation of the programand in particular the syntax of the input language. However, we point out twodifferences between the “Math Dictations” and the formalization exercises inEdukera: First, the input in Edukera leaves little room for entering non well-formed formulas, while the “Math Dictations” allow a free input. Thus, inEdukera, there is more guidance for the user, while the “Math Dictations”offer more opportunities to make mistakes. Didactically, both approachesmay well have complementary advantages and disadvantages. Second, whileEdukera only returns a “right-or-wrong”-feedback, the “Math Dictations”differentiate between (i) correct solutions, (ii) inputs that are necessary, butnot sufficient, (iii) inputs that are sufficient, but not necessary and (iv) inputsthat are neither necessary nor sufficient for the condition in question, whichmay help the user in improving a solution.The second one, which, with a bow to the legacy of J. Conway and his“Game of Life” we call “Game of Def”, has exercises that ask students togive descriptions of graphically depicted sets in a specified logical languagewith words such as “right”, “above”, “neighbour” or “equal distance”.Both programs are written in Prolog and form a part of the Diproche sys-tem, which is a proof checker for natural language proofs specifically adaptedto the area of beginner exercises. The Diproche system is built by the exam-ple of the Naproche system due to P. Koepke, B. Schroeder, M. Cramer andothers (see, e.g., [Cr1] or [CFKKSV]). The current Diproche version coversthe topics of propositional calculus, Boolean set theory, sets and functions,elementary number theory, induction proofs and axiomatic geometry. Pre-sentations of the checking mechanism and further components of Diprochecan be found in [CK] or [C]. The idea of “math dictations” is simple: The student is given a naturallanguage expression, which she or he is then to translate it to a quantifierformula. The quantifier formula is then checked for correctness. As men-tioned above, we first learned this concept from M. Junk in Konstanz.The automatization is rather straightforward: A dictation problem (Id,Nat,Formal,FreeVars)consists of an identifier Id, a natural language sentence (i.e., a string) Nat,a list of formal expressions in the internal Prolog list format Formal and alist of free variables that should occur in a solution. Here, Formal is a listof possible formalization of the sentence given in Nat. The reason we use a See, e.g., https://bitstorm.org/gameoflife/ . • Small Latin letters are used for variables and constants; both variablesand constants are terms. • Each natural number (written as a finite string of decimal digits) is aterm. • If a and b are terms and a is not a number, then a ( b ) is a term whichdescribes the application of a to b (clearly, this only makes sense when f is a function). • When a and b are terms, then a < b , a ≤ b , a > b , a ≥ b and a = b areformulas. • When φ and ψ are formulas, then so are ( φ & ψ ), ( φvψ ), ( φ − > ψ ),( φ < − > ψ ) and ∼ φ . • When φ and ψ are formulas and x is a small Latin letter, then Ax : φ and Ex : φ are formulas.All of these terms have their usualy meaning; as a convention, quantifiersrange over real numbers. This language is sufficient to express, in the realmof real numbers, statement like the following:1. Strictly between any two distinct real numbers, there is a third one.2. f is a strictly increasing function.3. f has a zero whenever g has a zero.4. f globally dominates g .5. f converges to 0.Thus, this language is already sufficient for a variety of formalizationexercises.In the program, the natural language formulation is displayed to the user,who also has a text window for entering a formula; clicking on the “check”button for the respective program, the checking is initiated and feedback isprovided.The checking works as follows: First, it is checked whether the inputis a well-formed formula in which the right free variables appear (i.e., the3ame ones that appear in the natural language formulation). If not, anerror message is displayed and no further processing takes place. Otherwise,the given expression φ is converted into an internal Prolog list format and aProlog Tableau-prover (as, e.g., described in [Fi]) is used to check, for each ψ from ψ , ..., ψ n belonging to the list Formal in the specification of the problem,whether φ → ψ and whether ψ → φ . If there are i, j ∈ { , , ..., n } such thatboth φ → ψ i and ψ j → φ can be verified, then the input is considered ascorrect and the user is congratulated for solving the problem. If there is i ∈ { , , ..., n } such that φ → ψ i , but no j ∈ { , , ..., n } with ψ j → φ , thena message is returned saying that φ is sufficient, but not necessary and thatthe input should be made more restrictive. If there is i ∈ { , , ..., n } suchthat ψ i → φ , but no j ∈ { , , ..., n } with φ → ψ j , then a message is returnedsaying that φ is necessary, but not sufficient and that the condition shouldbe loosened. If there is neither such a i nor such a j , the user is told that φ is neither sufficient nor necessary and that she or he should try again.Of course, the Tableau prover needs to be restricted in some way: First,due to the undecidablity of first-order logic, the checking might not terminate.Second, logical equivalence is a rather poor criterion for the adequacy offormalization. To take an extreme example, we should certainly not acceptthe statement of Fermat’s last theorem as a formalization of example (1)claiming the density of the real numbers, just because both are provable!In our case, propositional equivalence is accepted without restriction, butthe number of instantiations of universally quantified statements that can beused is restricted to 3. Math dictations as above only give a “‘right” or “wrong” answer, differenti-ated only by “sufficient” and “necessary”. this is of little help in refining awrong solution. it would be better if one could see what one actually defined,in contrast to what one was supposed to define. a good teacher could respondby giving examples that match the given solution but are not intended orthat are wrongly not covered by an attempted formalization. however, au-tomating this in general is quite difficult. For this reason, the “Game of Def”was designed. Unfortunately, the current version of the Tableau prover has a bug. It will be correctedsoon. This value is not chosen for any particular reason, but experience so far shows that itis sufficient for all cases attempted so far and does not yield unacceptably long runningtimes. L GD accepted by the system is asfollows: • Small latin letters denote variables and constants. • When a , b , x , y are variables or constants, then rechts( a , b ), links( a , b ),ueber( a , b ), unter( a , b ), nachbar( a , b ) and dist( a , b )=dist( x , y ) are formu-las. (The meaning of these German terms will be explained below whenwe specify the semantics.) • When φ and ψ are formulas, then ( φ & ψ ), ( φvψ ), ( φ − > ψ ), ( φ < − >ψ ) and ∼ φ are formulas. • When φ is a formula and x is a small latin letter, then Ex : φ and Ax : φ are formulas.This syntax is adhered to strictly. No omission of brackets, e.g. by priorityrules, or addition of extra brackets etc. are allowed. Though it would not bedifficult to somewhat loosen those rule, this is in line with the didactical goalof helping to get used to expressing oneself within the borders of a formalism. The somewhat odd notation for the existential and universal quantifier andthe logical junctors is due to the implementation in Prolog. An improvedinterface with a more appealing input format is certainly desirable, though itshould be kept in mind that beginners should not be expected to be familiarwith LaTeX.The semantics now works as follows: The domain on which the game isplayed is a 21 × G , with the middle marked with “ u ”. Variablesand constants refer to squares in this grid. Then: • a = b means that a and b denote the same square. • rechts( a , b ) means that the square b is somewhere to the right, but inthe same row as, a ; i.e., if one would use coordinates (which the gamesyntax does not), we would say that the x -coordinate of b is larger thanthat of a , while the y -coordinates agree. As it turns out, some of the advanced levels also raised the interest of advancedmathematicians, who took it as a kind of puzzle game. If this interest persists, looseningthe syntactic rules will be reconsidered. links( a , b ) means that the square b is somewhere to the left, but in thesame row as, a . • ueber( a , b ) means that the square b is somewhere above, but in the samecolumn as, a . • unter( a , b ) means that the square b is somewhere below, but in the samecolumn as, a . • nachbar( a , b ) means that a and b are neighbours, i.e. share exactly onecommon border line. In coordinates, that means that they have onecommon coordinate, while they differ by 1 in the other. • dist( a , b )=dist( x , y ) means that a and b lie in the same row or column,that x and y lie in the same row or column, and that the distance from a to b is the same as the distance from x to y .Junctors and quantifiers have their usual meaning; note that universaland exisential quantifiers only quantify over squares in the grid, not someinfinite extension thereof. Thus, there are squares with no right neighboursetc. Formulas that contain more than 2 nested quantifiers are accepted syn-tactically, but their semantic evaluation - which is based on an exhaustivesearch whenever nested quantifiers are involved - takes too long for all prac-tical purposes. Thus, nesting more than two quantifiers should be avoidedand is also not required for any solution.The “Game of Def” now works as follows: In each exercise, one is given animage of the grid, with some squares marked yellow. Some of the squares maybe labeled by letters, which means that those letters are constant letters thatcan be used as parameters. In addition, one is given an informal descriptionof the set Y of yellow squares in natural language (currently German). Thetask is then to write down a L GD -formula φ ( x ) with exactly one free variable x (the choice of the variable is up to the user with the only restriction thatconstant letters used in the exercise description cannot be used) such that { x ∈ G : φ ( x ) } = Y .Users can write a string into an input window and press the “check”button. If the input is not a L GD -formula or it does not have exactly onefree variable, an error message is displayed and no further processing takesplace. Otherwise, let us denote by φ ( x ) the input formula and by U φ the setdescribed by it. The system then does the following: • Squares in U φ ∩ Y are colored green. • Squares in U φ \ Y are colored red.6 Squares in Y \ U φ remain yellow.Furthermore, the user receives the following text feedback: • When Y = U φ , (s)he is congratulated that the solution is correct. • When Y (cid:40) U φ , a message is returned saying that the given conditionis necessary, but not sufficient and that further restriction should beimposed. • When U φ (cid:40) Y , a message is returned saying that the given condition issufficient, but not necessary and that it should be made more inclusive. • When none of the above cases hold, the user is told to try again.Here is an example of an exercise with the feedback as it is returned tothe user: 7he interested reader may now want to entertain her- or himself with thefollowing exercises, which are part of the current version of the system: (a) Problem1 (b) Problem 2(a) Problem3 (b) Problem 4(a) Problem5 (b) Problem 6 a) Problem7 (b) Problem 8(a) Problem9 (b) Problem 10(a) Problem11 (b) Problem 12 Further Work
Clearly, the possibilities of using automated theorem provers and truth pred-icate evaluation in supporting formalization exercises are endless. In particu-lar, it is easy to extend the syntax of the math dictation program to compriseother areas of mathematics, like number theory or geometry. Concerning theGame of Def, it would be desirable to get rid of the limited number of nestedquantifiers by improving the running time of the evaluation algorithm.There is a more general topic in the background here, which we plan totake up in future work: Namely, systematically look for theories that areboth simple in terms of model theory and complexity theory ( o -minimality,quantifier elimination and decidability (see, e.g., [Ma]) seem to be partic-ularly relevant properties) and didactically suitable in that their realm ofobjects is either known to or easy to explain to beginner students and thatthey allow for many non-trivial, but realistically solvable formalization exer-cises, preferable those with a visualizable aspect. The theories of Presburgerarithmetic and real closed fields may be suitable candidates, provided thatthe complexity issues (Presburger arithmetic has a double-exponential lowertime bound on a decision algorithm, see [FR]; however, the situation is con-siderably less bad in the case of real closed fields, see, e.g., [Gr]) turn outto be irrelevant for the intended application (simple formalization exercises).We hope for a stimulating interaction of mathematical logic (in particularmodel theory), computer science and the didactics of mathematics. References [C] M. Carl. Using Automated Theorem Provers for Mistake Diagnosis inthe Didactics of Mathematics. arXiv:2002.05083v1 (2020)[CK] M. Carl, R. Krapf. Das Diproche-System – ein automatisierter Tutorf¨ur den Einstieg ins Beweisen. submitted, (2019)[Cr1] M. Cramer. Proof-checking mathematical texts in controlled naturallanguage. PhD thesis (2013)[CFKKSV] M. Cramer, B. Fisseni, P. Koepke, D. K¨uhlwein, B. Schr¨oder andJ. Veldman. The Naproche Project – Controlled Natural Language ProofChecking of Mathematical Texts. Proceedings of the Controlled NaturalLanguage (CNL) Workshop. (2009)[BM] N. Budesca, A. Moreno. Mathematical Logic Tutor - Propositional Cal-culus. Available online: (2000)[Edukera] Edukera Homepage. [Fi] M. Fitting. First-Order Logic and Automated Theorem Proving.Springer New York (1996)[FR] M. Fischer, M. Rabin. Super-Exponential Complexity of PresburgerArithmetic. In: Caviness B.F., Johnson J.R. (eds) Quantifier Elimina-tion and Cylindrical Algebraic Decomposition. Texts and Monographs inSymbolic Computation (A Series of the Research Institute for SymbolicComputation, Johannes-Kepler-University, Linz, Austria). Springer, Vi-enna (1998)[Gr] D. Grigor’ev. Complexity of Deciding the First-Order Theory of RealClosed Fields. Journal of Soviet Mathematics, vol. 55 (1991)[Ha] R. Hammack. Book of Proof. Available online: