Practical Online Assessment of Mathematical Proof
PPractical Online Assessment of Mathematical Proof
Bickerton, R. and Sangwin, C.J.April 2020
Abstract
We discuss a practical method for assessing mathematical proof online. We examine the use of faded workedexamples and reading comprehension questions to understand proof. By breaking down a given proof, we formulatea checklist that can be used to generate comprehension questions which can be assessed automatically online. Wethen provide some preliminary results of deploying such questions.
Mathematical proof is central to the discipline of mathematics, indeed it is a hallmark which differentiates mathematicsfrom other subjects. Mathematical proof is also very difficult to learn. At the start of a university course, where proofbecomes more complex and more important, students are commonly taught through an explicit introduction to proof course. A typical mode of teaching is to use lectures to introduce particular types of proof, and then have studentssolve problems and prove conjectures of a similar type. Ultimately, students will be expected to select the type ofproof, and correctly write a complete proof of a conjecture which will be assessed.Assessment is a very expensive use of staff time and it is increasingly common for universities to use online assessmentsystems, particularly to support methods-based parts of the syllabus [23]. Online assessment of complete arguments,including proof, is much more difficult than assessment of a final answer. That said, it is not clear that students arebest-served by the traditional mode of teaching, particularly when regular and detailed feedback on their written workis not available. Instead, we are investigating alternatives. In particular, we consider the following practical approachesto developing online assessments designed to support formative assessments in proof-based courses.1. Faded worked examples.2. Explicit assessment of separated concerns .3. Reading comprehension.The last topic is the largest, with a number of separate aspects. These include (i) preparation for a proof, (ii) readingcomprehension of a particular proof, and (iii) proof followup. This, we believe, will better serve students than theexisting common practice of demonstrating a whole proof and then expecting students to imitate something verysimilar.Prior research of traditional assessment has generated useful insight into students’ understanding of proof. [3] sug-gests students’ problems include content knowledge [16]; overall strategy [28] and an over-reliance on inappropriateargument forms [4]. Reading a proof, pre-se , is not an active process. Learning needs carefully structured, consciousactivity with effort. Proof comprehension tasks provide such activity and are likely to help students focus their atten-tion. [5] found that students who were trained to provide self-explanations performed better on a proof comprehensiontest than those who were not. However, [5] suggested that making key ideas visible, e.g. with explicit layout, is not asimportant as having students engage with a proof to make sure important aspects in the proof become clear to them. [5]1 a r X i v : . [ m a t h . HO ] J un as shown that the self-explanation strategy substantially improves students’ comprehension of mathematical proofs.Asking students for warrants, missing steps, appears to require very similar activities on the part of the student.Previous research that compared the performance of experts and non-experts has revealed some counter-intuitiveresults. For example, the expertise reversal effect refers to the finding that instruction techniques can have oppositelevels of effectiveness with expert and non-expert learners [8, 9]. Related is the worked-example effect , which refersto the observation that novices benefit more from studying worked-examples than from independent problem-solving,[22]. Both of these findings have implications for how we choose to introduce students to the complex task of writingmathematical proofs.This is a practical paper, and so we have to consider the tools we have available to us to assess students’ answersonline. Our ultimate goal is to help students write a complete proof of a conjecture. We cannot currently automaticallyassess students’ free-form proof, indeed this looks like a particularly difficult task which will remain unsolved in theforeseeable future. That said, we give a strong preference, where possible, to asking students to provide an answer oftheir own rather than using multiple choice questions (MCQ) or similar question types. Some issues associated withthe differences between multiple choice and constructed response were discussed detail in [25].What tools do we have available to us? We assume we have an online assessment system which is capable of acceptingalgebraic expressions from a student as an answer and establishing objective mathematical properties of those answers.Such a system is commonplace [23]. We also assume we are able to automatically assess algebraic derivations, suchas reasoning by equivalence . This technology is less common than accepting a final answer, but is increasingly beingused in calculus and algebra courses. A specific example is given in Figure 2 in which the algebra of the inductionstep is entered free-form by the student and automatically assessed. For more details of this technology see [24].Multiple choice questions will certainly have their place in online assessment associated with proof, but they are verydifficult to write and normally require testing and refining. We are aware of software which will assess students’short answers, typically consisting of single sentences [1]. We have not, yet, used the “pattern match” question typedeveloped by [1], but the kinds of questions we would like to ask might be automatically assessed with this technology.Note that the pattern match question type needs a data set, and requires expertise from staff to implement patterns tobe used. So, these questions may prove to be as “expensive” to develop as MCQ. According to [22] a “worked example” consists of three components: a problem formulation, the solution steps, andthe final solution itself.
Classic faded worked examples refers to a progressive sequence of worked examples in whichsteps within a worked example are systematically removed, requiring students to take increasing responsibility forcompleting the problem. The use of faded worked examples is a form of scaffolding, and there are many choices ofwhat can be faded. For example, removing steps from the end of the problem, i.e. first removing the last step, has beenfound to be most favourable for learning [10]. For supporting cognitive skill acquisition in well-structured domains,[22] found that it is useful to use classic faded worked examples before starting to solve problems independently. Thisis particularly useful where there is a “model worked solution” which a student is expected to learn.This technique can be applied when teaching particular types of proof, e.g. when teaching proof by induction. How-ever, this form of scaffolding intentionally removes much of the decision making from students. That is to say, thedecision to provide a template removes the responsibility from the student to decide to use a proof by induction. Thiswill reduce the cognitive load on a student, which might prove very helpful in an early formative situation.Figure 1 shows an induction question in the STACK online assessment system, in which there is a lot of scaffolding.The left hand figure shows the blank question initially presented to the student. Note that one of the input boxescontains a “syntax hint” to reduce the difficulty of turning sigma notation such as (cid:80) nk =1 k into a linear typed expres-sion sum(kˆ2,k,1,n) . The right hand figure shows a completed student response, without any feedback aboutcorrectness. Typed expressions are displayed to the student in traditional notation, before they submit any response.Here, almost all decisions have been made for the student, and the structure of the proof is essentially complete. In this2igure 1: An induction problem with scaffolding, blank (left) and with a student’s answers (right)case the student is only responsible for some of the algebraic steps, and for justifying one step with a multiple choiceoption. It is, within current online assessment technology, possible to assess line-by-line algebraic working. For thistype of proof by induction, a student needs to know that they should be aiming to algebraically manipulate the formulato make it clear we have the right hand side of P ( n + 1) when written as a function of n + 1 . In Figure 2, the studentcan take any algebraic steps they please, and the software will establish (i) algebraic equivalence of adjacent steps, and(ii) that the last step is written in the correct form needed to show P ( n + 1) really does hold. While the ultimate goalof most university proof courses would be to have the student write the whole proof, we believe there is real merit inpractice in more structured situations for novice students. A mathematical proof is typically a rather complex mix of formal statements, definitions, logical relationships andcalculations. In a faded worked example, the worked solution is provided which students are expected to completethis. An alternative is to provide explicit practice of specific concerns which will very shortly occur in a proof. Muchof this practice can be done online, even when students will be writing proofs on paper in a traditional way.For example, novice students typically struggle with (cid:80) -notation in fundamentally mathematical ways. More specifi-cally, students confuse and confound (cid:80) nk =1 n with (cid:80) nk =1 k , or they do not notice the differences between n (cid:88) k =1 a k , n (cid:88) k =0 a k , n +1 (cid:88) k =1 a k and n +1 (cid:88) k =0 a k , where the action takes place in the limits of the summation. A further problem is a general confusion of the status oflocal and global variables, and a lack of confidence over whether (cid:80) nk =1 a k = (cid:80) nm =1 a m = (cid:80) n − m =0 a m +1 .Other examples of specific concerns which can be separated include the following. • Negation of logical statements, including quantified statements, e.g. those with ∀ or ∃ .3igure 2: An induction problem in which students are responsible for the algebraic decision making • Re-writing expressions to make n + 1 the variable in proof by induction, including just writing P ( n + 1) . • Algebraic manipulation of expressions with inequalities and absolute values in preparation for an (cid:15)/δ -argumentsin analysis. “Secret working”, e.g. reverse engineering the (cid:15)/δ -argument.An alternative to using extensive scaffolding in examples, such as those shown in Figure 1, is to explicitly separateconcerns . Separating concerns refers to explicitly identifying, teaching and assessing specific topics in relative iso-lation in anticipation of their immanent need. Indeed, [18] found that mathematics did not appear to be the mostdominant factor affecting reading comprehension. Instead, the use of symbols was more relevant, suggesting explicitattention to symbolism should be given particularly when any new symbolism is introduced.An obvious danger with separated concerns is the lack of obvious motivation for a topic, or particular type of calcula-tion. Indeed, one difficulty is knowing which concerns to separate in preparation for a particular topic.
Having names for things is important, especially when talking about them. For example, the word “ansatz” is widelyused as “an educated guess or an additional assumption made to help solve a problem, and which is later verified to bepart of the solution by its results” (Wikipedia). Many simple proofs by contradiction could be a contrapositive instead,see [11] for a discussion. Having a word “contrapositive” is very helpful when discussing such proofs, and explainingto students the difference between contradicting the hypothesis, and a general external contradiction such as .Mathematical theorems can be divided into two classes: specific and general. Specific theorems concern one object,or a unique situation. For example, the following classic proofs are all specific: (i) there are infinitely many primenumbers, (ii) the real numbers are uncountable, and (iii) √ (cid:54)∈ Q . General theorems have hypotheses which a rangeof examples do/do not satisfy, e.g. Theorem 4.1. If ( a n ) is a bounded and increasing sequence then lim n →∞ a n exists. p , · · · , p n and thenconsider n = p × · · · × p n + 1 . We then establish properties of n which lead to a contradiction. There does not appearto be a commonly used general name for such objects. In this paper will use the word gadget , or proof-gadget foremphasis, for a particular object constructed as a device within a proof, built to establish certain conditions must hold.The word gadget is often used to refer to a device, e.g. mechanical or electrical, with ingenious, novel and practicalaspects.Similarly, some methods and techniques make use of what [19] termed facilitator objects . For example, in an (cid:15)/δ -argument the goal is to show that ∀ (cid:15) > ∃ δ : · · · . Essentially, the proof must start “Let (cid:15) > and take δ = f ( (cid:15) ) , then... ”. We consider defining the facilitator object f in δ = f ( (cid:15) ) to be a particular form of proof-gadget, worthy of theseparate name. While the proof is typically presented in a finished form, to write the proof the function f has to befound by reverse-engineering, using hidden working . For proofs and techniques which make use of hidden workingthe teacher could choose to separate this concern as an explicit exercise, prior to the formal proof being written.It is common to talk about steps in the proof in a somewhat loose sense. A mathematical proof is written as an orderedsequence of statements. English sentences contains one or more statements, and statements can be highly abbreviatedby containing mathematical symbols. For example, the symbol = was consciously introduced as a synonym “to avoidthe tedious repetition of the words ‘is equals to’.” [21]. Nearly all modern notation abbreviates, and nearly all notationcan be read as part of a sentence. This does not help us understand what a mathematical “step” might be.A commonly used model for general arguments was devised by Toulmin [27] and used in mathematics education [7].Toulmin’s scheme has six components. Data (D) is evidence on which the claim is based. The conclusion (C) is theactual claim which is being put forward. The warrant (W) gives the justification for deriving the conclusion fromthe data. The backing (B) are “other assurances, without which the warrants themselves would posses[sic] neitherauthority nor currency” [27, p. 96]. A qualifier (Q) gives the degree of confidence in the claim. Lastly, a rebuttal(R) gives circumstances in which the claim might not hold. These items are arranged in what Toulmin calls the “argumentation pattern” shown in Figure 3.Toulmin’s scheme is not applied to a whole proof, and we do not propose to apply the scheme only to adjacentstatements within a proof. Rather we acknowledge that a typical mathematical proof is a nested recursive structure.Toulmin’s scheme will be applied both to adjacent statements and to “blocks” which make up the internal structure ofthe proof. Consider the following theorem, and its proof. Theorem 4.2. If a + b √ c + d √ and a, b, c, d ∈ Q then a = c and b = d . Proof.
Suppose (for a contradiction) that b (cid:54) = d . If a + b √ c + d √ then, rearranging, we have ( a − c ) = ( d − b ) √ .Dividing gives √ a − cd − b ∈ Q . But [as previously proved] √ (cid:54)∈ Q . This is a contradiction, so b = d . Then setting b = d in a + b √ c + d √ it follows a = c . 5he following proof has a more structured presentation. Proof.
Assume a + b √ c + d √ and a, b, c, d ∈ Q . Then a + b √ c + d √ ⇔ ( a − c ) = ( d − b ) √ .
1. If b (cid:54) = d then √ a − cd − b . Note that since a, b, c, d ∈ Q it follows a − cd − b ∈ Q . But [as previously proved] √ (cid:54)∈ Q .This contradicts the assumption b (cid:54) = d .2. If b = d then ( a − c ) = 0 , i.e. a = c , and the theorem holds.The only case which holds is b = d and so a = c .Notice the second proof has the following nested structure.Equivalence reasoning.Cases: • b (cid:54) = d : Contradiction. • b = d : Direct proof.The proof contains some direct reasoning by equivalence before exhaustive cases on the equality of b and d . The case b (cid:54) = d contains a contradiction, so cannot occur, leaving only the case b = d . However, the contradiction is to thehypothesis of the sub-proof b (cid:54) = d . An alternative proof by contradiction of the overall conclusion needs the hypothesis“ a (cid:54) = c or b (cid:54) = d ” and still requires exhaustive cases.There is still considerable philosophical discussion about the nature of legitimate mathematical arguments, and howto analyse proof. We will use the phrases “data”, “conclusion” and “warrant” as in the Toulmin scheme. Warrantswill apply to adjacent statements, e.g. explicitly using a re-write rule such a = b ⇔ a + x = b + x when reasoningby equivalence. Warrants will apply more generally to statements within a proof, e.g. using properties of a definition.Backing will refer to more general properties external to a proof, e.g. the legitimacy of a type of proof, such as theequivalence of a statement and its contrapositive, or to statements it would be inappropriate to justify in detail in thiscontext, such as that every integer is either odd or even. Note that the qualifier and rebuttal are often omitted whendiscussing formal mathematical arguments, see [7]. Traditional assessment requires students to write proofs of their own, and it appears to us that it is only comparativelyrarely that students are asked reading comprehension questions. In response [14] identified the following seven aspectsof proof comprehension. The first three are local and the last four consider global concerns.1.
Logical status of statements and proof framework.
The phrase proof framework was also used by [26].2.
Meaning of terms and statements within the proof.
This specifically includes understanding, and being able touse, mathematical definitions. [16].3.
Justification of claims. Summarizing via high-level ideas. Identifying the modular structure.
Mathematical proof often has a recursive structure, with local argumentswithin a larger global argument.6.
Transferring the general ideas or methods to another context. Illustrating with examples.
However, these all refer to reading comprehension of a particular proof. In practice, we suspect that a slightly broaderview is needed. Indeed, this is acknowledged in the headings
Transferring the general ideas or methods to anothercontext which looks outside the current proof.[29] investigated reading strategies that students can use to productively engage with a proof. Derived from interviewswith students he reports the following strategies.1a Understanding the theorem by rephrasing it in one’s own words1b Understanding the theorem by expressing it in logical notation2 Trying to prove a theorem before reading its proof3 Considering the proof framework used in the proof4 Partitioning the proof into parts or sub-proofs5 Checking confusing inferences with examples6 Comparing the method used in the proof with one’s own approach[29] claims that a contribution of his paper is “the suggestion that students’ understanding of the proofs that they readcan be improved if they can be taught to apply the strategies” he reports. Weber draws an interesting comparison withattempts, such as those of [20], to teach problem-solving heuristics.
Based on the seven aspects of proof comprehension identified by [14], on the strategies reported in [29], our ownexperience as mathematics researchers and teachers, and other sources we propose a
Proof understanding baselinechecklist shown in Figure 4.Unlike a pilot’s pre-fight checklist, the items listed in the proof understanding baseline checklist are not intended to beused precisely in the order they are written above. Indeed, we expect that typically a proof will need to be read a numberof times, [6]. Conversely, in some proofs, the items might be rather trivial and so will not need detailed comment.This is an example of where the expert-reversal effect might come into play. An experienced mathematician willbring examples to mind, will instinctively search for warrants and will be confident using formal definitions. Noviceswill need prompting, and the proof understanding baseline checklist might be able to help here. One hallmark ofexpertise is knowing the appropriate level of detail, e.g. being able to highlight the distinctive aspects of a particularproof compared to a typical poof in a particular field of mathematics. Providing comprehensive examples at variousstages (hypotheses, counter example to the converse) might only be appropriate once the overall structure has beenappreciated. So, the checklist is not intended to be used in a linear fashion by students.The proof understanding baseline checklist is about a particular proof, and so the checklist consciously neglects somethings identified by [14], such as transferring the general ideas to another context, or comparing different proofs.We conjecture that an efficient way to remember many proofs appears to be to identify the overall modular recursivestructure of the proof, and how to make use of any gadgets. A hallmark of competence would be the ability toreconstruct a substantially complete proof from a statement containing just (i) the overall modular recursive structure7. Which formal definitions are relevant to the proof?(a) What specialist notation is used, and what does it mean?(b) Write out definitions which occur in the hypotheses, conclusion and proof, adopting the current notation.(c) What examples do you know which do/do not satisfy the definitions?2. Describe the overall modular recursive structure of the proof. E.g. (i) direct, (ii) definition-chasing, (iii) if andonly if, (iv) exhaustive cases, (v) induction, (vi) contrapositive, (vii) contradiction.(a) Identify each structural part of the proof separately. (e.g. an “if any only if” proof must have both di-rections, induction must have (i) a clear hypothesis statement, (ii) base case, (iii) induction step, (iv)conclusion).(b) Recursively apply the proof understanding baseline checklist to each separate sub-part.3. Hypotheses(a) Where is each hypothesis used in the proof?(b) In a general proof, which examples do/do not satisfy the hypotheses? If there is more than one hypothesis,do we have examples which satisfy each logical combination?4. Is a correct warrant justifying each step in the proof given? If not then provide one.5. Does the proof make use of previously known theorems or results? If so, what are they and how are they used?6. Does the proof make use of proof-gadgets? If so, what are they and how are they used?7. For an if ... then proof, is the converse true or false? Do we have counter-examples?8. In a general proof, can you follow the proof steps through with a simple specific example, including any proof-gadgets? Figure 4: Proof understanding baseline checklist8f the proof, and (ii) definition and brief statement of how to use any gadgets. Indeed, this observation might providean opportunity for faded worked examples of proofs at a more structural level. That is, developing tasks in whichstudents are provided with a proposed structure and the details of a gadget. Their task would be to complete a proof.We anticipate that the proof understanding baseline checklist could be used by students independently. However formany proofs, items within the checklist will be rather trivial, or irrelevant. When a teacher creates a set of proof-comprehension questions we expect there will be some craft and artistry in choosing and sequencing the questionswhich highlight distinctive and interesting features unique to a particular proof. [14] developed a proof-comprehension framework, and applied this in [15] to develop and validate three multiplechoice reading comprehension tests. They argue that their model “could also be helpful for professors who teachadvanced mathematics courses.”
Their process involved the following six stages, see [15, Table 1].1. Generate open-ended items, covering all aspects of the proof-comprehension framework.
2. Conduct pre-test interviews with 12 students, asking them to answer each item generated in stage 1. to generatethe choices for multiple-choice tests.3. Expert review, and re-draft, of items generated after stage 2.4. Conduct a pilot with a further 12 students, asking them to solve problems and think aloud.5. Administer the test to a large population (approximately 200 students) to verify that these tests had high inter-nal reliability, to identify problematic items with poor discriminatory power, and to identify items that can beremoved to generate shorter multiple-choice tests.6. Conduct validating interviews with 12 students, asking them to answer each item generated in stage 5. to verifythat the final, shorter, multiple-choice tests accurately measured students’ understanding.This process represents a “gold standard” for test development, but the effort and resources needed is probably onlyavailable in a relatively limited range of situations such as in research projects, or for high-stakes national examina-tions. A similar process is used to develop and validate concept inventories, see [2] and [12].Typical university teachers do not have the resources available to research projects and normally need to generateitems on a week-by-week basis. So, our original goal was to develop a “practical taxonomy of question types” forgenerating questions which support learning mathematical proof, including reading comprehension. We want to equipa thoughtful, practical, teacher with concrete ways to develop assessments of proof, for use as online assessments,of any proof they might want to teach as a routine part of their teaching. So, our starting point was the following,abbreviated, process.1. A member of staff takes the proof to be taught and develops a reading comprehension test.2. The test is taken by students, as part of weekly teaching processes.3. The teacher evaluates the test, qualitatively and perhaps using some statistics such as item response theory,refines the questions and weeds out some of the poorly performing items.Note that stage 3 of our process it typically not formally undertaken even with relatively large university groups.Our process is ambitious and can’t possibly result in the kind of quality instruments developed for research by [15].But, we have good reason to think it could result in a substantial improvement on current practice. The goal of this9ection is to detail a streamlined, practical process which university teachers can follow in order to quickly create proofcomprehension questions for any proof they wish to teach.Since we expect students to use the proof understanding baseline checklist shown in Figure 4, an obvious starting pointis to create questions directly testing aspects of the checklist. For example, related to the logical status of statementsand proof framework we can always consider asking the following.1. What is the type of proof?2. Identify the lines in the proof in which each hypotheses is used.3. Provide a justification for a particular statement/step.4. Identify the lines which play particular roles in the proof structure, e.g. where is the induction hypothesis usedin the induction step? E.g. where is the conclusion of the “if then” direction, and hypothesis of the “only if”?Similarly, we expect to ask about definitions, and examples of objects which may satisfy the definitions.1. Can students identify the correct definition?2. Can students choose examples (from a list, say) which satisfy one of the relevant definitions? Can we findexamples for which each combination of hypotheses holds or fails to hold? If there are n properties (hypotheses)then there will be n examples.3. Can students illustrate a statement in a particular case? We asked our students to write out P (3) in the inductionproof shown in Figure 1. Of the 350 attempts online, 26% of students wrote only × , confusing the equation P (3) with the value of the sum of the series. Illustrating specific statements with an example helps todiscover any confusion about the meaning of notation, terms and statements in the proof. To develop our thinking of practical proof assessment, we set ourselves the goal of writing at least one proof compre-hension question sequence each week in a “proofs and problem solving” course. “Proofs and problem solving” is ayear 1 course is taken by approximately 400 undergraduate students, most of whom are taking mathematics degrees.In this section we illustrate the ideas discussed by recording our development and use of an example proof compre-hension exercise for Theorem 4.1. The following is the proof of Theorem 4.1 given in [13, p. 213], the course textbookfor our year 1 course.
Proof.
Since ( a n ) is bounded, the set S = { a n | n ∈ N } has an upper bound. Hence by the Completeness Axiom for R (see Chapter 22), S has a least upper bound, which we call l . We prove that l is the limit of ( a n ) ∞ n =1 . Let ε ≥ .Then l − ε is not an upper bound for the set S , so there exists an N such that a N ≥ l − ε . As ( a n ) is increasing thisimplies that a n ≥ a N ≥ l − ε for all n ≥ N . Also, l is an upper bound for S , so a n ≤ l for all n . We conclude that l − ε ≤ a n ≤ l for all n ≥ N , which means that | a n − l | ≤ ε for all n ≥ N . This shows that a n → l .Based on our experiences thus far, we suggest the following factors should be considered when turning such a proofinto a reading comprehension exercise. 10 .1 Selecting a Proof There are many reasons why a theorem might be chosen for a proof comprehension question. Mathematicians do onoccasion talk about the relative importance of particular proofs relative to the discipline as a whole, e.g. [ ? ]. [15]suggested that their proofs were chosen to maximize the utility of the corresponding proof comprehension tests. Morespecifically we considered the following factors. We require a theorem that is suitable for being taught in our courseand is something we would like students to understand. Is the proof particularly important in some absolute culturalsense? For example, speaking personally, we would like our students to remember the proofs of the following theorems(from our course) long after they graduate, (i) proof of infinitely many primes, (ii) Cantor’s diagonal argument and(iii) the proof that √ is irrational. In recalling the proof we would like students to fully understand the details ofthese proofs. It is interesting to note that [15] selected the first two of these theorems as material for their proofcomprehension tests.Is the proof in a style which is applicable in many situations? For example, our goal is that students should learninduction, contradiction and so on. Perhaps the proof makes use of important definitions, which are applicable inmany situations? For example, (cid:15)/δ arguments appear to require concerted effort for most people to learn.When writing materials for online assessment we are also driven, in part, by practicality. For example, we have apreference for relatively short proofs which will comfortably fit onto a single screen for most users. The types ofcomprehension questions being asked should also be considered when making such a choice. Not every proof willnaturally yield questions that assess all aspects of proof comprehension, indeed if they did we would have far too manyquestions. For example, specific theorems appear to have fewer opportunities to ask about related examples.Since proof can be written in different ways, see [17] for examples, the style of proof should also be considered.Different styles of writing can affect the suitability of the proof for various types of proof comprehension.Our choice of Theorem 4.1 was made by following these principles. It is a proof that students meet in our course. Theproof is short, and the result is useful in a practical sense. The proof involves important definitions, and the structureof this proof is used elsewhere in analysis. Theorem 4.1 is a general theorem, involving a number of definitions, withscope for interesting use of examples. Since our goal is to write proof comprehension questions that can be marked using an online assessment system, it isuseful to separate out the steps of the proof and to number them. This has the effect of simplifying the layout of theproof whilst providing a way that specific steps of the proof can be referred to and marked online. For example, wecan re-write the proof of Theorem 4.1 as follows.
Proof.
1. Since the sequence ( a n ) is bounded, the set S = { a n | n ∈ N } has an upper bound.2. Note that S has a least upper bound, which we call l .3. We prove that l is the limit of ( a n ) ∞ n =1 .4. Let ε ≥ then l − ε is not an upper bound for the set S , so there exists an N such that a N ≥ l − ε .5. As ( a n ) is increasing this implies that a n ≥ a N ≥ l − ε for all n ≥ N .6. Also, l is an upper bound for S , so a n ≤ l for all n .7. Thus, l − ε ≤ a n ≤ l for all n ≥ N , which means that | a n − l | ≤ ε for all n ≥ N .8. Therefore this shows that a n → l .Numbered lines allows us to ask students about specific statements in the proof without ambiguity. Writing statements11eparately has the added advantage that it somewhat compartmentalises the arguments in the proof and makes it easierto pick out specific items that make up the entire argument. If we we wish to test students’ understanding of a proofwe often mean that we wish to test whether they have understood a more specific idea within the proof. For examplein our choice of proof, do we wish to know if a student remembers the technique of transferring the argument fromsequences to sets as the primary way to apply the completeness hypothesis? Is this an exercise in using the formaldefinition of convergence in a general setting, rather than proving a particular sequence converges?Breaking down the given argument line by line helps us identify the specific items in the proof. It also helps to identifythe areas of the proof which are likely to be suitable for online proof comprehension questions. Our process was togo through the proof understanding baseline checklist shown in Figure 4 and pick out specific components from thechecklist which applied to the proof of Theorem 4.1. We then authored a comprehension question relating to each ofthe relevant components.For example we may be interested in whether a student recognises the notation for the (cid:15) -arguments and if they under-stand the difference between a result holding “ ∀ n ” and “for n ≥ N ”. In addition we may wish to test if a studentknows the correct definition for the least upper bound l or if a student can justify why line 2 in the proof holds. Wecan proceed in this way to build up a selection of potential proof comprehension questions that test the students’knowledge on different aspects of the proof. It seems natural to us to start by considering questions relating to the local aspects of the proof. Namely the meaningof terms and statements, the logical status of statements and proof framework, and the justification of claims. Perhapsthe easiest questions to write are those which test the notation or definitions at play in the proof. In the proof ofTheorem 4.1 we have an increasing sequence, a bounded sequence and the least upper bound b . We also need touse the definition of convergence of a sequence to a limit. Each of these definitions readily leads to multiple choicequestions which can be marked online.We can proceed to considering questions relating to the logical status of statements in the proof. Again, this providesa wide scope for proof comprehension questions. For Theorem 4.1 we could ask the following example questions.In which statements do we use the assumption that ( a n ) is increasing?In which statements do we use the assumption that ( a n ) is bounded?However, a teacher may not only be interested in testing students’ understanding of the statements in the proof, butalso interested in testing whether students can provide warrants in moving from one statement to the next. Thesewarrants could be contained within the proof or could be external. For our proof we could ask the following questions.Why does S have an upper bound in statement 2?Why can we proceed from statement 4. to statement 5. in the proof?Once again, these questions may be turned into suitable MCQ’s. Notice that the second sentence of the original proofstarts “Hence by the Completeness Axiom...”, whereas we have deliberately dropped this in our proof to provide anopportunity to ask students for the warrant for this step.We could also ask questions about steps in a proof which rely on a previous theorem or result. For example in Theorem4.1 we could ask what result we must appeal to in statement 2 in order to see that the set S has a least upper bound?Note that questions of this sort are subtly different to those discussing warrants between steps. In this case we areasking the student to provide some backing for the step, following Toulmin’s terminology [27]. Whether questions ofthis sort are suitable is ultimately a practical teaching decision depending on the context in which the student is seeingthe question. 12his proof makes use of the proof-gadget S . The only way to apply the completeness axiom directly is via a set withan upper bound. In constructing S we turn a sequence into a set of values. The resulting upper bound turns out to bethe limit of the sequence. We could ask students about the motivation for constructing the set S in this way. Another potentially fruitful area for proof comprehension is to create questions associated with some of the globalaspects of the proof. Asking students to illustrate concepts within the proof by providing examples is one such way totest understanding and limitations of the proof.Theorem 4.1 and its proof make use of four definitions. We potentially have eight situations to consider all possiblecombinations of the three definitions in the hypotheses and conclusion. The initial goal is to either provide an examplewhich satisfies the combination or to explain why such an example does not exist. This is shown in Table 1. Thefirst example, a n = 1 − n , is increasing and bounded, and so converges. This exemplifies the theorem. An examplewhich is increasing and bounded, but does not converge would be a counter example to the theorem. Given our proof,such an example cannot occur but do all students understand this? If a sequence converges then it is bounded, so tworows in the table correspond to potential counter examples to this, separate theorem (note A). Hence these examplescannot exist. Do students understand the difference in status between a hypothesis which is necessary for the truthof the theorem (bounded), and a hypothesis which is used to enable the proof-gadget to “work”? That is to say, theincreasing assumption is used to show the supremum of the proof-gadget S is indeed the limit of the sequence. Theremaining combinations, i.e. other rows in the table, can be exemplified. Increasing ? Bounded ? Converges ? Example
T T T Exemplify theorem: a n = 1 − n T T F Counter example to theorem!T F T Note A.T F F a n = n F T T a n = 1 /n F T F a n = ( − n F F T Note A.F F F a n = ( − n ) n Table 1: Example combinations for Theorem 4.1In a general theorem, examining the table of example combinations provides a potentially rich source of questions forstudents. Students can be asked to1. identify which properties a particular example satisfies;2. provide examples satisfying certain properties;3. justify why certain examples do/do not exist.For general theorems this appears to be something which could be done in a potentially systematic way.
Linked to the idea of illustrating by examples is the notion of taking techniques used in the proof and applying themto other situations. We note that this is somewhat different than applying the proof to other situations. For example inour proof we may wish to ask a student if they can use the (cid:15) -arguments when applied to a different proof rather thanasking for an example of an increasing, bounded sequence. Similarly, we could ask the following questions.13ould the proof still work if we considered a bounded, decreasing sequence? If not, why not?Could we write an alternative proof in order to prove that a bounded increasing sequence converges and, if sowhat are the relative merits of each proof?What is the statement of the contrapositive of the theorem? Are there other equivalent formulations?These ideas relate to the ability of students to understand the precise scope of the proof under consideration. Indeed,many proofs are introduced in undergraduate courses because the theorem which is proved is required for the use ofseveral examples. This process often involves the application of the conclusion of the proof, rather than any techniqueused within the proof. In addition to applying the result of the theorem, we may also be interested to see if studentscan adapt the methods or structure of the proof itself in a practical way. For example, we may wish to assess if studentscan run through a general proof on a specific example or if students understand what happens to any proof-gadgetswhen the proof is applied to a specific example. We could also assess students’ understanding of ways in which agiven proof could break. That is, if we apply the proof to an example where the theorem is false, then where does theproof break? E.g. in the classic proof that √ is irrational, what goes wrong in the proof when we try to apply theproof to show √ is irrational? In this manner, we can create potential comprehension questions around the globalconcerns of the proof. A simple proof comprehension asking five basic questions about the proof of Theorem 4.1 isshown in Figure 5. We asked the question shown in Figure 5 during semester 2 of the 2019-20 session, to a year 1 group of undergraduatestudents. The question was included in an online pre-lecture test as part of a flipped-classroom cycle. Students had oneattempt at the question, and feedback was provided after the test had closed. We had 344 responses to the question,although not all students answered every part of the question.(a) In which step of the proof is the Completeness Axiom for R used? This question was answered correctly (line2) by 243 (70.64%) and incorrectly by 95 (27.62%). Common incorrect responses were line 1: 56 (16.57%), or line4: 23 (6.80%), or line 5: 6 (1.78%). Line 1 refers to the upper-bound of S , as does the completeness axiom and sois understandable as a choice. However, we only use the completeness axiom when we apply the conclusion, i.e. theexistence of the least upper bound. Similarly, lines 4 and 6 also refer to upper-bounds suggesting some students arefocusing on surface terminology.(b) In which step of the proof is the assumption that the sequence is bounded first used? This question was answeredcorrectly (line 1) by 261 (75.87%) and incorrectly by 78 (22.67%). Complete results are as follows.Line Responses1 261 (76.99%)2 19 (5.60%)3 14 (4.13%)4 34 (10.03%)6 11 (3.24%)We do not have a hypothesis for the incorrect responses to this question, since boundedness of the sequence (ratherthan the set S ) is not used anywhere else.(c) In which step of the proof is the assumption the sequence is increasing used? This question was answered correctly(line 5) by 303 (89.38%) and incorrectly by 32 (9.30%). Common incorrect answers were line 4: 18 (5.31%) or line3: 8 (2.36%). We do not have a hypothesis for the incorrect responses to this question, since the hypothesis that thesequence is increasing is not used anywhere else.(d) We require the sequence ( a n ) ∞ n =1 to be bounded. What properties does the sequence have?14igure 5: A proof comprehension question15. There exists M such that | a n | < M .B. For all M there exists n such that | a n | < M .C. Given ε > and M ∈ R there is an integer N such that | a n − M | < ε for all n ≥ N .D. There exists M such that a n < M .E. Given any ε > there is an integer N such that ε < a n < N for all n ∈ N .Students were asked to choose all those which applied. 16 students failed to respond at all. Only 85 (24.71%) ofstudents chose the correct two responses (A and D). In addition to the 85 (24.71%) correct selections, students chosethe following. Choice Correct? Selected Not selected.A Y 133 (38.66%) 109 (31.69%)B 43 (12.50%) 199 (57.85%)C 197 (57.27%) 45 (13.08%)D Y 147 (42.73%) 95 (27.62%)E 63 (18.31%) 179 ( 52.03%)Hence, only 63.37% of students correctly selected A, and only 67.44% of students correctly selected D. C looksplausible, but potentially confuses bounded with convergence (although C is deliberately not the correct definition ofconvergence). A majority of students chose C.(e) Which of the following statements are not equivalent to saying the sequence ( a n ) ∞ n =1 converges to l ?A. a n has a limit l .B. a n → n as l → ∞ .C. Given an ε > there is an integer N such that | a n − l | < ε for all n ≥ N .D. a n → l as n → ∞ .E. Given an ε > there is an integer N such that a n < N for all n ∈ N .Students were asked to choose all those which applied. 14 students failed to respond at all. In this case, 208 (60.47%)of students chose the correct two responses (B and E). In addition to the 208 (60.47%) correct selections, studentschose the following. Choice Correct? Selected Not selected.A 52 (15.12%) 70 (20.35%)B Y 75 (21.80%) 47 (13.66%)C 61 (17.73%) 61 (17.73%)D 57 (16.57%) 65 (18.90%)E Y 35 (10.17%) 87 (25.29%)Hence, 82.27% of students correctly selected B, and only 70.64% of students correctly selected E.We expect pre-lecture test questions to have high success rates, and in many cases our instincts appear to have beenvalidated by the response rates of students. These questions were easy, but still, not all students got the correct answers.In the case of the question shown in Figure 5 one criticism from a colleague we feel is somewhat legitimate is thata “test-wise” student could use their knowledge of “the game” to guess answers. For example, the correct answerto part (b) (line 1) is merely the first time the word “bounded” is used in the proof. All assessment formats (MCQ,16pen-ended, reading comprehension) have their own strengths/weaknesses, and some formats have unintended formateffects. These results are an honest report of the question in Figure 5 which is just one of our relatively early attemptsat developing a sequence of proof comprehension questions. No doubt it can be improved. In the next section wediscuss general issues raised by our attempts at developing proof comprehension questions. We wish to understand to what extent our proof comprehension exercises have been worthwhile: to what extenthave they led to an greater understanding of proof in our students? An examination of the students’ most commonresponses to each question allows us to identify the most common mistakes, and to identify any floor/ceiling effectswith impossible/trivial questions. A more thorough long-term evaluation has been disrupted by COVID-19 duringMarch 2020. Given the sudden increase in the use of online learning and assessments as a response to this globalhealth crisis, and our cautious belief of the modest contribution practical proof comprehension has to play in this, wehave chosen to publish these preliminary findings how, rather than wait a full year for experimental data from a morecontrolled setting.The STACK online assessment system we are using allows a question setter to provide immediate tailored feedbackto a student, based on predefined rules. However, the question writer must predict common student errors and providesuitable feedback for each of these. We have been highly pragmatic in not trying to second-guess too many possibleincorrect responses, but rather to examine actual student responses. For example, for part (a) in the question in Figure5, . of students incorrectly believed the completeness axiom was used in line 1. Therefore we latter addedthe feedback “Both line 1 and the completeness axiom refer to the upper bound for S however we only use thecompleteness axiom in line 2 to find the least upper bound.” This gives the students who answered line 1, specificfeedback correcting their error after the test, and this feedback will be in place for subsequent years. We can repeatthis procedure with all common errors for all questions after the materials have been used. We accept there is a delayin developing this feedback for students who may not (this year) go back to review their responses.Figure 6: A proof fallacy questionIn some questions we asked students to give their reasoning as free-form text. Free-form text answers are not auto-matically assessed and these responses were not assigned any marks in the quiz. For example in the question shownin Figure 6 students were asked for the statement in the proof that contained an error. This was the answer that was17utomatically assessed, however the students were also asked to explain their reasoning and this explanation was notmarked automatically. Often, students would enter notes that we did not anticipate. For example, when completing thequestion in Figure 6 one response was that the proof fails because “You cannot take the square root of a negative num-ber.” We altered our feedback to this question as a result for students who chose line 2. In many cases students had thecorrect answer but gave incorrect reasoning or vice versa. For this question the most common response was the correctanswer (line 3), selected by 47.68% of students. However other common responses were line 4 (22.60%) and line 2(16.41%), suggesting students were having difficulty in correctly identifying the error in the proof. The main difficultyencountered by the students here appeared to be the consequences of defining i = 1 rather than defining i = √− i .Taking the comments in addition to the common question responses allowed for a more accurate identification of theprecise areas of the proof where students appear to be struggling. This helped us to develop more accurate feedback,addressing the common errors for this question. By applying this process to each question we can provide tailoredfeedback which can be supplemented each time the question materials are used. Based on our experiences of gatheringfree-form answers we are confident that we could gather students’ responses in year 1, and from these develop multiplechoice questions for subsequent student cohorts. Students in year 1 would not get any benefit from feedback from theautomatically scored MCQ, and either we would have to accept that or have a human assess students’ justificationsprovided in year 1.To what extent does the proof style inform the types of comprehension question that can be developed from it? In [17]various styles of proof writing are discussed in a general way. This is of interest here because, for the purposes ofreading comprehension, it is possible to alter the style of a given proof in order to increase the viability of developingcomprehension questions. For example, the induction proof in Figure 2 uses subtle colour and style which is intendedto highlight structural components of the induction proof, providing cues to the student. This style has the potential tohighlight the modular structure of a proof which may be otherwise obscured.When re-writing the proof of Theorem 4.1 using numbered steps, we purposely omitted the explicit reference tothe Completeness Axiom in statement 2, between the proof provided by [13] and our proof with numbered lines.By consciously choosing to omit a warrant we can then ask students to provide this information as an answer to acomprehension question. See Figure 5, statement 2 and question (a). In a similar way, it is possible to purposelyomit particular algebraic expressions in a proof for the purpose of asking students to provide that information byfilling in gaps. We note the subtle difference between deliberately obfuscating logical steps of a proof and obfuscatingcalculations within a proof. Another possible style choice would be to provide the proof to the student in a two columnformat and to obscure some of the working or the justification steps we wish students to provide.The complex fallacy question shown in Figure 6 could ask students to provide a warrant for each line, and to identifyany lines which are incorrect. This has the advantage that we do not deliberately make the proof more difficult tounderstand. In addition, this style allows for the precise identification of the warrant we wish students to provide.While the self-explanation training of [5] has proved to be effective, explicitly asking students specific questions abouta proof might (i) encourage them to ask such questions about any proof they encounter, and (ii) where carefullydesigned might prompt students to consider aspects of the proof they might otherwise not consider.We wish to insert a small note of caution at this point. One unexpected difficulty encountered during the developmentof our proof comprehension questions was that we found in certain cases, subtleties in either the structure or reasoningof a proof which were not necessarily straightforward for us to follow. This was despite the fact many of these proofsare well known arguments. We hypothesise that this was a result of our very familiarity with these proofs. Over timewe have become trained to subconsciously internalise some of their structure as we have gained expertise. This canhave the effect that we under-appreciate some of the subtleties of an argument. Appropriate proof comprehensiondevelopment therefore requires the “rediscovery” of every level of the proof.What has perhaps surprised us the most in developing and trialing our questions is the lack of floor/ceiling effects,even with apparently trivial questions. Despite teaching high-achieving students on a mathematics degree at a leadinguniversity, many of our students are not able to identify lines in a proof correctly. The inability of many students tocorrectly identify lines of a proof strongly suggests the value of using such questions to gauge students’ understanding.18 Conclusion
Developing proof comprehension tasks has allowed us to test students’ understanding of specific aspects of importantproofs. In addition, it has helped us to more accurately identify the specific areas of each proof where studentsappear to be struggling. Once we had developed, and gained confidence in using, the proof understanding baselinechecklist shown in Figure 4 we have been able to readily and efficiently produce adequate proof comprehensionquestion sequences on a week-by-week basis. By testing each appropriate item of the checklist, we were able to createat least one suitable proof comprehension question sequence per week. The self-imposed requirement that all of thequestions we developed were suitable for marking online, and the lack of resources to develop high-quality MCQ wasa restriction on what we could do. This has been significantly simpler than the gold-standard process described in [15].Each week we have been able to review the results of each question. Overall there was a lack of floor/ceiling effects inmost of our questions. Based on preliminary examination of students’ attempts we believe these questions have beenacceptable, or better, tests of important aspects of students’ understanding of these proofs.At times developing proof comprehension questions has been a very enjoyable process, which has deepened our un-derstanding of the proofs. At other times it has proved rather frustrating, with a growing confusion over some proofsas currently written in the core text. We suspect a more comprehensive examination of standard proofs through theattempt to apply the proof understanding baseline checklist can only lead to improvements (indeed perhaps correctionof errors/omissions) in the proofs we provide to students. However, we have been able to write a variety of compre-hension questions, including reading comprehension, fading questions and generating examples questions. Using eachof these techniques, we have been able to test a range of different aspects of proof understanding. We hope that themethods described here can aid university teachers to produce comprehension tasks across a wide variety of subjectswithin mathematics, and that such questions are a genuine aid to developing students’ understanding of proof itself.
References [1] P. G. Butcher and S. E. Jordan. A comparison of human and computer marking of short free-text student re-sponses.
Computers and Education , 55(2):489–499, September 2010.[2] M. Carlson, M. Oehrtman, and N. Engelke. The precalculus concept assessment: A tool for assessing students’reasoning abilities and understandings.
Cognition and Instruction , 28(2):113–145, 2010.[3] B. Davies.
Comparative Judgement and Proof . PhD thesis, Loughborough University, 2019.[4] G. Harel and L. Sowder.
Second handbook of research on mathematics teaching and learning , chapter Towardscomprehensive perspectives on the learning and teaching of proof, pages 805–842. National Council of Teachersof Mathematics, 2007.[5] M. Hodds, L. Alcock, and M. Inglis. Self-explanation training improves proof comprehension.
Journal forResearch in Mathematics Education , 45(1):62–101, 2014.[6] M. Inglis and L. Alcock. Expert and novice approaches to reading mathematical proofs.
Journal for Research inMathematics Education , 43:358–390, 2012.[7] M. Inglis, J. P. Mejia-Ramos, and A. Simpson. Modelling mathematical argumentation: the importance ofqualification.
Educational Studies in Mathematics , 66:3–21, 2007.[8] S. Kalyuga, P. Ayres, P. Chandler, and J Sweller. The expertise reversal effect.
Educational Psychologist ,38(1):23–31, 2003.[9] S. Kalyuga, R. Rikers, and F. Paas. Educational implications of expertise reversal effects in learning and perfor-mance of complex cognitive and sensorimotor skills.
Educational Psychology Review , 24(2):313–337, 2012.1910] G. Kinnear. Delivering an online course using STACK. In
Contributions to the 1st International STACK confer-ence 2018 in F¨urth, Germany . Zenodo, 2019.[11] G. Kinnear and C. J. Sangwin. Contradiction and contrapositive, what is the difference?
Scottish MathematicsCouncil Journal , 48:84–90, December 2018.[12] S. Lane-Getaz. Development of a reliable measure of students’ inferential reasoning ability.
Statisitics EducationResearch Journal , 12(1):20–47, 2013.[13] M. Liebeck.
A Concise Introduction to Pure Mathematics . Chapman Hall, second edition, 2000.[14] J. P. Mej´ıa-Ramos, E. Fuller, K. Weber, K. Rhoads, and A. Samkoff. An assessment model for proof compre-hension in undergraduate mathematics.
Educational Studies in Mathematics , 79(1):3–18, 2012.[15] J. P. Mej´ıa-Ramos, K. Lew, J. de la Torre, and K. Weber. Developing and validating proof comprehension testsin undergraduate mathematics.
Research in Mathematics Education , 19(2):130–146, 2017.[16] R. C. Moore. Making the transition to formal proof.
Educational Studies in Mathematics , 27:249–266, 1994.[17] P. Ording.
99 Variations on a Proof . Princeton University Press, Princeton and Oxford, 2019.[18] M. ¨Osterholm. Characterizing reading comprehension of mathematical texts.
Educational Studies in Mathemat-ics , 63(325–346), 2006.[19] A. Pointon and C. J. Sangwin. An analysis of undergraduate core material in the light of hand held computeralgebra systems.
International Journal of Mathematical Education in Science and Technology , 34(5):671–686,September 2003.[20] G. Polya.
Mathematics and Plausible Reasoning. Vol.1: Induction and Analogy in Mathematics. Vol 2. Patternsof Plausible Inference . Princeton University Press, 1954.[21] R. Recorde.
The Whetstone of Witte . London by I. Kyngston, 1557.[22] A. Renkl, R. K. Atkinson, and C. S. Gross. How fadingworked solution steps works – a cognitive load perspec-tive.
Instructional Science , 32:59–82, 2004.[23] C. J. Sangwin.
Computer Aided Assessment of Mathematics . Oxford University Press, Oxford, UK, 2013.[24] C. J. Sangwin.
Proof Technology in Mathematics Research and Teaching , chapter Reasoning by equivalence:the potential contribution of an automatic proof checker. Mathematics education in the digital era. SpringerInternational, 2019.[25] C. J. Sangwin and I. Jones. Asymmetry in student achievement on multiple choice and constructed responseitems in reversible mathematics processes.
Educational Studies in Mathematics , 94:205–222, 2017.[26] A. Selden and J. Selden. Research perspectives on concepts of functions.
Educational Studies in Mathematics ,29(2):123–151, 1995.[27] S. E. Toulmin.
The Uses of Argument, updated edition . Cambridge University Press, Cambridge, United King-dom, 2003.[28] K. Weber. Student difficulties in constructing proofs: The need for strategic knowledge.
Educational Studies inMathematics , 48(1):101–119, 2001.[29] K. Weber. Effective proof reading strategies for comprehending mathematical proofs.