A greedy algorithm for dropping digits (Functional Pearl)
FFunctional Pearl
A Greedy Algorithm for Dropping Digits
Richard Bird Shin-Cheng Mu Department of Computer Science, University of Oxford Institute of Information Science, Academia Sinica
Abstract
Consider the puzzle: given a number, remove k digits such thatthe resulting number is as large as possible. Various techniques wereemployed to derive a linear-time solution to the puzzle: predicate logicwas used to justify the structure of a greedy algorithm, a dependently-typed proof assistant was used to give a constructive proof of the greedycondition, and equational reasoning was used to calculate the greedystep as well as the final, linear-time optimisation. Greedy algorithms abound in computing. Well-known examples includeHuffman coding, minimum cost spanning trees, and the coin-changingproblem; but there are many others. This pearl adds yet another problem tothis collection. However, as has been said before, greedy algorithms can betricky things. The trickiness is not in the algorithm itself, which is usuallyquite short and easy to understand, but in the proof that it does produce abest possible result. The mathematical theory of matroids , see Lawler [ ],and its generalisation to greedoids , see Korte et al. [ ], have been developedto explain why and when many greedy algorithms work, although the theorydoes not cover all possible cases. In practice, greedy algorithms are usuallyverified directly rather than by extracting the underlying matroid or greedoid.Curtis [ ] discusses four basic ways in which a greedy algorithm can beproved to work, one of which will be followed with our problem.The problem is to remove k digits from a number containing at least k digits, so that the result is as large as possible. For example, removingone digit from the number ”6782334” gives ”782334” as the largest possibleresult, while removing three digit yields ”8334” . Given that a number can beseen as a list of digits, the problem can be generalised to removing, from alist whose elements are drawn from a linearly ordered type, k elements sothat the result is largest under lexicographic ordering. While the problemwas invented out of curiosity rather than for a pressing application, it hasapparently been used as an interview question for candidates seeking jobs incomputing. The hope is that we can discover an algorithm that takes lineartime in the number of elements. The problem is listed on LeetCode as https://leetcode.com/problems/remove-k-digits/ , wherethe objective is to find the smallest number rather than the largest, but the principles are thesame. a r X i v : . [ c s . P L ] J a n a greedy algorithm 2 The first task is to give a formal specification of the problem. Considerthe function drops that removes a single element from a non-empty list in allpossible ways: drops :: List a → List ( List a ) drops [ x ] = [ [ ] ] drops ( x : xs ) = xs : map ( x : ) ( drops xs ) For example, drops ”abcd” = [ ”bcd” , ”acd” , ”abd” , ”abc” ] . The function solve for computing a solution can be defined by a simple exhaustive search: solve :: Ord a ⇒ Nat → List a → List asolve k = maximum · apply k step · wrap , step :: List ( List a ) → List ( List a ) step = concat · map drops .The function solve converts a given input into a singleton list, applies thefunction step exactly k times to produce all possible candidates, and computesthe lexical maximum of the result. Nat is the type of natural numbers. Thefunction step is drops lifted to a list of candidates. It computes, for eachcandidate, all the ways to drop 1 element. Functions wrap and apply arerespectively defined by wrap :: a → List awrap x = [ x ] , apply :: Nat → ( a → a ) → a → aapply f = idapply ( + k ) f = apply k f · f ,For brevity, for the rest of the pearl we will write apply k f as f k . Since asequence of length n has n drops, and computing the larger of two lists oflength n − k takes O ( n − k ) steps, this method for computing the answertakes O ( n k ) steps. We aim to do better. To obtain a greedy algorithm, one would wish that the best way to remove k digits can be computed by removing 1 digit k times, and each time wegreedily remove the digit that makes the current result as large as possible.That is, letting gstep :: Ord a ⇒ List a → List agstep = maximum · drops ,we wish to have maximum · step k · wrap = gstep k . ( )One cannot just claim that this strategy works without proper reasoning,however. It can be shown that ( ) is true if the following monotonicitycondition holds (we denote lexicographic ordering on lists by ( (cid:69) ) , andordering on individual elements by ( (cid:54) ) ): xs (cid:69) ys ⇒ ( ∀ xs (cid:48) ∈ drops xs : ( ∃ ys (cid:48) ∈ drops ys : xs (cid:48) (cid:69) ys (cid:48) )) , ( ) We use notations similar to Haskell, with slight variations. For example, the type for lists isdenoted by
List , and we allow ( + k ) as a pattern in function definitions, to match our inductiveproofs. Laziness is not needed. a greedy algorithm 3 where ( ∈ ) is overloaded to denote membership for lists. That is, if ys is noworse than xs , whatever candidate we can obtain from xs , we can compute acandidate from ys that is no worse either.Unfortunately, ( ) does not hold for our problem. Consider xs = ”1934” (cid:67) ”4234” = ys , ”934” is a possible result of drops xs , but the best we can doby removing one digit from ys is ”434” . Note that ( ) does not hold evenif we restrict xs and ys to lists that can be obtained from the same source— certainly both ”1934” and ”4234” are both result of removing two digitsfrom, say, ”194234” .In the terminology of Curtis [ ], the Better-Global principle — whichsays that if one first step is no worse than another, then there is a globalsolution using the former that is no worse than one using the latter — doesnot hold for this problem. What does hold is a weaker property, the
Best-Global principle: a globally optimal solution can be obtained by starting outwith the best possible step. Formally, what we do have is that for all xs and k : ( ∀ xs (cid:48) ∈ step + k [ xs ] : ( ∃ zs ∈ ( step k · wrap · gstep ) xs : xs (cid:48) (cid:69) zs )) . ( )That is, letting xs (cid:48) be an arbitrary result of dropping k + xs , one can always obtain a result zs that is no worse than xs (cid:48) by greedilydropping the best element (by gstep ) and then dropping arbitrary k elements.Property ( ) will be proved in Section . For now, let us see how ( ) helpsto prove ( ), that is, maixmum · step k · wrap = gstep k . The proof proceeds byinduction on k . For k : = id . For the inductive case weneed the universal property of maximum : for all s :: a → b and p :: a → List b ,and for total order ( (cid:69) ) on b : s = maximum · p ≡ ( ∀ x : s x ∈ p x ) ∧ ( ∀ x , y : y ∈ p x ⇒ y (cid:69) s x ) .To prove maixmum · step + k · wrap = gstep + k we need to show that . for all xs , gstep + k xs is a member of step + k [ xs ] , which is a routine proof, and .for all xs and for all ys ∈ step + k [ xs ] , we have ys (cid:69) gstep + k xs . We reason: ys (cid:69) gstep + k xs ≡ ys (cid:69) gstep k ( gstep xs ) ≡ { induction hypothesis } ys (cid:69) maximum ( step k [ gstep xs ]) ≡ { since ys (cid:69) maximum xss ≡ ( ∃ zs ∈ xss : ys (cid:69) zs ) } ( ∃ xs ∈ step k [ gstep xs ] : ys (cid:69) xs ) ⇐ { by ( ) } ys ∈ step + k [ xs ] ,which is our assumption. We have thus proved ( ). Remark : the proof above was carried out using predicate logic. Thereis a relational counterpart, in the style of Bird and de Moor [ ], that isslightly more concise and more general, which we unfortunately cannotpresent without adding a section introducing the notations and rules. We restrict our disucssion to total orders to ensure that maximum returns one unique result.More general scenarios are discussed in Bird and de Moor [ ]. refining the greedy step 4 We will prove the greedy condition ( ) in Section . It will turn out that theproof makes use of properties of gstep that will be evident from its inductivedefinition. Therefore we calculate an inductive definition of gstep in thissection.It is easy to derive gstep [ x ] = [ ] . For the inductive case we reason gstep ( x : y : xs )= { definition of gstep } maximum ( drops ( x : y : xs ))= { definition of drops } maximum (( y : xs ) : map ( x : ) ( drops ( y : xs )))= { definition of maximum } max ( y : xs ) ( maximum ( map ( x : ) ( drops ( y : xs ))))= { since maximum ( map ( x : ) xss ) = x : maximum xss , provided xss is nonempty } max ( y : xs ) ( x : maximum ( drops ( y : xs )))= { definition of gstep } max ( y : xs ) ( x : gstep ( y : xs ))= { definition of max and lexicographic ordering } if x < y then y : xs else if x == y then x : max xs ( gstep ( y : xs )) else x : gstep ( y : xs )= { since xs (cid:69) gstep ( y : xs ) } if x < y then y : xs else x : gstep ( y : xs ) .Hence we have gstep [ x ] = [ ] gstep ( x : y : xs ) = if x < y then y : xs else x : gstep ( y : xs ) .It turns out that gstep xs deletes the last element of the longest descending prefix of xs . For easy reference, we will refer to this element as the hill foot of the list.For example, gstep ”8766678” = ”876678” , where the hill foot, the elementdeleted, is the third 6. In this section we aim to prove ( ), recited here: ( ∀ xs (cid:48) ∈ step + k [ xs ] : ( ∃ zs ∈ ( step k · wrap · gstep ) xs : xs (cid:48) (cid:69) zs )) .Proving a proposition containing universal and existential quantification canbe thought of as playing a game. The opponent challenges us by providing xs and a way of removing 1 + k elements to obtain xs (cid:48) . We win by presentinga way of removing k elements from gstep xs , such that the result zs satisfies xs (cid:48) (cid:69) zs . Equivalently, we present a way of removing 1 + k elements from xs , while making sure that the hill foot of xs is among the 1 + k elementsremoved. To prove ( ) is to come up with a strategy to always win the game.We could just invent the strategy and argue for its correctness. However,we experimented with another approach: could a proof assistant offer somehelp? Can we conjecture the existence of a function that, given the opponent’sinput, computes zs , and try to develop the function and the proof that xs (cid:48) (cid:69) zs at the same time, letting their developments mutually guide each other? Itwould be a modern realisation of Dijkstra’s belief that a program and itsproof should be developed hand-in-hand [Dijkstra, ]. proving the greedy condition 5 the datatypes . We modeled the problem in the dependently typedlanguage/proof assistant Agda. For the setting-up, we need to define anumber of datatypes. Firstly, given a type a with a total ordering ( (cid:54) ) (whichderives a strict ordering ( < ) ), lexicographic ordering for List a is defined by: data (cid:69) :: List a → List a → Set where [] (cid:69) :: [ ] (cid:69) xs < (cid:69) :: x < y → ( x : xs ) (cid:69) ( y : ys ) ≡ (cid:69) :: xs (cid:69) ys → ( x : xs ) (cid:69) ( x : ys ) ,that is, [ ] is no larger than any lists, ( x : xs ) (cid:69) ( y : ys ) if x < y , and two listshaving the same head is compared by their tails.Secondly, rather than actually deleting elements of a list, in proofs it helpsto remember which elements are deleted. The following datatype Dels k xs can be seen as instructions on how k elements are deleted from xs : data Dels :: Nat → List a → Set where end :: Dels [ ] keep :: Dels k xs → Dels k ( x : xs ) del :: Dels k xs → Dels ( + k ) ( x : xs ) .For example, letting xs = ”abcde” , ds = keep ( del ( keep ( del ( keep end )))) :: Dels xs says that the 1st and 3rd elements of xs (counting from 0) are to bedeleted. The function dels actually carries out the instruction: dels :: ( xs :: List a ) → Dels k xs → List adels [ ] end = [ ] dels ( x : xs ) ( del ds ) = dels xs dsdels ( x : xs ) ( keep ds ) = x : dels xs ds .For example, dels xs ds = ”ace” .Thirdly, the predicate HFoot i xs holds if the i -th element in xs is the hillfoot, that is, the element that would be removed by gstep xs : data HFoot :: Nat → List a → Set where last :: HFoot ( x : [ ]) this :: x < y → HFoot ( x : y : xs ) next :: x (cid:62) y → HFoot i ( y : xs ) → HFoot ( + i ) ( x : y : xs ) .For example, next ( next ( next ( next this ))) may have type HFoot ”8766678” ,since the 4th element is the last in the descending prefix ”87666” .Finally, we define a datatype IsDel :: Nat → Dels k xs → Set such that,for all ds : Dels k xs , the relation
IsDel i ds holds if ds instructs that the i -thelement of xs is to be deleted. Its definition is repetitive and thus omitted. the function and the proofs . The aim is to construct the followingfunction alter : alter :: Dels ( + k ) xs → HFoot i xs → Dels ( + k ) xs .It takes an instruction, given by the opponent, that deletes 1 + k elementsfrom xs , and an evidence that the i -th element of xs is its hill foot, andproduces a possibly altered instruction that also deletes 1 + k elements. To be consistent with earlier parts of this pearl, we use Haskell-like notations for the Agda code.Typing relation is denote by ( :: ) and list cons by ( : ) . The two constructors of Nat are 0 and ( +) .Universally quantified implicit arguments in constructor and function declarations are omitted. proving the greedy condition 6 Recalling the discussion in the beginning of this section, alter should satisfytwo properties: mono :: ( ds :: Dels ( + k ) xs ) → ( ft :: HFoot i xs ) → dels xs ds (cid:69) dels xs ( alter ds ft ) , unfoot :: ( ds :: Dels ( + k ) xs ) → ( ft :: HFoot i xs ) → IsDel i ( alter ds ft ) .Given ds and ft , the property mono says that alter ds ft always produces a listthat is not worse than that produced by ds , while unfoot says that alter ds ft does delete the hill foot.The goal now is to develop alter , mono , and unfoot together. The readeris invited to give it a try — it is more fun trying it yourself! In most ofthe steps, the type and proof constraints leave us with only one reasonablechoice, while in one case we are led to discover a lemma. The cases toconsider are: . alter { xs = x : y : ys } ( keep ds ) ( this x < y ) — the opponent keeps x ,which is the hill foot because x < y . Due to unfoot , we have to delete x ;a simple way to satisfy mono is to keep y . Thus we return del ( keep ds (cid:48) ) ,where ds (cid:48) can be any instruction that deletes k elements in xs — itdoesn’t matter how ds (cid:48) does it! . alter { xs = x : y : ys } ( keep ds ) ( next x (cid:62) y ft ) — the opponent keeps x , and we have not reached the hill foot yet. In this case it is safe toimitate the opponent and keep x too, before recursively calling alter togenerate the rest of the instruction. . alter { xs = [ x ] } ( del end ) last — the opponent deletes the sole element x . In this case we delete x too, returning del end . . alter { xs = x : y : ys } ( del ds ) ( this x < y ) — the element x is the hill foot,and is deleted by the opponent. In this case, since unfoot is satisfied,we can do exactly the same. We end up returning the same instructionas the opponent’s but it is fine, since both mono and unfoot are satisfied. . alter { xs = x : y : ys } ( del ds ) ( next x (cid:62) y ft ) — the opponent deletes x ,which is in the descending prefix but is not the hill foot. This turnsout to be the most complex case. One may try to imitate and delete x as well, returning del ds (cid:48) for some ds (cid:48) . However, ds (cid:48) , having type Dels k ( y : ys ) , cannot be produced by a recursive call to alter , whosereturn type is Dels ( + k ) . It could be the case that k is 0 and, sincewe have not deleted the hill foot yet, returning Dels ( y : ys ) wouldviolate unfoot . The lesson learnt from the type is that we can only delete1 + k elements, and we have to save at least one del for the hill foot,which is yet to come. We thus have to further distinguish between twocases:(a) k = + k (cid:48) for some k (cid:48) . In this case we still have room to deletemore elements, thus we can safely imitate the opponent, delete x ,and recursively call alter .(b) k =
0. In this case we keep x , returning keep ( delfoot ft ) :: Dels ( y : ys ) , where delfoot :: HFoot i zs → Dels zs computes an instruction The Agda code can be downloaded from https://scm.iis.sinica.edu.tw/home/2020/dropping-digits/ . Curly brackets are used in Agda to mention implicit arguments. In each case here we patternmatch { xs = ... } such that the readers know what input list we are dealing with. improving efficiency 7 alter :: Dels ( + k ) xs → HFoot i xs → Dels ( + k ) xsalter ( keep ds ) ( this x < y ) = del ( keep ( deleteAny ds )) alter ( keep ds ) ( next x (cid:62) y ft ) = keep ( alter ds ft ) alter ( del end ) last = del endalter ( del ds ) ( this x < y ) = del dsalter { k = } ( del ds ) ( next x (cid:62) y ft ) = keep ( delfoot ft ) alter { k = + k } ( del ds ) ( next x (cid:62) y ft ) = del ( alter ds ft ) , X Op : keeps the hill foot. We : drop the fill foot. Drop k elements in the tail. Stop. X … X Op : keeps the current element, which is not the hill foot. We : keep it. Recurse. X X Op : drops the last element We : do the same. Stop. XX … .. … ..XX k=0k=1+k’ Op : drops the current element, which is not the hill foot. Quota for deletion not used up yet. We : drop it. Recurse. Op : drops the current element, which is not the hill foot. We can drop only one element. We : keep it, drop the hill foot. Opponent We Op : drops the current element, which is the hill foot. We : do the same till the end. X X
Figure : The function alter , where deleteAny ds generates a Dels k xs , and a graphicalsummary. Element with dotted outline are those that are considered already; the onewith thick outline is the current element. It is the hill foot if it is smaller than theelement to the right, or it is the last. Deleted elements are marked with a cross. that deletes exactly one element, the hill foot. What is left to proveto establish mono for this case can be extracted to be a lemma: monoAux :: x (cid:62) y → ( ft :: HFoot i ( y : ys )) → ( y : ys ) (cid:69) ( x : dels ( y : ys ) ( delfoot ft )) ,whose proof is an induction on ys , keeping x (cid:62) y as an invariant.If x < y or y is the hill foot, we are done. Otherwise x = y and weinductively inspect the tail ys . Without Agda, it would not be easyto discover this lemma.In summary, the function alter we have constructed, and a graphicalsummary, are shown in Figure . Remark : We may also tuple alter and the properties together, and try toconstruct: alter (cid:48) :: ( ds :: Dels ( + k ) xs ) → ( ft :: HFoot i xs ) →∃ ( λ ( ds (cid:48) :: Dels ( + k ) xs ) → ( dels xs ds (cid:69) dels xs ds (cid:48) ) × IsDel i ds (cid:48) ) .An advantage is that the code of each case of alter are next to its proof. A dis-advantage is that having to pattern-match the result of alter (cid:48) psychologicallydiscourages one from making a recursive call when needing a Dels ( + k ) .It is up to personal preference which style one prefers. Back to our code. We have proved that solve k = gstep k , with gstep given by: improving efficiency 8 gstep [ x ] = [ ] gstep ( x : y : xs ) = if x < y then y : xs else x : gstep ( y : xs ) .Each time gstep is called, it takes O ( n ) steps to go through the descendingprefix and find the hill foot, before the next invocation of gstep starts fromthe beginning of the list again. Therefore, solve k takes O ( kn ) steps over all.This is certainly not necessary — to find the next hill foot, the next gstep could start from where the previous one left off.The way to implement this idea is to bring in an accumulating parameter.Suppose we generalise solve to a function gsolve , defined by gsolve k xs ys = solve k ( xs ++ ys ) ,with the proviso that the argument xs is constrained to be a descendingsequence. In particular, solve k xs = gsolve k [ ] xs . We aim to develop arecursive definition of gsolve . Clearly, gsolve xs ys = xs ++ ys .Recalling that gstep drops the last element of a descending list, we know that k repetitions of gstep on a decreasing list will drop the last k elements. Hence gsolve k xs [ ] = dropLast k xs ,where dropLast k drops the last k elements of a list. We will not give a formaldefinition of dropLast as it will be replaced by another function in a moment.That deals with the two base cases. For the recursive case, it is easy to provethe following property of gstep : gstep ( xs ++ y : ys ) | null xs ∨ last xs (cid:62) y = gstep (( xs ++ [ y ]) ++ ys ) | otherwise = init xs ++ y : ys ,which can be used to construct the following case of gsolve : gsolve ( + k ) xs ( y : ys ) | null xs ∨ last xs (cid:62) y = gsolve ( + k ) ( xs ++ [ y ]) ys | otherwise = gsolve k ( init xs ) ( y : ys ) .The second optimisation is simply to replace the list xs in the definitionof gsolve by reverse xs to avoid adding elements at the end of a list. That leadsto our final algorithm: solve k xs = gsolve k [ ] xs , gsolve xs ys = reverse xs ++ ysgsolve k xs [ ] = reverse ( drop k xs ) gsolve k xs ( y : ys ) | null xs ∨ head xs (cid:62) y = gsolve k ( y : xs ) ys | otherwise = gsolve ( k − ) ( tail xs ) ( y : ys ) ,where drop k is a standard Haskell function that drops the first k elementsfrom a list. For an operational explanation, gsolve traverses through the list,keeping looking for the next hill foot to delete — the otherwise case is whena hill foot is found. The list xs is the traversed part — ( xs , ys ) forms a zipper.The head of xs is a possible candidate of the hill foot. While the algorithmlooks simple once understood, without calculation it is not easy to get thedetails right. The authors have come up with several versions that are wrong,before sitting down to calculate it!To time the program, note that at each step either k is reduced to k − y : ys is reduced to ys . Hence solve k xs takes O ( k + n ) = O ( n ) steps, where n = length xs . conclusion 9 To construct a linear-time algorithm for solving the puzzle, various tech-niques were employed. The structure of the greedy algorithm was provedusing predicate logic, and the proof was simplified from relational programcalculus. Agda was used to give a constructive proof of the greedy condition,and equational reasoning was used to derive the greedy step as well as thefinal, linear-time optimisation. references
R. S. Bird and O. de Moor.
Algebra of Programming . International Series inComputer Science. Prentice Hall, . ISBN - - -X.S. Curtis. The classification of greedy algorithms. Science of Computer Pro-gramming , ( - ): – , .E. W. Dijkstra. Programming as a discipline of mathematical nature. AmericanMathematical Monthly , ( ): – , May . EWD .B. Korte, L. Lov´asz, and R. Schrader. Greedoids . Springer-Verlag, .E. L. Lawler.
Combinatorial Optimization: Networks and Matroids . Holt, Rinehart,and Winston,1976