A Typo in the Paterson-Wegman-de Champeaux algorithm
aa r X i v : . [ c s . L O ] J u l A typo in the Paterson-Wegman-de Champeaux algorithm
Valeriu Motroi and S¸tefan Ciobˆac˘a Alexandru Ioan Cuza University Ia¸si, Romania { motroival, stefan.ciobaca } @gmail.com Abstract
We investigate the Paterson-Wegman-de Champeaux linear-time unification algorithm.We show that there is a small mistake in the de Champeaux presentation of the algorithmand we provide a fix.
In this paper we investigate the Paterson-Wegman algorithm [3], as improved by de Cham-peaux [1]. The algorithm has linear-time complexity. In Figure 1 we present the pseudo-codeproposed by de Champeaux. We add line numbers and we make some cosmetic changes, whichdo not affect the algorithm logic. For example, we omit the else branch of an if statement whose then branch ends with an exit statement. The de Champeaux presentation of the algorithmends with a post-processing step, described in Figure 6.The issue we identify is that the post-processing step enters an infinite loop. The infiniteloop is caused by a bug in the occurs-check test. We give an input producing an infinite loopin the next section. The bug can be fixed syntactically by indenting an assignment statement,i.e., moving it inside the inner code block.This issue was noticed and fixed by Erik Jacobsen [2] (see footnote on Page 34). However,in this paper we present and analyze an troublesome input in detail. We show how the de Champeaux algorithm works when trying to unify the terms X and f ( X ).The algorithm starts with the DAG representation of the two terms, which we show in Figure 2.As the two terms have maximal sharing between them, there is only one node labeled X . Thereare two roots, each corresponding to one of the terms to be unified. We use simple arrows todenote the relation between parent and child nodes of the DAG.The algorithm creates links (undirected edges) between nodes that should be in the sameequivalence relation. We use dashed lines to denote the links created by the algorithm. Thealgorithm also maintains stacks (shown graphically on the right) and a set of pointers fromnodes to nodes, which are represented by two-headed arrows.The algorithm starts by creating a link between X and f ( X ) (Figure 3).The next step is to call Finish on all functional nodes (line 3). In this example we haveonly one functional node, f . At this step, we have r = f ( X ). Because complete(r) is markedas false and pointer(r) is NIL , we jump straight to line 12, where we set pointer(r) to r andpush it to the stack (Figure 4).At the first iteration of the while loop, at line 15, we have s = r . As s and r have the samefunction symbol, we do not enter the if statement at line 16. As s does not have any parent,we do not enter the if statement at line 18. The variable s has a link to X and, as a result,at line 21 we have r = f ( X ), s = f ( X ), t = X . The variable t is not marked complete and is typo in the Paterson-Wegman-de Champeaux algorithm Motroi and Ciobˆac˘a Procedure
Solver( u, v ) : Create link (u, v) While there is a function node r,
Finish (r) While there is a variable node r,
Finish (r) BUILD-SIGMA (SIGMA) Procedure
Finish( r ) : if complete(r) then Exit if pointer(r) = NIL then Exit with failure Create new pushdown stack with operations
Push (*) and
Pop pointer(r) := r Push (r) while stack = NIL do s := Pop if r, s have different function symbols then Exit with failure FOR-EACH parent t of s do Finish (t) FOR-EACH link (s, t) do if Complete(t) or t = r then Ignore t else if pointer(t) = NIL then pointer(t) := r Push (t) else if pointer(t) = r then Exit with failure else Ignore t // (since t is already on STACK) if s = r then if Variable(s) then Subs(s) := r Add s to SIGMA (input to BUILD-SIGMA) else Create links { j th son(r), j th son(s) | ≤ j ≤ q } Complete(s) := true end Complete(r) := trueFigure 1: Paterson-Wegman algorithm as presented by de Champeaux. We add line numbersand we make some cosmetic changes.fxRoot 1 Root 2Figure 2: The data structures at the start of the algorithm.2 typo in the Paterson-Wegman-de Champeaux algorithm Motroi and Ciobˆac˘a fxFigure 3: The data structures representation after adding the first link.fx f(X)Figure 4: The data structures after pushing the first functional node to the stack.not equal to r , so we enter the if statament at line 24, set pointer(t) to be r and push it on thestack (Figure 5).After this step, we jump straight to line 30, because there is only one link. We do not enterthe if statement at line 30 because s equals r . Then we set complete(s) to true at line 36. Notethat s is still f . In the next iteration of the while loop at line 14 we have s = X . Because ofthe shared structure of common variables, we call Finish( f ) at line 19, but complete( f ) is true,so we exit this function call at line 8. Next follows the loop at line 20. We have the initial link X and f ( X ), so in this case t = f ( X ), but complete(t) is true and the node t is ignored (line22). Moving on, on line 30, we enter the if statement and jump to line 32, because s = X ,which is a variable. At line 32 we set subs ( X ) = f ( X ) and at line 33 we add X to SIGMA .Then, at line 36, we set complete(s) to true. The stack is now empty, so we go to the line38 where we set complete(r) to true (this is the second time we set complete(s) to true). Theexecution of
Finish is done and we call
Finish on all variable nodes. We have only one variable, X , which has complete(X) set to true, so we immediately return. Now we call BUILD-SIGMA .One important observation is that we finished the main algorithm and the occurs-check at line9 did not happen.In Figure 6 we show the implementation of
BUILD-SIGMA . The function
BUILD-SIGMA creates a substitution from a ordered substitution in linear time. By running the algorithm, weconclude that it enters an infinite loop. In short, below are order of the function calls.1.
BUILD-SIGMA(list( X )) - at line 12. EXPLORE-VARIABLE( X ) -at line 33. DESCEND( f ( X ) ) - at line 74. EXPLORE-ARGUMENTS(list( X )) - at line 21fx XFigure 5: The data structures after adding the variable X to the stack. 3 typo in the Paterson-Wegman-de Champeaux algorithm Motroi and Ciobˆac˘a Procedure
BUILD-SIGMA( list-of-variables ) : FOR-EACH variable x i in list-of-variables do Add to final substitution x i → EXPLORE-VARIABLE( x i ) Function
EXPLORE-VARIABLE( x i ) : if Ready( x i ) = NIL then Exit with Ready( x i ) out := DESCEND(Subs( x i )) if out = NIL then out := x i Ready( x i ) := out Exit with out Function
DESCEND( u i ) : if u i = NIL then Exit with NIL if Variable( u i ) then Exit with
EXPLORE-VARIABLE( u i ) if Constant( u i ) then Exit with u i if Ready( u i ) then Exit with Ready( u i ) out := EXPLORE-ARGUMENTS (arguments-of( u i )) if out = arguments-of( u i ) then Ready( u i ) := out else // Cons gets as first argument a node and as a second argument // a pointer to a list of nodes and will return a pointer to // a list of nodes with the first argument in front // of the second argument. Ready( u i ) := Cons(Head-of( u i ), out) Exit with Ready( u i ) Function
EXPLORE-ARGUMENTS( list-of-arguments ) : if list-of-arguments = NIL then Exit with NIL DESCEND (1st(list-of-arguments)) tail-new := EXPLORE-ARGUMENTS (tail(list-of-arguments)) if = tail-new = tail(list-of-arguments) then Exit with Cons(1st-new, tail-new) Exit with list-of-argumentsFigure 6: Post-processing step described by de Champeaux.4 typo in the Paterson-Wegman-de Champeaux algorithm Motroi and Ciobˆac˘a DESCEND( X ) - at line 346. EXPLORE-VARIABLE ( X ) - at line 16The Ready variable is not used. As a result, we enter a infinite loop.
The issue with the pseudo-code presented by de Champeaux is on line 36 in the
Finish pro-cedure. Based on the pseudo-code by Paterson-Wegman,
Complete(s) should be set to trueinside the if statement at line 36. We propose a fixed version in Figure 7. This change fixes thepseudo-code and the algorithm remains linear time and there are no further issues.
We investigate the Paterson-Wegman linear-time unification algorithm as improved by de Cham-peaux. We show an example where the occurs-check test fails to work as expected and results inan infinite loop in the post-processing step. We show that the issue is caused by a misindentedstatement (line 36) in the pseudo-code. Once the statement is properly indented, the algorithmis correct and works in linear-time as claimed.
References [1] Dennis de Champeaux. About the Paterson-Wegman linear unification algorithm.
J. Comput. Syst.Sci. , 32(1):79–90, February 1986.[2] Erik Jacobsen. Unification and anti-unification. Technical report, 1991 (accessed: June 2020). http://erikjacobsen.com/pdf/unification.pdf .[3] M. S. Paterson and M. N. Wegman. Linear unification. In
Proceedings of the Eighth Annual ACMSymposium on Theory of Computing , STOC ’76, pages 181–186, New York, NY, USA, 1976. ACM. typo in the Paterson-Wegman-de Champeaux algorithm Motroi and Ciobˆac˘a Procedure
Finish( r ) : if complete(r) then Exit if pointer(r) = NIL then Exit with failure Create new pushdown stack withoperations
Push (*) and
Pop pointer(r) := r Push (r) while stack = NIL do s := Pop if r, s have different functionsymbols then Exit with failure FOR-EACH parent t of s do Finish (t) FOR-EACH link (s, t) do if Complete(t) or t = r then Ignore t else if pointer(t) = NIL then pointer(t) := r Push (t) else if pointer(t) = r then Exit with failure else Ignore t if s = r then if Variable(s) then Subs(s) := r Add s to SIGMA else Create links { j th son(r), j th son(s) | ≤ j ≤ q } Complete(s) := true end Complete(r) := true Procedure
Finish( r ) : if complete(r) then Exit if pointer(r) = NIL then Exit with failure Create new pushdown stack withoperations
Push (*) and
Pop pointer(r) := r Push (r) while stack = NIL do s := Pop if r, s have different functionsymbols then Exit with failure FOR-EACH parent t of s do Finish (t) FOR-EACH link (s, t) do if Complete(t) or t = r then Ignore t else if pointer(t) = NIL then pointer(t) := r Push (t) else if pointer(t) = r then Exit with failure else Ignore t if s = r then if Variable(s) then Subs(s) := r Add s to SIGMA else Create links { j th son(r), j th son(s) | ≤ j ≤ q } Complete(s) := true end38