Ties between Parametrically Polymorphic Type Systems and Finite Control Automata
TTies between Parametrically Polymorphic Type Systems andFinite Control Automata
Extended Abstract
JOSEPH (YOSSI) GIL and ORI ROTH,
The TechnionWe present a correspondence and bisimulation between variants of parametrically polymorphic type systemsand variants of finite control automata, such as FSA, PDA, tree automata and Turing machine. Within thiscorrespondence we show that two recent celebrated results on automatic generation of fluent API are optimal incertain senses, present new results on the studied type systems, formulate open problems, and present potentialsoftware engineering applications, other than fluent API generation, which may benefit from judicious use oftype theory.CCS Concepts: β’ Software and its engineering β General programming languages ; API lan-guages ; Polymorphism . Additional Key Words and Phrases: type systems, automata, computational complexity, fluent API
Computational complexity of type checking is a key aspect of any type system. Several classicalresults characterize this complexity in type systems where the main type constructor is functionapplication: Type checking in the S imply T yped L ambda C alculus (STLC), in which functionapplication is the sole type constructor, is carried out in linear time. In the Hindley-Milner (HM)type system [Damas and Milner 1982; Hindley 1969; Milner 1978], obtained by augmenting theSTLC with parametric polymorphism with unconstrained type parameters, type checking is harder,and was found to be deterministic exponential (DEXP) time complete [Kfoury et al. 1990]. However,the GirardβReynolds type system [Girard 1971, 1972; Reynolds 1974] (System-F) which generalizesHM is undecidable [Wells 1999].In contrast, our work focuses in type systems where the main type constructor is pair (or tuple),i.e., no higher order functions. This type constructor models object based programming , includingconcepts such as records, classes and methods, but not inheritance. In particular, we investigatethe computational complexity of such systems in the presence of parametric polymorphism, alsocalled genericity, allowing generic classes and generic functions.We acknowledge significant past work on more general systems modeling the combination ofgenericity with the object oriented programming paradigm, i.e., classes with single and even multipleinheritance. Type checking in these is particularly challenging, since inheritance may be used toplace sub-type and super-type constraints on the parameters to generics. In fact, Kennedy andPierce [2007] showed that, in general, such type systems are undecidable. Their work carefullyanalyzed the factors that may lead to undecidability, and identified three decidable fragments, butwithout analyzing their complexity. In fact, the presumed decidability of C π of type systems, andvariants of finite control automata, such as FSA, PDA, tree automata and Turing machine, organizedin another conceptual lattice π . a r X i v : . [ c s . P L ] O c t oseph (Yossi) Gil and Ori Roth With this correspondence we determine the exact computational complexity class of type checkingof many, but not all, type systems in π ; for other type systems, we provide upper and lower bounds,leaving the precise characterizations as open problems. We also show that two celebrated results onthe fluent API problem, are optimal in certain senses. The research also has practical applicationsfor language design, e.g., Thm. 5.1 below shows that introducing functions whose return type isdeclared by keyword auto to C Recall that a fluent API generator transforms β , the formal language that specifies the API, into πΏ = πΏ ( β ) , a library of type definitions in some target programming language , e.g., Java, Haskell, orC++. Library πΏ ( β ) is the fluent API library of β if an expression π type checks (in the targetlanguage) against πΏ ( β ) if an only if word π€ = π€ ( π ) is in the API language β , π€ ( π ) β β β π type checks against πΏ ( β ) . It is required that expression π is in the form of a chain of methodinvocations. The word π€ = π€ ( π ) is obtained by enumerating, in order, the names of methods inthe chain of π , e.g., a fluent API generator for C++ receives a language β over alphabet Ξ£ = { π, π } ,i.e., β β { π, π } β and generates as output a set of C++ definitions in which an expression such as new Begin().a().b().a().a().end() (1.1)type checks if, and only if word ππππ belongs in language β .The most recent such generator is TypelevelLR due to Yamazaki, Nakamaru, Ichikawa andChiba [2019].
TypelevelLR compiles an
LR language β into a fluent API library πΏ ( β ) in ei-ther Scala, C++, or, Haskell (augmented with β four GHC extensions: MultiParamTypeClasses , FunctionalDependencies , FlexibleInstances , and
UndecidableInstances β), but neither Javanor C
TypelevelLR makes it possible to switch between different front-ends totranslate a context free grammar specification of β into an intermediate representation. Differentsuch front-ends are SLR, LALR, LR(1) grammar processors. Similarly to traditional multi-languagecompilers, the front-ends compile the input specification into a library in Fluent , an intermediatelanguage invented for this purpose; the back-ends of
TypelevelLR translates πΏ Fluent into an equivalentlibrary πΏ = πΏ ( πΏ Fluent ) in the target languages. TypelevelLR strikes a sweet spot in terms of front-ends: It is a common belief that most program-ming languages are LR, so there is no reason for a fluent API generator to support any wider classof formal languages for the purpose of the mini programming language of an API. On the otherhand,
TypelevelLR βs client may tune down the generality of
TypelevelLR , by selecting the mostefficient front-end for the grammar of the particular language of the fluent API.We show that in terms of computational complexity,
TypelevelLR strikes another sweet spotin selecting
Fluent , specifically, that
Fluent = LR (Thm. 5.3). Equality here is understood in termsof classes of computational complexity, i.e., for every set of definitions πΏ in Fluent , there existsan equivalent formal language β = β ( πΏ ) β LR, and for every β β LR there exists equivalentlibrary πΏ = πΏ ( β ) . Also term Fluent in the equality refers to the computational complexity classdefined by the
Fluent language. This abuse of notation is freely used henceforth.Is there a similar sweet spot in the back-ends of
TypelevelLR ? Why didnβt
TypelevelLR includeneither
Fluent -into-Java nor
Fluent into-C
It follows from Thm. 5.3 that
Fluent can be compiled into a type system π only if π is (computa-tionally wise) sufficiently expressive , i.e., LR β π . But which are the features of π that make it this Sect. B gives more precise definitions and motivation for the fluent API problem L eft-to-right, R ightmost derivation [Knuth 1965] 2 ies between Type Systems and Automata expressive? Motivated by questions such as theses, we offer in Sect. 3 a taxonomy, reminiscentof π -cube [Barendregt 1991], for the classification of parametrically polymorphic type systems.The difference is that π -cube is concerned with parametric polymorphism where the main typeconstructor is function application; our taxonomy classifies type system built around the pairingtype constructor, as found in traditional imperative and object oriented languages.The taxonomy is a partially ordered set, specifically a lattice, π of points spanned by six, mostlyorthogonal characteristics . (See Table 3.1 below.) A point π β π is a combination of features(values of a characteristic) that specify a type system, e.g., Fluent is defined by combination ofthree non-default features, monadic-polymorphism , deep-argument-type , and rudimentary-typeof ofthree characteristics; features of the three other characteristics take their default (lowest) value, linear-patterns , unary-functions , and, one-type .We say that π is less potent than π , if π is strictly smaller in the lattice order than π , i.e., anyprogram (including type definitions and optionally an expression to type check) of π is also aprogram of π . In writing π = π ( π β π ) we mean that the computational complexity class of π is the same as (strictly contained in) that of π . The Pp p Type System.
We employ π to analyze Fling, yet another API generator [Gil and Roth2019] (henceforth G&R), capable of producing output for Java and C Fluent , type definitions produced by Flingbelong to a very distinct fragment of type systems of Java and C unboundedunspecialized parametric polymorphism β, and we call Pp p3 here.In plain words, Pp p refers to a type system in which genericity occurs only in no-parametersmethods occurring in generic classes (or interfaces) that take one or more unconstrained typearguments, as in, e.g., List. 1.1 . In terms of lattice π , type system Pp p is defined by feature polyadic-parametric-polymorphism of the β number of type arguments" characteristics (and default, least-potentfeature value of all other characteristics). Listing 1.1
An example non-sense program in type system Pp p class Program { // Type definitions interface πΎ πΎ interface πΎ πΎ πΎ πΎ interface πΎ πΎ πΎ πΎ πΎ πΎ { // Initializer with expression(s) to check. (( πΎ πΎ πΎ // Type checks (( πΎ πΎ πΎ // Type check error } } We prove that Pp p = DCFL (Thm. 4.1), i.e., computational complexity of Pp p is the same as Fluent .Further, we will see that type systems less potent than Pp p reduce its computational complexity.Other results (e.g., Thm. 4.2 and Thm. 4.3) show that making it more potent would have increasedits computational complexity. Combining Theory and Practice: Fling+
TypelevelLR architecture.
As Yamazaki et al. noticed,translating of
Fluent into mainstream programming language is not immediate. Curiously, the typesystems of all target languages of
TypelevelLR are undecidable. However, it follows from Thm. 5.3that the target language, from the theoretical complexity perspective, is only required to be at leastas expressive as DCFL, as is the case in language such as Java, ML, (vanilla) Haskell, and C read βplain parametric polymorphismβ, or, βpolyadic parametric polymorphismβ here Although not intended to be executable, Java (and C++) code in this examples can be copied and pasted as is (includingUnicode characters such as πΎ ) into-, and then compiled on- contemporary IDEs. Exceptions are expressions included fordemonstrating type checking failure. 3 oseph (Yossi) Gil and Ori Roth To bring theory into practice, notice that all these languages contain the Pp p type system. Weenvision a software artifact, the whose architecture combines TypelevelLR and Fling, making itpossible to compile of the variety of LR grammars processors into any programming languagewhich supports code such as List. 1.1. Front ends of this βultimate fluent API generatorβ are the sameas
TypelevelLR . However, instead of directly translating
Fluent introduce a (rather straightforward)implementation, e.g., in Java, of the algorithm behind the proof of Thm. 5.3, plugging it as a back endof
TypelevelLR . Concretely, the artifact compiles
Fluent into a specification of a DPDA (deterministicpushdown automaton) as in the said proof. We then invoke (a component of) Fling to translate theDPDA specification into a set of Pp p type definitions. The back-ends of Fling are then employed totranslate these type definitions to the desired target programming language. Outline.
Sect. 2 presents the lattice π of finite control automata on strings and trees, ranging fromFSAs to Turing machines, and reminds the reader of the computational complexity classes of automatain this lattice, e.g., in terms of families of formal languages in the Chomsky hierarchy. The lattice ofparametrically polymorphic type systems π is then presented in Sect. 3. The presentation makes clearbisimulation between the runs of certain automata and type checking in certain type systems, wherebyobtaining complexity results of these type systems.Sect. 4, concentrating on parallels between real-time automata and type systems, derives furthercomplexity results. In particular, this section shows that Fling is optimal in the sense that no widerclass of formal languages is used by Pp p , the type system it uses. Non real-time automata, and theirrelations to type systems which admit the typeof keywords are the subject of Sect. 5. In particular,this section sets the computational complexity of Fluent and several variants. Sect. 6 then turns todiscussing the ties between non-deterministic automata and type systems that allow an expression tohave multiple types.Sect. 7 concludes in a discussion, open problems and directions for future research.While reading this paper, readers should notice extensive overloading of notation, made in attemptto highlight the ties between automata and type systems. The list of symbols in Sect. A should help inobtaining a full grasp of this overloading. Appendices also include some of the more technical proofsand other supplementary material.
This section presents a unifying framework of finite control automata and formal languages,intended to establish common terminology and foundation for the description in the forthcomingSect. 3 of parametrically polymorphic type systems and their correspondence to automata.Definitions here are largely self contained, but the discussion is brief; it is tacitly assumed thatthe reader is familiar with fundamental concepts of automata and formal languages, which we onlyre-present here.
We think of automata of finite control as organized in a conceptual lattice π . The lattice (strictlyspeaking, a Boolean algebrea ) is spanned by seven (mostly) orthogonal characteristics , such as thekind of input that an automaton expects, the kind of auxiliary storage it may use, etc. Overall,lattice π includes automata ranging from finite state automata to Turing machines, going throughmost automata studied in the classics of automata and formal languages (see, e.g., Hopcroft, Motwaniand Ullman [2007]).Concretely, Table 2.1 defines lattice π , by row-wise enumeration of the said characteristics andthe values that each may take. We call these values properties of the lattice. ies between Type Systems and Automata Characteristic
Values in increasing potence
No. states(Def. 2.2)
Aux. storage(Sect. 2.3)
Recognizer kind(Def. 2.1, Def. 2.9). π -transitions(Def. 2.5) π -transitions Determinism(Def. 2.6)
Rewrite multiplicity(Def. 2.7)
Rewrite depth(Def. 2.8)
Table 2.1. Seven characteristics and 18 prop-erties spanning lattice π of finite control au-tomata Values of a certain characteristics are mutually exclu-sive: For example, the first row in the table states thatthe first characteristic, number of states , can be either stateless (the finite control of the automata does not in-clude any internal states) or stateful (finite control maydepend on and update an internal state). An automatoncannot be both stateful and stateless.An automaton in π is specified by selecting a valuefor each of the characteristics.The table enumerates properties in each characteristicin increasing potence order. For example, in βnumber ofstatesβ characteristic, stateful automata are more potent than stateless automata, in the sense that any computa-tion carried out by π΄ , a certain stateless automaton in π ,can be carried out by automaton π΄ β² β π , where the onlydifference between π΄ and π΄ β² is that π΄ β² is stateful .Each automaton in the lattice might be fully specified as a set of seven properties β¨ π , . . . , π β© . Inthe abbreviated notation we use, a property of a certain characteristic is mentioned only if it isnot the weakest (least potent) in this characteristic. For example, the notation β β¨β© β is short for theautomaton with least-potent property of all characteristics, π β₯ = β¨β© = β¨ stateless , no-store , language , real-time , determ , shallow , linear β© , (2.1)the bottom of lattice π . Table 2.2 offers additional shorthand using acronyms of familiar kinds ofautomata and their mapping to lattice points. For example, the second table row maps FSAs tolattice point β¨ stateful , non-deter β© . Acronym Common name Lattice point Complexity D eterministic F inite S tate A utomaton DFSA β¨ stateful β© REG F inite S tate A utomaton FSA non-deterministic- DFSA = β¨ stateful, non-deterministic β© REG S tateless R eal-time D eterministic P ush D own A utomaton SRDPDA β¨ pushdown, stateless, π -transitions, non-deterministic,shallow, linear β© β DCFL R eal-time D eterministic P ush D own A utomaton RDPDA stateful- SRDPDA= β¨ pushdown, stateful, π -transitions,non-deterministic, shallow, linear β© β DCFL D eterministic P ush D own A utomaton DPDA π -transitions- RDPDA = β¨ pushdown, stateful, π -transitions,deterministic, shallow, linear β© DCFL T ree A utomaton TA β¨ tree-store, stateless, real-time, shallow, linear β© DCFL P ush D own A utomaton PDA non-deterministic- DPDA= β¨ pushdown, stateful, π -transitions,non-deterministic, shallow, linear β© CFL R eal-time T uring M achine RTM β¨ linearly-bounded-tape, stateful, real-time, deterministic,shallow, linear β© β CSL L inear B ounded A utomaton LBA β¨ linearly-bounded-tape, shallow, linear β©β¨ FSA= β¨ linearly-bounded-tape, deterministic, shallow, linear β© CSL T uring M achine TM unbounded-tape- LBA= β¨ unbounded-tape, stateful,deterministic, shallow, linear β© RE REG β DCFL β CFL β CSL β PR β R β RE β¨ π -transitions, non-deterministic β©β¨ FSA = FSA [Autebert et al. 1997, Example 5.3] deep- DPDA = DPDA deep- PDA = PDA non-deterministic- TM = TM Table 2.2. Selected automata in the lattice π and their computational complexity classes Observe that just as the term pushdown automaton refers to an automaton that employs apushdown auxiliary storage, we use the term tree automaton for an automaton that employs a tree oseph (Yossi) Gil and Ori Roth auxiliary storage. Some authors use the term for automata that receive hierarchical tree ratherthan string as input. In our vocabulary, the distinction is found in the language-recognizer vs. forest-recognizer properties of the βrecognizer kindβ characteristic.The final column of Table 2.2 also specifies the computational complexity class of the automatondefined in the respective row. In certain cases, this class is a set of formal languages found in theChomsky hierarchy. From the first two rows of the table we learn that even though DFSAs areless potent than FSAs, they are able to recognize exactly the same set of formal languages, namelythe set of regular languages denoted REG. By writing, e.g., DFSA = FSA = REG, we employ theconvention of identifying an automaton in the lattice by its computational complexity class. Wenotice that a less potent automaton is not necessarily computationally weaker.
As usual, let Ξ£ be a finite alphabet , and let Ξ£ β denote the set of all strings (also called words) over Ξ£ ,including π , the empty string. A (formal) language β is a (typically infinite) set of such strings,i.e., β β Ξ£ β .Definition 2.1. A recognizer of language β β Ξ£ β is a device that takes as input a word π€ β Ξ£ β and determines whether π€ β β . Let π΄ be a finite control automata for language recognition. (Automata for recognizing forestsare discussed below in Sect. 2.4.) Then, π΄ is specified by four finitely described components: states,storage, consuming transition function, and π -transition function:(1) States.
The specification of these includes (i) a finite set π of internal states (or states ), (ii) adesignated initial state π β π , and, (iii) a set πΉ β π of accepting states .Definition 2.2. π΄ is stateful if | π | > ; it is stateless if | π | = , in which case πΉ = π = { π } . (2) Storage.
Unlike internal state, the amount of data in auxiliary storage is input dependent, henceunbounded. The pieces of information that can be stored is specified as a finite alphabet Ξ of storage symbols , which is not necessarily disjoint from Ξ£ .The organization of these symbols depends on the auxiliary-storage characteristic of π΄ : In pushdown-store automata, Ξ is known as the set of stack symbols , and the storage layout issequential. In tree-store automata, the organization is hierarchical and Ξ is a ranked-alphabet.In tape automata, Ξ is called the set of tape symbols , and they are laid out sequentially in auni-directional tape.Let πͺ denote the set of possible contents of the auxiliary storage. In pushdown automata πͺ = Ξ β ; in tape automata, the storage contents includes the position of the head: Specifically, in unbounded-tape-store (employed by Turing machines), πͺ = N Γ Ξ β . We set πͺ = N Γ Ξ β also forthe case of linearly bounded automata. For tree-store automata, πͺ = Ξ β³ , where Ξ β³ is definedbelow as the set of trees whose internal nodes are drawn from Ξ .Definition 2.3. An I nstantaneous D escription (ID, often denoted π ) of π΄ running on inputword π€ β Ξ£ β includes three components: (i) a string π’ β Ξ£ β , where π’ is a suffix of π€ , specifyingthe remainder of input to read; (ii) the current state π β π , and, (iii) πΈ β πͺ , the current contentsof the auxiliary storage. The auxiliary storage is initialized by a designated value πΈ β πͺ . Any run of π΄ on input π€ β Ξ£ β begins with ID π = β¨ π€, π , πΈ β© , and then proceeds as dictated by the transitions functions.Definition 2.4. π΄ is no-store if | Ξ | = , in which case πͺ is degenerate, πͺ = { πΈ } . (3) Consuming transition function.
Denoted by πΏ , this partial , possibly multi-valued function,defines how π΄ proceeds from a certain ID to a subsequent ID in response to a consumptionof a single input letter. ies between Type Systems and Automata β’ Function πΏ depends on (i) π β Ξ£ , the current input symbol , being the first letter of π’ ,i.e., π’ = ππ’ β² , π’ β² β Ξ£ β (ii) π β π the current state, and, (iii) πΈ β πͺ , the current contents of theauxiliary storage. β’ Given these, πΏ returns a new internal state π β² β π and the new storage contents πΈ β² for thesubsequent ID. The βremaining inputβ component of the subsequent ID is set to π’ β² .(4) π -transition function. A partial, multi-valued function π specifies how π΄ moves from a certainID to a subsequent ID, without consuming any input. Function π depends on the currentstate π β π and πΈ , the storageβs contents, but not on the current input symbol. Just like πΏ ,function π returns a new internal state π β² β π and storage contents πΈ β² for the subsequent ID.However, the remaining input component of IDs is unchanged by π .Automaton π΄ accepts π€ if there exists a run π , π , . . . , π π , that begins with the initial ID π = β¨ π€, π , πΈ β© and ends with an ID π π = β¨ π, π, πΌ β© in which all the input was consumed, the internalstate π is accepting, i.e., π β πΉ , and no further π -transitions are possible, i.e., π ( π π ) is not defined.On each input letter, automaton π΄ carries one transition defined by πΏ , followed by any numberof π -transitions defined by π , including none at all. A real-time automaton is one which carriesprecisely one transition for each input symbol.Definition 2.5. π΄ is real-time if there is no id π for which π ( π ) is defined. Real-time and non-real-time automata are, by the above definitions, non-deterministic. Sinceboth πΏ and π are multi-valued, an ID does not uniquely determine the subsequent ID.Definition 2.6. π΄ is deterministic if (i) partial functions π and πΏ are single valued, and, (ii) thereis no ID π for which both π ( π ) and πΏ ( π ) are defined. Both deterministic and non-deterministic automata may hang , i.e., they might reach an ID π for which neither π ( π ) nor πΏ ( π ) are defined. If all runs of a non-deterministic automaton π΄ on agiven input π€ either hang or reach a non-accepting state, π΄ rejects π€ . Alternatively, if the theonly run of a deterministic automaton π΄ on π€ hangs, automaton π΄ rejects π€ . Hanging is the onlyway a stateless automaton can reject. A stateful automaton rejects also in the case it reaches anon-accepting state π β π \ πΉ after consuming all input. Since functions πΏ and π are finitely described, they are specified as two finite sets, Ξ and Ξ of input-to-output items, e.g., the requirement in Def. 2.5 can be written as | Ξ | =
0. Since the transformationof auxiliary storage πΈ to πΈ β² by these functions must be finitely described, only a bounded portionof πΈ can be examined by π΄ . The transformation πΈ to πΈ β² , what we call rewrite of auxiliary storage ,must be finitely described in terms of this portion. Tape initialization, rewrite, and head overflow.
The literature often defines tape automata with noconsuming transitions, by making the assumption that they receive their input on the tape storewhich allows bidirectional movements. Our lattice π specifies that the input word π€ is consumedone letter at a time. No generality is lost, since with the following definitions tape automaton π΄ β π may begin its run by consuming the input while copying it to the tape, and only then process itwith as many π -transitions are necessary.The contents πΈ of tape auxiliary storage is a pair ( β, πΎ πΎ Β· Β· Β· πΎ π β ) , where integer β β₯ πΎ πΎ Β· Β· Β· πΎ π β β Ξ β is the tapeβs content. Let πΈ = ( π, ) , i.e., the tape is initiallyempty and the head is at location 0. Rewrites of tape are the standard read and replace of symbol-under-head, along with the move-left and move-right instructions to the head: Tape rewrite πΎ β πΎ β²+ (respectively, tape rewrite πΎ β πΎ β²β ) means that if πΎ β = πΎ then replace it with, not necessarily distinct,symbol πΎ β² β Ξ and increment (respectively, decrement) β . A third kind of rewrite is β₯ β πΎ , whichmeans that if the current cell is undefined, i.e., β β { , . . . , π β } , replace it with πΎ β Ξ . oseph (Yossi) Gil and Ori Roth The automaton hangs if β becomes negative, or if β exceeds π , the inputβs length, in the case of alinear bounded automaton. Rewrites of a pushdown.
Rewrites of a pushdown auxiliary storage are the usual push and popoperations; we will see that these can be regarded as tree rewrites.
Trees.
A finite alphabet Ξ is a signature if each πΎ β Ξ is associated with a integer π = π ( πΎ ) β₯ arity ). A tree over Ξ is either a leaf , denoted by the symbol πΊ , or a (finite) structure inthe form πΎ ( π ) , where πΎ β Ξ of arity π is the root and π = π‘ , . . . , π‘ π is a multi-tree , i.e., a sequence of π (inductively constructed) trees over Ξ . Let Ξ β³ denote the set of trees over Ξ .Let Depth ( π‘ ) be the depth of tree π‘ (for leaves let, Depth ( πΊ ) = monadic treeabbreviation by which tree πΎ ( πΎ ( πΊ ) , πΎ ( πΎ ( πΊ ))) is written as πΎ ( πΎ , πΎ πΎ ) , and tree πΎ ( πΎ (Β· Β· Β· ( πΎ π ( πΊ )))) is written as πΎ πΎ Β· Β· Β· πΎ π . If the rank of all πΎ β Ξ is 1, then Ξ β³ is essentially the set Ξ β , and everytree π‘ β Ξ β³ can be viewed as a stack whose top is the root of π‘ and depth is Depth ( π‘ ) .In this perspective, a pushdown automaton is a tree automaton in which the auxiliary tree ismonadic. We set πΈ in tree automata to the leaf πΊ β Ξ β³ , i.e., the special case pushdown automatonstarts with an empty stack. Terms.
Let π = { π₯ , π₯ , . . . } be an unbounded set of variables disjoint to all alphabets. Then, a pattern (also called term ) over Ξ is either some variable π₯ β π or a structure in the form πΎ ( π , . . . , π π ) ,where the arity of πΎ β Ξ is π and each of π , . . . , π π is, recursively, a term over Ξ . Let Ξ β³ denote theset of terms over Ξ . Thus, Ξ β³ β Ξ β³ , i.e., all trees are terms. Trees are also called grounded terms ; ungrounded terms are members of Ξ β³ \ Ξ β³ . A term is linear if no π₯ β π occurs in it more than once,e.g., πΎ ( π₯, π₯ ) is not linear while πΎ ( π₯ , πΎ ( π₯ , π₯ )) is linear, Terms match trees.
Atomic term π₯ matches all trees in Ξ β³ ; a compound linear term π = πΎ ( π , . . . , π π ) matches tree π‘ = πΎ ( π‘ , . . . , π‘ π ) if for all π = , . . . , π , π π recursively matches π‘ π , e.g., πΎ ( π₯ , πΎ ( π₯ , π₯ )) matches πΎ ( πΊ , πΎ ( πΊ , πΎ ( πΊ , πΊ ))) . To define matching of non-linear terms define tree substitution π (substi-tution for short) as a mapping of variables to terms, π = { π₯ β π , . . . , π₯ π β π π } . Substitution π is grounded if all terms π , . . . , π π are grounded. An application of substitution π to term π , denoted π / π ,replaces each variable π₯ π with term π π if and only if π₯ π β π π β π . The notation π β² β π is to say thatterm π matches a term π β² , which happens if there exists a substitution π such that π β² = π / π . Tree rewrites. A tree rewrite rule π ( rewrite for short) is a pair of two terms written as π = π β π .Rewrite π is applicable to (typically grounded) term π β² if π β² = π / π for some substitution π . Ifrewrite π matches term π β² then π β² / π , the application of π to π β² (also written π β² / π β π ) yields theterm π β² = π / π .The definition of rewrites does not exclude a rewrite πΎ ( π₯ ) β πΎ ( π₯ , πΎ ( π₯ )) , whose right-hand-side term introduces variables that do not occur in the left-hand-side term. Applying such a rewriteto a tree will always convert it to a term. Since the primary intention of rewrites is the manipulationof trees, we tacitly assume here and henceforth that it is never the case; a rewrite π β π is validonly if Vars ( π ) β Vars ( π ) .Manipulation of tree and pushdown auxiliary storage is defined with rewrites. For example, therewrite πΎ ( πΎ ( π₯ )) β πΎ ( π₯ ) , or in abbreviated form πΎ πΎ π₯ β πΎ π₯ , is, in terms of stack operations: ifthe top of the stack is symbol πΎ followed by symbol πΎ , then pop these two symbols and then pushsymbol πΎ onto the stack. With these definitions: β’ Each member of set Ξ is in the form β¨ π, π, π, π β² β© meaning: if the current input symbol is π ,the current state is π and auxiliary storage π‘ matches π , then, consume π , move to state π β² and set the storage to π‘ / π . β’ Each member of set Ξ is in the form β¨ π, π, π β² β© meaning: if the current state is π and auxiliarystorage π‘ matches π , then, move to state π β² and set the storage to π‘ / π . ies between Type Systems and Automata A tree rewrite π = π β π is linear if π is linear, e.g., rewrites πΎ ( π₯ ) β πΎ β² ( π₯, π₯, π₯ ) and πΎ ( π₯ , π₯ ) β πΎ ( πΎ ( π₯ , π₯ ) , πΊ ) are linear, but πΎ ( π₯, π₯ ) β πΊ is not. Notice that rewrites of tape and pushdown auxiliarystorage are linear: the transition functions of these do never depend on the equality of two tape orpushdown symbols.Definition 2.7. π΄ is linear-rewrite if all rewrites in Ξ and Ξ are linear. Let Depth ( π ) , π = π β π , be Depth ( π ) , and where the depth of terms is defined like treedepth, a variable π₯ β π considered a leaf. A term (rewrite) is shallow if its depth is at most one,e.g., π₯ , πΎ ( π₯ ) , and πΎ ( π₯, π₯ ) are shallow, while πΎ ( πΎ ( π₯ )) is not. Rewrite of tape storage are shallow bydefinition, since only the symbol under the head is inspected.Definition 2.8. π΄ is shallow-rewrite if all rewrites in Ξ and Ξ are shallow. In the case that the set of input symbols Ξ£ is a signature rather than a plain alphabet, the input to afinite control automata is then a tree π‘ β Ξ£ β³ rather than a plain word. We use the term forest forwhat some call tree language , i.e., a (typically infinite) set of trees. Generalizing Def. 2.1 we define:Definition 2.9. A recognizer of forest Β£ β Ξ£ β³ is a device that takes as input a tree π‘ β Ξ£ β³ anddetermines whether π‘ β Β£ . As explained in Sect. 2.2 a language-recognizer automaton scans the input left-to-right. However,this order is not mandatory, and there is no essential difference between left-to-right and right-to-left automata. This symmetry does not necessarily apply to a forest-recognizer automatonβthere ismuch research work on comparing and differentiating bottom-up and top-down traversal strategiesof finite control automata (e.g., CoquidΓ© et al. [1994] focus on bottom-up automata, Guessarian[1983] on top-down, while Comon et al. [2007] presents several cases in which the two traversalstrategies are equivalent.)Our interest in parametrically polymorphic type systems sets the focus here on the bottom-uptraversal strategy only. Most of the description of language-recognizer automata above in Sect. 2.2remains unchanged. The state and storage specification are the same in the two kinds of recognizers,just as the definitions of deterministic and real-time automata. Even the specification of π , the π -transition function is the same, since the automaton does not change its position on the input treeduring an π -transition.However, input consumption in forest recognizers is different than in language recognizers, andcan be thought of as visitation. A bottom-up forest-recognizer consumes an input tree node labeled π of rank π by visiting it after its π children were visited. Let π , π , . . . , π π be the states of the automatonin the visit to these children, and let π be the multi-state of the π children, i.e., π = π , π , . . . , π π .Then, the definition of πΏ is modified by letting it depend on multi-state π β π π rather than ona single state π β π . More precisely, each input-to-output item in Ξ takes the form β¨ π, π , π, π β² β© ,meaning, if (i) the automaton is in a node labeled π , and (ii) it has reached states π , π , . . . , π π in the π children of this node, and if storage rewrite rule π is applicable, then select state π β² for the current nodeand apply rewrite π .Consider π , the rewrite component of an input-output item. As it turns out, only tree auxiliarystorage makes sense for bottom up forest recognizers . Let π‘ , . . . , π‘ π be the trees representing thecontents of auxiliary storage in π children of the current node. Rewrite rule π should produce anew tree π‘ of unbounded size from a finite inspection of the π trees, whose size is also unbounded.We say that π is a many-input tree rewrite rule (for short, rewrite when the context is clear) ifit is in the form π = π , . . . , π π β π β² . Rule π = π , . . . , π π β π β² is applied to all children, with thestraightforward generalization of the notions of matching and applicability of a single-input-rewrite: In top-down forest recognizers pushdown auxiliary storage is also admissible.9 oseph (Yossi) Gil and Ori Roth A multi-term π is a sequence of terms π = π , . . . , π π , and a multi-tree π is a sequence oftrees, π = π‘ , . . . , π‘ π . Then, rule π = π β π β² applies to (also, matches ) π if there is a singlesubstitution π such that π π / π = π‘ π for all π = , . . . , π . The application of π to π is π / π . This section offers a unifying framework for parametrically polymorphic type systems. Definitionsreuse notations and symbols introduced in Sect. 2 in the definition of automata, but with differentmeaning. For example, the Greek letter π above denoted an input letter, but will be used hereto denote the name of a function defined in a certain type system. This, and all other cases ofoverloading of notation are intentional , with the purpose of highlighting the correspondencebetween the two unifying frameworks. Characteristic
Values in increasing order πΆ Number of type arguments(Sect. 3.1.2) πΆ Type pattern depth(Sect. 3.1.3) πΆ Type pattern multiplicity(Sect. 3.1.4) πΆ Arity of functions(Sect. 3.1.5) πΆ Type capturing(Sect. 3.1.6) πΆ Overloading(Sect. 3.1.7)
Table 3.1. Six characteristics and 17 properties spanning lat-tice π of parametrically polymorphic type systems. Examine Table 3.1 describing π , the lat-tice (Boolean algebra) of parametricallypolymorphic type systems. This table isthe equivalent of Table 2.1 depicting π ,the lattice of finitely controlled automata.We use the terms potence , characteristics ,and properties as before, just as the con-ventions of writing lattice points and useof abbreviations.Table 3.1 give means for economicspecification of different variations ofparametrically polymorphic types sys-tems. For example, inspecting Yamazakiet al.βs work we see that the type systemof the Fluent intermediate language is
Fluent = β¨ monadic-parametric-polymorphism,deep-type-pattern,rudimentary β© , (3.1)i.e., (i) it allows only one parameter generics, e.g., interface πΎ πΎ πΎ (ii) it allows generic functions to be defined for deeply nested generic parameter type, such as static
The example shows (i) definitions of two classes , A and B , (ii) methods in different classes have thesame name, but different return type, (iii) an expression whose type correctness depends on thesedefinitions.Fig. 3.1 presents the abstract syntax, notational conventions and typing rules of π β₯ . The subse-quent description of type systems in π is by additions and modifications to the figure. Fig. 3.1
The bottom of lattice π : the type system β¨β© modeling the object-based paradigm π :: = Ξ π Ξ :: = πΏ β πΏ :: = π : πΎ β πΎ β² :: = π : πΊ β πΎ β² π :: = π | π.π | π ( π ) (cid:18) FunctionApplication (cid:19) π : π‘π : π‘ β π‘ β² π.π : π‘ β² (cid:18) One TypeOnly (cid:19) π : π‘ π : π‘ π‘ β π‘ π : β₯ π Program π Expression Ξ Set of function definitions πΏ A function definition π Function name, drawn from al-phabet Ξ£ πΎ Class names, drawn from alpha-bet Ξ disjoint to Ξ£ π‘, π‘ β² , π‘ , π‘ Grounded (non-generic) types πΊ The unit type π The single value of the unit type β₯ The error type (a) Abstract syntax (b) Typing rules (c) Variables and notations
A type in π β₯ is either drawn from Ξ , or is the designated bottom type πΊ . The atomic expression,bootstrapping expression π , is denoted by π , and its type is πΊ .The figure defines program π in β¨β© as a set Ξ of function definitions πΏ followed by an expression π to type check. For π drawn from set Ξ£ of function names, and types names πΎ , πΎ drawn from set Ξ of class names, we can think of a function definition of the form π : πΎ β πΎ as either β’ a method named π in class πΎ taking no parameters and returning class πΎ , or , β’ an external function taking a single parameter of type πΎ , and returning a value of type πΎ .With the first perspective, the recursive description of expressions is the Polish convention, π :: = π.π ,best suited for making APIs fluent. With the latter perspective, this recursive definition shouldbe made in prefix notation, i.e., π :: = π ( π ) . Fig. 3.1 uses both variants, and we will use theseinterchangeably. Indeed, the distinction between methods and functions is in our perspective onlya syntactical matter.The special case of a function taking the unit type as argument, π : πΊ β πΎ , can be thought ofas an instantiation of the return type, new πΎ . The function name, π , is not essential in this case,but is kept for consistency. Also in the figure is the standard Function Application typing rule.Overloading on the parameter type is intentionally allowed, i.e., methods defined in different classesmay use the same name. The One Type Only rule excludes overloading based on the return type. Let Pp p be short for lattice point β¨ polyadic-parametric-polymorphism β© , as demonstrated in List. 1.1 above. Pp p is the type system behind LINQ , the firsttheoretical treatise of fluent API [Gil and Levy 2016], Fling and other fluent API generators, e.g.,of [Xu 2010] and [Nakamaru et al. 2017].The definition of Pp p relies on the definitions of trees, terms and rewrites in Sect. 2.3. Notice thatin π β₯ , types were drawn from set Ξ . In allowing generic types the type repertoire is extended to Ξ β³ ,the set of trees over signature Ξ . A type πΎ β Ξ of rank π β₯ π type parameters; theonly leaf, of rank 0, is the unit type πΊ . Pp p also admits βtermsβ, i.e., trees including formal variablesdrawn from the set Ξ β³ . We refer to terms of Pp p as β ungrounded types β; an ungrounded type is also ignore the somewhat idiosyncratic distinction between classes and interfaces https://docs.microsoft.com/en-us/dotnet/api/system.linq 11 oseph (Yossi) Gil and Ori Roth viewed in Pp p as a type pattern that typically match βgrounded typesβ (trees in Ξ β³ ), but can also beused for matching over ungrounded types.Fig. 3.2 summarizes the changes in Pp p βs definitions with respect to those of π β₯ in Fig. 3.1. Fig. 3.2
The type system Pp p (same as Fig. 3.1 (a) and...) (same as Fig. 3.1 (b) and...) (same as Fig. 3.1 (c) and...) πΏ :: = π : πΎ ( π ) β π term πΎ ( π ) is linear :: = π : πΊ β π‘ π :: = π₯ , . . . , π₯ π π :: = πΎ ( π ) | π₯ | π‘ π :: = π , . . . ,π π π‘ :: = πΎ ( π ) | πΊπ :: = π‘ , . . . , π‘ π (cid:169)(cid:173)(cid:171) GenericFunctionApplication (cid:170)(cid:174)(cid:172) π : π‘π : π β π β² π‘ = π / π π.π : π β² / π π,π β² Type patterns, drawn from Ξ β³ π Multi-pattern, i.e., a sequence oftype patterns ππ₯ Type variables, drawn fromset π disjoint to all alphabets π Multi-variable, i.e., a sequenceof type variables π Tree substitution (a) Abstract syntax (b) Typing rules (c) Variables and notations
The main addition of Pp p to π β₯ is allowing function definition πΏ to take also the form π : πΎ ( π ) β π ,where π = π₯ , . . . , π₯ π here is a sequence of π distinct type variables: β’ The single parameter to functions is a multi-variable, yet shallow and linear, type pattern πΎ ( π ) .This requirement models the definition of methods in List. 1.1, i.e., in generic classes with π independent type variables. The structure of this pattern implicitly models the Java/C β’ Also, as demonstrated by List. 1.1, π , the return type of a function in this form, is a typepattern of any depth constructed from the variables that occur in π but also from any othertypes in Ξ .The figure also shows how the Function Application typing rule is generalized by employingthe notions of matching and tree substitution from Sect. 2.3.The definition of a dyadic-parametric-polymorphism type system adds to Fig. 3.2 the requirementthat π ( πΎ ) β€
2. In monadic-parametric-polymorphism , used for fluent API generation by Nakamaruet al. [2017] and Yamazaki et al. [2019], the requirement becomes π ( πΎ ) = π‘ :: = πΎ ( π‘ ) instead of π‘ :: = πΎ ( π ) , π :: = πΎ ( π ) instead of π :: = πΎ ( π ) , and πΏ :: = π : πΎ ( π₯ ) β π instead of πΏ :: = π : πΎ ( π ) β π . Java, C f defined by static
Java proram in type system π = β¨ n-ary,deep,non-linear β© requiring over five minutesof compilation time by ecj executing on contemporary hardware class S2 { interface π {} interface C
1. The detailsare in Fig. 3.3.
Fig. 3.3
The type system β¨ n-ary-functions,deep β© (same as Fig. 3.2 (a) and...) (same as Fig. 3.2 (b) and...) (same as Fig. 3.2 (c) and...) πΏ :: = π : π β π Vars ( π )β Vars ( π ) π :: = π Γ π Γ Β· Β· Β· Γ π π π :: = π | π .π | π ( π ) π :: = π , π , . . . , π π (cid:18) MultipleArguments (cid:19) π : π‘ , π : π‘ , . . . , π π : π‘ π π : π Γ π Γ Β· Β· Β· Γ π π β ππ‘ = π / π π‘ = π / π . . . π‘ π = π π / π π , π , . . . , π π .π : π / π π , π , . . . , π π Expressions π‘ , π‘ , . . . , π‘ π Grounded types π ,π . . . ,π π Generic types π Multi-expression, i.e.,a sequence of expres-sions π , . . . , π π (a) Abstract syntax (b) Typing rules (c) Variables and notations Comparing the figure to Fig. 3.2 above we notice the introducing of notation π for a sequence ofexpressions. With this notation, a call to an π -ary function can be written π .π (Polish, fluent APIlike, convention) or as π ( π ) (traditional convention). As might be expected, the figure also extendsthe function application typing rule to non-unary functions.Note that languages embedded in n-ary- Pp p are no longer languages of words, but rather forests βlanguages of trees. Indeed, an expression in n-ary- Pp p is a tree of method calls, and the set Ξ in an n-ary- Pp p program defines the set of tree-like expressions that type-check against it. oseph (Yossi) Gil and Ori Roth A primary motivation for introducing keyword decltype to C++, was stream-lining the definition of wrapper functionsβfunctions whose return type is the same as the wrappedfunction, e.g., template
Note that neither Java nor C auto functions; it appears that the designers of the languagesmade a specific effort to block loopholes that permit piecemeal definition of functionsβ return type.Fig. 3.4 presents abstract modeling of C++βs decltype ; for readability we use the more familiar typeof keyword. The figure describes n-ary-functions ; for unary-functions let π = Fig. 3.4
Type system β¨ full-typeof,deep,n-ary-functions β© (same as Fig. 3.3 (a) and...) (same as Fig. 3.3 (b) and...) (same as Fig. 3.3 (c) and...) π :: = Ξ Ξ π Ξ :: = π β π :: = π : π β typeof π :: = π : π β π Vars ( π )β Vars ( π ) πΏ :: = π : π β typeof π Vars ( π )β Vars ( π ) π :: = π .π | π .πΏ | π π :: = π , . . . , π π (cid:18) TypeofExpression (cid:19) π = π or π = ππ : π Γ Β· Β· Β· Γ π π β typeof ππ : π‘ Β· Β· Β· π π : π‘ π π‘ = π / π Β· Β· Β· π‘ π = π π / π π / π : π‘ π , . . . , π π .π : π‘ The Multiple Arguments typing rule of Fig. 3.3 isalso generalized for auxiliary functions ( π ). Ξ Set of auxiliary function def-initions, used only in typeof clause π An auxiliary function definition π Auxiliary function names,drawn from alphabet Ξ¦ disjointto Ξ£ π Pseudo expression, an ex-pression whose type is notgrounded π Sequence of pseudo-expressions (a) Abstract syntax (b) Typing rules (c) Variables and notations
The figure uses two syntactical categories for defining functions: πΏ β Ξ , which as before,defines a function named π β Ξ£ that may occur in expression π (more generally π ); the similarlystructured π β Ξ uses distinct namespace π β Ξ¦ is for functions that may occur in a typeof clause. Pseudo-expressions.
Compare π β typeof π (the format of a definition of function named π in thefigure) with π β π (the format of this definition in n-ary-function type system (Fig. 3.3)). Withouttype capturing, π βs return type is determined by a tree rewrite of the argument type(s). With typecapturing, the return type is determined by subjecting type π to other function(s). To see this,expand the recursive abstract syntax definition of π , assuming for simplicity that π = πΏ :: = π : π β typeof π .π . Β· Β· Β· .π π , (3.8)i.e., the pseudo-expression π in this case is π = π .π . Β· Β· Β· .π π . If π > typeof is specified by hierarchical structure π , for which the figure coins the term pseudo-expression . Notice that a plain expression is a tree whose leaves (type instantiations) aredrawn from Ξ and internal nodes (function calls) are drawn from Ξ£ . Pseudo expressions are moregeneral in allowing type variables in their leaves. As emphasized in the figure, these variables mustbe drawn from π , the multi-pattern defining the types of arguments to π .A full-typeof type system allows any number of function calls in pseudo-expression π , as in(3.8). In contrast, a rudimentary-typeof type system allows at most one function symbol in pseudo-expressions. This restriction is obtained by replacing the abstract syntax rule for π in Fig. 3.4 witha simpler, non-recursive variant, π :: = π .π | π .To describe the semantics of typeof , we need to extend the notion of tree substitution to pseudo-expressions as well. The application of function π of (3.8) to a multi-expression π with multi-type π ies between Type Systems and Automata requires first that π β π , where the matching uses a grounded substitution π . Then, π / π , theapplication of π to pseudo-expression π is the plain-expression obtained by replacing the typevariables in π with the ground types defined by π .Typeof Expression typing rule employs this notion as follows: typing expression π .π withfunction π : π β typeof π and arguments π : π , we (i) match the argument types with the parametertypes, π = π / π , deducing substitution π , (ii) type π / π : π‘ (using an appropriate typing rule), andfinally (iii) type π .π : π‘ . As an application of the Type of Expression rule requires an additionaltyping, of π , its definition is recursive. The one-type property means that expressions must have exactly one type (asdefined in Fig. 3.1). With the more potent, multi-type property, expressions are allowed multipletypes, by disposing the One Type Only type inference rule of Fig. 3.1. With multi-type-overloading ,expressions are allowed multiple types. With eventually-one-type , the semantics of the Ada pro-gramming language [Persch et al. 1980] apply: Sub-expressions are allowed to have multiple types.However, upper level expressions are still required to be singly typed. For example, while the upperlevel expression π = π ( π ( π ())) can be assigned at most one type, both π () and π ( π ()) may havemany types. The notation used in this section highlight ties between tree automata and type systems, e.g., atree π‘ = πΎ ( πΎ ( πΎ ) , πΎ ) can be understood as an instantiated generic type, πΎ πΎ πΎ πΎ , to use Javasyntax. Likewise the tree rewrite π = πΎ ( πΎ ( π₯ ) , π₯ ) β πΎ ( π₯ ) can be interpreted as a Java function static
To be convinced, notice the natural bisimulation of automata and type system, obtained by aone-to-one correspondence between, e.g., β’ a run of an automaton and the type checking process as dictated by the type checking rules, β’ the hanging of an automaton, and failure of type checking , β’ the input word or tree, and the type-checked expression , β’ input-output items in Ξ π΄ and function definitions in Ξ π , β’ the contents of auxiliary storage, and the type of checked expression.Observe however that states of an automaton do not easily find their parallel in the typing world(except for π β₯ = FSA, in which classes correspond to states). Luckily, the expressive power ofmany of the automata we deal with does not depend on the presence of states, e.g., it is easy to seethat deep- TA = β¨ deep,stateful β©β¨ TA.
The following result employs the type-automata correspondence to characterize the complexityclass of type system Pp p .Theorem 4.1. Pp p = TA = DCFL
Recalling the equivalence Pp p = TA (Obs. 1), the gist of the theorem is the claim TA = DCFL.Towards the proof we draw attention to G&Rβs β tree encoding β, which is essentially a reductionby which every DPDA is converted to an equivalent tree automaton. Their work then proceeds toshow how this tree automaton is emulated in the Pp p type system they use (and that the emulationdoes not incur exponential space (and time) overhead). Hence, by G&R [2019], oseph (Yossi) Gil and Ori Roth DCFL = DPDA β TA = Pp p . (4.1)A similar result is described by Guessarian [1983]. In fact, we note that Guessarianβs contribution ismore general, specifically she achieves the result that augmenting tree automata with π -transitionsand multiple states does not increase their computational class.Fact 4.1 ([Guessarian 1983, Corollary 1. (i) ]). β¨ π -transitions, stateful β©β¨ TA = TA Fact 4.1 generalizes (4.1), since DPDAs are instances of β¨ π -transitions, stateful β©β¨ TA, where thetree store is linear. The proof of Thm. 4.1 is completed by showing the inverse of (4.1).Lemma 4.1. TA β DPDA.
Proof. The proof is constructed by employing Theorem 3 of Guessarian [1983]. (Notice that sheuses the term β pushdown tree automaton β (PDTA) for top-down tree-automata. However, for thepurpose of the reduction, we concentrate on input trees that are in the form of a string, i.e., thetree traversal order is immaterial.) β‘ Observe that Lem. 4.1 means that G&Rβs result is the best possible in the following sense: It isimpossible to extend Fling to support any wider family of fluent API languages within the limitsof the fragment of the Java type system that Fling uses. Moreover, as shown by Grigore [2017],allowing the fluent API generator a larger type system fragment, makes type-checking undecidableif the larger fragment includes the common Java idiom of employing super in signatures, as in e.g.,method boolean removeIf(Predicate filter) found in the standard java.util.Collection class.Combining Obs. 1 (5), known results (Table 2.2) and Thm. 4.1, we have β¨ monadic β© = SRDPDA β DCFL = Pp p = β¨ polyadic β© , (4.2)i.e., had Pp p been weakened to allow only monadic generics, its expressive power would have beenreduced. Conversely, we would like to check the changes to complexity when Pp p is made morepotent. Consider now allowing generic functions (on top of methods of generic classes) by addingthe deep-type-pattern feature to Pp p .Theorem 4.2. DCFL β deep- TA = deep- Pp p Again, recall the equivalence deep- Pp p = deep- TA from Obs. 1. The set containment, DCFL β deep- TA follows from (4.1). It remains to show that this containment is proper.The proof of Thm. 4.2 in Sect. C.1 is by encoding the context sensitive language π π π π π π β { π, π, π } β in type system deep- Pp p , and relying on the following definition: For an integer π β₯
0, let π π ,the unary type encoding of π , be a grounded type in Pp p , π = Zero , and π π = Succ< π’ π β > , withtypes (in Java syntax) interface Zero{} and interface Succ
TAautomaton π΄ , an equivalent binary deep- TA π΄ β² . Let πΎ be a tree node in π΄ of rank π >
2: Replace πΎ withnodes πΎ , πΎ , . . . , πΎ π β of rank two, and πΎ π of rank one. Tree nodes appear in both sides of tree rewriterules, and in the initial auxiliary storage tree: Replace every occurrence of πΎ in π΄ , πΎ ( π , π , . . . , π π ) ,with πΎ ( π , πΎ ( π , . . . πΎ π β ( π π β , πΎ π ( π π )) . . . )) . β‘ π -TRANSITIONS In the previous section we showed that the addition of deep-type-pattern property, as found ingeneric, non-method functions of (say) Java, to the Pp p type system, increases its computationalcomplexity, but does not render it undecidable. We now prove that the addition of even rudimentary typeof to Pp p makes it undecidable.Theorem 5.1. β¨ deep,rudimentary-typeof β©β¨ Pp p = RE.
The following reduction is pertinent to the proof of Thm. 5.1.Lemma 5.1.
A Turing machine π can be simulated by a deep-rewrite, stateful tree automaton π΄ which is allowed π -transitions. Proof. As explained in Sect. 2, we can assume that π accepts its input on the tape with thehead on the first letter, and then engages in π -transitions only. Also, w.l.o.g., π βs tape is extendedinfinitely in both directions by an infinite sequences of a designated blank symbol β . Fig. 5.1
Turing machine acceptingthe language π π π π π π π π start π π β β + β β β β π β β β β β β + β β β + π β π + π β π + π β π β π β π β Fig. 5.1 is an example of such a machine π with internalstates π through π , single accepting state π , and, tape alpha-bet Ξ = { π, π, β } . The machine terminates in an accepting stateif and only if the tape is initialized with a word π π π π , π β₯ π fromthe beginning of the word and its counterpart letter π from thewordβs end by β , until no more π βs or π βs are left. The conven-tion of depicting transitions over edges in the graph of statesis standard, e.g., the arrow and label rendered in purple (goingfrom state π to state π ) is the π -transition item β¨ π , π β β + , π β© , (5.1)which states that if the Turing machine is in internal state π ,and, the symbol under head is π , then (i) replace π by β , (ii) increment β , and and, (iii) change internal state to π .The encoding of π in π΄ includes the following components: oseph (Yossi) Gil and Ori Roth (1) Adopting the set of states π , set of accepting states πΉ , and initial state π of π .(2) A rank-1 tree symbol for each of the tape symbols, including β .(3) Employing the designated leaf symbol πΊ β Ξ to encode the infinite sequences of β at the endsof the tape.(4) Introducing a rank-3 tree symbol β¦ for encoding the tape itself. The center child of a nodelabeled β¦ encodes of a β¦ node encodes the cell under the head; its left (resp. right) childencodes the tape to the left (resp. to the right) of the head. For example, the tape con-tents Β· Β· Β· βββ πππππ βββ Β· Β· Β· is encoded by a certain tree π‘ = β¦( π ( πΊ )) , π ( πΊ ) , π ( π ( π ( πΊ ))) .For the sake of readability we write β¦ nodes in infix notation, e.g., π‘ = π ( πΊ ))/ π ( πΊ )/ π ( π ( π ( πΊ ))) ,or even more concisely π‘ = π / π / πππ .(5) Setting πΈ = πΊ / π / π Β· Β· Β· π π , i.e., letting the initial state of auxiliary storage encode the inputword π π Β· Β· Β· π π .(6) Introducing | Ξ£ | + π΄ for each of π βs transitions: A single transition for dealingwith the πΊ leaf denoting an infinite sequence of blanks, and a transition for each tape symbol.In demonstration, transition β¨ π , π β β + , π β© (5.1) is encoded in four π -transitions of π΄ whichdiffer only in their tree rewrite rule. β¨ π , π₯ / π / ππ₯ β β π₯ / π / π₯ , π β© β¨ π , π₯ / π / ππ₯ β β π₯ / π / π₯ , π β©β¨ π , π₯ / π / β π₯ β β π₯ / β / π₯ , π β© β¨ π , π₯ / π / πΊ β β π₯ / β / πΊ , π β© . (5.2)The rules above distinguish between the values the right child of node β¦ , i.e., the symbol tothe right of the head: For example, the first rule, π₯ / π / ππ₯ β β π₯ / π / π₯ , deals with the casethis child is π followed by some tape suffix captured in variable π₯ . The rule rewrites thenode, making π the center child.Notice that with the encoding, the input to π΄ is encoded in its transitions rules. β‘ Relying on Lem. 5.1, the proof of Thm. 5.1 is completed by encoding the automaton π΄ of thelemma in the appropriate type system.Proof of Thm. 5.1. We encode automaton π΄ = π΄ ( π ) as a program π = ΞΞ π in type system β¨ deep,rudimentary β©β¨ Pp p . In this encoding, set Ξ is empty, and there is a function π β Ξ for every π -transition item in set Ξ of π΄ . Expression π type checks against Ξ , if, and only if, machine π (automaton π΄ ) halts.In the encoding, the tree vocabulary of π΄ incarnates as generic types: A three parameter generictype β¦ , and generic one-parameter type πΎ for each tape symbol, including β . Also the argument toevery function π β Ξ function is a deep pattern over possible instantiations of β¦ .Also, introduce a function symbol π π for every π β π , and let every transition β¨ π, π β π β² , π β² β© of π΄ add an overloaded definition π π : π β typeof π β² .π β² π to this symbol. Thus, function π π emulates π΄ instate π with tape π : It applies the rewrite π β π β² to the type, and employs the resolution of typeof to continue the computation in function π β² π which corresponds to the destination state π β² .For example, the Turing machine transition shown in (5.1), encoded by the tree automatontransitions of (5.2), is embedded in C++ using decltype , as depicted in List. 5.1. Listing 5.1
Proof. Yamazaki et al. [2019] showed that DCFL β Fluent , i.e., that any LR language, alterna-tively, any DCFL, can be encoded in a
Fluent program. It remains to show the converse,
Fluent β DCFL. We prove
Fluent β deep- DPDA, noting the folk-lore equality deep-
DPDA = DPDA . (5.3)The encoding of a Fluent program in a deep-
DPDA is reminiscent of the encoding of a programin rudimentary- Pp p type system in a vanilla tree automaton in the proof of Thm. 5.2 just above. Thefull proof of the current theorem is in Sect. C.3. β‘ oseph (Yossi) Gil and Ori Roth Having seen that
Fluent is not more expressive than it was intended to be, it is interesting tocheck whether its expressive power would increase if it allowed unrestricted typeof clauses.Theorem 5.4. full-typeof-Fluent β DCFL
The proof is by showing that type system full-typeof-Fluent is expressive enough to encode thelanguage π€ π€ , known to be context sensitive. The full proof is in Sect. C.4. Most previous work concentrated in recognition of deterministic languages [Gil and Levy 2016;Gil and Roth 2019; Grigore 2017; Nakamaru et al. 2017]. We show here that type system withAda-like overloading can encode non-deterministic context free languages as well. Its proof relieson creating a direct correspondence of the type system and c ontext f ree g rammars (CFGs).Theorem 6.1. UCFL β β¨ monadic, eventually-one-type β© Proof. Given an unambiguous context free grammar πΊ , we encode it as Ξ , a set of functiondefinitions in β¨ monadic, eventually-one-type β© such that πΊ derives word π Β· Β· Β· π π if, and only if,expression π.π . Β· Β· Β· .π π . $ ($ being a dedicated function symbol) type checks against Ξ .We redefine CFGs using a notation more consistent with this manuscript: Context free grammar πΊ is a specification of a formal language over alphabet Ξ£ in the form of a quadruple β¨ Ξ£ , Ξ , πΊ , π β© where Ξ£ is the set of πΊ βs terminals, Ξ is the set of grammar variables, πΊ β Ξ is the start symbol, and π is a setof derivation rules. Each derivation rule π β π is either in the form πΊ β π , or in the form πΎ β π ,where πΎ β Ξ and where π is a possibly empty sequence of terminals and grammar variables,i.e., π β ( Ξ£ βͺ Ξ ) β .Recall that a grammar is in G reibach N ormal F orm (GNF) if every rule π β π is in one of threeforms (i) the usual form, π = πΎ β π πΈ , where π β Ξ£ is a terminal and πΈ β Ξ β is a sequence ofvariables, (ii) the initialization form, π = πΊ β π πΈ , or, (iii) the π -form , π = πΊ β π , present only if thegrammar derives the empty word π β Ξ£ β .For the encoding, first convert unambiguous grammar πΊ into an equivalent unambiguousgrammar in GNF. This is done using the algorithm of Nijholt [1979] (also presented in moreaccessible form by Salomaa and Soittola [1978]).The type encoding of GNF grammar πΊ uses a monadic generic type πΎ for every symbol πΎ β Ξ , anadditional monadic generic type $ , and, one non-generic type πΊ , also known as the unit type.For each derivation rule π β π introduces a function πΏ β Ξ that uses these types: β’ Suppose π includes the π -form rule πΊ β π $, introduce (one overloaded) definition of func-tion $ : πΊ β πΊ . Then, π. $, the expression corresponding to the empty word, type-checks totype πΊ . (Recall that π is the single type of the unit type πΊ .) β’ If π is in the initialization form πΊ β π πΈ then πΏ = π : πΊ β πΈ $ . For such a rule introduce alsofunction $ : β $ πΊ β πΊ . β’ If π is in the usual form πΎ β π πΈ , then πΏ = π : πΎπ₯ β πΈ π₯ .We show by induction on π = , . . . , π the following claim on the partial expression π π = π.π . Β· Β· Β· .π π : The set of types assigned by the type checker to π π includes a type πΈ $ , πΈ β Ξ + , ifand only if, there exists a l eft m ost d erivation (LMD) that yields the sentential form π Β· Β· Β· π π πΈ .For the inductive base observe that π = π and that the set of types of π includes only the unittype πΊ ; indeed there is a (trivial) LMD of the degenerate sentential form π πΊ = πΊ .Consider an LMD of π Β· Β· Β· π π π π + πΈ β² $ , where π < π , πΈ β² β Ξ + and π π + is the terminal π β Ξ£ , π β $.We show that πΈ β² is a type of π π + = π ( π π ) . The said LMD can only be obtained by applying arule π = πΎ β π πΈ β to the sentential form π Β· Β· Β· π π πΈ $ , where πΎ is the first symbol of πΈ .By examining the kind of functions in Ξ , one can similarly show that every type πΈ β² of π π + is anevidence of an LMD of a sentential form π Β· Β· Β· π π π π + πΈ β² . ies between Type Systems and Automata The proof is completed by manually checking that a full expression, ending with the . $ invocationcan only type check to a single type, πΊ , and this can happen only if the type of π.π . Β· Β· Β· .π π is πΈ ,where πΈ occurs in an initialization rule πΊ β π π πΈ . β‘ Sect. D.2 demonstrates the proof by presenting a fluent API of the non-deterministic context freelanguage of even length palindromes.If final expressions are also allowed to be multi-typed, then we can construct fluent API for allcontext free languages.Theorem 6.2. β¨ monadic, multiple-type β© = CFL
Proof. The construction in the proof of Thm. 6.1 works here as well. Note that here the transitionfrom a plain CFG to GNF does not have to preserve unambiguity. β‘ Perspective.
Revisiting Table 3.1, we see that in total it has | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | = Β· Β· Β· Β· Β· = lattice points. Accounting for the fact that in a nyladic type system, thevalues of πΆ ( type pattern depth ), and πΆ ( type pattern multiplicity ) are meaningless, we see thatlattice π spans | πΆ | Β· | πΆ | Β· | πΆ | = Β· Β· = monomorphic type systems ( π β₯ among them),and (| πΆ | β ) Β· | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | = Β· Β· Β· Β· Β· = potential polymorphic type systems(Pp p and Fluent among them). To make the count more exact, account for πΆ being irrelevant in a monadic type system, obtaining | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | = Β· Β· Β· = monadic , yet polymorphic typesystems, and (| πΆ | β ) Β· | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | Β· | πΆ | = Β· Β· Β· Β· Β· = non- monadic polymorphictype systems.Beyond the implicit mention that the type-automata correspondence applies to monomorphictype systems , these were not considered here. Our study also invariably assumed unary-function ,ignoring in characteristic πΆ n-ary-functions type systems which comprise half of the type systemsof π .Even though most of this work was in characterizing the complexity classes of type systems,it could not have covered even the ( + )/ =
126 type systems remaining in scope. Thestudy rather focused on these systems which we thought are more interesting: We gave an exactcharacterization of the complexity classes of two central type systems, Pp p (Thm. 4.1) and Fluent (Thm. 5.3), and investigated how this complexity changes if the type systems are made more orless potent along π βs characteristics (with the exception of πΆ , the function arity characteristic).Comparing (3.1) with Table 3.1 we see that Fluent can be made more potent along πΆ , πΆ , or πΆ ,and, as follows from our results, its complexity class increases in all three cases:(1) In πΆ , Fluent β dyadic-Fluent = RE, by combining Thm. 4.4 and Thm. 5.1.(2) In πΆ , Fluent β eventually-one-type-Fluent (Thm. 6.1).(3) In πΆ , Fluent β full-typeof-Fluent (Thm. 5.4).Conversely, Fluent can be made less potent along characteristics πΆ , πΆ and πΆ :(1) In πΆ complexity decreases, Fluent β monadic = πΉππ΄ β Fluent (Obs. 1).(2) In πΆ , (5.3) makes us believe that complexity does not change, Fluent β deep + shallow = Fluent .(3) In πΆ , then, by Obs. 1 and (5.3)), Fluent β rudimentary = deep- RDPDA. We believe complexitydecreases but are unsure.Type system Pp p can be made more potent along characteristics πΆ , πΆ , πΆ and πΆ :(1) In πΆ complexity increases, Pp p β deep- Pp p (Thm. 4.2). the ignored n-ary-functions correspond to the forest-recognizer brand of automata; however forest-recognizer automatawere used in the construction, e.g., in Lem. 5.1. 21 oseph (Yossi) Gil and Ori Roth (2) In πΆ complexity increases, Pp p β non-linear- Pp p (Thm. 4.3).(3) In πΆ complexity does not change, Pp p = rudimentary-typeof- Pp p (Thm. 5.2).(4) In πΆ complexity increases, Pp p β eventually-one-type- Pp p (Thm. 6.1).Type system Pp p can be made less potent only along characteristic πΆ . From Obs. 1 and Thm. 4.1, πΉππ΄ = β¨ nylaldic β© β ππ π·ππ·π΄ = β¨ monadic β© β β¨ dyadic β© β β¨ polyadic β© β β¨ polyadic β© = DCFL , (7.1)i.e., it is not known whether decreasing Pp p along πΆ to dyadic reduces its complexity, but decreasingit further to monadic certainely does.This work should also be viewed as a study of the type-automata correspondence: (i) The resultsin Sect. 4 revolve around the correspondence between tree-store automata employing tree rewrites,and type system in which the signature of functions employs type pattern to match its argument. (ii)
Sect. 5 explored the correspondence between typeof clause in the signature of functions, and π -transitions of automata. (iii) The correspondence between non-deterministic runs and allowingmultiple types of expressions, or at least as a partial step during resolution of overloading wasthe subject of Sect. 6. Overall, our study confirmed that the type-automata correspondence is asignificant aid in the characterization of complexity classes, either by a direct bisimulation betweenthe two, or by employing and adapting (sometimes ancient) contributions in the decades oldresearch of automata.
Open Problems.
Technically, we leave open the problem of characterizing the complexity classof each of the 126 type systems that were not considered at all, or, considered, but not fully character-ized. However, many of these can be trivially solved, e.g., since π = β¨ ππππ, ππ’ππππππ‘πππ¦, ππππ¦ππππ β© = RE, (Thm. 5.1), π = RE for all π β π , π > π . We draw attention to four type systems for whichwe are able to set a lower and an upper bound, but still miss precise characterization, e.g., in termsof familiar computational complexity classes.(1) deep- Pp p , for which we have DCFL β deep- Pp p β CSL by Thm. 4.2.(2) non-linear- Pp p , for which we also have DCFL β non-linear- Pp p β CSL by Thm. 4.3.(3) β¨ deep,non-linear β©β¨ Pp p , for which we have again DCFL β β¨ deep, non-linear β©β¨ Pp p β CSL byThms. 4.2 and 4.3.(4) full-typeof-Fluent , for which we have DCFL β full-typeof-Fluent β RE by Thm. 5.4.Also, we do not know yet how these relate to each other in terms of computational complexity,beyond what can be trivially inferred by π βs partial order. Sect. D.3 may offer some insights. Expression Trees vs. Expression Words.
Language recognizers, i.e., automata which take trees asinputs were defined and used in the proofs. Still, this study does not offer much on the study of n-ary-functions βthe type counterpart of language recognizers. There is potential in exploring thetheory of polymorphic types of tree shaped expressions. In particular, it is interesting to study typesystems π = β¨ n-ary , deep β© and π = β¨ n-ary , deep , non-linear β© , both modeling static generic multi-argument functions of C π adds the power, and predicament (see List. 3.1), ofnon-linear type patterns. In the type-automata perspective π and π correspond to forest-recognizerreal-time tree-store brand of automata, which received little attention in the literature. We see twonumber of potential applications of type theory, for which (say) Pp p is insufficient, and could serveas motivation for resolving the open problems above and for the study of π and π . Types for linear algebra
The matrix product π΄ Γ π΅ is defined if matrix π΄ is π Γ π and matrix π΅ is π Γ π , in which case the result is an π Γ π matrix. The matrix addition π΄ + π΅ is definedonly if both π΄ and π΅ are π Γ π , in which case the result is also π Γ π . The unary encodingof integers and their comparison in one step in the proof of Thm. 4.3 seem to be sufficient fordeveloping a decidable type system that enforces such constraints. ies between Type Systems and Automata However, unlike type systems for checking fluent API, types for linear algebra implementedthis way are impractical: matrices whose dimensions are in the range of thousands arecommon, e.g., in image processing. But, programmers cannot be expected to encode integersthis large in unary, not mentioning the fact that such types tend to challenge compilersβstability. The problem is ameliorated in π in which a decimal (say) representation of integersis feasible. A more precise design is left for future research.A more difficult challenge is the type system support and checking of operations whichinvolve integer arithmetic. A prime example is numpy βs reshape operation which converts,e.g., an π Γ π matrix to an π Γ π matrix, where correctness is contingent on the equalityif π Β· π = π Β· π . Indeed, we are not aware of any decidable type system that can dointeger multiplication. Dimensional types
A similar challenge is supporting of physical dimensions , i.e., a design of atype system allowing, e.g., the division of distance quantity by time quantity obtaining speedquantity, and addition and comparison distance quantities, but forbidding, e.g., additionand comparison of time and distance quantities. To do so, the type system should probablyencode (cid:206) ππ = π₯ π π π , π π β Z , the general form of a physical dimension (in say MKS), as a tupleof π of signed integers.To enforce the rules of addition and comparison of physical dimensions, the type systemshould be able compare (typically very small) integers, as done in Thm. 4.3, although theimplementation should be tweaked to support negative integers. For multiplying and dividingphysical quantities, the type system should be able to add (small) integers. We do not knowwhether this is possible in π or π . Modeling type erasure.
Finally, we draw attention to the fact that Javaβs type erasure is notaccurately modeled by our system. In particular Java forbids function overloading if the type of theoverloaded functions becomes identical after type erasure. We propose this type inference rule fortype erasure (cid:16)
TypeErasure (cid:17) π : πΎ ( π ) β π π : πΎ ( π β² ) β π β² π : β₯ (7.2)and leave the problem of studying type systems with type erasure to future research. REFERENCES
Nada Amin and Ross Tate. 2016. Java and Scalaβs type systems are unsound: The existential crisis of null pointers. In
Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, andApplications (OOPSLA 2016) . Association for Computing Machinery, New York, NY, USA, 838β848. https://doi.org/10.1145/2983990.2984004Jean-Michel Autebert, Jean Berstel, and Luc Boasson. 1997.
Context-free languages and pushdown automata . Springer, Berlin,Heidelberg. 111β174 pages. https://doi.org/10.1007/978-3-642-59136-5\protect$\relax_3$Henk Barendregt. 1991. Introduction to generalized type systems.
J. Functional Programming
On the power of real-time Turing machines: π tapes are more powerful than π β tapes Theoretical Comp. Science
Proceedings of the 9th ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL β82) . Association for Computing Machinery,New York, NY, USA, 207β212. https://doi.org/10.1145/582153.582176 https://numpy.org/ 23 oseph (Yossi) Gil and Ori Roth Alan A.A. Donovan and Brian W. Kernighan. 2015.
The Go Programming Language . Addison-Wesley Professional, Boston,MA, USA.Yossi Gil and Tomer Levy. 2016. Formal language recognition with the Java type checker. In , Shriram Krishnamurthi and Benjamin S.Lerner (Eds.), Vol. 56. Schloss DagstuhlβLeibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 10:1β10:27. https://doi.org/10.4230/LIPIcs.ECOOP.2016.10Yossi Gil and Ori Roth. 2019. Flingβa fluent API generator. In , Alastair F. Donaldson (Ed.), Vol. 134. Schloss DagstuhlβLeibniz-Zentrum fuerInformatik, Dagstuhl, Germany, 13:1β13:25. https://doi.org/10.4230/LIPIcs.ECOOP.2019.13Jean-Yves Girard. 1971. Une Extension De Δ½Interpretation De GΓΆdel a Δ½Analyse, Et Son Application a Δ½Elimination DesCoupures Dans Δ½Analyse Et La Theorie Des Types. In
Proceedings of the Second Scandinavian Logic Symposium , J.E.Fenstad (Ed.). Studies in Logic and the Foundations of Mathematics, Vol. 63. Elsevier, 63β92. https://doi.org/10.1016/S0049-237X(08)70843-7Jean-Yves Girard. 1972.
InterprΓ©tation fonctionnelle et Γ©limination des coupures de lβarithmΓ©tique dβordre supΓ©rieur . Ph.D.Dissertation. UniversitΓ© Paris.Radu Grigore. 2017. Java generics are Turing complete.
SIGPLAN Not.
52, 1 (Jan. 2017), 73β85. https://doi.org/10.1145/3093333.3009871IrΓ¨ne Guessarian. 1983. Pushdown tree automata.
Math. Syst. Theory
16, 1 (1983), 237β263.R. Hindley. 1969. The principal type-scheme of an object in combinatory logic.
Trans. Amer. Math. Soc.
Introduction to automata theory, languages, and computation (3rd ed.). Pearson Addison Wesley, Boston, MA.Richard M. Karp. 1972. Reducibility among combinatorial problems. In
Proc. Symp. Complex. Comp. , Raymond E. Miller,James W. Thatcher, and Jean D. Bohlinger (Eds.). Springer, Yorktown Heights, NY, 85β103. https://doi.org/10.1007/978-1-4684-2001-2\protect$\relax_9$Andrew Kennedy and Benjamin Pierce. 2007. On decidability of nominal subtyping with variance. In
Int. Workshop Found.& Devel. OO Lang. (FOOL/WOODβ07) . Nice, France. http://foolwood07.cs.uchicago.edu/program/kennedy-abstract.htmlA. J. Kfoury, J. Tiuryn, and P. Urzyczyn. 1990. ML typability is DEXPTIME-complete. In
CAAP β90 , A. Arnold (Ed.). Springer,New York, 206β220.Donald E. Knuth. 1965. On the translation of languages from left to right.
Info & Comp.
8, 6 (1965), 607β639. https://doi.org/10.1016/S0019-9958(65)90426-2Robin Milner. 1978. A theory of type polymorphism in programming.
J. Comput. System Sci.
17, 3 (1978), 348β375.https://doi.org/10.1016/0022-0000(78)90014-4Tomoki Nakamaru and Shigeru Chiba. 2020. Generating a generic fluent API in Java.
The Art, Science, and Eng. of Prog.
4, 3(Feb. 2020). https://doi.org/10.22152/programming-journal.org/2020/4/9Tomoki Nakamaru, Kazuhiro Ichikawa, Tetsuro Yamazaki, and Shigeru Chiba. 2017. Silverchain: a fluent API generator. In
Proc. 16th ACM SIGPLANInt. Conf. Generative Prog. (GPCEβ17) . ACM, Vancouver, BC, Canada, 199β211.Anton Nijholt. 1979. Grammar functors and covers: From non-left-recursive to Greibach normal form grammars.
BITNumerical Mathematics
19, 1 (01 March 1979), 73β78. https://doi.org/10.1007/BF01931223Guido Persch, Georg Winterstein, Manfred Dausmann, and Sophia Drossopoulou. 1980. Overloading in Preliminary Ada.
SIGPLAN Not.
15, 11 (Nov. 1980), 47β56. https://doi.org/10.1145/947783.948640Michael O. Rabin. 1963. Real time computation.
Israel J. Math.
Programming Symposium , B. Robinet (Ed.). Springer BerlinHeidelberg, Berlin, Heidelberg, 408β425.A. Salomaa and M. Soittola. 1978.
Automata-Theoretic Aspects of Formal Power Series . Springer-Verlag, NY.J.B. Wells. 1999. Typability and type checking in System F are equivalent and undecidable.
Annals of Pure and Applied Logic
98, 1 (1999), 111 β 156. https://doi.org/10.1016/S0168-0072(98)00047-5Hao Xu. 2010. EriLex: an embedded domain specific language generator. In
Objects, Models, Components, Patterns , Jan Vitek(Ed.). Springer, Berlin, Heidelberg, 192β212.Tetsuro Yamazaki, Tomoki Nakamaru, Kazuhiro Ichikawa, and Shigeru Chiba. 2019. Generating a fluent API with syntaxchecking from an LR grammar.
Proc. ACM Program. Lang.
3, Article Article 134 (Oct. 2019), 24 pages. https://doi.org/10.1145/3360560 24 ies between Type Systems and Automata
A ABBREVIATIONS, ACRONYMS, AND NOTATION
Acronyms
G&R Gil and Roth [2019], page 3Pp p plain parametric polymorphism, or, polyadic parametric polymorphism, (3.2) and Fig. 3.2,page 3API application programming interface, page 1CFG context free grammar, page 20CFL context free language, page 5CSL context sensitive language, page 5DCFL deterministic context free language, page 5DEXP deterministic exponential, page 1DFSA deterministic finite state automaton, page 5FSA finite state automaton, page 1GHC Glasgow Haskell Compiler, page 2GNF Greibach normal form (of CFG), page 20HM Hindley-Milner (type system), page 1IDE interactive development environment, page 3LBA linear bounded automaton, page 5LMD left-most derivation, page 20LR left-to-right, right-most derivation, page 2MKS meter-kilogram-second (system of physical units, page 23ML the ML (βmeta-languageβ) programming language, page 3PDA pushdown automaton, page 1PDTA pushdown tree automaton, page 16RDPDA real-time deterministic pushdown automaton, page 5REG the set of regular languages, page 5RTM real-time Turing machine, page 5SRDPDA stateless real-time deterministic pushdown automaton, page 5STLC simply typed lambda calculus, page 1TA tree automaton, i.e., an automaton employing a tree store, page 5UCFL unambiguous context free language, page 20 List of Symbols
1. Latin Letters Like (upper case) Β£ forest, or language of trees, Β£ β Ξ£ β³ , page 9 π΄ a finite control automaton, page 6 π΄ a two-dimensional matrix, page 22 π΅ a two-dimensional matrix, page 22 πΆ π a characteristic of lattice π , see Table 3.1, page 10 πΆ number of type arguments (characteristic of lattice π ), see Table 3.1, page 10 πΆ type pattern depth (characteristic of lattice π ), see Table 3.1, page 10 πΆ type pattern multiplicity (characteristic of lattice π ), see Table 3.1, page 10 πΆ arity of functions (characteristic of lattice π ), see Table 3.1, page 10 πΆ type capturing (characteristic of lattice π ), see Table 3.1, page 10 πΆ overloading (characteristic of lattice π ), see Table 3.1, page 10 πΉ the set of accepting states in a finite control automaton, page 6 oseph (Yossi) Gil and Ori Roth πΊ context free grammar, page 2 πΏ library of type definitions in a programming language, page 2 π a Turing machine, page 17 π program (abstract syntax start symbol), see Fig. 3.1, page 11 π the set of internal states of a finite control automaton, page 6 π set of derivation rules of CFG, page 20 π β¨ n-ary , deep β© (type system in π ), page 22 π β¨ n-ary , deep , non-linear β© (type system in π ) List. 3.1, page 22 π a type system in lattice π , page 2 π unary type encoding of 0, base of the π π recursion, page 16 π π unary type encoding of integer π β N , defined recursively, page 16 π unbounded set of variables disjoint to all alphabets, page 8 β formal language, page 2 N the set of non-negative integers { , , , . . . } , page 6 Z set of signed integers, {Β· Β· Β· , β , β , , , , . . . } , page 23 π lattice of finite control automata, see Table 2.1, page 1 π β₯ bottom of lattice π , see (2.1), page 5 π lattice of parametrically polymorphic type systems, see Table 3.1, page 1
2. Latin Letters Like (lower case) π an example letter in alphabet, page 2 π an example letter in alphabet, page 2 π an example letter in alphabet, page 16 π expression (abstract syntax category), see Fig. 3.1, page 11 β position of the read/write head on tape auxiliary storage, page 7 π a dimension of a matrix, page 22 π exponent of certain physical unit in a physical dimension such as kilogram/meter-squared,page 23 π length of word input to finite control automaton, page 8 π π value of the lattice property π , page 5 π a state of a finite control automaton, page 6 π the initial internal state of a finite control automaton, page 6 π number of physical units in a system of physical units such as MKS, page 23 π rank/number of children in a node of a tree in Ξ β³ , page 8 π ( πΎ ) rank of symbol πΎ drawn from a signature, page 8 π tree substitution { π₯ β π , . . . , π₯ π β π π } , page 8 π‘ a tree in Ξ β³ , page 8 π‘ grounded type (abstract syntax category), see Fig. 3.1, page 11 π’ the word denoting the remainder of input to a language recognizer, page 6 π€ the input word to language recognizer, page 6 π₯ type variable (abstract syntax category), see Fig. 3.2, page 12 π₯ variable used in a term, page 8 π₯ π a physical unit such as centimeter, second, gram, and coulomb, page 23 β formal language of strings, β β Ξ£ β , page 6 β designated blank symbol occupying uninitialized cells of tape auxiliary storage, page 17 π multi-expression (abstract syntax category), see Fig. 3.3, page 13 π multi-state π , π , . . . , π π , π determined by context, page 9 π multi-tree π‘ , . . . , π‘ π , π determined by context, page 10 ies between Type Systems and Automata π multi-variable (abstract syntax category), see Fig. 3.2, page 12
3. Greek Letters Like (upper case) Ξ alphabet of symbols used in auxiliary storage, page 6 Ξ set of variables of CFG, page 20 Ξ β³ set of all trees over signature Ξ , page 8 Ξ β³ set of all terms over signature Ξ , page 8 Ξ set of input-output items of consuming transition function πΏ , page 7 Ξ set of primary function definitions (abstract syntax category), see Fig. 3.1, page 11 Ξ set of auxiliary function definitions (abstract syntax category), see Fig. 3.4, page 14 Ξ set of input-output items of π -transition function π , page 7 Ξ£ finite alphabet of symbols, page 6 Ξ£ set of terminals of CFG, page 20 Ξ£ β set of all strings (words) over Ξ£ , including the empty string, page 6 Ξ¦ set of auxiliary function names, disjoint to Ξ£ , see Fig. 3.4, page 14 πͺ set of possible contents of auxiliary storage, page 6
4. Greek Letters Like (lower case) πΎ variable (non-terminal) of CFG, page 20 πΏ a definition of primary function (abstract syntax category), see Fig. 3.1, page 11 πΏ the consuming transition function of a finite control automaton, page 7 π an instantaneous description of a finite control automaton, see Def. 2.3, page 6 π initial instantaneous description of a finite control automaton, page 6 π auxiliary function definition (abstract syntax category), see Fig. 3.4, page 14 π the π -transition function of a finite control automaton, page 7 π derivation rule of CFG, page 20 π tree rewrite rule, page 8 π A terminal of a CFG, or the special symbol $, page 20 π a letter in alphabet Ξ£ , page 7 π class name (abstract syntax category), see Fig. 3.1, page 11 π name of primary function (abstract syntax category), see Fig. 3.1, page 11 π terminal of CFG, page 20 π term in set Ξ β³ , page 8 π type pattern, i.e., ungrounded type (abstract syntax category), see Fig. 3.2, page 12 π sentential form, i.e., a sequence of terminals and variables of a CFG, π β ( Ξ£ βͺ Ξ ) β , page 20 π pseudo expression, an expression whose type is ungrounded (abstract syntax category),see Fig. 3.4, page 14 π auxiliary function name, drawn from set Ξ¦ (abstract syntax category), see Fig. 3.4, page 14 πΈ a string of symbols drawn from alphabet Ξ , page 32 πΈ entire contents of auxiliary storage, page 6 πΈ sequence of CFG variables, πΈ β Ξ β , page 20 πΈ initial contents of auxiliary storage, page 6 π multi-term, π , . . . , π π , π determined by context, page 10 π multi-type pattern, i.e., multi ungrounded type (abstract syntax category), see Fig. 3.2,page 12 πΊ degenerate tree, also denoting a leaf in any tree in Ξ β³ , page 8 πΊ designated stack symbol denoting the bottom of the stack, page 32 πΊ start symbol of CFG, page 20 oseph (Yossi) Gil and Ori Roth πΊ the unit type (terminal of abstract syntax), see Fig. 3.1, page 11 π multi-pseudo expression (abstract syntax category), see Fig. 3.4, page 14 π the empty string, page 6 π the single value of the unit type (terminal of abstract syntax), see Fig. 3.1, page 11
5. Other
Depth ( π‘ ) depth of tree π‘ β Ξ β³ , Depth ( πΊ ) =
0, page 8Depth ( π ) depth of pattern π β Ξ β³ , page 9Depth ( π ) depth of term π‘ β Ξ β³ , Depth ( π₯ ) =
0, page 9Vars ( π ) set of variables in rewrite π , page 8Vars ( π ) set of variables in term π , page 8 β₯ the error type (terminal of abstract syntax), see Fig. 3.1, page 11Fling a fluent API generator contributed by G&R , page 3 Fluent intermediate language used in the implementation of
TypelevelLR , page 2
TypelevelLR a fluent API generator due to Yamazaki, Nakamaru, Ichikawa and Chiba [2019], page 2 ies between Type Systems and Automata B FLUENT API: FROM PRACTICE TO THEORY An application programming interface (API) provides the means to interact with an application viaa computer program. For example, using a file system API we can open, read, and close files fromwithin C code: open(); // Open file read(); // Read line read(); // Read another line close(); // Close file Accompanied to an API is a protocol of use , defining rules for good API practice. A protocol is usuallybrought in internal and external documentation, delegating its imposition to the programmer. Forinstance, a typical file system API protocol disallows read() to be called before open() , and close() tobe called twice in a row. Although breaking the protocol may result in malicious run time behaviors,it nonetheless yields coherent, runnable programs.With object oriented programming (OOP) , functions (methods) are defined within classes. Toinvoke a method, it must be sent as a message to an object of the defining class. Methods of an OO fluent API yield objects that accept other API methods: Listing B.1
Fluent file system API implemented in Java class ClosedFile { OpenedFile open() {. . . } } class OpenedFile { OpenedFile read() {. . . } ClosedFile close() {. . . } } In this OO file system API there are two classes,
ClosedFile and
OpenedFile . Every API call returnseither an object of class
ClosedFile or an object of class
OpenedFile , and thus may immediately befollowed by a successive API call:
Listing B.2
Chain of fluent API method calls closedFile.open().read().read().close(); This expression conducts multiple API calls: Invoking open on a
ClosedFile object yields an
OpenedFile object. Calling read on the
OpenedFile yields itself, but a close invocation returns a
ClosedFile .The main advantage of fluent APIs is their ability to enforce a protocol at compile time : Theobject returned from API call π π () is missing method π () , if calling π at that location ( π π + β π )breaks the protocol. Consider, for instance, finishing the methods chain of List. B.2 with a second close() call, therefore breaking the file system protocol which forbids double closing: This call failsat compile time, raising a compilation error, as the first close call returns a ClosedFile object, definedin List. B.1, which lacks a close method.Fluent APIs grew in fame due to their application for domain specific languages (DSLs). In contrastto general purpose programming languages, as Java and C++, DSLs employ syntax and semanticsdesigned for a specific component. Standard query language (SQL), for example, is a DSL for writingdatabase queries. To make use of an application in a general software library, its DSL has to besubstituted for an API. Making the API fluent is then ideal: it makes it possible to embed
DSLprograms in code as chains of method calls, that preserve and enforce the original syntax of theDSL. Additional details on DSLs and fluent APIs may be found in [Gil and Roth 2019]. Strictly speaking, we need only βobject basedβ programming, which admits classes and objects, but no class inheritance.29 oseph (Yossi) Gil and Ori Roth
A protocol or a DSL may be described by a formal language β : Then, the fluent API problem is tocompile β into a fluent API that enforces the protocol. The fluent API problem is parameterized bythe complexity of the input language, and the capabilities of the host type system. The file systemprotocol, for instance, is described by a regular expression, ( open Β· read β Β· close ) β , and therefore defines a regular language. Given a class of formal languages L , we seek a minimalset of type system features required to embed L languages.As many programming languages and DSLs are not regular, practical interest lies with strongerlanguage classes. A popular approach is to use parametric polymorphism , yet another commonOOP feature . A fixed number of polymorphic classes define an infinite number of types ( A , A , A> ,. . . ): Intuitively, these types can be used to simulate an unbounded storage, required toaccept non-regular languages.Consider, for example, the following Java definitions: With these definitions, an expression of
Listing B.3
Fluent stack API implemented in Java using (monadic) polymorphism class Empty { Stack
C PROOFSC.1 Proof of Thm. 4.2
Recall that π π π π π π β CSL, and that DCFL β CSL. We show that π π π π π π β deep- Pp p . The details are inList. C.1, that employs Java syntax to show a set of definitions that recognizes the language π π π π π π . Listing C.1
Definitions in type system deep- Pp p (using Java syntax) for the language π π π π π π interface πΎ // Type after reading π π is πΎ π’ π , π’ π > interface πΎ // Type after reading π π π π is πΎ π’ π β π , π’ π > interface πΎ // Type after reading π π π π π π is πΎ π’ π β π > static πΎ // chain start static
2. The first argument of πΎ π encountered. The second argument remains however unchangedduring these encounters. β’ This second argument is then passed to generic πΎ π is encountered. It is thendecremented for each π encountered. β’ Function end type-checks only if this argument is π’ . oseph (Yossi) Gil and Ori Roth C.2 Proof of Thm. 4.3
The Java definitions in List. C.2 realize the language π π π π π π β CSL.
Listing C.2
Definitions in type system non-linear- Pp p (using Java syntax) for the language π π π π π π interface πΎ // Type after reading π π is πΎ < π’ π ,π’ ,π’ > πΎ // No phase change: increment the first type argument πΎ // First π seen: change phase, and increment second argument } interface πΎ // Type after reading π π π π is πΎ < π’ π ,π’ π ,π’ > πΎ // No phase change: increment the second type argument πΎ // First π seen: change phase, and increment third argument } interface πΎ // Type after reading π π π π π π is πΎ < π’ π ,π’ π ,π’ π > πΎ // No phase change: increment the third type argument } static πΎ // Start with type πΎ < π’ ,π’ ,π’ > static
Given is a fluent program π = ΞΞ π . We construct from the definitions Ξ and Ξ deep- DPDAautomaton π΄ . Let π = π.π . Β· Β· Β· .π π . Then, π΄ accepts π€ = π Β· Β· Β· π π if and only if π is type-correct.The construction maintains the invariant that after π΄ consumes π π and conducting all (if any) sub-sequent π -transitions, its stack contents encodes π‘ π , the type of the partial expression π = π.π . Β· Β· Β· .π π .Concretely, since Fluent is a monadic type system, π‘ π must be in the (full) form πΎ ( πΎ (Β· Β· Β· πΎ π ( πΊ ) Β· Β· Β· )) .The stack encoding of π‘ π is πΎ πΎ Β· Β· Β· πΎ π πΊ , i.e., the monadic abbreviation of the full form augmentedwith a designated symbol πΊ for denoting the stackβs bottom. For this reason, the set of stack symbolsof π΄ includes a symbol πΎ for every type name used in Ξ βͺ Ξ , and the extra symbol πΊ .The set of internal states of π΄ includes an initial and accepting state π . The automaton will be instate π initially, and then whenever it exhausted all possible π -transitions after consuming a letter,and is ready to consume the next input symbol. Also, π΄ has an internal (not-accepting) state π π for every auxiliary function name π used in Ξ . These states are used while executing π -transitions,which emulate the resolution of the rudimentary typeof clauses allowed in Fluent .As in the proof of Thm. 5.2, the rudimentary-typeof property of the type systems makes itpossible to classify any function definition in Ξ βͺ Ξ as either direct , if its type signature is π β π β² ,or as forwarding , in case it is π β typeof π β² .π .Every Fluent function is encoded in one (consuming- or π -) transition item of π΄ . In this encoding,the function type signature uniquely determines the stack rewrite rule π , but unlike in the proof ofThm. 5.2, π is not identical to the type signature.To see why, recall first that since Fluent is monadic , we can write any term π as πΈ π₯ where πΈ β Ξ β (in the case π is a proper term) or as πΈ (in the case it is a grounded). If a functionβs type is πΈ π₯ β πΈ β² ,then to maintain the invariant, π΄ needs to push the string πΈ β² πΊ to stack after emptying it, by poppingfirst the πΈ fixed portion, and then the π₯ variable portion which may of unbounded length . Alas, this π₯ portion cannot be cleared with the single stack rewrite allowed in the single transition encoding a Fluent function. ies between Type Systems and Automata For this reason, we use instead a stack rewrite π = πΈ π₯ β πΈ β² πΊ π₯ in this case, i.e., emulating stackemptying by pushing another copy of πΊ , the bottom of the stack symbol. Automaton π΄ is obliviousto the trick, since none of the rewrites in its transitions of removes a πΊ symbol off the stack.With the definition of π ( π β π β² ) by π ( π β π β² ) =  πΈ π₯ β πΈ β² π₯ if π = πΈ π₯ and π β² = πΈ β² π₯ πΈ πΊ β πΈ β² πΊ if π = πΈ and π β² = πΈ β² πΈ π₯ β πΈ β² πΊ π₯ if π = πΈ π₯ and π β² = πΈ β² (C.1)we can describe the transition encoding of each of the four kinds of functions that may occur in π .(1) Primary function definitions , found in Ξ , are encoded as consuming transitions of π΄ :(a) Direct definition π : π β π β² as β¨ π, π , π ( π β π β² ) , π β© ,(b) Forwarding definition π : π β typeof π β² .π as β¨ π, π , π ( π β π β² ) , π π β© .(2) Auxiliary function definitions , found in Ξ , are encoded as π transitions of π΄ :(a) Direct defintion π : π β π β² as β¨ π π , π ( π β π β² ) , π β© .(b) Forwarding definition π : π β typeof π β² .π β² as β¨ π π , π ( π β π β² ) , π π β² β© .We can now verify that automaton π΄ iteratively computes the type of the word-encoded inputexpression: Consuming transitions correspond to type checking of primary function invocation,while π -transitions make the detour required to compute the type of functions defined by a typeof clause. If the input expression fails type checking, then automaton π΄ hangs (whereby rejecting theinput), due to failure to find an appropriate transition for the current stack contents, internal state(and the current input symbol, when appropriate). C.4 Proof of Thm. 5.4
We present a set of full-typeof-Fluent definitions that encodes the language π€ π€ β CSL.
Listing C.3
C++, full-typeof-Fluent program recognizing the CSL π€ π€ struct E {}; // Bottom type template
A>>>> , type E is the bottom type (C++ indentifiers π and π stand for $ , which terminate all expressions.Function $ first traverses the first π€ of π€ π€ , while replacing types A and B with calls to match_a and match_b respectively. Upon reaching type S , encoding $ encodes the second π€ as atype, and reverses it; then functions match_a and match_b proceed to match the words in the correctorder. For example, expression $(a(b(s(a(b() Β· Β· Β· ) changes first into match_a(match_b($(s(a(b() Β· Β· Β· ) ,and then into match_a(match_b(B>)) ; next the match functions match π and then π , and return thebottom type E , successfully terminating the typing process. If the word before the reverse . Function reverse appends the currenttype A (resp. B ) to the end of the type, recursively, using function append2end_a ( append2end_b ). Function append2end_a examines its argument A
C++program encoding the Turing machine of Fig. 5.1 template
Here we demonstrate Thm. 6.1 and its proof, by constructing a fluent API library for palindromes inan Ada like type system, i.e., a type system with eventually-one-type style of overloading resolutions. oseph (Yossi) Gil and Ori Roth Consider the formal language of even length palindromes over alphabet { π, π } , as defined by thefollowing context free grammar πΊ β π πΊ π β π πΊ π β π. (D.1)It is well known that the language (D.1) is not-deterministic yet unambiguous. Rewriting itsgrammar in Greibach normal form gives πΊ β ππΎ β ππΎ πΎ β ππΎ πΎ β ππΎ πΎ β ππΎ β ππΎ πΎ β ππΎ πΎ β ππΎ β ππΎ β π. (D.2)Applying the construction in the proof of Thm. 6.1 to the grammar (D.2) gives the program inList. D.2, that realizes a fluent API for (D.1). Listing D.2
Definitions in type system β¨ monadic, eventually-one-type β© (using Java-like syntax)encoding the language of even lengthed palindromes interface π { πΎ πΎ } interface πΎ πΎ πΎ πΎ πΎ T a(); // Java error, overloaded functions cannot differ only by return type } interface πΎ πΎ πΎ πΎ πΎ T b(); // Java error, overloaded functions cannot differ only by return type } interface πΎ T a(); } interface πΎ T b(); } interface $ { void $(); } new π ().a().a().b().b().a().a().$(); Note that even though the program in the listing uses Java syntax, it would not provide thedesired result if compiled by a Java compiler. The reason is that Java does not permit multiple typesfor sub-expressions.Expression new π ().a().a().b().b().a().a() in List. D.2 is phrased as ππππππ βwith this prefix,the center of the word (denoted by β Β· β), separating π€ from π€ π , can be in three places: πππ Β· πππ ,in case π€ = πππ , πππππ Β· π , in case π€ = πππππ , or ππππππ Β· , in case π€ = πππππππ β . These three ies between Type Systems and Automata possibilities correspond to three types deduced for the expression. Yet, when reaching method $() ,the type checker settles the ambiguity to the favor of the first option, as only after reading π€π€ π type $ with method $() is returned. As there is exactly one way to type the entire expression, typechecking is successful. D.3 On the Complexity of Deep Polyadic Parametric Polymorphism
We take particular interest in type system deep- Pp p , since it models generic non-method functions.Also, this type system might be applicable for the software engineering applications mentioned inSect. 7.We donβt know the exact complexity class of deep- Pp p , but here are few comments and observationsthat might be useful towards characterizing it.(1) A tree automaton with π -transitions is even more potent than a two-pushdown automaton,which is equivalent to a Turing machine. This equivalence does not hold for the tree automatonin point, which is real-time.(2) A direct comparison of our real-time (and hence linear time) tree automata to real-time (orlinear time) Turing machines is not possible, since an elementary operation of tree automatamay involve transformations of trees whose size may be exponential.(3) We can still describe an emulation of the computation of real-time Turing machine (RTM, seeTable 2.1 above) by a deep tree automaton, by breaking the machineβs tape into two stacks,and store these stacks as branches of the same tree, RTM β deep- TA. Let RTM π be an RTMequipped with π β₯ = RTM from RTM , showing | π π π | β RTM . Subsequently, Bruda and Akl[1999] generalized Rabinβs result for any number of tapes, showing that RTM π β RTM π + ,for all π β₯
1. Extending the tree automaton emulation of RTMs, to run concurrently onany (fixed) number of tapes, we obtain that the entire non-collapsing hierarchy of RTMs iscontained in deep-
TA, i.e, that RTM π β deep- TA for all π β₯ deep- TA β CSL.(5) A hint to the complexity of class deep- Pp p may be found in the fact that it is closed underfinite intersection and finite union. (The proof is by merging the respective tree automataby running their rewrites in tandem on two distinct branches of the same tree. The mergedautomata recognizes intersection if there is an accept both branches; it recognizes the union,if there is an accept in one of the branches.)(6) On the other hand, we claim that deep- TA is not closed under complement (equivalently, setdifference): Consider (yet again) the language π π π π π π β deep- TA. If there was an automatonthat recognizes the complement of the language, it should accept the word ππππ , but reject itsprefix πππ . Alas, a stateless automaton such as ours, can only reject by reaching a configurationwhere there are no further legal transitions, and hence cannot recover from the rejection ofthis prefix.We are however able to show that stateful-