[PDF] Combinators and the Story of Computation

Abstract

We discuss the role of combinators in the development of the modern conception of computation over the course of the past century. We describe how ideas about formalism and mathematical logic led to the introduction of combinators in 1920 as an extension of the discovery of Nand as a basis for basic logic. We then discuss how combinators informed lambda calculus and symbolic computation, and their relationship to the development of practical computation. We finally describe recent views of combinators in terms of the computational universe of possible programs, and a recent approach to the fundamental theory of physics.

Full PDF

CCombinator s a n d t h e S tor y o f C o m p u t a t i o n Stephen

Wolfram * We discuss the role of combinators in the development of the modern conception of computation over the course of the past century. We describe how ideas about formalism and mathematical logic led to the introduction of combinators in as an extension of the discovery of Nand as a basis for basic logic. We then discuss how combinators informed lambda calculus and symbolic computation, and their relationship to the development of practical computation. We ﬁ nally describe recent views of combinators in terms of the computational universe of possible programs, and a recent approach to the fundamental theory of physics. Th e A bstrac t R epresentatio n of T hings “In principle you could use combinators,” some footnote might say. But the implication tends to be “But you probably don’t want to.” And, yes, combinators are deeply abstract— and in many ways hard to understand. But tracing their history over the hundred years since they were invented,

I’ve come to realize just how critical they’ve actually been to the develop - ment of our modern conception of computation—and indeed my own contributions to it. The idea of representing things in a formal, symbolic way has a long history. In antiquity there was Aristotle’s logic and

Euclid’s geometry . By the there was algebra, and in the Boolean algebra . Each of these was a formal system that allowed one to make deduc - tions purely within the system. But each, in a sense, ultimately viewed itself as being set up to model something speci ﬁ c. Logic was for modeling the structure of arguments, Euclid’s geometry the properties of space, algebra the properties of numbers; Boolean algebra aspired to model the “laws of thought”. But was there perhaps some more general and fundamental infrastructure: some kind of abstract system that could ultimately model or represent anything? Today we understand that’s what computation is. And it’s becoming clear that the modern conception of computa - tion is one of the single most powerful ideas in all of intellectual history—whose implica - tions are only just beginning to unfold. But how did we ﬁ nally get to it? Combinators had an important role to play, woven into a complex tapestry of ideas stretching across more than a century. A version of this document with immediately executable code is available at https://writings.stephenwolfram.com/2020/12/combinators-and-the-story-of-computation/ * Email: [email protected] he main part of the story begins in the Through the course of the and mathematics had developed a more and more elaborate formal structure that seemed to be reaching ever further. But what really was mathematics?

Was it a formal way of describing the world, or was it something else—perhaps something that could exist without any refer - ence to the world? Developments like non-Euclidean geometry, group theory and trans ﬁ nite numbers made it seem as if meaningful mathematics could indeed be done just by positing abstract axioms from scratch and then following a process of deduction. But could all of mathematics actually just be a story of deduction, perhaps even ultimately derivable from something seemingly lower level—like logic? But if so, what would things like numbers and arithmetic be? Somehow they would have to be “constructed out of pure logic”. Today we would recognize these e ﬀ orts as “writing programs” for numbers and arithmetic in a “machine code” based on certain “instructions of logic”. But back then, everything about this and the ideas around it had to be invented. Wha t Is M athematics—an d L ogic—Mad e O f? Before one could really dig into the idea of “building mathematics from logic” one had to have ways to “write mathematics” and “write logic”. At ﬁ rst, everything was just words and ordinary language. But by the end of the mathematical notation like +, =, > had been established. For a while new concepts—like Boolean algebra —tended to just piggyback on existing notation. By the end of the however, there was a clear need to extend and generalize how one wrote mathematics. In addition to algebraic variables like x , there was the notion of symbolic functions f , as in f ( x ) . In logic, there had long been the idea of letters ( p , q , ...) standing for propositions (“it is raining now”). But now there needed to be notation for quanti ﬁ ers (“for all x such-and- such”, or “there exists x such that ...”). In addition, in analogy to symbolic functions in mathematics, there were symbolic logical predicates: not just explicit statements like x > y but also ones like p ( x , y ) for symbolic p. The ﬁ rst full e ﬀ ort to set up the necessary notation and come up with an actual scheme for constructing arithmetic from logic was Gottlob

Frege ’s B e g r i f f s s c h r i f t (“concept script”): | S t e p h e n W o l f r a m nd, yes, it was not so easy to read, or to typeset—and at ﬁ rst it didn’t make much of an impression. But the notation got more streamlined with

Giuseppe

Peano ’s F o r m u l a r i o project in the wasn’t so concerned with starting from logic as starting from some speci ﬁ ed set of axioms (the “ Peano axioms ” ) : And then in Alfred

Whitehead and

Bertrand

Russell began publishing their

Principia

Mathematica —which pretty much by its sheer weight and ambition (and notwith - standing what I would today consider grotesque errors of language design)—popularized the C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | possibility of building up “the complexity of mathematics” from “the simplicity of logic”: tanding today grotesque language design)—popularized It was one thing to try to represent the content of mathematics, but there was also the question of representing the infrastructure and processes of mathematics. Let’s say one picks some axioms.

How can one know if they’re consistent? What’s involved in proving everything one can prove from them? In the David

Hilbert began to develop ideas about this, particularly in the context of tightening up the formalism of Euclid’s geometry and its axioms.

And a ft er Principia

Mathe - m a t i c a , Hilbert turned more seriously to the use of logic-based ideas to develop “metamath - ematics”—notably leading to the formulation of things like the “decision problem” ( E n t s c h e - i d u n g s p r o b l e m ) of asking whether, given an axiom system, there’s a de ﬁ nite procedure to prove or disprove any statement with respect to it. But while connections between logic and mathematics were of great interest to people concerned with the philosophy of mathematics, a more obviously mathematical develop - ment was universal algebra —in which axioms for di ﬀ erent areas of mathematics were speci ﬁ ed just by giving appropriate algebraic-like relations. (As it happens, universal algebra was launched under that name by the book A Treatise on Universal

Algebra by Alfred

Whitehead, later of Principia

Mathematica fame.)

But there was one area where ideas about algebra and logic intersected: the tightening up of Boolean algebra, and in particular the ﬁ nding of simpler foundations for it. Logic had pretty much always been formulated in terms of A n d , O r and N o t . But in Henry

She ﬀ er — attempting to simplify Principia

Mathematica— showed that just N a n d (or N o r ) were su ﬃ cient . (It turned out that Charles

Peirce had already noted the same thing in the | S t e p h e n W o l f r a m lready thing So that established that the notation of logic could be made basically as simple as one could imagine. But what about its actual structure, and axioms?

She ﬀ er talked about needing ﬁ ve “algebra-style” axioms. But by going to axioms based on logical inferences Jean

Nicod managed in to get it down to just one axiom . (And, as it happens, I ﬁ nally ﬁ nished the job in by ﬁ nding the very simplest “algebra-style” axioms for logic—the single axiom: ( ( p · q ) · r ) · ( p · ( ( p · r ) · p ) ) (cid:1) r .) The big question had in a sense been “What is mathematics ultimately made of?”. Well, now it was known that ordinary propositional logic could be built up from very simple elements. So what about the other things used in mathematics—like functions and predicates? Was there a simple way of building these up too? People like

Frege,

Whitehead and

Russell had all been concerned with constructing speci ﬁ c things—like sets or numbers—that would have immediate mathematical meaning. But

Hilbert’s work in the late began to highlight the idea of looking instead at metamathe - matics and the “mechanism of mathematics”—and in e ﬀ ect at how the pure symbolic infrastructure of mathematics ﬁ ts together (through proofs, etc.), independent of any immediate “external” mathematical meaning. Much as Aristotle and subsequent logicians had used (propositional) logic to de ﬁ ne a “symbolic structure” for arguments, independent of their subject matter, so too did Hilbert’s program imagine a general “symbolic structure” for mathematics, independent of particular mathematical subject matter. And this is what ﬁ nally set the stage for the invention of combinators. Combinator s A rrive We don’t know how long it took Moses

Schön ﬁ nkel to come up with combinators. From what we know of his personal history , it could have been as long as a decade. But it could also have been as short as a few weeks. There’s no advanced math or advanced logic involved in de ﬁ ning combinators. But to drill through the layers of technical detail of mathematical logic to realize that it’s even conceiv - able that everything can be de ﬁ ned in terms of them is a supreme achievement of a kind of abstract reductionism. There is much we don’t know about Schön ﬁ nkel as a person. But the paper he wrote on the basis of his December talk in which he introduced combinators is extremely clear. The paper is entitled “On the Building

Blocks of Mathematical

Logic” (in the original

Ger - man, “ Ü b e r die B a u s t e i n e d e r m a t h e m a t i s c h e n L o g i k ”.) In other words, its goal is to talk about “atoms” from which mathematical logic can be built. Schön ﬁ nkel explains that it’s “in C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ogic explains the spirit of” Hilbert’s axiomatic method to build everything from as few notions as possible; then he says that what he wants to do is to “seek out those notions from which we shall best be able to construct all other notions of the branch of science in question”. His ﬁ rst step is to explain that Hilbert,

Whitehead,

Russell and

Frege all set up mathematical logic in terms of standard And,

Or,

Not, etc. connectives—but that

She ﬀ er had recently been able to show that just a single connective (indicated by a stroke “|”—and what we would now call Nand) was su ﬃ cient: But in addition to the “content” of these relations, I think Schön ﬁ nkel was trying to communi - cate by example something else: that all these logical connectives can ultimately be thought of just as examples of “abstract symbolic structures” with a certain “function of arguments” (i.e. f [ x , y ]) form. The next couple of paragraphs talk about how the quanti ﬁ ers “for all” ( ∀ ) and “there exists” ( ∃ ) can also be simpli ﬁ ed in terms of the She ﬀ er stroke (i.e. Nand).

But then comes the rallying cry: “The successes that we have encountered thus far ... encourage us to attempt further progress.” And then he’s ready for the big idea—which he explains “at ﬁ rst glance certainly appears extremely bold”. He proposes to “eliminate by suitable reduction the remaining fundamental concepts of proposition, function and variable”. He explains that this only makes sense for “arbitrary, logically general propositions”, or, as we’d say now, for purely symbolic constructs without speci ﬁ c meanings yet assigned. In other words, his goal is to create a general framework for operating on arbitrary symbolic expressions independent of their interpretation. He explains that this is valuable both from a “methodological point of view” in achieving “the greatest possible conceptual uniformity”, but also from a certain philosophical or perhaps aesthetic point of view. And in a sense what he was explaining—back in something that’s been a core part of the computational language design that I’ve done for the past years: that everything can be represented as a symbolic expression, and that there’s tremendous value to this kind of uniformity. | S t e p h e n W o l f r a m ut as a “language designer” Schön ﬁ nkel was an ultimate minimalist. He wanted to get rid of as many notions as possible—and in particular he didn’t want variables, which he explained were “nothing but tokens that characterize certain argument places and operators as belonging together”; “mere auxiliary notions”. Today we have all sorts of mathematical notation that’s at least somewhat “variable free” (think coordinate-free notation, category theory, etc.) But in mathematics as it was written was full of variables. And it needed a serious idea to see how to get rid of them. And that’s where

Schön ﬁ nkel starts to go “even more symbolic”. He explains that he’s going to make a kind of “functional calculus” ( F u n k t i o n a l k a l k ü l ). He says that normally functions just de ﬁ ne a certain correspondence between the domain of their arguments, and the domain of their values. But he says he’s going to generalize that— and allow (“disembodied”) functions to appear as arguments and values of functions. In other words, he’s inventing what we’d now call higher-order functions, where functions can operate “symbolically” on other functions. In the context of traditional calculus-and-algebra-style mathematics it’s a bizarre idea. But really it’s an idea about computation and computational structures—that’s more abstract and ultimately much more general than the mathematical objectives that inspired it. But back to Schön ﬁ nkel’s paper. His next step is to explain that once functions can have other functions as arguments, functions only ever need to take a single argument. In modern ( Wolfram

Language ) notation he says that you never need f [ x , y ] ; you can always do everything with f [ x ] [ y ] . In something of a sleight of hand, he sets up his notation so that f x y z (which might look like a function of three arguments f [ x , y , z ] ) actually means ((( f x ) y ) z ) (i.e. f [ x ] [ y ] [ z ] ). (In other words— somewhat confusingly with respect to modern standard functional notation—he takes function application to be le ft associative . ) Again, it’s a bizarre idea—though actually Frege had had a similar idea many years earlier (and now the idea is usually called currying, a ft er Haskell

Curry , who we’ll be talking about later). But with his “functional calculus” set up, and all functions needing to take only one argument, Schön ﬁ nkel is ready for his big result. He’s e ﬀ ectively going to argue that by combining a small set of particular functions he can construct any possible symbolic function—or at least anything needed for predicate logic. He calls them a “sequence of particular functions of a very general nature”. Initially there are ﬁ ve of them: the identity function ( I d e n t i t ä t s f u n k t i o n ) I , the constancy function ( K o n s t a n z f u n k t i o n ) C (which we now call K ), the interchange function ( V e r t a u s c h u n g s f u n k t i o n ) T , the composition function ( Z u s a m m e n s e t z u n g s f u n k t i o n ) Z , and the fusion function ( V e r s c h m e l z u n g s f u n k t i o n ) S . C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | nd then he’s o ﬀ and running de ﬁ ning what we now call combinators . The de ﬁ nitions look simple and direct. But to get to them Schön ﬁ nkel e ﬀ ectively had to cut away all sorts of conceptual baggage that had come with the historical development of logic and mathematics. Even talking about the identity combinator isn’t completely straightforward.

Schön ﬁ nkel carefully explains that in I x = x , equality is direct symbolic or structural equality, or as he puts it “the equal sign is not to be taken to represent logical equivalence as it is ordi narily de ﬁ ned in the propositional calculus of logic but signi ﬁ es that the expressions on the le ft and on the right mean the same thing, that is, that the function value l x is always the same as the argument value x , whatever we may substitute for x .” He then adds parenthetically, “Thus, for instance, I I would be equal to I ”. And, yes, to someone used to the mathematical idea that a function takes values like numbers, and gives back numbers, this is a bit mind-blowing. Next he explains the constancy combinator, that he called C (even though the German word for it starts with K ), and that we now call K . He says “let us assume that the argument value is again arbitrary without restric tion, while, regardless of what this value is, the function value will always be the ﬁ xed value a ”. And when he says “arbitrary” he really means it: | S t e p h e n W o l f r a m it’ s no t jus t a n umbe r or s omething ; i t’ s w ha t we w oul d no w t hin k of as a n y symbolic expression. First he writes ( C a ) y = a , i.e. the value of the “constancy function C a operating on any y is a ”, then he says to “let a be variable too”, and de ﬁ nes ( C x ) y = x or C x y = x . Helpfully, almost as if he were writing computer documentation, he adds: “In practical applications C serves to permit the introduction of a quantity x as a ‘blind’ variable.” Then he’s on to T . In modern notation the de ﬁ nition is T [ f ] [ x ] [ y ] = f [ y ] [ x ] (i.e. T is essentially R e v e r s e A p p l i e d ). (He wrote the de ﬁ nition as ( T ϕ ) x y = ϕ y x , explaining that the parentheses can be omitted.) He justi ﬁ es the idea of T by saying that “The function T makes it possible to alter the order of the terms of an expression, and in this way it compensates to a certain extent for the lack of a commutative law.” Next comes the composition combinator Z . He explains that “In [mathematical] analysis, as is well known, we speak loosely of a ‘function of a function’...”, by which he meant that it was pretty common then (and now) to write something like f ( g ( x )). But then he “went symbolic”—and de ﬁ ned a composition function that could symbolically act on any two functions f and g : Z [ f ] [ g ] [ x ] = f [ g [ x ]]. He explains that Z allows one to “shi ft parentheses” in an expression: i.e. whatever the objects in an expression might be, Z allows one to transform [][][] to [[]] etc. But in case this might have seemed too abstract and symbolic, he then attempted to explain in a more “algebraic” way that the e ﬀ ect of Z is “somewhat like that of the associative law” (though, he added, the actual associative law is not satis ﬁ ed). Finally comes the p i è c e de r é s i s t a n c e : the S combinator (that Schön ﬁ nkel calls the “fusion function”): C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | e doesn’t take too long to de ﬁ ne it. He basically says: consider ( f x ) ( g x ) (i.e. f [ x ] [ g [ x ] ] ). This is really just “a function of x ”. But what function?

It’s not a composition of f and g ; he calls it a “fusion”, and he de ﬁ nes the S combinator to create it: S [ f ] [ g ] [ x ] = f [ x ] [ g [ x ]]. It’s pretty clear

Schön ﬁ nkel knew this kind of “symbolic gymnastics” would be hard for people to understand. He continues: “It will be advisable to make this function more intelligible by means of a practical example.” He says to take f x y (i.e. f [ x ] [ y ] ) to be log x y (i.e. L o g [ x , y ] ), and g z (i.e. g [ z ] ) to be + z . Then

Sfgx = ( fx ) ( gx ) = log x ( + x ) (i.e. S [ f ] [ g ] [ x ] = f [ x ] [ g [ x ] ] = L o g [ x , + x ] ). And,

OK, it’s not obvious why one would want to do that, and I’m not rushing to make S a built-in function in the Wolfram

Language . But

Schön ﬁ nkel explains that for him “the practical use of the function S will be to enable us to reduce the number of occurrences of a variable—and to some extent also of a particular function—from several to a single one”. Setting up everything in terms of ﬁ ve basic objects I , C (now K ), T , Z and S might already seem impressive and minimalist enough. But

Schön ﬁ nkel realized that he could go even further: | S t e p h e n W o l f r a m irst, he says that actually I = S C C (or, in modern notation, s [ k ] [ k ] ). In other words, s [ k ] [ k ] [ x ] for symbolic x is just equal to x (since s [ k ] [ k ] [ x ] becomes k [ x ] [ k [ x ] ] by using the de ﬁ nition of S , and this becomes x by using the de ﬁ nition of C ). He notes that this particular reduction was communicated to him by a certain Alfred B o s k o w i t z (who we know to have been a student at the time); he says that Paul

Bernays (who was more of a colleague) had “some time before” noted that I = ( S C ) ( C C ) (i.e. s [ k ] [ k [ k ] ] ). Today, of course, we can use a computer to just enumer - ate all possible combinator expressions of a particular size, and ﬁ nd what the smallest reduction is. But in Schön ﬁ nkel’s day, it would have been more like solving a puzzle by hand. Schön ﬁ nkel goes on, and proves that Z can also be reduced: Z = S ( C S ) C (i.e. s [ k [ s ] ] [ k ] ). And, C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | oes proves yes, a very simple Wolfram

Language program can verify in a few milliseconds that that is the simplest form. OK, what about T ? Schön ﬁ nkel gives steps of reduction to prove that T = S ( Z Z S ) ( C C ) (i.e. s [ s [ k [ s ] ] [ k ] [ s [ k [ s ] ] [ k ] ] [ s ] ] [ k [ k ] ] ). But is this the simplest possible form for T ? Well, no.

But (with the very straightforward

Wolfram

Language program I wrote) it did take my modern computer a number of minutes to determine what the simplest form is. The answer is that it doesn’t have size like Schön ﬁ nkel’s, but rather size Actually, there are cases of size that all work: s [ s [ k [ s ] ] [ s [ k [ k ] ] [ s ] ] ] [ k [ k ] ] ( S ( S ( K S ) ( S ( K K ) S ) ) ( K K ))) and ﬁ ve others. And, yes, it takes a few steps of reduction to prove that they work (the other size-9 cases S ( S S K ( K ( S S ( K K ) ) ) ) S , S ( S ( K ( S ( K S ) K ) ) S ) ( K K ), S ( K ( S ( S ( K S ) K ) ( K K ) ) ) S , S ( K ( S S ( K K ) ) ) ( S ( K K ) S ) , S ( K ( S ( K ( S S ( K K ) ) ) K ) ) S all have more complicated reductions): But,

OK, what did

Schön ﬁ nkel want to do with these objects he’d constructed? As the title of his paper suggests, he wanted to use them as building blocks for mathematical logic. He begins: “Let us now apply our results to a special case, that of the calculus of logic in which the basic elements are individuals and the functions are propositional functions.” I consider this sentence signi ﬁ cant. Schön ﬁ nkel didn’t have a way to express it (the concept of univer - sal computation hadn’t been invented yet), but he seems to have realized that what he’d done was quite general, and went even beyond being able to represent a particular kind of logic. Still, he went on to give his example. He’d explained at the beginning of the paper that the quanti ﬁ ers we now call ∀ and ∃ could both be represented in terms of a kind of “quanti ﬁ ed Nand” that he wrote | x : | S t e p h e n W o l f r a m ut now he wanted to “combinator- i f y ” everything. So he introduced a new combinator U , and de ﬁ ned it to represent his “quanti ﬁ ed Nand”: U f g = f x | x g x (he called U the “incompatibil - ity function”—an interesting linguistic description of Nand): C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | It is a remarkable fact”, he says, “that every formula of logic can now be expressed by means... solely of C , S and U .” So he’s saying that any expression from mathematical logic can be written out as some combinator expression in terms of S , C (now K ) and U . He says that when there are q u a n t i f i e r s like “for all x ...” it’s always possible to use combinators to get rid of the “bound variables” x , etc. He says that he “will not give the complete demonstration here”, but rather content himself with an example. (Unfortunately—for reasons of the trajectory of his life that are still quite unclear—he never published his “complete demonstration”.) But,

OK, so what had he achieved? He’d basically shown that any expression that might appear in predicate logic (with logical connectives, quanti ﬁ ers, variables, etc.) could be reduced to an expression purely in terms of the combinators S , C (now K ) and U . Did he need the U ? Not really.

But he had to have some way to represent the object with mathematical or logical “meaning” on which his combinators would be acting. Today the obvious thing to do would be to have a representation for true and false. And what’s more, to represent these purely in terms of combinators . For example, if we took K to represent true, and S K ( s [ k ] ) to represent false, then And can be represented as S S K ( s [ s ] [ k ] ), Or as S ( S S ) S ( S K ) ( s [ s [ s ] ] [ s ] [ s [ k ] ] ) and Nand as S ( S ( K ( S ( S S ( K ( K K ) ) ) ) ) ) S ( s [ s [ k [ s [ s [ s ] [ k [ k [ k ] ] ] ] ] ] ] [ s ] ). Schön ﬁ nkel got amazingly far in reducing everything to his “building blocks”. But, yes, he missed this ﬁ nal step. | S t e p h e n W o l f r a m tep. But given that he’d managed to reduce everything to S , C and U he ﬁ gured he should try to go further. So he considered an object J that would be a single building block of S and C : J J = S and J ( J J ) = C . With S and K one can just point to any piece of an expression and see if it reduces. With J it’s a bit more complicated. In modern Wolfram

Language terms one can state the rules as { j [ j ][ x _][ y _][ z _] (cid:1) x [ z ][ y [ z ]] , j [ j [ j ]][ x _][ y _] (cid:1) x } (where order matters) but to apply these requires pattern matching “clusters of J ’s” rather than just looking at single S ’s and K ’s at a time. But even though—as

Schön ﬁ nkel observed—this “ ﬁ nal reduction” to J didn’t work out, getting everything down to S and K was already amazing. At the beginning of the paper, Schön ﬁ nkel had described his objectives. And then he says “It seems to me remarkable in the extreme that the goal we have just set can be realized also; as it happens, it can be done by a reduction to three fundamental signs.” (The paper does say three fundamental signs, presumably counting U as well as S and K . ) I’m sure

Schön ﬁ nkel expected that to reproduce all the richness of mathematical logic he’d need quite an elaborate set of building blocks. And certainly people like

Frege,

Whitehead and

Russell had used what were eventually very complicated setups.

Schön ﬁ nkel managed to cut through all the complexity to show that simple building blocks were all that was needed. But then he found something else: that actually just two building blocks ( S and K ) were enough. In modern terms, we’d say that Schön ﬁ nkel managed to construct a system capable of universal computation. And that’s amazing in itself. But even more amazing is that he found he could do it with such a simple setup. I’m sure

Schön ﬁ nkel was extremely surprised. And here I personally feel a certain common - ality with him. Because in my own explorations of the computational universe, what I’ve found over and over again is that it takes only remarkably simple systems to be capable of highly complex behavior—and of universal computation. And even a ft er exploring the computational universe for four decades, I’m still continually surprised at just how simple the systems can be. C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ontinually surprised just simple For me, this has turned into a general principle—the Principle of Computational

Equiva l e n c e —and a whole conceptual framework around it. Schön ﬁ nkel didn’t have anything like that to think in terms of. But he was in a sense a good enough scientist that he still managed to discover what he discovered—that many decades later we can see ﬁ ts in as another piece of evidence for the Principle of Computational

Equivalence.

Looking at Schön ﬁ nkel’s paper a century later, it’s remarkable not only for what it discovers, but also for the clarity and simplicity with which it is presented. A little of the notation is now dated (and of course the original paper is written in German, which is no longer the kind of leading language of scholarship it once was). But for the most part, the paper still seems perfectly modern.

Except, of course, that now it could be couched in terms of sym - bolic expressions and computation, rather than mathematical logic. Wha t Is T hei r M athematics?

Combinators are hard to understand, and it’s not clear how many people understood them when they were ﬁ rst introduced—let alone understood their implications. It’s not a good sign that when Schön ﬁ nkel’s paper appeared in the person who helped prepare it for ﬁ nal publication ( Heinrich B e h m a n n ) added his own three paragraphs at the end, that were quite confused. And

Schön ﬁ nkel’s sole other published paper —coauthored with Paul

Bernays in even mention combinators, even though they could have very pro ﬁ tably been used to discuss the subject at hand (decision problems in mathematical logic). But in combinators (if not perhaps Schön ﬁ nkel’s recognition for them) had a remark able piece of good fortune. Schön ﬁ nkel’s paper was discovered by a certain Haskell

Curry — who would then devote more than years to studying what he named “combinators”, and to spreading the word about them. At some level I think one can view the main thrust of what Curry and his disciples did with combinators as an e ﬀ ort to “ m a t h e m a t i c i z e ” them. Schön ﬁ nkel had presented combinators in a rather straightforward “structural” way. But what was the mathematical interpretation of what he did, and of how combinators work in general? What mathematical formalism could capture

Schön ﬁ nkel’s structural idea of substitution? Just what, for example, was the true notion of equality for combinators? In the end, combinators are fundamentally computational constructs, full of all the p h e n o m - ena of “unbridled computation” —like u n d e c i d a b i l i t y and computational irreducibility . And it’s inevitable that mathematics as normally conceived can only go so far in “cracking” them. But back in the and the concept and power of computation was not yet under - stood, and it was assumed that the ideas and tools of mathematics would be the ones to use in analyzing a formal system like combinators. And it wasn’t that mathematical methods got absolutely nowhere with combinators. | S t e p h e n W o l f r a m ot Unlike cellular automata , or even Turing machines , there’s a certain immediate s t r u c t u r a l complexity to combinators , with their elaborate tree structures, equivalences and so on. And so there was progress to be made—and years of work to be done—in untangling this, without having to face the raw features of full-scale computation, like computational irreducibility. In the end, combinators are full of computational irreducibility. But they also have layers of computational reducibility, some of which are aligned with the kinds of things mathematics and mathematical logic have been set up to handle. And in this there’s a curious resonance with our recent Physics

Project . In our models based on hypergraph rewriting there’s also a kind of bedrock of computa - tional irreducibility. But as with combinators, there’s a certain immediate structural complex - ity to what our models do. And there are layers of computational reducibility associated with this. But the remarkable thing with our models is that some of those layers—and the for - malisms one can build to understand them—have an immediate interpretation: they are basically the core theories of twentieth-century physics , namely general relativity and quantum mechanics. Combinators work su ﬃ ciently di ﬀ erently that they don’t immediately align with that kind of interpretation. But it’s still true that one of the important properties discovered in combina - tors (namely con ﬂ uence, related to our idea of causal invariance ) turns out to be crucial to our models, their correspondence with physics, and in the end our whole ability to perceive regularity in the universe, even in the face of computational irreducibility. But let’s get back to the story of combinators as it played out a ft er Schön ﬁ nkel’s paper. Schön ﬁ nkel had basically set things up in a novel, very direct, structural way. But

Curry wanted to connect with more traditional ideas in mathematical logic, and mathematics in general. And a ft er a ﬁ rst paper (published in which pretty much just recorded his ﬁ rst thoughts, and his e ﬀ orts to understand what Schön ﬁ nkel had done, Curry was by starting to do things like formulate axioms for combinators, and hoping to prove general theorems about mathematical properties like equality. Without the understanding of universal computation and their relationship to it, it wasn’t clear yet how complicated it might ultimately be to deal with combinators. And

Curry pushed forward, publishing more papers and trying to do things like de ﬁ ne set theory using his axioms for combinators. But in disaster struck. It wasn’t something about computa - tion or undecidability; instead it was that Stephen

Kleene and J. Barkley

Rosser showed the axioms

Curry had come up to try and “tighten up Schön ﬁ nkel” with were just plain i n c o n s i s t e n t . To Kleene and

Rosser it provided more evidence of the need for Russell’s (originally quite hacky) idea of types—and led them to more complicated axiom systems, and away from combinators. But

Curry was undeterred. He revised his axiom system and continued— C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ltimately for many decades—to see what could be proved about combinators and things like them using mathematical methods. But already at the beginning of the there were bigger things afoot around mathemati - cal logic—which would soon intersect with combinators. Gödel’ s T heore m a n d C omputability How should one represent the fundamental constructs of mathematics? Back in the nobody thought seriously about using combinators. And instead there were basically three “big brands”:

Principia

Mathematica , set theory and Hilbert’s program.

Relations were being found, details were being ﬁ lled in, and issues were being found. But there was a general sense that progress was being made. Quite where the boundaries might lie wasn’t clear.

For example, could one specify a way to “construct any function” from lower-level primitives? The basic idea of recursion was very old (think: F i b o n a c c i ). But by the early there was a fairly well-formalized notion of “ primitive recursion ” in which functions always found their values from earlier values. But could all “mathematical” functions be constructed this way? By it was known that this wouldn’t work: the Ackermann function was a reasonable “mathematical” function, but it wasn’t primitive recursive. It meant that de ﬁ nitions had to be generalized (e.g. to “ general recursive functions ” that didn’t just look back at earlier values, but could “look forward until...” as well). But there didn’t seem to be any fundamen - tal problem with the idea that mathematics could just “mechanistically” be built out forever from appropriate primitives. But in came Gödel’s theorem . There’d been a long tradition of identifying paradoxes and inconsistencies, and ﬁ nding ways to patch them by changing axioms. But

Gödel’s theorem was based on Peano’s by-then-standard axioms for arithmetic (branded by Gödel as a fragment of Principia

Mathematica ). And it showed there was a fundamental problem. In essence, Gödel took the paradoxical statement “this statement is unprovable” and showed that it could be expressed purely as a statement of arithmetic—roughly a statement about the existence of solutions to appropriate integer equations. And basically what

Gödel had to do to achieve this was to create a “compiler” capable of compiling things like “this statement is unprovable” into arithmetic. In his paper one can basically see him building up di ﬀ erent capabilities (e.g. representing arbitrary expressions as numbers through Gödel numbering, checking conditions using general recursion, etc.)—eventually getting to a “high enough level” to represent the state - ment he wanted: | S t e p h e n W o l f r a m hat did Gödel’s theorem mean?

For the foundations of mathematics it meant that the idea of mechanically proving “all true theorems of mathematics” wasn’t going to work. Because it showed that there was at least one statement that by its own admission couldn’t be proved, but was still a “statement about arithmetic”, in the sense that it could be “compiled into arithmetic”. That was a big deal for the foundations of mathematics. But actually there was something much more signi ﬁ cant about Gödel’s theorem, even though it wasn’t recognized at the time. Gödel had used the primitives of number theory and logic to build what amounted to a computational system—in which one could take things like “this statement is unprovable”, and “ run them in arithmetic ”. What

Gödel had, though, wasn’t exactly a streamlined general system (a ft er all, it only really needed to handle one statement). But the immediate question then was: if there’s a problem with this statement in arithmetic, what about Hilbert’s general “decision problem” ( E n t s c h e - i d u n g s p r o b l e m ) for any axiom system? To discuss the “general decision problem”, though, one needed some kind of general notion of how one could decide things. What ultimate primitives should one use?

Schön ﬁ nkel (with Paul

Bernays)—in his sole other published paper—wrote about a restricted case of the decision problem in but doesn’t seem to have had the idea of using combinators to study it. C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | y Gödel was talking about general r e c u r s i v e n e s s (i.e. de ﬁ nability through general recursion). And

Alonzo

Church and

Stephen

Kleene were introducing λ de ﬁ nability . Then in Alan

Turing introduced

Turing machines . All these approaches involved setting up certain primitives, then showing that a large class of things could be “compiled” to those primitives. And that—in e ﬀ ect by thinking about having it compile itself—Hilbert’s E n t s c h e i - d u n g s p r o b l e m couldn’t be solved. Perhaps no single result along these lines would have been so signi ﬁ cant. But it was soon established that all three kinds of systems were exactly equivalent: the set of computations they could represent were the same, as established by showing that one system could emulate another. And from that discovery eventually emerged the modern notion of univer - sal computation—and all its implications for technology and science. In the early days, though, there was actually a fourth equivalent kind of system—based on string rewriting—that had been invented by Emil

Post in Oh, and then there were combinators.

Lambd a C alculus

What was the right “language” to use for setting up mathematical logic? There’d been gradual improvement since the complexities of Principia

Mathematica . But around

Alonzo

Church wanted a new and cleaner setup. And he needed to have a way (as Frege and

Principia

Mathematica had done before him) to represent “ pure functions ”. And that’s how he came to invent λ . Today in the Wolfram

Language we have F u n c t i o n [ x, f [ x ]] or x (cid:1) f [ x ] (or various shorthands). Church originally had λ x [ M ] : | S t e p h e n W o l f r a m ut what’s perhaps most notable is that on the very ﬁ rst page he de ﬁ nes λ , he’s referencing Schön ﬁ nkel’s combinator paper. (Well, speci ﬁ cally, he’s referencing it because he wants to use the device Schön ﬁ nkel invented that we now call currying— f [ x ] [ y ] in place of f [ x , y ] — though ironically he doesn’t mention Curry.) In his paper (apparently based on work in λ is almost a sideshow—the main event being the introduction of formal postu - lates for mathematical logic: C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | y the next year J. Barkley

Rosser is trying to retool Curry’s “combinatory logic” with combinators of his own—and showing how they correspond to lambda expressions: Then in lambda calculus has its big “coming out” in Church’s “An

Unsolvable

Problem of Elementary

Number

Theory”, in which he introduces the idea that any “e ﬀ ectively calculable” function should be “ λ de ﬁ nable”, then de ﬁ nes integers in terms of λ ’ s ( “ Church numerals ” ) | S t e p h e n W o l f r a m nd then shows that the problem of determining equivalence for λ expressions is undecidable. Very soon therea ft er T u r i n g publishes his “On Computable

Numbers, with an Application to the E n t s c h e i d u n g s p r o b l e m ” in which he introduces his much more manifestly mechanistic Turing machine model of computation. In the main part of the paper there are no lambdas— or combinators—to be seen. But by late Turing had gone to Princeton to be a student with Church—and added a note showing the correspondence between his Turing machines and

Church’s lambda calculus. By the next year, when Turing is writing his rather abstruse “Systems of Logic

Based on Ordinals” he’s using lambda calculus all over the place.

Early in the document he writes I (cid:1) λ x [ x ] , and soon he’s mixing lambdas and combinators with wild abandon—and in fact he’d already published a one-page paper which introduced the ﬁ xed-point combinator Θ (and, yes, the K in the title refers to Schön ﬁ nkel’s K combinator): When

Church summarized the state of lambda calculus in in his “The Calculi of Lambda-Conversion” he again made extensive use of combinators. Schön ﬁ nkel’s K is prominent. But

Schön ﬁ nkel’s S is nowhere to be seen—and in fact Church has his own S C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | combinator S [ n ] [ f ] [ x ] (cid:1) f [ n [ f ] [ x ]] which implements successors in Church’s numeral system.

And he has also has a few other “basic combinators” that he routinely uses. In the end, combinators and lambda calculus are completely equivalent, and it’s quite easy to convert between them—but there’s a curious tradeo ﬀ . In lambda calculus one names variables, which is good for human readability, but can lead to problems at a formal level. In combinators, things are formally much cleaner, but the expressions one gets can be com - pletely incomprehensible to humans. The point is that in a lambda expression like λ x λ y x [ y ] one’s naming the variables (here x and y ), but really these names are just placeholders: what they are doesn’t matter; they’re just showing where di ﬀ erent arguments go. And in a simple case like this, everything is ﬁ ne. But what happens if one substitutes for y another lambda expression, say λ x f [ x ]? What is that x ? Is it the same x as the one outside, or something di ﬀ erent? In practice, there are all sorts of renaming schemes that can be used, but they tend to be quite hacky, and things can quickly get tangled up. And if one wants to make formal proofs about lambda calculus, this can potentially be a big problem, and indeed at the beginning it wasn’t clear it wouldn’t derail the whole idea of lambda calculus. And that’s part of why the correspondence between lambda calculus and combinators was important. With combinators there are no variables, and so no variable names to get tangled up. So if one can show that something can be converted to combinators—even if one never looks at the potentially very long and ugly combinator expression that’s generated—one knows one’s safe from issues about variable names. There are still plenty of other complicated issues, though. Prominent among them are questions about when combinator expressions can be considered equal. Let’s say you have a combinator expression, like s [ s [ s [ s ] [ k ] ] ] [ k ] . Well, you can repeatedly apply the rules for combinators to transform and reduce it. And it’ll o ft en end up at a ﬁ xed point, where no rules apply anymore. But a basic question is whether it matters in which order the rules are applied. And in Church and

Rosser proved it doesn’t . Actually, what they speci ﬁ cally proved was the analogous result for lambda calculus. They drew a picture to indicate di ﬀ erent possible orders in which lambdas could be reduced out, and showed it didn’t matter which path one takes: | S t e p h e n W o l f r a m his all might seem like a detail. But it turns out that generalizations of their result apply to all sorts of systems. In doing computations (or automatically proving theorems ) it’s all about “it doesn’t matter what path you take; you’ll always get the same result”. And that’s impor - tant. But recently there’s been another important application that’s shown up. It turns out that a generalization of the “Church–Rosser property” is what we call causal invariance in our Physics

Project . And it’s causal invariance that leads in our models to relativistic invariance, general covari - ance, objective reality in quantum mechanics, and other central features of physics. Practica l C omputation In retrospect, one of the great achievements of the was the inception of what ended up being the idea of universal computation. But at the time what was done was couched in terms of mathematical logic and it was far from obvious that any of the theoretical struc - tures being built would have any real application beyond thinking about the foundations of mathematics. But even as people like Hilbert were talking in theoretical terms about the mechanization of mathematics, more and more there were actual machines being built for doing mathematical calculations. We know that even in antiquity (at least one) simple gear-based mechanical calculational d e v i c e s existed. In the mid-1600s arithmetic calculators started being constructed, and by the late they were in widespread use. At ﬁ rst they were mechanical, but by the most were electromechanical, and there started to be systems where units for carrying out di ﬀ erent arithmetic operations could be chained together. And by the end of the fairly elaborate such systems based on electronics were being built. Already in the Charles

Babbage had imagined an “analytical engine” which could do di ﬀ erent operations depending on a “program” speci ﬁ ed by punch cards—and Ada

Lovelace had realized that such a machine had broad “computational” potential. But by the a century had passed and nothing like this was connected to the theoretical developments that C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | entury passed nothing developments were going on—and the actual engineering of computational systems was done without any particular overarching theoretical framework. Still, as electronic devices got more complicated and scienti ﬁ c interest in psychology intensi ﬁ ed, something else happened: there started to be the idea (sometimes associated with the name cybernetics) that somehow electronics might reproduce how things like brains work. In the mid-1930s Claude

Shannon had shown that

Boolean algebra could represent how switching circuits work, and in Warren

McCulloch and

Walter

Pitts proposed a model of idealized neural networks formulated in something close to mathemati - cal logic terms. Meanwhile by the mid-1940s John von

Neumann —who had worked extensively on mathemat - ical logic—had started suggesting math-like speci ﬁ cations for practical electronic comput - ers, including the way their programs might be stored electronically. At ﬁ rst he made lots of brain-like references to “organs” and “inhibitory connections”, and essentially no mention of ideas from mathematical logic. But by the end of the von Neumann was talking at least conceptually about connections to Gödel’s theorem and

Turing machines,

Alan

Turing had become involved with actual electronic computers, and there was the beginning of widespread understanding of the notion of general-purpose computers and universal c o m p u t a t i o n . In the there was an explosion of interest in what would now be called the theory of computation—and great optimism about its relevance to arti ﬁ cial intelligence. There was all sorts of “interdisciplinary work” on fairly “concrete” models of computation, like ﬁ nite automata, Turing machines, cellular automata and idealized neural networks.

More “abstract” approaches, like recursive functions, lambda calculus—and combinators— remained, however, pretty much restricted to researchers in mathematical logic. When early programming languages started to appear in the latter part of the think - ing about practical computers began to become a bit more abstract. It was understood that the grammars of languages could be speci ﬁ ed recursively—and actual recursion (of func - tions being able to call themselves) just snuck into the speci ﬁ cation of ALGOL . But what about the structures on which programs operated? Most of the concentration was on arrays (sometimes rather elegantly, as in A P L ) and, occasionally, character strings. But a notable exception was L I S P , described in John

McCarthy ’s paper “Recursive Functions of Symbolic

Expressions and

Their

Computation by Machine,

Part I” (part was not written). There was lots of optimism about AI at the time, and the idea was to create a language to “implement AI”—and do things like “mechanical theorem proving”. A key idea— that McCarthy described as being based on “recursive function formalism”—was to have tree-structured symbolic expressions (“ S expressions”). (In the original paper, what’s now Wolfram

Language–style f [ g [ x ] ] “ M expression” notation, complete with square brackets, was used as part of the speci ﬁ cation, but the quintessential-LISP-like ( f ( g x )) notation won out when LISP was actually implemented.) | S t e p h e n W o l f r a m n issue in LISP was how to take “expressions” (which were viewed as representing things) and turn them into functions (which do things). And the basic plan was to use Church’s idea of λ notation. But when it came time to implement this, there was, of course, trouble with name collisions, which ended up getting handled in quite hacky ways. So did McCarthy know about combinators?

The answer is yes, as his paper shows: C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | actually didn’t know until just now that McCarthy had ever even considered combinators, and in the years I knew him I don’t think I ever personally talked to him about them. But it seems that for McCarthy—as for

Church—combinators were a kind of “comforting backstop” that ensured that it was OK to use lambdas, and that if things went too badly wrong with variable naming, there was at least in principle always a way to untangle everything. In the practical development of computers and computer languages, even lambdas—let alone combinators—weren’t really much heard from again (except in a small AI circle) until the And even then it didn’t help that in an e ﬀ ort variously to stay close to hardware and to structure programs there tended to be a desire to give everything a “data type”— which was at odds with the “consume any expression” approach of standard combinators and lambdas. But beginning in the with the progressive rise of f u n c t i o n a l p r o g r a m m i n g —lambdas, at least, have steadily gained in visibility and practical application. What of combinators? Occasionally as a proof of principle there’ll be a hardware system developed that natively implements Schön ﬁ nkel’s combinators. Or—particularly in modern times—there’ll be an esoteric language that uses combinators in some kind of purposeful e ﬀ ort at obfuscation. Still, a remarkable cross-section of notable people concerned with the foundations of computing have—at one time or another—taught about combinators or written a paper about them. And in recent years the term “combinator” has become more popular as a way to describe a “purely applicative” function. But by and large the important ideas that ﬁ rst arose with combinators ended up being absorbed into practical computing by quite circuitous routes, without direct reference to their origins, or to the speci ﬁ c structure of combinators. Combinator s in C ulture

For years combinators have mostly been an obscure academic topic, studied particularly in connection with lambda calculus, at borders between theoretical computer science, mathematical logic and to some extent mathematical formalisms like category theory. Much | S t e p h e n W o l f r a m ogic category theory. of the work that’s been done can be traced in one way or another to the in ﬂ uence of Haskell

Curry or Alonzo

Church—particularly through their students , g r a n d s t u d e n t s , g r e a t - g r a n d s t u - d e n t s , etc. Partly in the early years, most of the work was centered in the US, but by the there was a strong migration to Europe and especially the

Netherlands.

But even with all their abstractness and obscurity, on a few rare occasions combinators have broken into something closer to the mainstream. One such time was with the popular logic- puzzle book To Mock a Mockingbird , published in by Raymond

Smullyan —a former student of Alonzo

Church’s. It begins: “A certain enchanted forest is inhabited by talking birds” and goes on to tell a story that’s basically about combinators “dressed up” as birds calling each other ( S is the “starling”, K the “kestrel”)—with a convenient “bird who’s who” at the end. The book is dedicated “To the memory of Haskell

Curry—an early pioneer in combinatory logic and an avid bird-watcher”. And then there’s Y Combinator . The original Y combinator arose out of work that Curry did in the on the consistency of axiom systems for combinators, and it appeared explicitly in his classic book: C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | e called it the “paradoxical combinator” because it was recursively de ﬁ ned in a kind of self- referential way analogous to various paradoxes. Its explicit form is S S K ( S ( K ( S S ( S ( S S K ) ) ) ) K ) and its most immediately notable feature is that under Schön ﬁ nkel’s combinator transforma - tion rules it never settles down to a particular “value” but just keeps growing forever. Well, in Paul

Graham —who had long been an enthusiast of functional programming and LISP—decided to name his new (and now very famous) startup accelerator “Y Combina - tor”. I remember asking him why he’d called it that. “Because,” he said, “nobody under - stands the Y combinator”. Looking in my own archives from that time I ﬁ nd an email I sent a combinator enthusiast who was working with me: Followed by, basically, “Yes our theorem prover can prove the basic property of the Y combinator” ( V sounds so ancient; we’re now just about to release V12 .2 ) : | S t e p h e n W o l f r a m had another unexpected encounter with combinators last year. I had been given a book that was once owned by Alan

Turing, and in it I found a piece of paper—that I recognized as being covered with none other than lambdas and combinators (but that’s not t h e Y combinator): It took quite a bit of sleuthing ( that I wrote extensively about )—but I eventually discovered that the piece of paper was written by Turing’s student

Robin G a n d y . But I never ﬁ gured out why he was doing combinators.... C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | esignin g S ymboli c L anguage I think I ﬁ rst found out about combinators around by seeing Schön ﬁ nkel’s original paper in a book called From

Frege to Gödel: A Source

Book in Mathematical

Logic (by a certain Jean van H e i j e n o o r t ). How

Schön ﬁ nkel’s paper ended up being in that book is an interesting question, which I’ll write about elsewhere . The spine of my copy of the book has long been broken at the location of Schön ﬁ nkel’s paper, and at di ﬀ erent times I’ve come back to the paper, always thinking there was more to understand about it. But why was I even studying things like this back in ? I guess in retrospect I can say I was engaged in an activity that goes back to Frege or even L e i b n i z : I was trying to ﬁ nd a fundamental framework for representing mathematics and beyond. But my goal wasn’t a philosophical one; it was a very practical one: I was trying to build a computer language that could do general computations in mathematics and beyond. My immediate applications were in physics, and it was from physics that my main method - ological experience came. And the result was that—like trying to understand the world in terms of elementary particles—I wanted to understand computation in terms of its most fundamental elements. But I also had lots of practical experience in using computers to do mathematical computation. And I soon developed a theory about how I thought computa - tion could fundamentally be done. It started from the practical issue of transformations on algebraic expressions (turn sin(2 x ) into sin( x ) cos( x ), etc.). But it soon became a general idea: compute by doing transforma - tions on symbolic expressions. Was this going to work? I wanted to understand as fundamen - tally as possible what computation really was—and from that I was led to its history in mathematical logic. Much of what I saw in books and papers about mathematical logic I found abstruse and steeped in sometimes horrendous notational complexity. But what were these people really doing? It made it much easier that I had a de ﬁ nite theory, against which I could essentially do reductionist science. That stu ﬀ in Principia

Mathematica ? Those ideas about rewriting systems?

Yup, I could see how to represent them as rules for transforma - tions on symbolic expressions. And so it was that I came to design SMP : “ A Symbolic

Manipulation

Program ”—all based on transformation rules for symbolic expressions. It was easy to represent mathematical relations ( $ x is a pattern variable that would now in the Wolfram

Language be x _ on the le ft - hand side only) : | S t e p h e n W o l f r a m r basic logic: Or, for that matter, predicate logic of the kind Schön ﬁ nkel wanted to capture: And, yes, it could emulate a Turing machine (note the tape-as-transformation-rules represen - tation that appears at the end): C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ut the most important thing I realized is that it really worked to represent basically any - thing in terms of symbolic expressions, and transformation rules on them. Yes, it was quite o ft en useful to think of “applying functions to things” (and SMP had its version of lambda, for example), but it was much more powerful to think about symbolic expressions as just “being there” (“ x doesn’t have to have a value”)—like things in the world—with the language being able to de ﬁ ne how things should transform. In retrospect this all seems awfully like the core idea of combinators, but with one impor - tant exception: that instead of everything being built from “purely structural elements” with names like S and K , there was a whole collection of “primitive objects” that were intended to have direct understandable meanings (like P l u s , T i m e s , etc.). And indeed I saw a large part of my task in language design as being to think about computations one might want to do, and then try to “drill down” to ﬁ nd the “elementary particles”—or primitive objects—from which these computations might be built up. Over time

I’ve come to realize that doing this is less about what one can in principle use to construct computations, and more about making a bridge to the way humans think about t h i n g s . It’s crucial that there’s an underlying structure—symbolic expressions—that can represent anything. But increasingly

I’ve come to realize that what we need from a computa - tional language is to have a way to encapsulate in precise computational form the kinds of things we humans think about—in a way that we humans can understand. And a crucial part of being able to do that is to leverage what has ultimately been at the core of making our whole intellectual development as a species possible: the idea of human language. Human language has given us a way to talk symbolically about the world: to give symbolic names to things, and then to build things up using these. In designing a computational | S t e p h e n W o l f r a m hings, things up using designing computational l a n g u a g e the goal is to leverage this: to use what humans already know and understand, but be able to represent it in a precise computational way that is amenable to actual computa - tion that can be done automatically by computer. It’s probably no coincidence that the tree structure of symbolic expressions that I have found to be such a successful foundation for computational language is a bit like an ideal - ized version of the kind of tree structure (think parse trees or sentence diagramming) that one can view human language as following. There are other ways to set up universal compu - tation, but this is the one that seems to ﬁ t most directly with our way of thinking about t h i n g s . And, yes, in the end all those symbolic expressions could be constructed like combinators from objects—like S and K —with no direct human meaning. But that would be like having a world without nouns—a world where there’s no name for anything—and the representation of everything has to be built from scratch. But the crucial idea that’s central to human language—and now to computational language—is to be able to have layers of abstraction, where one can name things and then refer to them just by name without having to think about how they’re built up “inside”. In some sense one can see the goal of people like Frege—and

Schön ﬁ nkel—as being to “reduce out” what exists in mathematics (or the world) and turn it into something like “pure logic”. And the structural part of that is exactly what makes computational language possi - ble. But in my conception of computational language the whole idea is to have content that relates to the world and the way we humans think about it. And over the decades

I’ve continually been amazed at just how strong and successful the idea of representing things in terms of symbolic expressions and transformations on them is. Underneath everything that’s going on in the Wolfram

Language—and in all the many systems that now use it—it’s all ultimately just symbolic expressions being transformed according to particular rules, and reaching ﬁ xed points that represent results of computa - tions, just like in those examples in Schön ﬁ nkel’s original paper. One important feature of Schön ﬁ nkel’s setup is the idea that one doesn’t just have “fun - ctions” like f [ x ], or even just nested functions, like f [ g [ x ]]. Instead one can have constructs where instead of the “name of a function” (like f ) one can have a whole complex symbolic structure. And while this was certainly possible in SMP, not too much was built around it.

But when I came to start designing what’s now the Wolfram

Language in I made sure that the “head” (as I called it) of an expression could itself be an arbitrary expression. And when

Mathematica was ﬁ rst launched in I was charmed to see more than one person from mathematical logic immediately think of implementing combinators. Make the de ﬁ nitions: s [ x _ ] [ y _ ] [ z _ ] : = x [ z ] [ y [ z ] ] C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | k[x_][y_] := x hen combinators “just work” (at least if they reach a ﬁ xed point): s [ s [ k [ s ] ] [ s [ k [ k ] ] [ s [ k [ s ] ] [ k ] ] ] ] [ s [ k [ s [ s [ k ] [ k ] ] ] ] [ k ] ] [ a ] [ b ] [ c ] a [ b [ a ] [ c ] ] But what about the idea of “composite symbolic heads”? Already in SMP

I’d used them to do simple things like represent derivatives (and in Wolfram

Language f ' [ x ] is D e r i v a t i v e [ ] [ f ] [ x ] ) . But something that’s been interesting to me to see is that as the decades have gone by, more and more gets done with “ composite heads ”. Sometimes one thinks of them as some kind of nesting of operations, or nesting of modi ﬁ ers to a symbolic object. But increasingly they end up being a way to represent “higher-order constructs”—in e ﬀ ect things that produce things that produce things etc. that eventually give a concrete object one wants. I don’t think most of us humans are particularly good at following this kind of chain of abstraction, at least without some kind of “guide rails”. And it’s been interesting for me to see over the years how we’ve been able to progressively build up guide rails for longer and longer chains of abstraction. First there were things like F u n c t i o n , A p p l y , M a p . Then N e s t , F o l d , F i x e d P o i n t , M a p T h r e a d . But only quite recently N e s t G r a p h , F o l d P a i r , S u b s e t M a p , etc. Even from the beginning there were direct “head manipulation” functions like O p e r a t e and T h r o u g h . But unlike more “array-like” operations for list manipulation they’ve been slow to catch on. In a sense combinators are an ultimate story of “symbolic head manipulation”: everything can get applied to everything before it’s applied to anything. And, yes, it’s very hard to keep track of what’s going on—which is why “named guide rails” are so important, and also why they’re challenging to devise. But it seems as if, as we progressively evolve our understand - ing, we’re slowly able to get a little further, in e ﬀ ect building towards the kind of structure and power that combinators—in their very non-human-relatable way— ﬁ rst showed us was possible a century ago. Combinator s in t h e C omputationa l U niverse Combinators were invented for a de ﬁ nite purpose: to provide building blocks, as Schön - ﬁ nkel put it, for logic. It was the same kind of thing with other models of what we now know of as computation. All of them were “constructed for a purpose”. But in the end computa tion—and programs—are abstract things, that can in principle be studied without reference to any particular purpose. One might have some particular reason to be looking at how fast programs of some kind can run, or what can be proved about them. But what about the analog of pure natural science: of studying what programs just “naturally do”? At the beginning of the I got very interested in what one can think of as the “natural science of programs”. My interest originally arose out of a question about ordinary natural | S t e p h e n W o l f r a m rograms”. My originally question ordinary science. One of the very noticeable features of the natural world is how much in it seems to us highly complex. But where does this complexity really come from?

Through what kind of mechanism does nature produce it? I quickly realized that in trying to address that question, I needed as general a foundation for making models of things as possible. And for that I turned to programs, and began to study just what “programs in the wild” might do. Ever since the time of G a l i l e o and N e w t o n mathematical equations had been the main way that people ultimately imagined making models of nature. And on the face of it—with their real numbers and continuous character—these seemed quite di ﬀ erent from the usual setup for computation, with its discrete elements and discrete choices. But perhaps in part through my own experience in doing mathematics symbolically on computers, I didn’t see a real con ﬂ ict, and I began to think of programs as a kind of generalization of the traditional approach to modeling in science. But what kind of programs might nature use? I decided to just start exploring all the possibili - ties: the whole “computational universe” of programs—starting with the simplest. I came up with a particularly simple setup involving a row of cells with values or updated in parallel based on the values of their neighbors. I soon learned that systems like this had actually been studied under the name “cellular automata” in the (particularly in as poten - tial models of computation, though had fallen out of favor mainly through not having seemed very “human programmable”. My initial assumption was that with simple programs I’d only see simple behavior.

But with my cellular automata it was very easy to do actual computer experiments, and to visualize the results. And though in many cases what I saw was simple behavior, I also saw something very surprising: that in some cases— even though the rules were very simple —the b e h a v i o r that was generated could be immensely complex: It took me years to come to terms with this phenomenon, and it’s gradually informed the way I think about science, computation and many other things. At ﬁ rst I studied it almost exclusively in cellular automata. I made connections to actual systems in nature that cellular automata could model. I tried to understand what existing mathematical and other methods could say about what I’d seen.

And slowly I began to formulate general ideas to explain what was going on—like computational irreducibility and the Principle of Computational E q u i v a l e n c e . C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ut at the beginning of the armed with what would become the Wolfram

Language—I decided I should try to see just how the phenomenon I had found in cellular automata would play it in other kinds of computational systems. And my archives record that on April I started looking at combinators. I seem to have come back to them several times, but in a notebook from July (which, yes, still runs just ﬁ ne), there it is: A randomly chosen combinator made of Schön ﬁ nkel’s S ’s and K ’s starting to show complex behavior. I seem to have a lot of notebooks that start with the simple combinator de ﬁ ni - tions—and then start exploring: There are what seem like they could be pages from a “computational naturalist’s ﬁ eld notebook”: | S t e p h e n W o l f r a m hen there are attempts to visualize combinators in the same kind of way as cellular automata: But the end result was that, yes, like

Turing machines, string substitution systems and all the other systems I explored in the computational universe, combinators did exactly the same kinds of things I’d originally discovered in cellular automata. Combinators weren’t just systems that could be set up to do things. Even “in the wild” they could spontaneously do very interesting and complex things. I included a few pages on what I called “symbolic systems” (essentially lambdas) at the end C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ages “symbolic systems” (essentially of my chapter on “ The

World of Simple

Programs ” in A New

Kind of Science (and, yes, reading particularly the notes again now, I realize there are still many more things to explore...): Later in the book I talk speci ﬁ cally about Schön ﬁ nkel’s combinators in connection with the threshold of computation universality. But before showing examples of what they do, I remark: “Originally intended as an idealized way to represent structures of functions de ﬁ ned in logic, combinators were actually ﬁ rst introduced in years before Turing machines.

But although they have been investigated somewhat over the past eighty years, they have for the most part been viewed as rather obscure and irrelevant constructs.” How “irrelevant” should they be seen as being? Of course it depends on what for. As things to explore in the computational universe, cellular automata have the great advantage of allowing immediate visualization. With combinators it’s a challenge to ﬁ nd any way to translate their behavior at all faithfully into something suitable for human perception. And since the

Principle of Computational

Equivalence implies that general computational features won’t depend on the particulars of di ﬀ erent systems, there’s a tendency to feel that even in studying the computational universe, combinators “aren’t worth the trouble”. Still, one thing that’s been prominently on display with cellular automata over the past or so years is the idea that any su ﬃ ciently simple system will eventually end up being a u s e f u l model for something . Mollusc pigmentation.

Catalysis processes.

Road tra ﬃ c ﬂ ow. There are simple cellular automaton models for all of these. What about combinators?

Without good visualization it’s harder to say “that looks like combinator behavior”. And even a ft er | S t e p h e n W o l f r a m years they’re still a bit too unfamiliar. But when it comes to capturing some large-scale expression or tree behavior of some system, I won’t be surprised if combinators are a good ﬁ t. When one looks at the computational universe, one of the important ideas is “mining” it not just for programs that can serve as models for things, but also for programs that are some - how useful for some technological purpose. Yes, one can imagine speci ﬁ cally “compiling” some known program to combinators. But the question is whether “naturally occurring combinators” can somehow be identi ﬁ ed as useful for some particular purpose. Could they deliver some new kind of distributed cryptographic protocol? Could they be helpful in mapping out distributed computing systems? Could they serve as a base for setting up molecular-scale computation, say with tree-like molecules? I don’t know. But it will be interesting to ﬁ nd out. And as combinators enter their second century they provide a unique kind of “computational raw material” to mine from the computational universe. Combinator s A l l t h e W a y D own? What is the universe fundamentally made of? For a long time the assumption was that it must be described by something fundamentally mathematical. And indeed right around the time combinators were being invented the two great theories of general relativity and quantum mechanics were just developing. And in fact it seemed as if both physics and mathematics were going so well that people like David

Hilbert imagined that perhaps both might be completely solved—and that there might be a mathematics-like axiomatic basis for physics that could be “mechanically explored” as he imagined mathematics could be. But it didn’t work out that way. Gödel’s theorem appeared to shatter the idea of a “complete mechanical exploration” of mathematics. And while there was immense technical progress in working out the consequences of general relativity and quantum mechanics little was discovered about what might lie underneath. Computers (including things like M a t h e m a t i c a ) were certainly useful in exploring the existing theories of physics. But physics didn’t show any particular signs of being “fundamentally computational”, and indeed the existing theories seemed structurally not terribly compatible with computational processes. But as I explored the computational universe and saw just what rich and complex behavior could arise even from very simple rules, I began to wonder whether maybe , far below the level of existing physics, the universe might be fundamentally computational. I began to make speci ﬁ c models in which space and time were formed from an evolving network of discrete points. And I realized that some of the ideas that had arisen in the study of things like combinators and lambda calculus from the and might have direct relevance. Like combinators (or lambda calculus) my models had the feature that they allowed many possible paths of evolution. And like combinators (or lambda calculus) at least some of my models had the remarkable feature that in some sense it didn’t matter what path one took; C o m b i n a t o r s (cid:1) a n d (cid:1) t h e (cid:1) S t o r y (cid:1) o f (cid:1) C o m p u t a t i o n (cid:1) | (cid:1) he ﬁ nal result would always be the same. For combinators this “Church–Rosser” or “con ﬂ uence” feature was what allowed one to have a de ﬁ nite ﬁ xed point that could be considered the result of a computation. In my models of the universe that doesn’t just stop—things are a bit more subtle—but the generalization to what I call causal invariance is precisely what leads to relativistic invariance and the validity of general relativity. For many years my work on fundamental physics languished—a victim of other priorities and the uphill e ﬀ ort of introducing new paradigms into a well-established ﬁ eld. But just over a year ago— with help from two very talented young physicists —I started again, with unex - pectedly spectacular results . I had never been quite satis ﬁ ed with my idea of everything in the universe being repre - sented as a particular kind of giant graph. But now I imagined that perhaps it was more like a giant symbolic expression, or, speci ﬁ cally, like an expression consisting of a huge collec - tion of relations between elements—in e ﬀ ect, a certain kind of giant hypergraph. It was, in a way, a very combinator-like concept. At a technical level, it’s not the same as a general combinator expression : it’s basically just a single layer, not a tree. And in fact that’s what seems to allow the physical universe to consist of something that approximates uniform (manifold-like) space, rather than showing some kind of hierarchical tree-like structure everywhere. But when it comes to the progression of the universe through time, it’s basically just like the transformation of combinator expressions. And what’s become clear is that the existence of di ﬀ erent paths—and their ultimate equivalences—is exactly what’s responsible not only for the phenomena of relativity, but also for quantum mechanics. And what’s remarkable is that many of the concepts that were ﬁ rst discovered in the context of combinators and lambda calculus now directly inform the theory of physics. Normal forms (basically ﬁ xed points) are related to black holes where “time stops” . Critical pair lemmas are related to measurement in quantum mechanics. And so on. In practical computing, and in the creation of computational language, it was the addition of “meaningful names” to the raw structure of combinators that turned them into the powerful symbolic expressions we use. But in understanding the “data structure of the universe” we’re in a sense going back to something much more like “raw combinators”. Because now all those “atoms of space” that make up the universe don’t have meaningful names; they’re more like S ’s and K ’s in a giant combinator expression, distinct but yet all the same. In the traditional, mathematical view of physics, there was always some sense that by “appropriately clever mathematics” it would be possible to “ ﬁ gure out what will happen” in any physical system. But once one imagines that physics is fundamentally computational, that’s not what one can expect. And just like combinators—with their capability for universal computation—can’t in a sense be “cracked” using mathematics, so also that’ll be true of the universe. And indeed in our | S t e p h e n W o l f r a m sing model that’s what the progress of time is about: it’s the inexorable, irreducible process of computation, associated with the repeated transformation of the symbolic expression that represents the universe. When

Hilbert ﬁ rst imagined that physics could be reduced to mathematics he probably thought that meant that physics could be “solved”. But with

Gödel’s theorem—which is a re ﬂ ection of universal computation—it became clear that mathematics itself couldn’t just be “solved”. But now in e ﬀ ect we have a theory that “reduces physics to mathematics”, and the result of the Gödel’s theorem phenomenon is something very important in our universe: it’s what leads to a meaningful notion of time. Moses

Schön ﬁ nkel imagined that with combinators he was ﬁ nding “building blocks for logic”. And perhaps the very simplicity of what he came up with makes it almost inevitable that it wasn’t just about logic: it was something much more general. Something that can represent computations.

Something that has the germ of how we can represent the “machine code” of the physical universe. It took in a sense “humanizing” combinators to make them useful for things like computa - tional language whose very purpose is to connect with humans. But there are other places where inevitably we’re dealing with something more like large-scale “combinators in the raw”. Physics is one of them. But there are others. In distributed computing. And perhaps in biology, in economics and in other places. There are speci ﬁ c issues of whether one’s dealing with trees (like combinators), or hyper - graphs (like our model of physics), or something else. But what’s important is that many of the ideas—particularly around what we call multiway systems —show up with combinators. And yes, combinators o ft en aren’t the easiest places for us humans to understand the ideas in. But the remarkable fact is that they exist in combinators—and that combinators are now a century old. I’m not sure if there’ll ever be a signi ﬁ cant area where combinators alone will be the domi - nant force. But combinators have—for a century—had the essence of many important ideas. Maybe as such they are at some level destined forever to be footnotes. But in sense they are also seeds or roots—from which remarkable things have grown. And as combinators enter their second century it seems quite certain that there is still much more that will grow from t h e m . Note:

References are given as links in the body of this document. C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n ||