CCombinator s a n d t h e S tor y o f C o m p u t a t i o n Stephen
Wolfram * We discuss the role of combinators in the development of the modern conception of computation over the course of the past century. We describe how ideas about formalism and mathematical logic led to the introduction of combinators in as an extension of the discovery of Nand as a basis for basic logic. We then discuss how combinators informed lambda calculus and symbolic computation, and their relationship to the development of practical computation. We fi nally describe recent views of combinators in terms of the computational universe of possible programs, and a recent approach to the fundamental theory of physics. Th e A bstrac t R epresentatio n of T hings “In principle you could use combinators,” some footnote might say. But the implication tends to be “But you probably don’t want to.” And, yes, combinators are deeply abstract— and in many ways hard to understand. But tracing their history over the hundred years since they were invented,
I’ve come to realize just how critical they’ve actually been to the develop - ment of our modern conception of computation—and indeed my own contributions to it. The idea of representing things in a formal, symbolic way has a long history. In antiquity there was Aristotle’s logic and
Euclid’s geometry . By the there was algebra, and in the Boolean algebra . Each of these was a formal system that allowed one to make deduc - tions purely within the system. But each, in a sense, ultimately viewed itself as being set up to model something speci fi c. Logic was for modeling the structure of arguments, Euclid’s geometry the properties of space, algebra the properties of numbers; Boolean algebra aspired to model the “laws of thought”. But was there perhaps some more general and fundamental infrastructure: some kind of abstract system that could ultimately model or represent anything? Today we understand that’s what computation is. And it’s becoming clear that the modern conception of computa - tion is one of the single most powerful ideas in all of intellectual history—whose implica - tions are only just beginning to unfold. But how did we fi nally get to it? Combinators had an important role to play, woven into a complex tapestry of ideas stretching across more than a century. A version of this document with immediately executable code is available at https://writings.stephenwolfram.com/2020/12/combinators-and-the-story-of-computation/ * Email: [email protected] he main part of the story begins in the Through the course of the and mathematics had developed a more and more elaborate formal structure that seemed to be reaching ever further. But what really was mathematics?
Was it a formal way of describing the world, or was it something else—perhaps something that could exist without any refer - ence to the world? Developments like non-Euclidean geometry, group theory and trans fi nite numbers made it seem as if meaningful mathematics could indeed be done just by positing abstract axioms from scratch and then following a process of deduction. But could all of mathematics actually just be a story of deduction, perhaps even ultimately derivable from something seemingly lower level—like logic? But if so, what would things like numbers and arithmetic be? Somehow they would have to be “constructed out of pure logic”. Today we would recognize these e ff orts as “writing programs” for numbers and arithmetic in a “machine code” based on certain “instructions of logic”. But back then, everything about this and the ideas around it had to be invented. Wha t Is M athematics—an d L ogic—Mad e O f? Before one could really dig into the idea of “building mathematics from logic” one had to have ways to “write mathematics” and “write logic”. At fi rst, everything was just words and ordinary language. But by the end of the mathematical notation like +, =, > had been established. For a while new concepts—like Boolean algebra —tended to just piggyback on existing notation. By the end of the however, there was a clear need to extend and generalize how one wrote mathematics. In addition to algebraic variables like x , there was the notion of symbolic functions f , as in f ( x ) . In logic, there had long been the idea of letters ( p , q , ...) standing for propositions (“it is raining now”). But now there needed to be notation for quanti fi ers (“for all x such-and- such”, or “there exists x such that ...”). In addition, in analogy to symbolic functions in mathematics, there were symbolic logical predicates: not just explicit statements like x > y but also ones like p ( x , y ) for symbolic p. The fi rst full e ff ort to set up the necessary notation and come up with an actual scheme for constructing arithmetic from logic was Gottlob
Frege ’s B e g r i f f s s c h r i f t (“concept script”): | S t e p h e n W o l f r a m nd, yes, it was not so easy to read, or to typeset—and at fi rst it didn’t make much of an impression. But the notation got more streamlined with
Giuseppe
Peano ’s F o r m u l a r i o project in the wasn’t so concerned with starting from logic as starting from some speci fi ed set of axioms (the “ Peano axioms ” ) : And then in Alfred
Whitehead and
Bertrand
Russell began publishing their
Principia
Mathematica —which pretty much by its sheer weight and ambition (and notwith - standing what I would today consider grotesque errors of language design)—popularized the C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | possibility of building up “the complexity of mathematics” from “the simplicity of logic”: tanding today grotesque language design)—popularized It was one thing to try to represent the content of mathematics, but there was also the question of representing the infrastructure and processes of mathematics. Let’s say one picks some axioms.
How can one know if they’re consistent? What’s involved in proving everything one can prove from them? In the David
Hilbert began to develop ideas about this, particularly in the context of tightening up the formalism of Euclid’s geometry and its axioms.
And a ft er Principia
Mathe - m a t i c a , Hilbert turned more seriously to the use of logic-based ideas to develop “metamath - ematics”—notably leading to the formulation of things like the “decision problem” ( E n t s c h e - i d u n g s p r o b l e m ) of asking whether, given an axiom system, there’s a de fi nite procedure to prove or disprove any statement with respect to it. But while connections between logic and mathematics were of great interest to people concerned with the philosophy of mathematics, a more obviously mathematical develop - ment was universal algebra —in which axioms for di ff erent areas of mathematics were speci fi ed just by giving appropriate algebraic-like relations. (As it happens, universal algebra was launched under that name by the book A Treatise on Universal
Algebra by Alfred
Whitehead, later of Principia
Mathematica fame.)
But there was one area where ideas about algebra and logic intersected: the tightening up of Boolean algebra, and in particular the fi nding of simpler foundations for it. Logic had pretty much always been formulated in terms of A n d , O r and N o t . But in Henry
She ff er — attempting to simplify Principia
Mathematica— showed that just N a n d (or N o r ) were su ffi cient . (It turned out that Charles
Peirce had already noted the same thing in the | S t e p h e n W o l f r a m lready thing So that established that the notation of logic could be made basically as simple as one could imagine. But what about its actual structure, and axioms?
She ff er talked about needing fi ve “algebra-style” axioms. But by going to axioms based on logical inferences Jean
Nicod managed in to get it down to just one axiom . (And, as it happens, I fi nally fi nished the job in by fi nding the very simplest “algebra-style” axioms for logic—the single axiom: ( ( p · q ) · r ) · ( p · ( ( p · r ) · p ) ) (cid:1) r .) The big question had in a sense been “What is mathematics ultimately made of?”. Well, now it was known that ordinary propositional logic could be built up from very simple elements. So what about the other things used in mathematics—like functions and predicates? Was there a simple way of building these up too? People like
Frege,
Whitehead and
Russell had all been concerned with constructing speci fi c things—like sets or numbers—that would have immediate mathematical meaning. But
Hilbert’s work in the late began to highlight the idea of looking instead at metamathe - matics and the “mechanism of mathematics”—and in e ff ect at how the pure symbolic infrastructure of mathematics fi ts together (through proofs, etc.), independent of any immediate “external” mathematical meaning. Much as Aristotle and subsequent logicians had used (propositional) logic to de fi ne a “symbolic structure” for arguments, independent of their subject matter, so too did Hilbert’s program imagine a general “symbolic structure” for mathematics, independent of particular mathematical subject matter. And this is what fi nally set the stage for the invention of combinators. Combinator s A rrive We don’t know how long it took Moses
Schön fi nkel to come up with combinators. From what we know of his personal history , it could have been as long as a decade. But it could also have been as short as a few weeks. There’s no advanced math or advanced logic involved in de fi ning combinators. But to drill through the layers of technical detail of mathematical logic to realize that it’s even conceiv - able that everything can be de fi ned in terms of them is a supreme achievement of a kind of abstract reductionism. There is much we don’t know about Schön fi nkel as a person. But the paper he wrote on the basis of his December talk in which he introduced combinators is extremely clear. The paper is entitled “On the Building
Blocks of Mathematical
Logic” (in the original
Ger - man, “ Ü b e r die B a u s t e i n e d e r m a t h e m a t i s c h e n L o g i k ”.) In other words, its goal is to talk about “atoms” from which mathematical logic can be built. Schön fi nkel explains that it’s “in C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ogic explains the spirit of” Hilbert’s axiomatic method to build everything from as few notions as possible; then he says that what he wants to do is to “seek out those notions from which we shall best be able to construct all other notions of the branch of science in question”. His fi rst step is to explain that Hilbert,
Whitehead,
Russell and
Frege all set up mathematical logic in terms of standard And,
Or,
Not, etc. connectives—but that
She ff er had recently been able to show that just a single connective (indicated by a stroke “|”—and what we would now call Nand) was su ffi cient: But in addition to the “content” of these relations, I think Schön fi nkel was trying to communi - cate by example something else: that all these logical connectives can ultimately be thought of just as examples of “abstract symbolic structures” with a certain “function of arguments” (i.e. f [ x , y ]) form. The next couple of paragraphs talk about how the quanti fi ers “for all” ( ∀ ) and “there exists” ( ∃ ) can also be simpli fi ed in terms of the She ff er stroke (i.e. Nand).
But then comes the rallying cry: “The successes that we have encountered thus far ... encourage us to attempt further progress.” And then he’s ready for the big idea—which he explains “at fi rst glance certainly appears extremely bold”. He proposes to “eliminate by suitable reduction the remaining fundamental concepts of proposition, function and variable”. He explains that this only makes sense for “arbitrary, logically general propositions”, or, as we’d say now, for purely symbolic constructs without speci fi c meanings yet assigned. In other words, his goal is to create a general framework for operating on arbitrary symbolic expressions independent of their interpretation. He explains that this is valuable both from a “methodological point of view” in achieving “the greatest possible conceptual uniformity”, but also from a certain philosophical or perhaps aesthetic point of view. And in a sense what he was explaining—back in something that’s been a core part of the computational language design that I’ve done for the past years: that everything can be represented as a symbolic expression, and that there’s tremendous value to this kind of uniformity. | S t e p h e n W o l f r a m ut as a “language designer” Schön fi nkel was an ultimate minimalist. He wanted to get rid of as many notions as possible—and in particular he didn’t want variables, which he explained were “nothing but tokens that characterize certain argument places and operators as belonging together”; “mere auxiliary notions”. Today we have all sorts of mathematical notation that’s at least somewhat “variable free” (think coordinate-free notation, category theory, etc.) But in mathematics as it was written was full of variables. And it needed a serious idea to see how to get rid of them. And that’s where
Schön fi nkel starts to go “even more symbolic”. He explains that he’s going to make a kind of “functional calculus” ( F u n k t i o n a l k a l k ü l ). He says that normally functions just de fi ne a certain correspondence between the domain of their arguments, and the domain of their values. But he says he’s going to generalize that— and allow (“disembodied”) functions to appear as arguments and values of functions. In other words, he’s inventing what we’d now call higher-order functions, where functions can operate “symbolically” on other functions. In the context of traditional calculus-and-algebra-style mathematics it’s a bizarre idea. But really it’s an idea about computation and computational structures—that’s more abstract and ultimately much more general than the mathematical objectives that inspired it. But back to Schön fi nkel’s paper. His next step is to explain that once functions can have other functions as arguments, functions only ever need to take a single argument. In modern ( Wolfram
Language ) notation he says that you never need f [ x , y ] ; you can always do everything with f [ x ] [ y ] . In something of a sleight of hand, he sets up his notation so that f x y z (which might look like a function of three arguments f [ x , y , z ] ) actually means ((( f x ) y ) z ) (i.e. f [ x ] [ y ] [ z ] ). (In other words— somewhat confusingly with respect to modern standard functional notation—he takes function application to be le ft associative . ) Again, it’s a bizarre idea—though actually Frege had had a similar idea many years earlier (and now the idea is usually called currying, a ft er Haskell
Curry , who we’ll be talking about later). But with his “functional calculus” set up, and all functions needing to take only one argument, Schön fi nkel is ready for his big result. He’s e ff ectively going to argue that by combining a small set of particular functions he can construct any possible symbolic function—or at least anything needed for predicate logic. He calls them a “sequence of particular functions of a very general nature”. Initially there are fi ve of them: the identity function ( I d e n t i t ä t s f u n k t i o n ) I , the constancy function ( K o n s t a n z f u n k t i o n ) C (which we now call K ), the interchange function ( V e r t a u s c h u n g s f u n k t i o n ) T , the composition function ( Z u s a m m e n s e t z u n g s f u n k t i o n ) Z , and the fusion function ( V e r s c h m e l z u n g s f u n k t i o n ) S . C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | nd then he’s o ff and running de fi ning what we now call combinators . The de fi nitions look simple and direct. But to get to them Schön fi nkel e ff ectively had to cut away all sorts of conceptual baggage that had come with the historical development of logic and mathematics. Even talking about the identity combinator isn’t completely straightforward.
Schön fi nkel carefully explains that in I x = x , equality is direct symbolic or structural equality, or as he puts it “the equal sign is not to be taken to represent logical equivalence as it is ordi narily de fi ned in the propositional calculus of logic but signi fi es that the expressions on the le ft and on the right mean the same thing, that is, that the function value l x is always the same as the argument value x , whatever we may substitute for x .” He then adds parenthetically, “Thus, for instance, I I would be equal to I ”. And, yes, to someone used to the mathematical idea that a function takes values like numbers, and gives back numbers, this is a bit mind-blowing. Next he explains the constancy combinator, that he called C (even though the German word for it starts with K ), and that we now call K . He says “let us assume that the argument value is again arbitrary without restric tion, while, regardless of what this value is, the function value will always be the fi xed value a ”. And when he says “arbitrary” he really means it: | S t e p h e n W o l f r a m it’ s no t jus t a n umbe r or s omething ; i t’ s w ha t we w oul d no w t hin k of as a n y symbolic expression. First he writes ( C a ) y = a , i.e. the value of the “constancy function C a operating on any y is a ”, then he says to “let a be variable too”, and de fi nes ( C x ) y = x or C x y = x . Helpfully, almost as if he were writing computer documentation, he adds: “In practical applications C serves to permit the introduction of a quantity x as a ‘blind’ variable.” Then he’s on to T . In modern notation the de fi nition is T [ f ] [ x ] [ y ] = f [ y ] [ x ] (i.e. T is essentially R e v e r s e A p p l i e d ). (He wrote the de fi nition as ( T ϕ ) x y = ϕ y x , explaining that the parentheses can be omitted.) He justi fi es the idea of T by saying that “The function T makes it possible to alter the order of the terms of an expression, and in this way it compensates to a certain extent for the lack of a commutative law.” Next comes the composition combinator Z . He explains that “In [mathematical] analysis, as is well known, we speak loosely of a ‘function of a function’...”, by which he meant that it was pretty common then (and now) to write something like f ( g ( x )). But then he “went symbolic”—and de fi ned a composition function that could symbolically act on any two functions f and g : Z [ f ] [ g ] [ x ] = f [ g [ x ]]. He explains that Z allows one to “shi ft parentheses” in an expression: i.e. whatever the objects in an expression might be, Z allows one to transform [][][] to [[]] etc. But in case this might have seemed too abstract and symbolic, he then attempted to explain in a more “algebraic” way that the e ff ect of Z is “somewhat like that of the associative law” (though, he added, the actual associative law is not satis fi ed). Finally comes the p i è c e de r é s i s t a n c e : the S combinator (that Schön fi nkel calls the “fusion function”): C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | e doesn’t take too long to de fi ne it. He basically says: consider ( f x ) ( g x ) (i.e. f [ x ] [ g [ x ] ] ). This is really just “a function of x ”. But what function?
It’s not a composition of f and g ; he calls it a “fusion”, and he de fi nes the S combinator to create it: S [ f ] [ g ] [ x ] = f [ x ] [ g [ x ]]. It’s pretty clear
Schön fi nkel knew this kind of “symbolic gymnastics” would be hard for people to understand. He continues: “It will be advisable to make this function more intelligible by means of a practical example.” He says to take f x y (i.e. f [ x ] [ y ] ) to be log x y (i.e. L o g [ x , y ] ), and g z (i.e. g [ z ] ) to be + z . Then
Sfgx = ( fx ) ( gx ) = log x ( + x ) (i.e. S [ f ] [ g ] [ x ] = f [ x ] [ g [ x ] ] = L o g [ x , + x ] ). And,
OK, it’s not obvious why one would want to do that, and I’m not rushing to make S a built-in function in the Wolfram
Language . But
Schön fi nkel explains that for him “the practical use of the function S will be to enable us to reduce the number of occurrences of a variable—and to some extent also of a particular function—from several to a single one”. Setting up everything in terms of fi ve basic objects I , C (now K ), T , Z and S might already seem impressive and minimalist enough. But
Schön fi nkel realized that he could go even further: | S t e p h e n W o l f r a m irst, he says that actually I = S C C (or, in modern notation, s [ k ] [ k ] ). In other words, s [ k ] [ k ] [ x ] for symbolic x is just equal to x (since s [ k ] [ k ] [ x ] becomes k [ x ] [ k [ x ] ] by using the de fi nition of S , and this becomes x by using the de fi nition of C ). He notes that this particular reduction was communicated to him by a certain Alfred B o s k o w i t z (who we know to have been a student at the time); he says that Paul
Bernays (who was more of a colleague) had “some time before” noted that I = ( S C ) ( C C ) (i.e. s [ k ] [ k [ k ] ] ). Today, of course, we can use a computer to just enumer - ate all possible combinator expressions of a particular size, and fi nd what the smallest reduction is. But in Schön fi nkel’s day, it would have been more like solving a puzzle by hand. Schön fi nkel goes on, and proves that Z can also be reduced: Z = S ( C S ) C (i.e. s [ k [ s ] ] [ k ] ). And, C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | oes proves yes, a very simple Wolfram
Language program can verify in a few milliseconds that that is the simplest form. OK, what about T ? Schön fi nkel gives steps of reduction to prove that T = S ( Z Z S ) ( C C ) (i.e. s [ s [ k [ s ] ] [ k ] [ s [ k [ s ] ] [ k ] ] [ s ] ] [ k [ k ] ] ). But is this the simplest possible form for T ? Well, no.
But (with the very straightforward
Wolfram
Language program I wrote) it did take my modern computer a number of minutes to determine what the simplest form is. The answer is that it doesn’t have size like Schön fi nkel’s, but rather size Actually, there are cases of size that all work: s [ s [ k [ s ] ] [ s [ k [ k ] ] [ s ] ] ] [ k [ k ] ] ( S ( S ( K S ) ( S ( K K ) S ) ) ( K K ))) and fi ve others. And, yes, it takes a few steps of reduction to prove that they work (the other size-9 cases S ( S S K ( K ( S S ( K K ) ) ) ) S , S ( S ( K ( S ( K S ) K ) ) S ) ( K K ), S ( K ( S ( S ( K S ) K ) ( K K ) ) ) S , S ( K ( S S ( K K ) ) ) ( S ( K K ) S ) , S ( K ( S ( K ( S S ( K K ) ) ) K ) ) S all have more complicated reductions): But,
OK, what did
Schön fi nkel want to do with these objects he’d constructed? As the title of his paper suggests, he wanted to use them as building blocks for mathematical logic. He begins: “Let us now apply our results to a special case, that of the calculus of logic in which the basic elements are individuals and the functions are propositional functions.” I consider this sentence signi fi cant. Schön fi nkel didn’t have a way to express it (the concept of univer - sal computation hadn’t been invented yet), but he seems to have realized that what he’d done was quite general, and went even beyond being able to represent a particular kind of logic. Still, he went on to give his example. He’d explained at the beginning of the paper that the quanti fi ers we now call ∀ and ∃ could both be represented in terms of a kind of “quanti fi ed Nand” that he wrote | x : | S t e p h e n W o l f r a m ut now he wanted to “combinator- i f y ” everything. So he introduced a new combinator U , and de fi ned it to represent his “quanti fi ed Nand”: U f g = f x | x g x (he called U the “incompatibil - ity function”—an interesting linguistic description of Nand): C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | It is a remarkable fact”, he says, “that every formula of logic can now be expressed by means... solely of C , S and U .” So he’s saying that any expression from mathematical logic can be written out as some combinator expression in terms of S , C (now K ) and U . He says that when there are q u a n t i f i e r s like “for all x ...” it’s always possible to use combinators to get rid of the “bound variables” x , etc. He says that he “will not give the complete demonstration here”, but rather content himself with an example. (Unfortunately—for reasons of the trajectory of his life that are still quite unclear—he never published his “complete demonstration”.) But,
OK, so what had he achieved? He’d basically shown that any expression that might appear in predicate logic (with logical connectives, quanti fi ers, variables, etc.) could be reduced to an expression purely in terms of the combinators S , C (now K ) and U . Did he need the U ? Not really.
But he had to have some way to represent the object with mathematical or logical “meaning” on which his combinators would be acting. Today the obvious thing to do would be to have a representation for true and false. And what’s more, to represent these purely in terms of combinators . For example, if we took K to represent true, and S K ( s [ k ] ) to represent false, then And can be represented as S S K ( s [ s ] [ k ] ), Or as S ( S S ) S ( S K ) ( s [ s [ s ] ] [ s ] [ s [ k ] ] ) and Nand as S ( S ( K ( S ( S S ( K ( K K ) ) ) ) ) ) S ( s [ s [ k [ s [ s [ s ] [ k [ k [ k ] ] ] ] ] ] ] [ s ] ). Schön fi nkel got amazingly far in reducing everything to his “building blocks”. But, yes, he missed this fi nal step. | S t e p h e n W o l f r a m tep. But given that he’d managed to reduce everything to S , C and U he fi gured he should try to go further. So he considered an object J that would be a single building block of S and C : J J = S and J ( J J ) = C . With S and K one can just point to any piece of an expression and see if it reduces. With J it’s a bit more complicated. In modern Wolfram
Language terms one can state the rules as { j [ j ][ x _][ y _][ z _] (cid:1) x [ z ][ y [ z ]] , j [ j [ j ]][ x _][ y _] (cid:1) x } (where order matters) but to apply these requires pattern matching “clusters of J ’s” rather than just looking at single S ’s and K ’s at a time. But even though—as
Schön fi nkel observed—this “ fi nal reduction” to J didn’t work out, getting everything down to S and K was already amazing. At the beginning of the paper, Schön fi nkel had described his objectives. And then he says “It seems to me remarkable in the extreme that the goal we have just set can be realized also; as it happens, it can be done by a reduction to three fundamental signs.” (The paper does say three fundamental signs, presumably counting U as well as S and K . ) I’m sure
Schön fi nkel expected that to reproduce all the richness of mathematical logic he’d need quite an elaborate set of building blocks. And certainly people like
Frege,
Whitehead and
Russell had used what were eventually very complicated setups.
Schön fi nkel managed to cut through all the complexity to show that simple building blocks were all that was needed. But then he found something else: that actually just two building blocks ( S and K ) were enough. In modern terms, we’d say that Schön fi nkel managed to construct a system capable of universal computation. And that’s amazing in itself. But even more amazing is that he found he could do it with such a simple setup. I’m sure
Schön fi nkel was extremely surprised. And here I personally feel a certain common - ality with him. Because in my own explorations of the computational universe, what I’ve found over and over again is that it takes only remarkably simple systems to be capable of highly complex behavior—and of universal computation. And even a ft er exploring the computational universe for four decades, I’m still continually surprised at just how simple the systems can be. C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ontinually surprised just simple For me, this has turned into a general principle—the Principle of Computational
Equiva l e n c e —and a whole conceptual framework around it. Schön fi nkel didn’t have anything like that to think in terms of. But he was in a sense a good enough scientist that he still managed to discover what he discovered—that many decades later we can see fi ts in as another piece of evidence for the Principle of Computational
Equivalence.
Looking at Schön fi nkel’s paper a century later, it’s remarkable not only for what it discovers, but also for the clarity and simplicity with which it is presented. A little of the notation is now dated (and of course the original paper is written in German, which is no longer the kind of leading language of scholarship it once was). But for the most part, the paper still seems perfectly modern.
Except, of course, that now it could be couched in terms of sym - bolic expressions and computation, rather than mathematical logic. Wha t Is T hei r M athematics?
Combinators are hard to understand, and it’s not clear how many people understood them when they were fi rst introduced—let alone understood their implications. It’s not a good sign that when Schön fi nkel’s paper appeared in the person who helped prepare it for fi nal publication ( Heinrich B e h m a n n ) added his own three paragraphs at the end, that were quite confused. And
Schön fi nkel’s sole other published paper —coauthored with Paul
Bernays in even mention combinators, even though they could have very pro fi tably been used to discuss the subject at hand (decision problems in mathematical logic). But in combinators (if not perhaps Schön fi nkel’s recognition for them) had a remark able piece of good fortune. Schön fi nkel’s paper was discovered by a certain Haskell
Curry — who would then devote more than years to studying what he named “combinators”, and to spreading the word about them. At some level I think one can view the main thrust of what Curry and his disciples did with combinators as an e ff ort to “ m a t h e m a t i c i z e ” them. Schön fi nkel had presented combinators in a rather straightforward “structural” way. But what was the mathematical interpretation of what he did, and of how combinators work in general? What mathematical formalism could capture
Schön fi nkel’s structural idea of substitution? Just what, for example, was the true notion of equality for combinators? In the end, combinators are fundamentally computational constructs, full of all the p h e n o m - ena of “unbridled computation” —like u n d e c i d a b i l i t y and computational irreducibility . And it’s inevitable that mathematics as normally conceived can only go so far in “cracking” them. But back in the and the concept and power of computation was not yet under - stood, and it was assumed that the ideas and tools of mathematics would be the ones to use in analyzing a formal system like combinators. And it wasn’t that mathematical methods got absolutely nowhere with combinators. | S t e p h e n W o l f r a m ot Unlike cellular automata , or even Turing machines , there’s a certain immediate s t r u c t u r a l complexity to combinators , with their elaborate tree structures, equivalences and so on. And so there was progress to be made—and years of work to be done—in untangling this, without having to face the raw features of full-scale computation, like computational irreducibility. In the end, combinators are full of computational irreducibility. But they also have layers of computational reducibility, some of which are aligned with the kinds of things mathematics and mathematical logic have been set up to handle. And in this there’s a curious resonance with our recent Physics
Project . In our models based on hypergraph rewriting there’s also a kind of bedrock of computa - tional irreducibility. But as with combinators, there’s a certain immediate structural complex - ity to what our models do. And there are layers of computational reducibility associated with this. But the remarkable thing with our models is that some of those layers—and the for - malisms one can build to understand them—have an immediate interpretation: they are basically the core theories of twentieth-century physics , namely general relativity and quantum mechanics. Combinators work su ffi ciently di ff erently that they don’t immediately align with that kind of interpretation. But it’s still true that one of the important properties discovered in combina - tors (namely con fl uence, related to our idea of causal invariance ) turns out to be crucial to our models, their correspondence with physics, and in the end our whole ability to perceive regularity in the universe, even in the face of computational irreducibility. But let’s get back to the story of combinators as it played out a ft er Schön fi nkel’s paper. Schön fi nkel had basically set things up in a novel, very direct, structural way. But
Curry wanted to connect with more traditional ideas in mathematical logic, and mathematics in general. And a ft er a fi rst paper (published in which pretty much just recorded his fi rst thoughts, and his e ff orts to understand what Schön fi nkel had done, Curry was by starting to do things like formulate axioms for combinators, and hoping to prove general theorems about mathematical properties like equality. Without the understanding of universal computation and their relationship to it, it wasn’t clear yet how complicated it might ultimately be to deal with combinators. And
Curry pushed forward, publishing more papers and trying to do things like de fi ne set theory using his axioms for combinators. But in disaster struck. It wasn’t something about computa - tion or undecidability; instead it was that Stephen
Kleene and J. Barkley
Rosser showed the axioms
Curry had come up to try and “tighten up Schön fi nkel” with were just plain i n c o n s i s t e n t . To Kleene and
Rosser it provided more evidence of the need for Russell’s (originally quite hacky) idea of types—and led them to more complicated axiom systems, and away from combinators. But
Curry was undeterred. He revised his axiom system and continued— C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ltimately for many decades—to see what could be proved about combinators and things like them using mathematical methods. But already at the beginning of the there were bigger things afoot around mathemati - cal logic—which would soon intersect with combinators. Gödel’ s T heore m a n d C omputability How should one represent the fundamental constructs of mathematics? Back in the nobody thought seriously about using combinators. And instead there were basically three “big brands”:
Principia
Mathematica , set theory and Hilbert’s program.
Relations were being found, details were being fi lled in, and issues were being found. But there was a general sense that progress was being made. Quite where the boundaries might lie wasn’t clear.
For example, could one specify a way to “construct any function” from lower-level primitives? The basic idea of recursion was very old (think: F i b o n a c c i ). But by the early there was a fairly well-formalized notion of “ primitive recursion ” in which functions always found their values from earlier values. But could all “mathematical” functions be constructed this way? By it was known that this wouldn’t work: the Ackermann function was a reasonable “mathematical” function, but it wasn’t primitive recursive. It meant that de fi nitions had to be generalized (e.g. to “ general recursive functions ” that didn’t just look back at earlier values, but could “look forward until...” as well). But there didn’t seem to be any fundamen - tal problem with the idea that mathematics could just “mechanistically” be built out forever from appropriate primitives. But in came Gödel’s theorem . There’d been a long tradition of identifying paradoxes and inconsistencies, and fi nding ways to patch them by changing axioms. But
Gödel’s theorem was based on Peano’s by-then-standard axioms for arithmetic (branded by Gödel as a fragment of Principia
Mathematica ). And it showed there was a fundamental problem. In essence, Gödel took the paradoxical statement “this statement is unprovable” and showed that it could be expressed purely as a statement of arithmetic—roughly a statement about the existence of solutions to appropriate integer equations. And basically what
Gödel had to do to achieve this was to create a “compiler” capable of compiling things like “this statement is unprovable” into arithmetic. In his paper one can basically see him building up di ff erent capabilities (e.g. representing arbitrary expressions as numbers through Gödel numbering, checking conditions using general recursion, etc.)—eventually getting to a “high enough level” to represent the state - ment he wanted: | S t e p h e n W o l f r a m hat did Gödel’s theorem mean?
For the foundations of mathematics it meant that the idea of mechanically proving “all true theorems of mathematics” wasn’t going to work. Because it showed that there was at least one statement that by its own admission couldn’t be proved, but was still a “statement about arithmetic”, in the sense that it could be “compiled into arithmetic”. That was a big deal for the foundations of mathematics. But actually there was something much more signi fi cant about Gödel’s theorem, even though it wasn’t recognized at the time. Gödel had used the primitives of number theory and logic to build what amounted to a computational system—in which one could take things like “this statement is unprovable”, and “ run them in arithmetic ”. What
Gödel had, though, wasn’t exactly a streamlined general system (a ft er all, it only really needed to handle one statement). But the immediate question then was: if there’s a problem with this statement in arithmetic, what about Hilbert’s general “decision problem” ( E n t s c h e - i d u n g s p r o b l e m ) for any axiom system? To discuss the “general decision problem”, though, one needed some kind of general notion of how one could decide things. What ultimate primitives should one use?
Schön fi nkel (with Paul
Bernays)—in his sole other published paper—wrote about a restricted case of the decision problem in but doesn’t seem to have had the idea of using combinators to study it. C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | y Gödel was talking about general r e c u r s i v e n e s s (i.e. de fi nability through general recursion). And
Alonzo
Church and
Stephen
Kleene were introducing λ de fi nability . Then in Alan
Turing introduced
Turing machines . All these approaches involved setting up certain primitives, then showing that a large class of things could be “compiled” to those primitives. And that—in e ff ect by thinking about having it compile itself—Hilbert’s E n t s c h e i - d u n g s p r o b l e m couldn’t be solved. Perhaps no single result along these lines would have been so signi fi cant. But it was soon established that all three kinds of systems were exactly equivalent: the set of computations they could represent were the same, as established by showing that one system could emulate another. And from that discovery eventually emerged the modern notion of univer - sal computation—and all its implications for technology and science. In the early days, though, there was actually a fourth equivalent kind of system—based on string rewriting—that had been invented by Emil
Post in Oh, and then there were combinators.
Lambd a C alculus
What was the right “language” to use for setting up mathematical logic? There’d been gradual improvement since the complexities of Principia
Mathematica . But around
Alonzo
Church wanted a new and cleaner setup. And he needed to have a way (as Frege and
Principia
Mathematica had done before him) to represent “ pure functions ”. And that’s how he came to invent λ . Today in the Wolfram
Language we have F u n c t i o n [ x, f [ x ]] or x (cid:1) f [ x ] (or various shorthands). Church originally had λ x [ M ] : | S t e p h e n W o l f r a m ut what’s perhaps most notable is that on the very fi rst page he de fi nes λ , he’s referencing Schön fi nkel’s combinator paper. (Well, speci fi cally, he’s referencing it because he wants to use the device Schön fi nkel invented that we now call currying— f [ x ] [ y ] in place of f [ x , y ] — though ironically he doesn’t mention Curry.) In his paper (apparently based on work in λ is almost a sideshow—the main event being the introduction of formal postu - lates for mathematical logic: C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | y the next year J. Barkley
Rosser is trying to retool Curry’s “combinatory logic” with combinators of his own—and showing how they correspond to lambda expressions: Then in lambda calculus has its big “coming out” in Church’s “An
Unsolvable
Problem of Elementary
Number
Theory”, in which he introduces the idea that any “e ff ectively calculable” function should be “ λ de fi nable”, then de fi nes integers in terms of λ ’ s ( “ Church numerals ” ) | S t e p h e n W o l f r a m nd then shows that the problem of determining equivalence for λ expressions is undecidable. Very soon therea ft er T u r i n g publishes his “On Computable
Numbers, with an Application to the E n t s c h e i d u n g s p r o b l e m ” in which he introduces his much more manifestly mechanistic Turing machine model of computation. In the main part of the paper there are no lambdas— or combinators—to be seen. But by late Turing had gone to Princeton to be a student with Church—and added a note showing the correspondence between his Turing machines and
Church’s lambda calculus. By the next year, when Turing is writing his rather abstruse “Systems of Logic
Based on Ordinals” he’s using lambda calculus all over the place.
Early in the document he writes I (cid:1) λ x [ x ] , and soon he’s mixing lambdas and combinators with wild abandon—and in fact he’d already published a one-page paper which introduced the fi xed-point combinator Θ (and, yes, the K in the title refers to Schön fi nkel’s K combinator): When
Church summarized the state of lambda calculus in in his “The Calculi of Lambda-Conversion” he again made extensive use of combinators. Schön fi nkel’s K is prominent. But
Schön fi nkel’s S is nowhere to be seen—and in fact Church has his own S C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | combinator S [ n ] [ f ] [ x ] (cid:1) f [ n [ f ] [ x ]] which implements successors in Church’s numeral system.
And he has also has a few other “basic combinators” that he routinely uses. In the end, combinators and lambda calculus are completely equivalent, and it’s quite easy to convert between them—but there’s a curious tradeo ff . In lambda calculus one names variables, which is good for human readability, but can lead to problems at a formal level. In combinators, things are formally much cleaner, but the expressions one gets can be com - pletely incomprehensible to humans. The point is that in a lambda expression like λ x λ y x [ y ] one’s naming the variables (here x and y ), but really these names are just placeholders: what they are doesn’t matter; they’re just showing where di ff erent arguments go. And in a simple case like this, everything is fi ne. But what happens if one substitutes for y another lambda expression, say λ x f [ x ]? What is that x ? Is it the same x as the one outside, or something di ff erent? In practice, there are all sorts of renaming schemes that can be used, but they tend to be quite hacky, and things can quickly get tangled up. And if one wants to make formal proofs about lambda calculus, this can potentially be a big problem, and indeed at the beginning it wasn’t clear it wouldn’t derail the whole idea of lambda calculus. And that’s part of why the correspondence between lambda calculus and combinators was important. With combinators there are no variables, and so no variable names to get tangled up. So if one can show that something can be converted to combinators—even if one never looks at the potentially very long and ugly combinator expression that’s generated—one knows one’s safe from issues about variable names. There are still plenty of other complicated issues, though. Prominent among them are questions about when combinator expressions can be considered equal. Let’s say you have a combinator expression, like s [ s [ s [ s ] [ k ] ] ] [ k ] . Well, you can repeatedly apply the rules for combinators to transform and reduce it. And it’ll o ft en end up at a fi xed point, where no rules apply anymore. But a basic question is whether it matters in which order the rules are applied. And in Church and
Rosser proved it doesn’t . Actually, what they speci fi cally proved was the analogous result for lambda calculus. They drew a picture to indicate di ff erent possible orders in which lambdas could be reduced out, and showed it didn’t matter which path one takes: | S t e p h e n W o l f r a m his all might seem like a detail. But it turns out that generalizations of their result apply to all sorts of systems. In doing computations (or automatically proving theorems ) it’s all about “it doesn’t matter what path you take; you’ll always get the same result”. And that’s impor - tant. But recently there’s been another important application that’s shown up. It turns out that a generalization of the “Church–Rosser property” is what we call causal invariance in our Physics
Project . And it’s causal invariance that leads in our models to relativistic invariance, general covari - ance, objective reality in quantum mechanics, and other central features of physics. Practica l C omputation In retrospect, one of the great achievements of the was the inception of what ended up being the idea of universal computation. But at the time what was done was couched in terms of mathematical logic and it was far from obvious that any of the theoretical struc - tures being built would have any real application beyond thinking about the foundations of mathematics. But even as people like Hilbert were talking in theoretical terms about the mechanization of mathematics, more and more there were actual machines being built for doing mathematical calculations. We know that even in antiquity (at least one) simple gear-based mechanical calculational d e v i c e s existed. In the mid-1600s arithmetic calculators started being constructed, and by the late they were in widespread use. At fi rst they were mechanical, but by the most were electromechanical, and there started to be systems where units for carrying out di ff erent arithmetic operations could be chained together. And by the end of the fairly elaborate such systems based on electronics were being built. Already in the Charles
Babbage had imagined an “analytical engine” which could do di ff erent operations depending on a “program” speci fi ed by punch cards—and Ada
Lovelace had realized that such a machine had broad “computational” potential. But by the a century had passed and nothing like this was connected to the theoretical developments that C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | entury passed nothing developments were going on—and the actual engineering of computational systems was done without any particular overarching theoretical framework. Still, as electronic devices got more complicated and scienti fi c interest in psychology intensi fi ed, something else happened: there started to be the idea (sometimes associated with the name cybernetics) that somehow electronics might reproduce how things like brains work. In the mid-1930s Claude
Shannon had shown that
Boolean algebra could represent how switching circuits work, and in Warren
McCulloch and
Walter
Pitts proposed a model of idealized neural networks formulated in something close to mathemati - cal logic terms. Meanwhile by the mid-1940s John von
Neumann —who had worked extensively on mathemat - ical logic—had started suggesting math-like speci fi cations for practical electronic comput - ers, including the way their programs might be stored electronically. At fi rst he made lots of brain-like references to “organs” and “inhibitory connections”, and essentially no mention of ideas from mathematical logic. But by the end of the von Neumann was talking at least conceptually about connections to Gödel’s theorem and
Turing machines,
Alan
Turing had become involved with actual electronic computers, and there was the beginning of widespread understanding of the notion of general-purpose computers and universal c o m p u t a t i o n . In the there was an explosion of interest in what would now be called the theory of computation—and great optimism about its relevance to arti fi cial intelligence. There was all sorts of “interdisciplinary work” on fairly “concrete” models of computation, like fi nite automata, Turing machines, cellular automata and idealized neural networks.
More “abstract” approaches, like recursive functions, lambda calculus—and combinators— remained, however, pretty much restricted to researchers in mathematical logic. When early programming languages started to appear in the latter part of the think - ing about practical computers began to become a bit more abstract. It was understood that the grammars of languages could be speci fi ed recursively—and actual recursion (of func - tions being able to call themselves) just snuck into the speci fi cation of ALGOL . But what about the structures on which programs operated? Most of the concentration was on arrays (sometimes rather elegantly, as in A P L ) and, occasionally, character strings. But a notable exception was L I S P , described in John
McCarthy ’s paper “Recursive Functions of Symbolic
Expressions and
Their
Computation by Machine,
Part I” (part was not written). There was lots of optimism about AI at the time, and the idea was to create a language to “implement AI”—and do things like “mechanical theorem proving”. A key idea— that McCarthy described as being based on “recursive function formalism”—was to have tree-structured symbolic expressions (“ S expressions”). (In the original paper, what’s now Wolfram
Language–style f [ g [ x ] ] “ M expression” notation, complete with square brackets, was used as part of the speci fi cation, but the quintessential-LISP-like ( f ( g x )) notation won out when LISP was actually implemented.) | S t e p h e n W o l f r a m n issue in LISP was how to take “expressions” (which were viewed as representing things) and turn them into functions (which do things). And the basic plan was to use Church’s idea of λ notation. But when it came time to implement this, there was, of course, trouble with name collisions, which ended up getting handled in quite hacky ways. So did McCarthy know about combinators?
The answer is yes, as his paper shows: C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | actually didn’t know until just now that McCarthy had ever even considered combinators, and in the years I knew him I don’t think I ever personally talked to him about them. But it seems that for McCarthy—as for
Church—combinators were a kind of “comforting backstop” that ensured that it was OK to use lambdas, and that if things went too badly wrong with variable naming, there was at least in principle always a way to untangle everything. In the practical development of computers and computer languages, even lambdas—let alone combinators—weren’t really much heard from again (except in a small AI circle) until the And even then it didn’t help that in an e ff ort variously to stay close to hardware and to structure programs there tended to be a desire to give everything a “data type”— which was at odds with the “consume any expression” approach of standard combinators and lambdas. But beginning in the with the progressive rise of f u n c t i o n a l p r o g r a m m i n g —lambdas, at least, have steadily gained in visibility and practical application. What of combinators? Occasionally as a proof of principle there’ll be a hardware system developed that natively implements Schön fi nkel’s combinators. Or—particularly in modern times—there’ll be an esoteric language that uses combinators in some kind of purposeful e ff ort at obfuscation. Still, a remarkable cross-section of notable people concerned with the foundations of computing have—at one time or another—taught about combinators or written a paper about them. And in recent years the term “combinator” has become more popular as a way to describe a “purely applicative” function. But by and large the important ideas that fi rst arose with combinators ended up being absorbed into practical computing by quite circuitous routes, without direct reference to their origins, or to the speci fi c structure of combinators. Combinator s in C ulture
For years combinators have mostly been an obscure academic topic, studied particularly in connection with lambda calculus, at borders between theoretical computer science, mathematical logic and to some extent mathematical formalisms like category theory. Much | S t e p h e n W o l f r a m ogic category theory. of the work that’s been done can be traced in one way or another to the in fl uence of Haskell
Curry or Alonzo
Church—particularly through their students , g r a n d s t u d e n t s , g r e a t - g r a n d s t u - d e n t s , etc. Partly in the early years, most of the work was centered in the US, but by the there was a strong migration to Europe and especially the
Netherlands.
But even with all their abstractness and obscurity, on a few rare occasions combinators have broken into something closer to the mainstream. One such time was with the popular logic- puzzle book To Mock a Mockingbird , published in by Raymond
Smullyan —a former student of Alonzo
Church’s. It begins: “A certain enchanted forest is inhabited by talking birds” and goes on to tell a story that’s basically about combinators “dressed up” as birds calling each other ( S is the “starling”, K the “kestrel”)—with a convenient “bird who’s who” at the end. The book is dedicated “To the memory of Haskell
Curry—an early pioneer in combinatory logic and an avid bird-watcher”. And then there’s Y Combinator . The original Y combinator arose out of work that Curry did in the on the consistency of axiom systems for combinators, and it appeared explicitly in his classic book: C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | e called it the “paradoxical combinator” because it was recursively de fi ned in a kind of self- referential way analogous to various paradoxes. Its explicit form is S S K ( S ( K ( S S ( S ( S S K ) ) ) ) K ) and its most immediately notable feature is that under Schön fi nkel’s combinator transforma - tion rules it never settles down to a particular “value” but just keeps growing forever. Well, in Paul
Graham —who had long been an enthusiast of functional programming and LISP—decided to name his new (and now very famous) startup accelerator “Y Combina - tor”. I remember asking him why he’d called it that. “Because,” he said, “nobody under - stands the Y combinator”. Looking in my own archives from that time I fi nd an email I sent a combinator enthusiast who was working with me: Followed by, basically, “Yes our theorem prover can prove the basic property of the Y combinator” ( V sounds so ancient; we’re now just about to release V12 .2 ) : | S t e p h e n W o l f r a m had another unexpected encounter with combinators last year. I had been given a book that was once owned by Alan
Turing, and in it I found a piece of paper—that I recognized as being covered with none other than lambdas and combinators (but that’s not t h e Y combinator): It took quite a bit of sleuthing ( that I wrote extensively about )—but I eventually discovered that the piece of paper was written by Turing’s student
Robin G a n d y . But I never fi gured out why he was doing combinators.... C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | esignin g S ymboli c L anguage I think I fi rst found out about combinators around by seeing Schön fi nkel’s original paper in a book called From
Frege to Gödel: A Source
Book in Mathematical
Logic (by a certain Jean van H e i j e n o o r t ). How
Schön fi nkel’s paper ended up being in that book is an interesting question, which I’ll write about elsewhere . The spine of my copy of the book has long been broken at the location of Schön fi nkel’s paper, and at di ff erent times I’ve come back to the paper, always thinking there was more to understand about it. But why was I even studying things like this back in ? I guess in retrospect I can say I was engaged in an activity that goes back to Frege or even L e i b n i z : I was trying to fi nd a fundamental framework for representing mathematics and beyond. But my goal wasn’t a philosophical one; it was a very practical one: I was trying to build a computer language that could do general computations in mathematics and beyond. My immediate applications were in physics, and it was from physics that my main method - ological experience came. And the result was that—like trying to understand the world in terms of elementary particles—I wanted to understand computation in terms of its most fundamental elements. But I also had lots of practical experience in using computers to do mathematical computation. And I soon developed a theory about how I thought computa - tion could fundamentally be done. It started from the practical issue of transformations on algebraic expressions (turn sin(2 x ) into sin( x ) cos( x ), etc.). But it soon became a general idea: compute by doing transforma - tions on symbolic expressions. Was this going to work? I wanted to understand as fundamen - tally as possible what computation really was—and from that I was led to its history in mathematical logic. Much of what I saw in books and papers about mathematical logic I found abstruse and steeped in sometimes horrendous notational complexity. But what were these people really doing? It made it much easier that I had a de fi nite theory, against which I could essentially do reductionist science. That stu ff in Principia
Mathematica ? Those ideas about rewriting systems?
Yup, I could see how to represent them as rules for transforma - tions on symbolic expressions. And so it was that I came to design SMP : “ A Symbolic
Manipulation
Program ”—all based on transformation rules for symbolic expressions. It was easy to represent mathematical relations ( $ x is a pattern variable that would now in the Wolfram
Language be x _ on the le ft - hand side only) : | S t e p h e n W o l f r a m r basic logic: Or, for that matter, predicate logic of the kind Schön fi nkel wanted to capture: And, yes, it could emulate a Turing machine (note the tape-as-transformation-rules represen - tation that appears at the end): C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ut the most important thing I realized is that it really worked to represent basically any - thing in terms of symbolic expressions, and transformation rules on them. Yes, it was quite o ft en useful to think of “applying functions to things” (and SMP had its version of lambda, for example), but it was much more powerful to think about symbolic expressions as just “being there” (“ x doesn’t have to have a value”)—like things in the world—with the language being able to de fi ne how things should transform. In retrospect this all seems awfully like the core idea of combinators, but with one impor - tant exception: that instead of everything being built from “purely structural elements” with names like S and K , there was a whole collection of “primitive objects” that were intended to have direct understandable meanings (like P l u s , T i m e s , etc.). And indeed I saw a large part of my task in language design as being to think about computations one might want to do, and then try to “drill down” to fi nd the “elementary particles”—or primitive objects—from which these computations might be built up. Over time
I’ve come to realize that doing this is less about what one can in principle use to construct computations, and more about making a bridge to the way humans think about t h i n g s . It’s crucial that there’s an underlying structure—symbolic expressions—that can represent anything. But increasingly
I’ve come to realize that what we need from a computa - tional language is to have a way to encapsulate in precise computational form the kinds of things we humans think about—in a way that we humans can understand. And a crucial part of being able to do that is to leverage what has ultimately been at the core of making our whole intellectual development as a species possible: the idea of human language. Human language has given us a way to talk symbolically about the world: to give symbolic names to things, and then to build things up using these. In designing a computational | S t e p h e n W o l f r a m hings, things up using designing computational l a n g u a g e the goal is to leverage this: to use what humans already know and understand, but be able to represent it in a precise computational way that is amenable to actual computa - tion that can be done automatically by computer. It’s probably no coincidence that the tree structure of symbolic expressions that I have found to be such a successful foundation for computational language is a bit like an ideal - ized version of the kind of tree structure (think parse trees or sentence diagramming) that one can view human language as following. There are other ways to set up universal compu - tation, but this is the one that seems to fi t most directly with our way of thinking about t h i n g s . And, yes, in the end all those symbolic expressions could be constructed like combinators from objects—like S and K —with no direct human meaning. But that would be like having a world without nouns—a world where there’s no name for anything—and the representation of everything has to be built from scratch. But the crucial idea that’s central to human language—and now to computational language—is to be able to have layers of abstraction, where one can name things and then refer to them just by name without having to think about how they’re built up “inside”. In some sense one can see the goal of people like Frege—and
Schön fi nkel—as being to “reduce out” what exists in mathematics (or the world) and turn it into something like “pure logic”. And the structural part of that is exactly what makes computational language possi - ble. But in my conception of computational language the whole idea is to have content that relates to the world and the way we humans think about it. And over the decades
I’ve continually been amazed at just how strong and successful the idea of representing things in terms of symbolic expressions and transformations on them is. Underneath everything that’s going on in the Wolfram
Language—and in all the many systems that now use it—it’s all ultimately just symbolic expressions being transformed according to particular rules, and reaching fi xed points that represent results of computa - tions, just like in those examples in Schön fi nkel’s original paper. One important feature of Schön fi nkel’s setup is the idea that one doesn’t just have “fun - ctions” like f [ x ], or even just nested functions, like f [ g [ x ]]. Instead one can have constructs where instead of the “name of a function” (like f ) one can have a whole complex symbolic structure. And while this was certainly possible in SMP, not too much was built around it.
But when I came to start designing what’s now the Wolfram
Language in I made sure that the “head” (as I called it) of an expression could itself be an arbitrary expression. And when
Mathematica was fi rst launched in I was charmed to see more than one person from mathematical logic immediately think of implementing combinators. Make the de fi nitions: s [ x _ ] [ y _ ] [ z _ ] : = x [ z ] [ y [ z ] ] C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | k[x_][y_] := x hen combinators “just work” (at least if they reach a fi xed point): s [ s [ k [ s ] ] [ s [ k [ k ] ] [ s [ k [ s ] ] [ k ] ] ] ] [ s [ k [ s [ s [ k ] [ k ] ] ] ] [ k ] ] [ a ] [ b ] [ c ] a [ b [ a ] [ c ] ] But what about the idea of “composite symbolic heads”? Already in SMP
I’d used them to do simple things like represent derivatives (and in Wolfram
Language f ' [ x ] is D e r i v a t i v e [ ] [ f ] [ x ] ) . But something that’s been interesting to me to see is that as the decades have gone by, more and more gets done with “ composite heads ”. Sometimes one thinks of them as some kind of nesting of operations, or nesting of modi fi ers to a symbolic object. But increasingly they end up being a way to represent “higher-order constructs”—in e ff ect things that produce things that produce things etc. that eventually give a concrete object one wants. I don’t think most of us humans are particularly good at following this kind of chain of abstraction, at least without some kind of “guide rails”. And it’s been interesting for me to see over the years how we’ve been able to progressively build up guide rails for longer and longer chains of abstraction. First there were things like F u n c t i o n , A p p l y , M a p . Then N e s t , F o l d , F i x e d P o i n t , M a p T h r e a d . But only quite recently N e s t G r a p h , F o l d P a i r , S u b s e t M a p , etc. Even from the beginning there were direct “head manipulation” functions like O p e r a t e and T h r o u g h . But unlike more “array-like” operations for list manipulation they’ve been slow to catch on. In a sense combinators are an ultimate story of “symbolic head manipulation”: everything can get applied to everything before it’s applied to anything. And, yes, it’s very hard to keep track of what’s going on—which is why “named guide rails” are so important, and also why they’re challenging to devise. But it seems as if, as we progressively evolve our understand - ing, we’re slowly able to get a little further, in e ff ect building towards the kind of structure and power that combinators—in their very non-human-relatable way— fi rst showed us was possible a century ago. Combinator s in t h e C omputationa l U niverse Combinators were invented for a de fi nite purpose: to provide building blocks, as Schön - fi nkel put it, for logic. It was the same kind of thing with other models of what we now know of as computation. All of them were “constructed for a purpose”. But in the end computa tion—and programs—are abstract things, that can in principle be studied without reference to any particular purpose. One might have some particular reason to be looking at how fast programs of some kind can run, or what can be proved about them. But what about the analog of pure natural science: of studying what programs just “naturally do”? At the beginning of the I got very interested in what one can think of as the “natural science of programs”. My interest originally arose out of a question about ordinary natural | S t e p h e n W o l f r a m rograms”. My originally question ordinary science. One of the very noticeable features of the natural world is how much in it seems to us highly complex. But where does this complexity really come from?
Through what kind of mechanism does nature produce it? I quickly realized that in trying to address that question, I needed as general a foundation for making models of things as possible. And for that I turned to programs, and began to study just what “programs in the wild” might do. Ever since the time of G a l i l e o and N e w t o n mathematical equations had been the main way that people ultimately imagined making models of nature. And on the face of it—with their real numbers and continuous character—these seemed quite di ff erent from the usual setup for computation, with its discrete elements and discrete choices. But perhaps in part through my own experience in doing mathematics symbolically on computers, I didn’t see a real con fl ict, and I began to think of programs as a kind of generalization of the traditional approach to modeling in science. But what kind of programs might nature use? I decided to just start exploring all the possibili - ties: the whole “computational universe” of programs—starting with the simplest. I came up with a particularly simple setup involving a row of cells with values or updated in parallel based on the values of their neighbors. I soon learned that systems like this had actually been studied under the name “cellular automata” in the (particularly in as poten - tial models of computation, though had fallen out of favor mainly through not having seemed very “human programmable”. My initial assumption was that with simple programs I’d only see simple behavior.
But with my cellular automata it was very easy to do actual computer experiments, and to visualize the results. And though in many cases what I saw was simple behavior, I also saw something very surprising: that in some cases— even though the rules were very simple —the b e h a v i o r that was generated could be immensely complex: It took me years to come to terms with this phenomenon, and it’s gradually informed the way I think about science, computation and many other things. At fi rst I studied it almost exclusively in cellular automata. I made connections to actual systems in nature that cellular automata could model. I tried to understand what existing mathematical and other methods could say about what I’d seen.
And slowly I began to formulate general ideas to explain what was going on—like computational irreducibility and the Principle of Computational E q u i v a l e n c e . C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ut at the beginning of the armed with what would become the Wolfram
Language—I decided I should try to see just how the phenomenon I had found in cellular automata would play it in other kinds of computational systems. And my archives record that on April I started looking at combinators. I seem to have come back to them several times, but in a notebook from July (which, yes, still runs just fi ne), there it is: A randomly chosen combinator made of Schön fi nkel’s S ’s and K ’s starting to show complex behavior. I seem to have a lot of notebooks that start with the simple combinator de fi ni - tions—and then start exploring: There are what seem like they could be pages from a “computational naturalist’s fi eld notebook”: | S t e p h e n W o l f r a m hen there are attempts to visualize combinators in the same kind of way as cellular automata: But the end result was that, yes, like
Turing machines, string substitution systems and all the other systems I explored in the computational universe, combinators did exactly the same kinds of things I’d originally discovered in cellular automata. Combinators weren’t just systems that could be set up to do things. Even “in the wild” they could spontaneously do very interesting and complex things. I included a few pages on what I called “symbolic systems” (essentially lambdas) at the end C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n | ages “symbolic systems” (essentially of my chapter on “ The
World of Simple
Programs ” in A New
Kind of Science (and, yes, reading particularly the notes again now, I realize there are still many more things to explore...): Later in the book I talk speci fi cally about Schön fi nkel’s combinators in connection with the threshold of computation universality. But before showing examples of what they do, I remark: “Originally intended as an idealized way to represent structures of functions de fi ned in logic, combinators were actually fi rst introduced in years before Turing machines.
But although they have been investigated somewhat over the past eighty years, they have for the most part been viewed as rather obscure and irrelevant constructs.” How “irrelevant” should they be seen as being? Of course it depends on what for. As things to explore in the computational universe, cellular automata have the great advantage of allowing immediate visualization. With combinators it’s a challenge to fi nd any way to translate their behavior at all faithfully into something suitable for human perception. And since the
Principle of Computational
Equivalence implies that general computational features won’t depend on the particulars of di ff erent systems, there’s a tendency to feel that even in studying the computational universe, combinators “aren’t worth the trouble”. Still, one thing that’s been prominently on display with cellular automata over the past or so years is the idea that any su ffi ciently simple system will eventually end up being a u s e f u l model for something . Mollusc pigmentation.
Catalysis processes.
Road tra ffi c fl ow. There are simple cellular automaton models for all of these. What about combinators?
Without good visualization it’s harder to say “that looks like combinator behavior”. And even a ft er | S t e p h e n W o l f r a m years they’re still a bit too unfamiliar. But when it comes to capturing some large-scale expression or tree behavior of some system, I won’t be surprised if combinators are a good fi t. When one looks at the computational universe, one of the important ideas is “mining” it not just for programs that can serve as models for things, but also for programs that are some - how useful for some technological purpose. Yes, one can imagine speci fi cally “compiling” some known program to combinators. But the question is whether “naturally occurring combinators” can somehow be identi fi ed as useful for some particular purpose. Could they deliver some new kind of distributed cryptographic protocol? Could they be helpful in mapping out distributed computing systems? Could they serve as a base for setting up molecular-scale computation, say with tree-like molecules? I don’t know. But it will be interesting to fi nd out. And as combinators enter their second century they provide a unique kind of “computational raw material” to mine from the computational universe. Combinator s A l l t h e W a y D own? What is the universe fundamentally made of? For a long time the assumption was that it must be described by something fundamentally mathematical. And indeed right around the time combinators were being invented the two great theories of general relativity and quantum mechanics were just developing. And in fact it seemed as if both physics and mathematics were going so well that people like David
Hilbert imagined that perhaps both might be completely solved—and that there might be a mathematics-like axiomatic basis for physics that could be “mechanically explored” as he imagined mathematics could be. But it didn’t work out that way. Gödel’s theorem appeared to shatter the idea of a “complete mechanical exploration” of mathematics. And while there was immense technical progress in working out the consequences of general relativity and quantum mechanics little was discovered about what might lie underneath. Computers (including things like M a t h e m a t i c a ) were certainly useful in exploring the existing theories of physics. But physics didn’t show any particular signs of being “fundamentally computational”, and indeed the existing theories seemed structurally not terribly compatible with computational processes. But as I explored the computational universe and saw just what rich and complex behavior could arise even from very simple rules, I began to wonder whether maybe , far below the level of existing physics, the universe might be fundamentally computational. I began to make speci fi c models in which space and time were formed from an evolving network of discrete points. And I realized that some of the ideas that had arisen in the study of things like combinators and lambda calculus from the and might have direct relevance. Like combinators (or lambda calculus) my models had the feature that they allowed many possible paths of evolution. And like combinators (or lambda calculus) at least some of my models had the remarkable feature that in some sense it didn’t matter what path one took; C o m b i n a t o r s (cid:1) a n d (cid:1) t h e (cid:1) S t o r y (cid:1) o f (cid:1) C o m p u t a t i o n (cid:1) | (cid:1) he fi nal result would always be the same. For combinators this “Church–Rosser” or “con fl uence” feature was what allowed one to have a de fi nite fi xed point that could be considered the result of a computation. In my models of the universe that doesn’t just stop—things are a bit more subtle—but the generalization to what I call causal invariance is precisely what leads to relativistic invariance and the validity of general relativity. For many years my work on fundamental physics languished—a victim of other priorities and the uphill e ff ort of introducing new paradigms into a well-established fi eld. But just over a year ago— with help from two very talented young physicists —I started again, with unex - pectedly spectacular results . I had never been quite satis fi ed with my idea of everything in the universe being repre - sented as a particular kind of giant graph. But now I imagined that perhaps it was more like a giant symbolic expression, or, speci fi cally, like an expression consisting of a huge collec - tion of relations between elements—in e ff ect, a certain kind of giant hypergraph. It was, in a way, a very combinator-like concept. At a technical level, it’s not the same as a general combinator expression : it’s basically just a single layer, not a tree. And in fact that’s what seems to allow the physical universe to consist of something that approximates uniform (manifold-like) space, rather than showing some kind of hierarchical tree-like structure everywhere. But when it comes to the progression of the universe through time, it’s basically just like the transformation of combinator expressions. And what’s become clear is that the existence of di ff erent paths—and their ultimate equivalences—is exactly what’s responsible not only for the phenomena of relativity, but also for quantum mechanics. And what’s remarkable is that many of the concepts that were fi rst discovered in the context of combinators and lambda calculus now directly inform the theory of physics. Normal forms (basically fi xed points) are related to black holes where “time stops” . Critical pair lemmas are related to measurement in quantum mechanics. And so on. In practical computing, and in the creation of computational language, it was the addition of “meaningful names” to the raw structure of combinators that turned them into the powerful symbolic expressions we use. But in understanding the “data structure of the universe” we’re in a sense going back to something much more like “raw combinators”. Because now all those “atoms of space” that make up the universe don’t have meaningful names; they’re more like S ’s and K ’s in a giant combinator expression, distinct but yet all the same. In the traditional, mathematical view of physics, there was always some sense that by “appropriately clever mathematics” it would be possible to “ fi gure out what will happen” in any physical system. But once one imagines that physics is fundamentally computational, that’s not what one can expect. And just like combinators—with their capability for universal computation—can’t in a sense be “cracked” using mathematics, so also that’ll be true of the universe. And indeed in our | S t e p h e n W o l f r a m sing model that’s what the progress of time is about: it’s the inexorable, irreducible process of computation, associated with the repeated transformation of the symbolic expression that represents the universe. When
Hilbert fi rst imagined that physics could be reduced to mathematics he probably thought that meant that physics could be “solved”. But with
Gödel’s theorem—which is a re fl ection of universal computation—it became clear that mathematics itself couldn’t just be “solved”. But now in e ff ect we have a theory that “reduces physics to mathematics”, and the result of the Gödel’s theorem phenomenon is something very important in our universe: it’s what leads to a meaningful notion of time. Moses
Schön fi nkel imagined that with combinators he was fi nding “building blocks for logic”. And perhaps the very simplicity of what he came up with makes it almost inevitable that it wasn’t just about logic: it was something much more general. Something that can represent computations.
Something that has the germ of how we can represent the “machine code” of the physical universe. It took in a sense “humanizing” combinators to make them useful for things like computa - tional language whose very purpose is to connect with humans. But there are other places where inevitably we’re dealing with something more like large-scale “combinators in the raw”. Physics is one of them. But there are others. In distributed computing. And perhaps in biology, in economics and in other places. There are speci fi c issues of whether one’s dealing with trees (like combinators), or hyper - graphs (like our model of physics), or something else. But what’s important is that many of the ideas—particularly around what we call multiway systems —show up with combinators. And yes, combinators o ft en aren’t the easiest places for us humans to understand the ideas in. But the remarkable fact is that they exist in combinators—and that combinators are now a century old. I’m not sure if there’ll ever be a signi fi cant area where combinators alone will be the domi - nant force. But combinators have—for a century—had the essence of many important ideas. Maybe as such they are at some level destined forever to be footnotes. But in sense they are also seeds or roots—from which remarkable things have grown. And as combinators enter their second century it seems quite certain that there is still much more that will grow from t h e m . Note:
References are given as links in the body of this document. C o m b i n a t o r s a n d t h e S t o r y o f C o m p u t a t i o n ||