[PDF] Fundamentals of Computing

Abstract

These are notes for the course CS-172 I first taught in the Fall 1986 at UC Berkeley and subsequently at Boston University. The goal was to introduce the undergraduates to basic concepts of Theory of Computation and to provoke their interest in further study. Model-dependent effects were systematically ignored. Concrete computational problems were considered only as illustrations of general principles. The notes are skeletal: they do have (terse) proofs, but exercises, references, intuitive comments, examples are missing or inadequate. The notes can be used for designing a course or by students who want to refresh the known material or are bright and have access to an instructor for questions. Each subsection takes about a week of the course.

Full PDF

aa r X i v : . [ c s . CC ] J u l Fundamentals of Computing

Leonid A. Levin:

Abstract

These are notes for the course CS-172 I ﬁrst taught in the Fall 1986 at UC Berkeley and subsequentlyat Boston University. The goal was to introduce the undergraduates to basic concepts of Theory ofComputation and to provoke their interest in further study. Model-dependent eﬀects were systematicallyignored. Concrete computational problems were considered only as illustrations of general principles.The notes are skeletal: they do have (terse) proofs, but exercises, references, intuitive comments andexamples are missing or inadequate. The notes can be used by an instructor designing a course or bystudents who either know the material and want to refresh the memory or are exceptionally bright andhave access to an instructor for questions. Each subsection takes about a week of the course. Versions ofthese notes appeared in [Levin 91].

Acknowledgments.

I am grateful to the University of California at Berkeley, its MacKey Professorship fundand Manuel Blum who made possible for me to teach this course. The opportunity to attend lectures of M. Blumand Richard Karp and many ideas of my colleagues at BU and MIT were very beneﬁcial for my lectures. I am alsograteful to the California Institute of Technology for a semester with light teaching load in a stimulating environmentenabling me to rewrite the students’ notes. NSF grants

Contents

Sections 1,2 study deterministic computations. Non-deterministic aspects of computations (inputs, interac-tion, errors, randomization, etc.) are crucial and challenging in advanced theory and practice. Deﬁning themas an extension of deterministic computations is simple. The latter, however, while simpler conceptually,require elaborate models for deﬁnition. These models may be sophisticated if we need a precise measure-ment of all required resources. However, if we only need to deﬁne what is computable and get a very roughmagnitude of the needed resources, all reasonable models turn out equivalent, even to the simplest ones. Wewill pay signiﬁcant attention to this surprising and important fact. The simplest models are most useful forproving negative results and the strongest ones for positive results.We start with terminology common to all models, gradually making it more speciﬁc to those we actuallystudy. We represent computations as graphs: the edges reﬂect various relations between nodes ( events ).Nodes, edges have attributes: labels, states, colors, parameters, etc. (aﬀecting the computation or its analysis).

Causal edges run from each event to all events essential for its occurrence or attributes. They form a directedacyclic graph (though cycles may be added artiﬁcially to mark the external input parts of the computation).We will study only synchronous computations. Their nodes have a time parameter. It reﬂects logicalsteps, not necessarily a precise value of any physical clock. Causal edges only span short (typically, ≤ parent . Pointer edges connectthe parent of each event to all its other possible causes and reﬂect connections that allow simultaneousevents to interact and have a joint eﬀect. Pointers with the same source have diﬀerent labels. The (labeled)subgraph of events/edges at a given time is an instant memory conﬁguration of the model.Each non-terminal conﬁguration has active nodes/edges around which it may change. The models withonly a small active area at any step of the computation are sequential . Others are called parallel . Complexity.

The following measures of computing resources of a machine A on input x will be used throughout: Time : The greatest depth D A ( x ) of causal chains is the number of computation steps. The volume V A ( x ) is the combined number of active edges during all steps. Time T A ( x ) is used (depending on the context) aseither depth or volume, which are close for sequential models. Note that time complexity is robust only upto a constant factor: a machine can be modiﬁed into a new one with a larger alphabet of labels, representingseveral locations in one. It would produce identical results in a fraction of time and space (provided that thetime limits suﬃce for transforming the input and output into the other alphabet). Space : S A ( x ) or S A ( x ) of a synchronous computation is the greatest (over time) size of its conﬁgurations.Sometimes excluded are nodes/edges unchanged since the input. Growth Rates (typically expressed as functions of bit length n = k x, y k of input/output x / y ): O, Ω: f ( n ) = O ( g ( n )) ⇐⇒ g ( n ) = Ω( f ( n )) ⇐⇒ sup n f ( n ) g ( n ) < ∞ . o, ω : f ( n ) = o ( g ( n )) ⇐⇒ g ( n ) = ω ( f ( n )) ⇐⇒ lim n →∞ f ( n ) g ( n ) = 0.Θ : f ( n ) = Θ( g ( n )) ⇐⇒ ( f ( n ) = O ( g ( n )) and g ( n ) = O ( f ( n ))).Here are a few examples of frequently appearing growth rates: negligible (log n ) O (1) ; moderate n Θ(1) (called polynomial or P, like in P-time); infeasible: 2 n Ω(1) , also n ! = ( n/e ) n p π (2 n +1 /

3) + ε/n , ε ∈ [0 , . The reason for ruling out exponential (and neglecting logarithmic) rates is that the visible Universe istoo small to accommodate exponents. Its radius is about 46.5 giga-light-years ∼ Plank units. A systemof ≫ R . atoms packed in R Plank Units radius collapses rapidly, be it Universe-sized or a neutron star. Sothe number of atoms is < ≪ ≪ This is a customary but somewhat misleading notation. The clear notations would be like f ( n ) ∈ O ( g ( n )) A rougher estimate follows by computing ln n ! = ( n + .

5) ln n − n + O (1) using that | P ni =2 g ( i ) − R n + . . g ( t ) d t | ≤ v , where v is the total variation of g ′ /

8. So v < /

12 for monotone g ′ ( t ) = ln ′ ( t ) = 1 /t and the O (1) is 1 . − ln 1 .

5) + ε , ε ∈ [0 , . eonid A. Levin Fundamentals of Computing Rigid computations have another node parameter: location or cell . Combined with time, it designates theevent uniquely. Locations have structure or proximity edges between them. They (or their short chains)indicate all neighbors of a node to which pointers may be directed. Cellular Automata (CA).

CA are a parallel rigid model. Its sequential restriction is the

Turing Machine (TM) . The conﬁgurationof CA is a (possibly multi-dimensional) grid with a ﬁnite, independent of the grid size, alphabet of states tolabel the events. The states include, among other values, pointers to the grid neighbors. At each step of thecomputation, the state of each cell can change as prescribed by a transition function (also called program)applied to the previous states of the cell and its pointed-to neighbors. The initial state of the cells is theinput for the CA. All subsequent states are determined by the transition function.An example of a possible application of CA is a VLSI (very large scale integration) chip represented asa grid of cells connected by wires (chains of cells) of diﬀerent lengths. The propagation of signals along thewires is simulated by changing the state of the wire cells step by step. The clock interval can be set to thetime the signals propagate through the longest wire. This way the delays aﬀect the simulation implicitly.

An example: the Game of Life (GL).

GL is a plane grid of cells, each holds a 1-bit state (dead/alive)and pointers to the 8 adjacent cells. A cell remains dead or alive if the number i of its live neighbors is 2.It becomes (or stays) alive if i =3. In all other cases it dies (of overcrowding or loneliness) or stays dead.A simulation of a machine M by M is a correspondence between memory conﬁgurations of M and M which is preserved during the computation (may be with some time dilation). Such constructions showthat the computation of M on any input x can be performed by M as well. GL can simulate any CA (seea sketch of an ingenious proof in the last section of [Berlekamp, Conway, Guy 82]) in this formal sense:We ﬁx space and time periods a, b . Cells ( i, j ) of GL are mapped to cell ( ⌊ i/a ⌋ , ⌊ j/a ⌋ ) of CA M (com-pressing a × a blocks). We represent cell states of M by states of a × a blocks of GL . This correspondenceis preserved after any number t steps of M and bt steps of GL regardless of the starting conﬁguration. Turing Machines (TMs).

TM is a minimal CA . Its conﬁguration – tape – is an inﬁnite to the right chain of cells.Each state of a cell has a pointer to one of the cell’s two adjacent neighbors. No adjacent cells can bothpoint away from each other. Only the two cells pointing at each other are active , i.e. can change state.The cell that just turned its pointer is the TM’s moving head working on the tape symbol - its target.The input is an array of non-blank cells (only one is rightward) followed by blanks at the right.Another type of CA represents a TM A with several non-communicating heads. At most O (1) heads ﬁtin a cell. They can vanish, split, or merge only in the ﬁrst cell (which, thus, controls the number of activecells). The input x makes an unchangeable “ink” part of each cell’s state. The rest of the cell’s state is in“pencil” and can be changed by A . The computation halts when all heads drop oﬀ. The output A ( x ) isthe pencil part of the tape’s state. This model has convenient theoretical features. E.g. with linear (in T )number ( k p k T ) of state changes (volume) one can solve the Bounded Halting Problem H ( p, x, T ): ﬁndout whether the machine with a program p stops on an input x within volume T of computation (see 2.3). Problem:

Find a method to transform any given multi-head TM A into another one B such that the valueof the output of B ( x ) (as a binary integer) and the volumes of computation of A ( x ) and of B ( x ) are allequal within a constant factor (for all inputs x ). Hint: B -cells may have a ﬁeld to simulate A and maintain(in other ﬁelds) two binary counters h (with Θ(1) density of heads) for the number of heads of A and v for A ’s volume. Their least signiﬁcant digits are at the leftmost cell. h adds its most signiﬁcant digit to thesame position in v at each step of A . To move the carry 1 on v a head is borrowed from h . These 1-headsmove right in v till they empty their carry into its 0 digit. Then empty 0-heads move back to h in a separateﬁeld/track, possibly ﬁrst continuing right to ﬁnd a free slot in this return track. (The heads area in v extendsto k -th cell only by dropping the carry there, with frequency O (2 − k ). Then it shrinks to O (1) in O ( k ) stepssince heads enter it slower than they move away.) Borrowed or returned heads make low or high head-densityareas in h which shift left until absorbed at the leftmost cell. Fundamentals of Computing Leonid A. Levin

The memory conﬁguration of a

Pointer Machine (PM) , called pointer graph , is a ﬁnite directed labeledmultigraph. One node R is marked as root and has directed paths to all nodes. Nodes can see and changethe conﬁguration of their out-neighborhood of constant (2 suﬃces) depth. Edges ( pointers ) are labeled with colors from a ﬁnite alphabet common to all graphs handled by a given program. The pointers coming outof a node must have diﬀerent colors (which bounds the outdegree). Some colors are designated as working and not used in inputs/outputs. One of them is called active , as also are pointers carrying it and nodesseeing them. Active pointers must have inverses, form a tree to the root, and can be dropped only in leaves.All active nodes each step execute an identical program . At its ﬁrst pulling stage, node A acquirescopies of all pointers of its children using “composite” colors: e.g., for a two-pointer path ( A, B, C ) colored x, y , the new pointer (

A, C ) is colored xy , or an existing z -colored pointer ( A, C ) is recolored { z, xy } . A alsospawns a new node with pointers to and from it. Next, A transforms the colors of its set of pointers, dropsthe pointers left with composite colors, and vanishes if no pointers are left. Nodes with no path from the rootare forever invisible and considered dropped. The computation is initiated by inserting an active loop-edgeinto the root. When no active pointers remain, the graph, with all working colors dropped, is the output. Problem:

Design a PM transforming the input graph into the same one with two extra pointers fromeach node: to its parent in a BFS spanning tree and to the root. Hint: Nodes with no path to the root cannever be activated. but can be copied with pointers, copies connected to the root, the original input removed.PM can be parallel, PPM [Barzdin’, Kalnin’s 74] or sequential, SPM . SPM diﬀer in that only pointersto the root, their sources, and nodes that have pointers with inverses to these sources can be active.A Kolmogorov or Kolmogorov-Uspenskii

Machine (KM) [Kolmogorov, Uspenskii 58], is a specialcase of Pointer Machine [Schoenhage 80] with the restriction that all pointers have inverses. This implies thebounded in/out-degree of the graph which we further assume to be constant.

Fixed Connection

Machine (FCM) is a variant of the PKM with the restriction that pointers oncecreated cannot be removed, only re-colored. So when the memory limits are reached, the pointer structurefreezes, and the computation can be continued only by changing the colors of the pointers.PPM is the most powerful model we consider: it can simulate the others in the same space/time. E.g.,cellular automata make a simple special case of a PPM which restricts the Pointer Graph to be a grid.

Problem.

Design a machine of each model (TM, CA, KM, PPM) which determines if an input string x has a form ww , w ∈ { a, b } ∗ . Analyze time (depth) and space. KM/PPM takes input x in the form of colorsof edges in a chain of nodes, with root linked to both ends. The PPM nodes also have pointers to the root.Below are hints for TM,SPM,CA. The space is O ( k x k ) in all three cases. Turing and Pointer Machines.

TM ﬁrst ﬁnds the middle of ww by capitalizing the letters at both endsone by one. Then it compares letter by letter the two halves, lowering their case. The complexity is: T ( x ) = O ( k x k ). SPM acts similarly, except that the root keeps and updates the pointers to the borders betweenthe upper and lower case substrings. This allows constant time access to these borders. So, T ( x )= O ( k x k ). Cellular Automata.

The computation starts with the leftmost cell sending right two signals. Reachingthe end the ﬁrst signal turns back. The second signal propagates three times slower, so they meet in themiddle of ww and disappear. While alive, the second signal copies the input ﬁeld i of each cell into a specialﬁeld c . The c symbols will try to move right whenever the next cell’s c ﬁeld is blank. So the chain of thesesymbols alternating with blanks will start moving right from the middle of ww . Upon reaching the end theywill push the blanks out and pack themselves back into a copy of the left half of ww shifted right. When a c symbol does not have a blank at the right to move to, it compares itself with the i ﬁeld of the same cell. Ifthey diﬀer, a signal is generated which halts all activity and rejects x . If all comparisons are successful, thelast c generates the accepting signal. The depth is: T ( x ) = O ( k x k ). eonid A. Levin Fundamentals of Computing We have considered several models of computation. We will see now how the simplest of them – TuringMachine – can simulate all others: these powerful machines can compute no more functions than TM.

Church-Turing Thesis is a generalization of this conclusion: TMs can compute every function computablein any thinkable physical model of computation. This is not a math theorem because the notion of modelis not formally speciﬁed. But the long history of studying ways to design real and ideal computing devicesmakes it very convincing. Moreover, this Thesis has a stronger

Polynomial Time version which boundsthe volume of computation required by that TM simulation by a polynomial of the volume used by the othermodels. Both forms of the Thesis play a signiﬁcant role in foundations of Computer Science.

PKM Simulation of PPM.

For convenience, we assume all PPM nodes have pointers to root. PPMconﬁguration is represented in PKM with extra colors l, r, u used in a u -colored binary tree added to eachnode X so that all (unlimited in number) PPM pointers to X are reconnected to its leaves, and inverses,colored l, r , added to all pointers. The number of pointers increases at most 4 times. To simulate PPM, X gets a binary name formed by the l, r colors on its path through the root tree, and broadcasts it downits own tree. For pulling stage X extends its tree to double depth and merges (with combined colors) itsown pointers to nodes with identical names. Then X re-colors its pointers as PPM program requires andrebalances its tree. This simulation of a PPM step takes polylogarithmic parallel time. TM Simulation of PKM.

We assume the PKM keeps a constant degree and a roughly balanced roottree (to yield short node names as described above). TM tape reﬂects its conﬁguration as the list of allpointers sorted by source name, then by color. The TM’s transition table reﬂects the PKM program. Tosimulate PKM’s pulling stage TM creates a copy of each pointer and sorts copies by their sinks. Now eachpointer, located at source, has its copy near its sink. So both components of 2-pointer paths are nearby: thespecial double-colored pointers can be created and moved to their sources by resorting on the source names.The re-coloring stage is straightforward, as all relevant pointers having the same source are located together.Once the root has no active pointers, the Turing machine stops and its tape represents the PKM output. If aPPM computes a function f ( x ) in t ( x ) steps, using s ( x ) nodes, the simulating TM uses space S = O ( s log s ),( O (log s ) bits for each of O ( s ) pointers) and time T = O ( S ) t , as TM sorting takes quadratic time. Squaring matters !

TM cannot outperform Bubble Sort. Is its quadratic overhead a big deal? In a short time allsilicon gates on your PC run, say, X =10 ∼ . clock cycles combined. Silicon parameters double almost annually.Decades may bring micron-thin things that can sail sunlight in space in clouds of great computing and physical (lightbeam) power. Centuries may turn them into a Dyson Sphere enveloping the solar system. Still, the power of suchan ultimate computer is limited by the number of photons the Sun emits per second: Y ∼ . = X . Giga-years mayturn much of the known universe into a computer, but its might is still limited by its total entropy 2 . = Y . Faster PPM Simulations.

Parallel Bubble-Sort on CA or Merge-Sort on sequential FCM take nearlylinear time. Parallel FCM can do much better [Ofman 65]. It represents and updates pointer graphs as theabove TM. All steps are straightforward to do locally in parallel polylog time except sorting of pointers. Weneed to create a ﬁxed connection sorting network. Sophisticated networks sort arbitrary arrays of n integersin O (log n ) parallel steps. We need only a simpler polylog method. Merge-Sort splits an array of two or moreentries in two halves and sorts each recursively. Batcher-Merge combines two sorted lists in O (log n ) steps. Batcher Merge. A bitonic cycle is the combination of two sorted arrays (one may be shorter),connected by max-to-max and min-to-min entries. Entries in a contiguous half ( high-half ) of the cycleare ≥ than all entries in the other ( low ) half. Each half (with its ends connected) forms a bitonic cycle itself.A ﬂip of an array entry is one with the highest address bit ﬂipped. Its shift has the highest bit of itsaddress cycle-shifted to the end. Linking nodes in a 2 k -nodes array to their ﬂips and shifts forms a ShuﬄeExchange graph. We merge-sort two sorted arrays given as a bitonic cycle on such a graph as follows.Comparing each entry with its ﬂip (half-a-cycle away), and switching if wrongly ordered, ﬁts the high andlow halves into respectively the ﬁrst and last halves of the array by shifting the dislocated segment of each(thus rotating each cycle). This repeats for each half recursively (decrementing k via graph’s shift edges). Fundamentals of Computing Leonid A. Levin

The ﬁrst computers were hardware-programmable. To change the function computed, one had to reconnectthe wires or even build a new computer. John von Neumann suggested using Turing’s Universal Algorithm.The function computed can be then speciﬁed by just giving its description (program) as part of the inputrather than by changing the hardware. This was a radical idea, since in the classical mathematics universalfunctions do not exist (as we will see in Sec. 2.2).Let R be the class of all TM-computable functions: total (deﬁned for all inputs) and partial (which maydiverge). Surprisingly, there is a universal function u in R . For any Turing Machine M that computes f ∈ R in time T and space S , u uses a program m of length c listing the commands and initial head state of M .Then u ( mx ) simulates M ( x ) in time c T and space S + c . It operates in cycles, each simulating one step of M ( x ). After i steps of M ( x ), let s i be the head’s state, l i be the left from it part of the tape; r i be the rest ofthe tape. After i cycles u ( mx ) has the tape conﬁguration t i = l i ms i r i and looks up m to ﬁnd the commandcorresponding to the state s i and the ﬁrst symbol of r i . It modiﬁes t i accordingly. When M ( x ) halts, u ( mx )erases the (penciled) ms i from the tape and halts too. Universal Multi-head TM works similarly but canalso determine in time O ( t ( x )) whether it halts in t steps (given x, t ( x ) and an appropriate program). Problem.

Design a universal multi-head TM with a constant factor overhead.Hint: When heads split or merge in the ﬁrst cell, the room u needs for their programscreates sparse or dense content regions that propagate right (sparse faster).We now describe in detail a simpler but slower universal [Ikeno 58] TM U . It simulates any other TM M that uses only 0 , M lacks the blanksymbol that usually marks the end of input. Thus input needs to be given in some preﬁxless form, e.g. witha padding that encodes input length l = k x k as a binary string preceded by a string of 2 k l k zeros. In the cellscarrying this padding, two counters are initiated that monitor the distance to both ends of the used partof M ’s tape (initially the input). M ’s head moving on the tape pulls these counters along and keeps themupdated. When the right end of the used tape is reached, any subsequent characters are treated as blanks. U has 11 states + 6 symbols; its transition table is at the right.It shows the states and tape digits only when changed, exceptthat the prime is always shown. The head is on the tape: lowercase states look left, upper case – right. The external choice, halt,etc. commands are special states for M ; for U they are shownas A/B or =. U works in cycles, simulating one transition of M each. The tape is inﬁnite to the right (the leftward head in theleftmost cell halts). It consist of 0 / M ’s tape. The other segmentsdescribe one transition each: a command ( s, b ) → ( s ′ , b ′ , d ) for M to change state s to s ′ , tape bit b to b ′ and turn left or right. 1 0 * 1’ 0’ *’A f f e0B F F e1f,F b* a* F cD d’ – e’E = – ’ ’ ’ e’d ’ ’ ’ ’ Db ’ ’ ’ a’ Da ’ ’ ’ b’ F E’c ’ ’ = F E’e ’ ’ ’ B A A/BThe transition segments are sorted in order of ( s, b ) and never change, except for priming. Each transitionis represented as ∗ Sdb , where b is the bit to write, d the direction R =0 /L =1 to turn. S points to the nextstate represented as 1 k , if it is k segments to the left, or 0 k (if to the right). Each cycle starts with U ’s headin state F or f , located at the site of M ’s head. Primed are the digits of S in the prior command and alldigits to their left. An example of the conﬁguration: ∗ ′ ′ ′ ′ ′ ′ ∗ ′ ′ ′ ′ ′ ∗ ∗ . . . ∗

00 head 00 . . .U ﬁrst reads the bit of an M ’s cell changing the state from F or f to a/b , puts a * there, moves left tothe primed state segment S , ﬁnds from it the command segment and moves there. It does this by repeatedlypriming nearest unprimed * and 1s of S (or unpriming 0s) while alternating the states c/F or d/D . When S is exhausted, the target segment k S k + b stars away is reached. Then U reads (changing state from e to A or B ) the rightmost symbol b ′ of the command, copies it at the * in the M area, goes back, reads the nextsymbol d , returns to the just overwritten (and ﬁrst unprimed) cell of M area and turns left or right. As CA, M and U have in each cell three standard bits: present and previous pointer directions and a “content” bitto store M’s symbol. In addition U needs just one “trit” of its own! See its simulator at eonid A. Levin Fundamentals of Computing Universal and Complete Functions.

Notations: Let us choose a special mark and after its k -th occurrence, break any string x into Preﬁx k ( x ) andSuﬃx k ( x ). Let f + ( x ) be f (Preﬁx k ( x ) x ) and f − ( x ) be f (Suﬃx k ( x )). We say u k - simulates f iﬀ for some p =Preﬁx k ( p ) and all s , u ( ps ) = f ( s ). The preﬁx can be intuitively viewed as a program which simulatingfunction u applies to the suﬃx (input). We also consider a symmetric variant of relation “ k -simulate” whichmakes some proofs easier. Namely, u k - intersects f iﬀ u ( ps ) = f ( ps ) for some preﬁx k p and all s . E.g.,length preserving functions can intersect but cannot simulate one another.We call universal for a class F , any u which k -simulates all functions in F for a ﬁxed k . When F contains f − , f + for each f ∈ F , universality is equivalent to (or implies, if only f + ∈ F ) completeness : uk -intersects all f ∈ F . Indeed, u k -simulates f iﬀ it k -intersects f − ; u k -intersects f if it k -simulates f + . Problem:

Describe explicitly a function, complete for the class of all linear (e.g., 5 x or 23 x ) functions.A negation of a (partial or total) function f is the total predicate ¬ f which yields 1 iﬀ f ( x )=0 andyields 0 otherwise. Obviously, no closed under negation class of functions contains a complete one. So, thereis no universal function in the class of all (computable or not) predicates. This is the well known CantorTheorem that the set of all sets of strings (as well as the sets of all functions, reals etc.) is not countable. Goedel’s Theorem.

There is no complete function among the total computable ones, as this class is closed under negation.So the universal in R function u (and u = ( u mod 2)) has no total computable extensions.Formal proof systems are computable functions A ( P ) which check if P is an acceptable proof and outputthe proven statement. ⊢ s means s = A ( P ) for some P . A is rich iﬀ it allows computable translations s x of statements “ u ( x ) = 0,” provable whenever true, and refutable ( ⊢ ¬ s x ), whenever u ( x ) = 1. A is consistent iﬀ at most one of any such pair s x , ¬ s x is provable, and complete iﬀ at least one of themalways (even when u ( x ) diverges) is. Rich consistent and complete formal systems cannot exist, since theywould provide an obvious total extension u A of u (by exhaustive search for P to prove or refute s x ). Thisis the famous Goedel’s Theorem – one of the shocking surprises of the 20th century science. (Here A is anyextension of the formal Peano Arithmetic; we skip the details of its formalization and proof of richness.) Recursive Functions.

Another byproduct is that the Halting (of u ( x )) Problem would yield a totalextension of u and, thus, is not computable. This is the source of many other uncomputability results.Another source is an elegant Fixed Point

Theorem by S. Kleene: any total computable transformation A of programs (preﬁxes) maps some program into an equivalent one. Indeed, the complete/universal u ( ps )intersects computable u ( A ( p ) s ). This implies, e.g., Rice theorem: the only computable invariant (i.e. thesame on programs computing the same functions) property of programs is constant ( exercise ).Computable (partial and total) functions are also called recursive (due to an alternative deﬁnition).Their ranges (and, equivalently, domains) are called (recursively) enumerable or r.e. sets. An r.e. set withan r.e. complement is called recursive (as is its yes/no characteristic function) or decidable . A function isrecursive iﬀ its graph is r.e. An r.e. graph of a total function is recursive. Each inﬁnite r.e. set is the rangeof an injective total recursive function (“enumerating” it, hence the name r.e.).We can reduce membership problem of a set A to the one of a set B by ﬁnding a recursive function f s.t. x ∈ A ⇐⇒ f ( x ) ∈ B . Then A is called m- (or many-to-1- ) reducible to B . A more complex Turing reduction is given by an algorithm which, starting from input x , interacts with B by generating strings s and receiving answers to s ∈ ? B questions. Eventually it stops and tells if x ∈ A . R.e. sets (like HaltingProblem) to which all r.e. sets can be m-reduced are called r.e.-complete. One can show a set r.e.-complete(and, thus, undecidable) by reducing the Halting Problem to it. So Ju.Matijasevich proved r.e.-completenessof Diophantine Equations Problem: given a multivariate polynomial of degree 4 with integer coeﬃcients, ﬁndif it has integer roots. The above (and related) concepts and facts are broadly used in Theory of Algorithmsand should be learned from any standard text, e.g., [Rogers 67]. A closer look at this proof reveals another famous Goedel theorem: Consistency C of A (expressible in A as divergence ofthe search for contradictions) is itself an example of unprovable ¬ s x . Indeed, u intersects 1 − u A for some preﬁx a . C impliesthat u A extends u and, thus, u ( a ) , u A ( a ) both diverge. So, C ⇒ ¬ s a . This proof can be formalized in Peano Arithmetic, thus ⊢ C ⇒ ⊢ ¬ s a . But ⊢ ¬ s a implies u A ( a ) converges, so ⊢ C contradicts C : Consistency of A is provable in A if and only if false ! Fundamentals of Computing Leonid A. Levin

The t -restriction u t of u aborts and outputs 1 if u ( x ) does not halt within t ( x ) steps, i.e. u t computesthe t -Bounded Halting Problem ( t -BHP) . It remains complete for the closed under negation class offunctions computable in o ( t ( x )) steps. ( O ( k p k ) overhead is absorbed by o (1) and padding p .) So, u t is not inthe class, i.e. cannot be computed in time o ( t ( x )) [Tseitin 56]. (And neither can be any function agreeing with t -BHP on a dense (i.e. having strings with each preﬁx) subset.) E.g. 2 k x k -BHP requires exponential time.However for some trivial input programs the BHT can obviously be answered by a fast algorithm. Thefollowing theorem provides another function P f ( x ) (which can be made a predicate) for which there is onlya ﬁnite number of such trivial inputs. We state the theorem for the volume of computation of Multi-HeadTuring Machine. It can be reformulated in terms of time of Pointer Machine and space (or, with smalleraccuracy, time) of regular Turing Machine. Deﬁnition:

A function f ( x ) is constructible if it can be computed in volume V ( x ) = O ( f ( x )).Here are two examples: 2 k x k is constructible, as V ( x ) = O ( k x k log k x k ) ≪ k x k .Yet, 2 k x k + h ( x ), where h ( x ) is 0 or 1, depending on whether U ( x ) halts within 3 k x k steps, is not. Compression Theorem [Rabin 59].

For any constructible function f , there exists a function P f suchthat for all functions t , the following two statements are equivalent:1. There exists an algorithm A such that A ( x ) computes P f ( x ) in volume t ( x ) for all inputs x .2. t is constructible and f ( x ) = O ( t ( x )). Proof.

Let t -bounded Kolmogorov Complexity K t ( i | x ) of i given x be the length of the shortest pro-gram p for the Universal Multi-Head Turing Machine transforming x into i with < t volume of computation.Let P f ( x ) be the smallest i , with 2 K t ( i | x ) > log( f ( x ) | t ) for all t . P f is computed in volume f by generatingall i of low complexity, sorting them and taking the ﬁrst missing. It satisﬁes the Theorem, since computing i = P f ( x ) faster would violate the complexity bound deﬁning it. (Some extra eﬀorts can make P boolean.) Speed-up Theorem [Blum 67].

There exists a total computable predicate P such that for any algorithmcomputing P ( x ) in volume t ( x ), there exists another algorithm doing it in volume O (log t ( x )).Though stated here for exponential speed-up, this theorem remains true with log replaced by any computableunbounded monotone function. In other words, there is no even nearly optimal algorithm to compute P . The general case.

So, the complexity of some predicates P cannot be characterized by a single con-structible function f , as in Compression Theorem. However, the Compression Theorem remains true (withharder proof) if the requirement that f is constructible is dropped (replaced with being computable). In this form it is general enough so that every computable predicate (or function) P satisﬁes the statementof the theorem with an appropriate computable function f . There is no contradiction with Blum’s Speed-up,since the complexity f (not constructible itself) cannot be reached. See a review in [Seiferas, Meyer 95]. The proof stands if constructibility of f is weakened to being semi-constructible , i.e. one with an algorithm A ( n, x ) runningin volume O ( n ) and such that A ( n, x )= f ( x ) if n>f ( x ). The sets of programs t whose volumes (where ﬁnite) satisfy either(1) or (2) of the Theorem (for computable P, f ) are in Σ (i.e. deﬁned with 2 quantiﬁers). Both generate monotone classes ofconstructible functions closed under min( t , t ) /

2. Then any such class is shown to be the Ω( f ) for some semi-constructible f . eonid A. Levin Fundamentals of Computing In this section we consider a more interesting provably intractable problem: playing games with full informa-tion, two players and zero sum. We will see that even for some simple games there cannot be a much moreeﬃcient algorithm than exhaustive search through all possible conﬁgurations.The rules of an n -player game G are set by families f, v of information and value functions anda transition rule r . Each player i ∈ I at each step participates in transforming a conﬁguration (gameposition) x ∈ C into the new conﬁguration r ( x, m ) , m : I → M by choosing a move m i = m ( i ) based onlyon his knowledge f i ( x ) of x . The game proceeds until a terminal conﬁgurations t ∈ T ⊂ C is reached. Then v i ( t ) is the loss or gain of the i -th player. Our games will have zero sum P v i ( t ) = 0 and full information: f i ( x ) = x , r ( x, m ) = r ′ ( x, m a ( x ) ), where a ( x ) points to the active player. We consider binary, two-players,no-draw games, taking C ⊂ Z , M ⊂ Z , I = {± } , a ( x ) = sign ( x ), v i ( t ) = a ( t ) i , and | r ( x, m ) | < | x | .An example of such games is chess. Examples of games without full information are card games, whereonly a part f i ( x ) (player’s own hand) of the position x is known. Each player may have a strategy providinga move for each position. A strategy S is winning if it guarantees victory whatever the opponent does, evenif he knows S . We can extend v on T to V on all positions with a winning strategy for one side so that a ( x ) V ( x ) = sup m { a ( x ) V ( r ( x, m )) } . (sup {} taken as − Evaluating or solving a game, means computing V . This ability is close to the ability to ﬁnd a goodmove in a modiﬁed game. Indeed, modify a game G into G ′ by adding a preliminary stage to it.At this stage the player A oﬀers a starting position for G and her opponent B chooses which side to play.Then A may either start playing G or decrement the counter of unused positions and oﬀer another one.Obviously, B wins if he can determine the winning side of every position. If he cannot while A can, A wins.Also, any game can be modiﬁed into one with two moves: M ⊂ { , } by breaking a string move into severalbit-moves. (A position of the new game consists of a position x of the old one and a preﬁx y of a move.The active player keeps adding bits to y until m is complete and the next position generated by r ( x, m ).)Evaluating such games is obviously suﬃcient for choosing the right move. Theorem.

Each position of any full information game has a winning strategy for one side. (This theorem [Neumann, Morgenstern 44] fails for games with partial information: either playermay lose if his strategy is known to the adversary. E.g.: 1. Blackjack (21); 2. Each player picks a bit;their equality determines the winner.) The game can be solved by playing all strategies against each other.There are 2 n positions of length n , (2 n ) n = 2 n × n strategies and 2 n × n +1 pairs of them. For a 5-bit gamethat is 2 . The proof of this Theorem gives a much faster (but still exponential time!) strategy. Proof.

Make the graph of all ≤k x k -bit positions and moves; Set V = 0; reset V = v on T .Repeat until idle: If V ( x ) = 0, set V ( x ) = a ( x ) sup m { a ( x ) V ( r ( x, m )) } .The procedure stops with empty V − (0) since | r ( x, m ) | < | x | in our games keep decreasing.Games may be categorized by the diﬃculty to compute r . We will consider only r computable inlinear space O ( k x k ). Then, the 2 k x k possible moves can be computed in exponential time, say 2 k x k .The algorithm tries each move in each step. Thus, its total running time is 2 k x k +1 : extremely slow(2 for a 13-byte game) but still much faster than the previous (double exponential) algorithm. Problem: the Match Game.

Consider 3 boxes with 3 matches each: ! ! ! ! ! ! ! ! ! .The players alternate turns taking any positive number of matches from a single box. One cannot leave thetable empty. Use the above algorithm to evaluate all positions and list the evaluations after each its cycle.

Problem:

Modify the chess game by giving one side the right to make (if it chooses to) an extramove out of turn during the ﬁrst 10 moves. Prove that this side have a non-loosing strategy.0

Fundamentals of Computing Leonid A. Levin

A simple example of a full information game is

Linear Chess , played on a ﬁnite linear board. Each piecehas a 1-byte type, including loyalty to one of two sides: W (weak) or S (shy), gender

M/F and a 6-bit rank .All cells of the board are ﬁlled and all W’s are always on the left of all S’s. Changes occur only at the active border where W and S meet (and ﬁght). The winner of a ﬁght is determined by the following Gender Rules:1. If S and W are of the same sex, W (being weaker) loses.2. If S and W are of diﬀerent sexes, S gets confused and loses.The party of a winning piece A replaces the loser’s piece B by its own piece C. The choice of C isrestricted by the table of rules listing all allowed triples (ABC). We will see that this game cannot be solvedin a subexponential time. We ﬁrst prove that (see [Chandra, Kozen, Stockmeyer 81]) for an artiﬁcial game.Then we reduce this

Halting Game to Linear Chess showing that any fast algorithm to solve Linear Chess,could be used to solve Halting Game, thus requiring exponential time. For Exp-Time Completeness of regular(but n × n ) Chess, Go, Checkers see: [Fraenkel, Lichtenstein 81, Robson 83, 84]. Exptime Complete Halting Game.

We use a universal Turing Machine u (deﬁned as 1-pointer cellular automata) which halts only by its headrolling oﬀ of the tape’s left end, leaving a blank. Bounded Halting Problem BHP( x ) determines if u ( x ) stops(i.e. the leftmost tape cell points left) within 2 k x k steps. This cannot be done in o (2 k x k ) steps.We now convert u into the Halting Game.The players are: L claiming u ( x ) halts in time (and shouldhave winning strategy iﬀ this is true); His opponent is S . The board has four parts: the diagram, the input x to u , positiveintegers p (position) and t (time in the execution of u ( x )): p t A t +1 x B − B B +1 tp − p p +1The diagram shows the states A of cell p at time t +1 and B s , s ∈{ , ± } of cells p + s , at time t . A, B include the pointer direction; B may be replaced by “?”. Some board conﬁgurations are illegal: if (1) two of B s point away from each other, or (2) A diﬀers from the result prescribed by the transition rules for B s , or(3) t = 1, while ( B s ) = x p + s . (At t = 1, u ( x ) is just starting, so its tape has the input x at the left, the headin the initial state at the end with blanks leading oﬀ to the right.) Here are the Game Rules:

The game starts in the conﬁguration shown below. L moves ﬁrst, replacing the ?s with symbols thatclaim to reﬂect the state of cells p + s at step t of u ( x ). S in its move chooses s , copies B s into A , ﬁlls all B with ?s, adds s to p , and decrements t .Start: p = 0 t = 2 k x k ← input x ? ? ? L puts: a t +1 b − b b +1 t S puts: b s t ? ? ? t − L may lie (i.e ﬁll in “?” distorting the actual computation of u ( x )), as long as he is consistentwith the above “local” rules. All S can do is to check the two consecutive board conﬁgurations. He cannotrefer to past moves or to actual computation of u ( x ) as an evidence of L ’s violation. Strategy: If u ( x ) does indeed halt within 2 k x k steps, then the initial conﬁguration is true to the compu-tation of u ( x ). Then L has an obvious (though hard to compute) winning strategy: just tell truly (and thusalways consistently) what actually happens in the computation. S will lose when t =1 and cannot decreaseany more. If the initial conﬁguration is a lie, S can force L to lie all the way down to t = 1. How?If the upper box a of a legal conﬁguration is false then the lower boxes b s cannot all be true, since therules of u determine a uniquely from them. If S correctly points the false b s and brings it to the top on hismove, then L is forced to keep on lying. At time t =1 the lie is exposed: the conﬁguration doesn’t match theactual input string x , i.e. is illegal.Solving this game amounts to deciding correctness of the initial conﬁguration, i.e. u ( x ) halting in 2 k x k steps: impossible in time o (2 k x k ). This Halting Game is artiﬁcial, still has a BHP ﬂavor, though it does notrefer to exponents. We now reduce it to a nicer game (Linear Chess) to prove it exponentially hard, too. eonid A. Levin Fundamentals of Computing To reduce (see deﬁnition in sec. 2.2) Halting game to Linear Chess we introduce a few concepts.A non-deterministic Turing Machine (NTM) is a TM that sometimes oﬀers a (restricted) transitionchoice, made by a driver , a function (of the TM conﬁguration) of unrestricted complexity. A deterministic(ordinary) TM M accepts a string x if M ( x )=yes; an NTM M does if there exists a driver d s.t. M d ( x )=yes.NTM represent single player games – puzzles – with a simple transition rule, e.g., Rubik’s Cube.One can compute the winning strategy in exponential time by exhaustive search of all d . Home Work:

Prove all such games have P-time winning strategies, or show some have not.Will get you grade A for the course, $1,000,000 Award and a senior faculty rank at a school of your choice.The alternating

TM (ATM) is a variation of the NTM driven by two alternating drivers (players) l, r . A string is accepted if there is l such that for any r : M l,r ( x )= yes. Our games could be viewed asATM returning the result of the game in linear space but possibly exponential time, M prompts l and r alternatingly to choose their moves (in several steps if the move is speciﬁed by several bits) and computesthe resulting position, until a winner emerges. Accepted strings describe winning positions. Linear Chess Simulation of TM-Games.

The simulation ﬁrst represents the Halting Game as an ATMcomputation simulated by the Ikeno TM (2.1) (using the “A/B” command for players’ input). The UTM isviewed as an array of 1-pointer cellular automata: Weak cells as rightward, Shy leftward. Upon termination,the TM head is set to move to the end of the tape, eliminating all loser pieces.This is viewed as a game of , a variant of Linear Chess, where the table, not the “GenderRule,” determine the victorious piece, and not only the vanquished piece is replaced, but also the winningpiece may be “promoted to” another type of the same side. The types are states of Ikeno TM showing Loyalty(pointer direction) d ∈{ W, S } , gender g (= previous d ), and 6/6/6/5 ranks (trit t ∈{ , , ∗} with ′ bit p ).Each 1dC transition is simulated in several Linear Chess stages. Let L,R be the two pieces active in 1dC.In odd stages L (in even stages R) changes gender while turning pointer twice. The last stage turns pointeronly once and possibly changes gender. In the ﬁrst stage L appends its rank with R’s p bit. All other stagesreplace old 1dC rank with the new one. R appends its old t bit (only if t = ∗ ) to its new rank. Subsequentstages drop both old bits, marking L instead if it is the new 1dC head. Up to 4 more stages are used to exitany mismatch with 1dC new d, g bits. Space-Time Trade-oﬀ.

Deterministic linear space computations are games where any position has atmost one (and easily computable) move. We know no general superlinear lower bound or subexponentialupper bound for time required to determine their outcome. This is a big open problem.Recall that on a parallel machine: time is the number of steps untilthe last processor halts; space is the amount of memory used; vol-ume is the combined number of steps of all processors. “

Small ” willrefer to values bounded by a polynomial of the input length; “ large ”to exponential. Let us call computations narrow if either time or space are polynomial, and compact if both (and, thus, volume too)are. An open question:

Do all exponential volume algorithms (e.g.,one solving Linear Chess) allow an equivalent narrow computation? ✲ space ❄ time largetime,smallspace small time, large spacenarrow computations Alternatively, can every narrow computation be converted into a compact one?

This isequivalent to the existence of a P-time algorithm for solving any fast game, i.e. a game with a P-timetransition rule and a move counter limiting the number of moves to a polynomial. The sec. 3.1 algorithmcan be implemented in parallel P-time for such games. Converse also holds, similarly to the Halting Game.[Stockmeyer, Meyer 73] solve compact games in P-space: With M ⊂{ , } run depth-ﬁrst search on thetree of all games – sequences of moves. On exiting each node it is marked as the active player’s win ifsome move leads to a child so marked; else as his opponent’s. Children’s marks are then erased. Conversely,compact games can simulate any P-space algorithms. Player A declares the result of the space- k , time-2 k computation. If he lies, player B asks him to declare the memory state in the middle of that time interval,and so by a k-step binary search catches A’s lie on a mismatch of states at two adjacent times. This hassome ﬂavor of trade-oﬀs such as saving time at the expense of space in dynamic programming.Thus, fast games (i.e. compact alternating computations) correspond to narrow deterministic computa-tions; general games (i.e. narrow alternating computations) correspond to large deterministic ones.2 Fundamentals of Computing Leonid A. Levin

Consider a P-time function F . For convenience, assume k F ( x ) k = k x k , (often k F ( x ) k = k x k Θ(1) suﬃces).Inverting F means ﬁnding, for a given y , at least one x ∈ F − ( y ), i.e. such that F ( x ) = y .We may try all possible x for F ( x ) = y . Assume F runs in linear time on a Pointer Machine. What is thecost of inverting F ? The space used is k x k + k y k +space F ( x ) = O ( k x k ). But time is O ( k x k k x k ): absolutelyinfeasible. No method is currently proven much better in the worst case. And neither could we prove someinversion problems to require super-linear time. This is the sad present state of Computer Science! An Example: Factoring.

Let F ( x , x ) = x x be the product of integers. For simplicity, assume x , x are primes. A fast algorithm in sec. 5.1 determines if an integer is prime. If not, no factor is given, only itsexistence. To invert F means to factor F ( x ). The density of n -bit primes is ≈ / ( n ln 2). So, factoring byexhaustive search takes exponential time! In fact, even the best known algorithms for this ancient problemrun in time about 2 √ k y k , despite centuries of eﬀorts by most brilliant people. The task is now commonlybelieved infeasible and the security of many famous cryptographic schemes depends on this unproven faith. One-Way Functions: F : x → y are those easy to compute ( x y ) and hard to invert ( y x ) formost x . Even their existence is sort of a religious belief in Computer Theory. It is unproven, though manyfunctions seem to be one-way. Some functions, however, are proven to be one-way, IFF one-way functionsEXIST. Many theories and applications are based on this hypothetical existence. Search and NP Problems.

Let us compare the inversion problems with another type – the search problems speciﬁed by computable intime k x k O (1) relations P ( x, w ): given x , ﬁnd w s.t. P ( x, w ). Any inversion problem is a search problem andany search problem can be restated as an inversion problem. E.g., ﬁnding a Hamiltonian cycle C in a graph G , can be stated as inverting a f ( G, C ), which outputs G, . . . C is in fact a Hamiltonian cycle of G .Otherwise, f ( G, C ) = 0 . . . w (called witness ) exist, and (b) aconstructive problem: actually ﬁnd w . A time bound for solving one of these types of problems gives a similarbound for the other. Decision from construction is obvious. Conversely, each relation P can be extended to P ′ (( x, y ) , w ) = P ( x, w )&( w

Turing Machine (sec. 3.3). All three classes of languages– search, inversion and NP – coincide (NP ⇐⇒ search is straightforward).Interestingly, polynomial space bounded deterministic and non-deterministic TMs have equivalent power.It is easy to modify TM to have a unique accepting conﬁguration. Any acceptable string will be acceptedin time 2 s , where s is the space bound. Then we need to check A ( x, w, s, k ): whether the TM can be drivenfrom the conﬁguration x to w in time < k and space s . For this we need for every z , to check A ( x, z, s, k − A ( z, w, s, k − t k ≤ t k − + k z k + O (1). So, t k = O ( sk ) = O ( s ) [Savitch 70].Search problems are games with P-time transition rules and one move duration. A great hierarchy ofproblems results from allowing more moves and/or other complexity bounds for transition rules. eonid A. Levin Fundamentals of Computing We discussed the (equivalent) inversion, search, and NP types of problems. Nobody knows whether all suchproblems are solvable in P-time (i.e. belong to P). This question (called P=?NP) is probably the most famousone in Theoretical Computer Science. All such problems are solvable in exponential time but it is unknownwhether any better algorithms generally exist. For many problems the task of ﬁnding an eﬃcient algorithmmay seem hopeless, while similar or slightly modiﬁed problems have been solved. Examples:1. Linear Programming: Given integer n × m matrix A and vector b , ﬁnd a rational vector x with Ax < b .Note, if n and entries in A have ≤ k -bits and x exists then an O ( nk )-bit x exists, too.Solution: The Dantzig’s Simplex algorithm ﬁnds x quickly for many A .Some A , however, take exponential time. After long frustrating eﬀorts, a worst caseP-time Ellipsoid Algorithm was ﬁnally found in [Yudin and A.S. Nemirovsky 76].2. Primality test: Determine whether a given integer p has a factor?Solution: A bad (exponential time) way is to try all 2 k p k possible integer factors of p .More sophisticated algorithms, however, run fast (see section 5.1).3. Graph Isomorphism Problem: Are two given graphs G , G , isomorphic?I.e., can the vertices of G be re-numbered so that it becomes equal G ?Solution: Checking all n ! enumerations of vertices is impractical(for n = 100, this exceeds the number of atoms in the known Universe).[Luks 80] found an O ( n d ) steps algorithm where d is the degree. This is a P-time for d = O (1).4. Independent Edges (Matching):Find a given number of independent (i.e., not sharing nodes) edges in a given graph.Solution: Max ﬂow algorithm solves a bipartite graph case.The general case is solved with a more sophisticated algorithm by J. Edmonds.Many other problems have been battled for decades or centuries and no P-time solution has been found.Even modiﬁcations of the previous four examples have no known answers:1. Linear Programming: All known solutions produce rational x .No reasonable algorithm is known to ﬁnd integer x .2. Factoring: Given an integer, ﬁnd a factor. Can be done in about exponential time n √ n .Seems very hard: Centuries of quest for fast algorithm were unsuccessful.3. Sub-graph isomorphism: In a more general case of ﬁnding isomorphisms of a graphto a part of another, no P-time solution has been found, even for O (1)-degree graphs.4. Independent Nodes: Find k independent (i.e., not sharing edges) nodes in a given graph.No P-time solution is known.We learned the proofs that Linear Chess and some other games have exponential complexity. None ofthe above or any other search/inversion/NP problem, however, have been proven to require super-P-time.When, therefore, do we stop looking for an eﬃcient solution? NP-Completeness theory is an attempt to answer this question.See results by S.Cook, R.Karp, L.Levin, and others surveyed in [Garey, Johnson 79, Trakhtenbrot 84].A P-time function f reduces one NP-predicate p ( x ) to p ( x ) iﬀ p ( x ) = p ( f ( x )), for all x . p is NP-complete if all NP problems can be reduced to it. Thus, each NP-complete problem is at least as worst-casehard as all other NP problems. This may be a good reason to give up on fast algorithms for it. Any P-timealgorithm for one NP-complete problem would yield one for all other NP (or inversion, or search) problems.No such solution has been discovered yet and this is left as a homework (10 years deadline).Faced with an NP-complete problem we can sometimes restate it, ﬁnd a similar one which is easier(possibly with additional tools) but still gives the information we really want. We will do this in Sec. 5.1 forfactoring. Now we proceed with an example of NP-completeness.4

Fundamentals of Computing Leonid A. Levin

Tiling Problem.

Invert the function which, given a tiled square, outputs its ﬁrst rowand the list of tiles used. A tile is one of the 26 possible squares containing a Latin letterat each corner. Two tiles may be placed next to each other if the letters on the sharedside match. (See an example at the right.) We now reduce to Tiling any search problem:given x , ﬁnd w satisfying a P-time computable property P ( x, w ). a xm r x cr zm rn s r zs z Padding Argument.

First, we need to reduce it to some “standard” NP problem. An obvious candidateis the problem “Is there w : U ( v, w ) ?”, where U is the universal Turing Machine, simulating P ( x, w ) for v = px . But U does not run in P-time, so we must restrict U to u which stops within some P-time limit.How to make this ﬁxed degree limit suﬃcient to simulate any polynomial (even of higher degree) time P ?Let the TM u ( v, w ) for v =00 . . . px simulate k v k steps of U ( px, w )= P ( x, w ). If the padding of 0’s in v is suﬃciently long, u will have enough time to simulate P , even though u runs in quadratic time, while P ’stime limit may be, say, cube (of a shorter “un-padded” string). So the NP problem P ( x, w ) is reduced to u ( v, w ) by mapping instances x into f ( x ) = 0 . . . px = v , with k v k determined by the time limit for P .Notice that program p for P is ﬁxed. So, if some NP problem cannot be solved in P-time then neither canbe the problem ∃ ? w : u ( v, w ). Equivalently, if the latter IS solvable in P-time then so is any search problem.We do not know which of these alternatives is true. It remains to reduce the search problem u to Tiling. The Reduction.

We compute u ( v, w ) (where v = 00 . . . px ) by a TM represented as an array of 1-pointer cellular automata that runs for k v k steps and stops if w does NOT solve the relation P . Otherwiseit enters an inﬁnite loop. An instance x has a solution iﬀ u ( v, w ) runs forever for some w and v = 0 . . . px .Here is the space-time diagram of computation of u ( v, w ). We set n to u ’s time (and space) k v k . Each row in this table represents theconﬁguration of u in the respective moment of time. The solution w is ﬁlled in at the second step below a special symbol ”?”. If a table isﬁlled in wrongly, i.e. doesn’t reﬂect any actual computation, then itmust have four cells sharing a corner that couldn’t possibly appearin the computation of u on any input. ✲ space: n = k v k ❄ time v ?. . . ? v w T ... ... ... T n Proof.

As the input v and the guessed solution w are the same in both the right and the wrong tables,the ﬁrst 2 rows agree. The actual computation starts on the third row. Obviously, in the ﬁrst mismatchingrow a transition of some cell from the previous row is wrong. This is visible from the state in both rows ofthis cell and the cell it points to, resulting in an impossible combination of four cells sharing a corner.For a given x , the existence of w satisfying P ( x, w ) is equivalent to the existence ofa table with the prescribed ﬁrst row, no halting state, and permissible patterns of eachfour adjacent squares (cells). Converting the table into the Tiling Problem :The cells in the table are separated by “—” ; the tiles by “...”; Cut each cell into 4 partsby a vertical and a horizontal lines through its center and copy cell’s content in eachpart. Combine into a tile each four parts sharing a corner of 4 cells. If these cells arepermissible in the table, then so is the respective tile. u vv xSo, any P-time algorithm extending a given ﬁrst row to the whole table of matching tiles from a givenset could be used to solve any NP problem by converting it to Tiling as shown.

Problem:

Find a polynomial time algorithm for n × log n Tiling Problem. eonid A. Levin Fundamentals of Computing The factoring problem seems very hard. But to test if a number has factors turns out to be much easier thanto ﬁnd them. It also helps if we supply the computer with a coin-ﬂipping device. See: [Rabin 80, Miller 76,Solovay, Strassen 77]. We now consider a Monte Carlo algorithm, i.e. one that with high probability rejectsany composite number, but never a prime.

Residue Arithmetic. p | x means p divides x . x ≡ y (mod p ) means p | ( x − y ). y = ( x mod p ) denotes theresidue of x when divided by p , i.e. x ≡ y ∈ [0 , p − , p −

1] via shifting by an appropriate multiple of p . E.g., − x means p − x for residues mod p . We use ± x to mean either x or − x .The Euclidean Algorithm ﬁnds gcd( x, y ) – the greatest (and divisible by any other) common divisor of x and y : gcd( x,

0) = x ; gcd( x, y ) = gcd( y, ( x mod y )), for y >

0. By induction, g = gcd( x, y )= A ∗ x − B ∗ y ,where integers A =( g/x mod y ) and B =( g/y mod x ) are produced as a byproduct of Euclid’s Algorithm. Thisallows division ( mod p ) by any r coprime with p , (i.e. gcd( r, p )=1), and operations + , − , ∗ , / obey all usualarithmetical laws. We will need to compute ( x q mod p ) in polynomial time. We cannot do q> k q k multipli-cations. Instead we compute all numbers x i = ( x i − mod p ) = ( x i mod p ) , i < k q k . Then we represent q inbinary, i.e. as a sum of powers of 2 and multiply mod p the needed x i ’s. Fermat Test.

The Little Fermat Theorem for every prime p and x ∈ [1 , p −

1] says: x ( p − ≡ p ).Indeed, the sequence ( xi mod p ) is a permutation of 1 , . . . , p −

1. So, 1 ≡ ( Q i

For each y and prime p, x ≡ y (mod p ) has at most one pair of solutions ± x . Proof.

Let x, x ′ be two solutions: y ≡ x ≡ x ′ (mod p ). Then x − x ′ = ( x − x ′ )( x + x ′ ) ≡ p ).So, p divides ( x − x ′ )( x + x ′ ) and, if prime, must divide either ( x − x ′ ) or ( x + x ′ ).(Thus either ( x ≡ x ′ ) or ( x ≡ − x ′ ).) Otherwise p is composite, and gcd( p, x + x ′ ) actually gives its factor. Miller-Rabin Test T ( x, p ) completes the Fermat Test: it factors a composite p , given d that kills Z ∗ p (i.e. x d ≡ gcd( x, p ) d (mod p ) for all x ) and a random choice of x . For prime p , d = p − d =2 k q , with odd q . T sets x = ( x q mod p ), x i = ( x i − mod p ) = ( x i q mod p ), i ≤ k . If x k =1 thengcd( x, p ) =1 factors p (if d killed x , else Fermat test rejects p = d +1). If x =1, or one of x i is − T gives upfor this x . Otherwise x i / ∈{± } for some i x =(1+ p/a ) works for Fermat Test: (1+ p/a ) p − =1+( p/a )( p − p/a ) ( p − p − / . . . ≡ − p/a p ), since p | ( p/a ) . Otherwise p = ab, gcd( a, b )=1

6≡ ± p ).Now, T ( y, p ) succeeds with most y i , as it does with x i (or x ′ i ): the function y xy is 1-1 and T cannotfail with both y and xy . This test can be repeated for many randomly chosen y . Each time T fails, we aretwice more sure that p is prime. The probability of 300 failures on a composite p is < − , its inverseexceeds the number of atoms in the known Universe.6 Fundamentals of Computing Leonid A. Levin

Las-Vegas algorithms, unlike Monte-Carlo, never give wrong answers. Unlucky coin-ﬂips just make themrun longer than expected. Quick-Sort is a simple example. It is about as fast as deterministic sorters, but ispopular due to its simplicity. It sorts an array a [1 . . . n ] of n > pivot ,splitting the remaining array in two by comparing with the pivot, and calling itself recursively on each half.For easy reference, rename the array entries with their positions 1 , . . . , n in the sorted output (no eﬀecton the algorithm). Denote t ( i ) the (random) time i is chosen as a pivot. Then i will ever be compared with j iﬀ either t ( i ) or t ( j ) is the smallest among t ( i ) , . . . , t ( j ). This has 2 out of | j − i | + 1 chances. So, the expectednumber of comparisons is P i,j>i / (1+ j − i ) = − n + ( n +1) P nk =1 /k = 2 n (ln n − O (1)). Note, that theexpectation of the sum of variables is the sum of their expectations (not true, say, for product).The above Monte-Carlo and Las-Vegas algorithms require choosing strings at random with uniformdistribution. We mentally picture that as ﬂipping a coin. (Computers use pseudo-random generators rather than coins in hope, rarely supported by proofs, that their outputs have all the statistical propertiesof truly random coin ﬂips needed for the analysis of the algorithm.) Random Inputs to Deterministic Algorithms are analyzed similarly to algorithms that ﬂip coins them-selves and the two should not be confused. Consider an example: Someone is interested in knowing whetheror not certain graphs contain Hamiltonian Cycles. He oﬀers graphs and pays $100 if we show either that thegraph has or that it has not

Hamiltonian Cycles. Hamiltonian Cycle problem is NP-Complete, so it should bevery hard for some , but not necessarily for most graphs. In fact, if our patron chooses the graphs uniformly,a fast algorithm can earn us the $100 most of the time ! Let all graphs have n nodes and, say, d < ln n/ No Hamiltonian Cycles” and collect the $100, if the graph has an isolated node. Otherwise, pass onthat graph and the money. Now, how often do we get our $100. The probability that a given node A of thegraph is isolated is (1 − /n ) dn > (1 − O (1 /n )) / √ n . Thus, the probability that none of n nodes is isolated(and we lose our $100) is O ((1 − / √ n ) n ) = O (1) /e √ n and vanishes fast. Similar calculations can be madewhenever r = lim( d/ ln n ) <

1. If r >

1, other fast algorithms can actually ﬁnd a Hamiltonian Cycle.See: [Johnson 84, Karp 76, Gurevich 85]. See also [Levin Venkatesan 18] for a proof that another graphproblem is NP-complete even on average. How do this HC algorithm and the above primality test diﬀer? • The primality algorithm works for all instances. It tosses the coin itself and can repeat it for a morereliable answer. The HC algorithms only work for most instances (with isolated nodes or generic HC). • In the HC algorithms, we must trust the customer to follow the presumed random procedure.If he cheats and produces rare graphs often, the analysis breaks down.

Symmetry Breaking.

Randomness comes into Computer Science in many other ways besides those weconsidered. Here is a simple example: avoiding conﬂicts for shared resources.

Dining Philosophers.

They sit at a circular table. Between each pair is either a knife or a fork,alternating. The problem is, neighboring diners must share the utensils, cannot eat at the same time. Howcan the philosophers complete the dinner given that all of them must act in the same way without anycentral organizer? Trying to grab the knives and forks at once may turn them into ﬁghting philosophers.Instead they could each ﬂip a coin, and sit still if it comes up heads, otherwise try to grab the utensils.If two diners try to grab the same utensil, neither succeeds. If they repeat this procedure enough times,most likely each philosopher will eventually get both a knife and a fork without interference.We have no time to actually analyze this and many other scenaria, where randomness is crucial.Instead we will take a look into the concept of Randomness itself. eonid A. Levin Fundamentals of Computing Intuitively, a random sequence is one that has the same properties as a sequence of coin ﬂips. But thisdeﬁnition leaves the question, what are these properties? Kolmogorov resolved these problems with a newdeﬁnition of random sequences: those with no description noticeably shorter than their full length. See surveyand history in [Kolmogorov, V.A.Uspenskii 87, Li, Vitanyi 19].

Kolmogorov Complexity K A ( x | y ) of the string x given y is the length of the shortest program p whichlets algorithm A transform y into x : min { ( k p k ) : A ( p, y ) = x } . There exists a Universal Algorithm U suchthat, K U ( x ) ≤ K A ( x ) + O (1), for every algorithm A . This constant O (1) is bounded by the length of theprogram U needs to simulate A . We abbreviate K U ( x | y ) as K ( x | y ), or K ( x ) for empty y .An example: For A : x x , K A ( x ) = k x k , so K ( x ) < K A ( x ) + O (1) < k x k + O (1).Can we compute K ( x ) by trying all programs p, k p k < k x k + O (1) to ﬁnd the shortest one generating x ?This does not work because some programs diverge, and the halting problem is unsolvable.In fact, no algorithm can compute K or even any its lower bounds except O (1).Consider the Berry Paradox expressed in the phrase: “The smallest integer which cannotbe uniquely and clearly deﬁned by an English phrase of less than two hundred characters.”There are < English phrases of <

200 characters. So there must be integers not expressibleby such phrases and the smallest one among them. But isn’t it described by the above phrase?A similar argument proves that K is not computable. Suppose an algorithm L ( x ) = O (1) computes a lowerbound for K ( x ). We can use it to compute f ( n ) that ﬁnds x with n < L ( x ) ≤ K ( x ), but K ( x ) < K f ( x )+ O (1)and K f ( f ( n )) ≤ k n k , so n < K ( f ( n )) < k n k + O (1) = log O ( n ) ≪ n : a contradiction.So, K and its non-constant lower bounds are not computable.An important application of Kolmogorov Complexity measures the Mutual Information: I ( x : y ) = K ( x ) + K ( y ) − K ( x, y ). It has many uses which we cannot consider here. Deﬁciency of Randomness.

Some upper bounds of K ( x ) are close in some important cases. One such case is of x generated at random.Deﬁne its rarity for uniform on { , } n distribution as d ( x ) = n − K ( x | n ) ≥ − O (1).What is the probability of d ( x ) > i , for uniformly random n -bit x ? There are 2 n strings x of length n .If d ( x ) > i , then K ( x | n ) < n − i . There are < n − i programs of such length, generating < n − i strings.So, the probability of such strings is < n − i / n = 2 − i (regardless of n )! Even for n = 1 , , d ( x ) >

300 is absolutely negligible (provided x was indeed generated by fair coin ﬂips).Small rarity implies all other enumerable properties of random strings. Indeed, let such property “ x P ”have a negligible probability and S n be the number of n -bit strings violating P , so s n = log( S n ) is small.To generate x , we need only the algorithm enumerating S n and the s n -bit position of x in that enumeration.Then the rarity d ( x ) > n − ( s n + O (1)) is large. Each x violating P will thus also violate the “small rarity”requirement. In particular, the small rarity implies unpredictability of bits of random strings: A short al-gorithm with high prediction rate would assure large d ( x ). However, the randomness can only be refuted,cannot be conﬁrmed: we saw, K and its lower bounds are not computable. Rectiﬁcation of Distributions.

We rarely have a source of randomness with precisely known distribution.But there are very eﬃcient ways to convert “roghly” random sources into perfect ones. Assume, we have sucha sequence with weird unknown distribution. We only know that its long enough ( m bits) segments havemin-entropy > k + i , i.e. probability < / k + i , given all previous bits. (Without such m we would not knowa segment needed to extract even one not fully predictable bit.) No relation is required between n, m, i, k ,but useful are small m, i, k and huge n = o (2 k /i ). We can fold X into an n × m matrix. We also need a small m × i matrix Z , independent of X and really uniformly random (or random Toeplitz, i.e. with restriction Z a +1 ,b +1 = Z a,b ). Then the n × i product XZ has uniform with accuracy O ( p ni/ k ) distribution. Thisfollows from [Goldreich, Levin 89], which uses earlier ideas of U. and V. Vazirani.8 Fundamentals of Computing Leonid A. Levin

The above deﬁnition of randomness is very robust, if not practical. True random generators are rarely used incomputing. The problem is not that making a true random generator is impossible: we just saw eﬃcient waysto perfect the distributions of biased random sources. The reason lies in many extra beneﬁts provided bypseudorandom generators. E.g., when experimenting with, debugging, or using a program one often needs torepeat the exact same sequence. With a truly random generator, one actually has to record all its outcomes:long and costly. The alternative is to generate pseudo-random strings from a short seed. Such methodswere justiﬁed in [Blum, Micali 84, Yao 82]:First, take any one-way permutation F n ( x ) (see sec. 5.5) with a hard-core bit (see below) B p ( x ) whichis easy to compute from x, p , but infeasible to guess from p, n, F n ( x ) with any noticeable correlation.Then take a random seed of three k -bit parts x , p, n and Repeat: ( S i ← B p ( x i ); x i +1 ← F n ( x i ); i ← i +1).We will see how distinguishing outputs S of this generator from strings of coin ﬂips would imply theability to invert F . This is infeasible if F is one-way. But if P=NP (a famous open problem), no one-way F ,and no pseudorandom generators could exist.By Kolmogorov’s standards, pseudo-random strings are not random: let G be the generator; s be theseed, G ( s ) = S , and k S k ≫ k = k s k . Then K ( S ) ≤ O (1) + k ≪ k S k , thus violating Kolmogorov’s deﬁnition.We can distinguish between truly random and pseudo-random strings by simply trying all short seeds.However this takes time exponential in the seed length. Realistically, pseudo-random strings will be as goodas a truly random ones if they can’t be distinguished in feasible time. Such generators we call perfect . Theorem: [Yao 82] Let G ( s ) = S ∈ { , } n run in time t G . Let a probabilistic algorithm A in expected(over internal coin ﬂips) time t A accept G ( s ) and truly random strings with diﬀerent by d probabilities.Then, for random i , one can use A to guess S i from S i +1 , S i +2 , . . . in time t A + t G with correlation d/O ( n ). Proof.

Let p i be the probability that A accepts S = G ( s ) modiﬁed by replacing its ﬁrst i digitswith truly random bits. Then p is the probability of accepting G ( s ) and must diﬀer by d fromthe probability p n of accepting random string. Then p i − − p i = d/n , for randomly chosen i .Let P and P be the probabilities of accepting r x and r x for x = S i +1 , S i +2 , . . . , and random ( i − r .Then ( P + P ) / p i , while P S i = P +( P − P ) S i averages to p i − and( P − P )( S i − /

2) to p i − − p i = d/n . So, P − P has the stated correlation with S i . If the above generator was not perfect, one could guess S i from the sequence S i +1 , S i +2 , . . . with a polynomial (in 1 / k s k ) correlation. But, S i +1 , S i +2 . . . can be produced from p, n, x i +1 .So, one could guess B p ( x i ) from p, n, F ( x i ) with correlation d/n , which cannot be done for hard-core B . Hard Core.

The key to constructing a pseudorandom generator is ﬁnding a hard core for a one-way F .The following B is hard-core for any one-way F , e.g., for Rabin’s OWF in sec. 5.5.[Knuth 97] has more details and references.Let B p ( x ) = ( x · p ) = ( P i x i p i mod 2). [Goldreich, Levin 89] converts any method g of guessing B p ( x )from p, n, F ( x ) with correlation ε into an algorithm of ﬁnding x , i.e. inverting F (slower ε times than g ). Proof. (Simpliﬁed with some ideas of Charles Rackoﬀ.) Take k = k x k = k y k , j = log(2 k/ε ), v i = 0 i k − i .Let B p ( x ) = ( x · p ) and b ( x, p ) = ( − B p ( x ) . Assume, for y = F n ( x ), g ( y, p, w ) ∈ {± } guesses B p ( x ) withcorrelation P p −k p k b ( x, p ) g p > ε , where g p abbreviates g ( y, p, w ), since w, y are ﬁxed throughout the proof.( − ( x · p ) g p averaged over > k/ε random pairwise independent p deviates from its mean (over all p ) by <ε (and so is >

0) with probability > − / k . The same for ( − ( x · [ p + v i ]) g p + v i = ( − ( x · p ) g p + v i ( − x i .Take a random k × j binary matrix P . The vectors P r , r ∈{ , } j \ { j } are pairwise independent. So, fora fraction ≥ − / k of P , sign( P r ( − xP r g P r + v i ) = ( − x i . We could thus ﬁnd x i for all i with probability > / z = xP . But z is short: we can try all its 2 j possible values and check y = F n ( x ) for each !So the inverter, for a random P and all i, r , computes G i ( r ) = g P r + v i . It uses Fast Fourier on G i tocompute h i ( z ) = P r b ( z, r ) G i ( r ). The sign of h i ( z ) is the i -th bit for the z -th member of output list. eonid A. Levin Fundamentals of Computing Rabin’s One-way Function.

Pick random prime numbers p, q, k p k = k q k with two last bits =1, i.e. withodd ( p − q − /

4. Then n = pq is called a Blum number. Its length should make factoring infeasible.Let Q n = ( Z ∗ n ) be the set of squares, i.e. quadratic residues (all residues are assumed (mod n )). Lemma.

Let n = pq be a Blum number, F : x x ∈ Q n . Then (1) F is a permutation on Q n and (2) The ability to invert F on random x is equivalent to that of factoring n . Proof. (1) t =( p − q − / u =( t +1) / x = F ( z ). Both p − q − t .So, by Fermat’s little theorem, both p , q (and, thus n ) divide x t − ≡ z t −

1. Then F ( x ) u ≡ x u = xx t ≡ x .(2) The above y u inverts F . Conversely, let F ( A ( y )) = y for a fraction ε of y ∈ Q n .Each y ∈ Q n has x, x ′ = ± x with F ( x )= F ( x ′ )= y , both with equal chance to be chosen at random.If F ( x ) generates y while A ( y ) = x ′ the Square Root Test (5.1) has both x, x ′ for factoring n. Such one-way permutations, called “trap-door,” have many applications; we look at cryptography below.Picking random primes is easy: they have density 1 /O ( k p k ). Indeed, one can see that (cid:0) nn (cid:1) is divisible byevery prime p ∈ [ n, n ] but by no prime p ∈ [ n, n ] or prime power p i > n . So, (log (cid:0) nn (cid:1) ) / log n = 2 n/ log n − O (1)is an upper bound on the number of primes in [ n, n ] and a lower bound on that in [1 , n ] (and in [3 n, n ]as a simple calculation shows). And fast VLSI exist to multiply long numbers and check primality. Public Key Encryption.

A perfect way to encrypt a message m is to add it mod 2 bit by bit to a randomstring S of the same length k . The resulting encryption m ⊕ S has the same uniform probability distribution,no matter what m is. So it is useless for the adversary who wants to learn something about m , withoutknowing S . A disadvantage is that the communicating parties must share a secret S as large as all messagesto be exchanged, combined. Public Key

Cryptosystems use two keys. One key is needed to encrypt themessages and may be completely disclosed to the public. The decryption key must still be kept secret, butneed not be sent to the encrypting party. The same keys may be used repeatedly for many messages.Such cryptosystem can be obtained [Blum, Goldwasser 82] by replacing the above random S by pseudo-random S i = ( s i · x ); s i +1 = ( s i mod n ). Here a Blum number n = pq is chosen by the Decryptor and ispublic, but p, q are kept secret. The Encryptor chooses x ∈ Z k n k , s ∈ Z n at random and sends x, s k , m ⊕ S .Assuming factoring is intractable for the adversary, S should be indistinguishable from random strings (evenwith known x, s k ). Then this scheme is as secure as if S were random. The Decryptor knows p, q and cancompute u, t (see above) and v = ( u k − mod t ). So, he can ﬁnd s = ( s vk mod n ), and then S and m .Another use of the intractability of factoring is digital signatures [Rivest, Shamir, Adleman 78, Rabin 79].Strings x can be released as authorizations of y = ( x mod n ). Verifying x , is easy but the ability of forgingit for generic y is equivalent to that of factoring n . Go On!

You noticed that most of our burning questions are still open. Take them on!Start with reading recent results (FOCS/STOC is a good source). See where you can improve them.Start writing, ﬁrst notes just for your friends, then the real papers. Here is a little writing advice:A well written paper has clear components: skeleton, muscles, etc.The skeleton is an acyclic digraph of basic deﬁnitions and statements, with cross-references.The meat consists of proofs (muscles) each separately veriﬁable by competent graduate students having toread no other parts but statements and deﬁnitions cited. Intuitive comments, examples and other comfortitems are fat and skin: a lack or excess will not make the paper pretty. Proper scholarly references constituteclothing, no paper should ever appear in public without! Trains of thought which led to the discovery areblood and guts: keep them hidden. Metaphors for other vital parts, like open problems, I skip out of modesty.

Writing Contributions.

Section 1 was originally prepared by Elena Temin, Yong Gao and Imre Kifor (BU),others by Berkeley students: 2.3 by Mark Sullivan, 3.1 by Eric Herrmann and Elena Eliashberg, 3.2 by Wayne Fentonand Peter Van Roy, 3.3 by Carl Ludewig, Sean Flynn, and Francois Dumas, 4.1 by Jeﬀ Makaiwi, Brian Jones andCarl Ludewig, 4.2 by David Leech and Peter Van Roy, 4.3 by Johnny and Siu-Ling Chan, 5.2 by Deborah Kordon,5.3 by Carl Ludewig, 5.4 by Sean Flynn, Francois Dumas, Eric Herrmann, 5.5 by Brian Jones. Fundamentals of Computing Leonid A. Levin [Levin 91] L.Levin. Fundamentals of Computing: a Cheat-List.

SIGACT News; Education Forum.

Special 100-th issue, 27(3):89-110, 1996. Errata: ibid. 28(2):80. Earlier version: ibid. 22(1), 1991.[Kleinberg, Tardos 06] Jon Kleinberg, Eva Tardos.

Algorithm design.

The Art of Computer Programming.

Vol. 1-3. Addison-Wesley, 3d ed., 1997.New to 3d ed. Sec.3.5.F of v.2 is also on pp. 10, 29-36 of [Feller 68] William Feller.

An Introduction to Probability Theory and Its Applications.

Wiley & Sons, 1968.[Lang 93] S.Lang.

Algebra.

Theory of Recursive Functions and Eﬀective Computability.

McGraw-Hill, 1967.[] References for section 1:[Barzdin’, Kalnin’s 74] Ja.M. Barzdin’, Ja.Ja. Kalnin’s. A Universal Automaton with Variable Structure.

Automatic Control and Computing Sciences.

Winning Ways.

Sec.25. 1982.[Kolmogorov, Uspenskii 58] A.N. Kolmogorov, V.A. Uspenskii. On the Deﬁnition of an Algorithm.

UspekhiMat. Nauk

SIAM J. on Computing

Trans. of the Moscow Math. Soc. , pp.200-215, 1965.[] Section 2:[Blum 67] M. Blum. A machine-independent theory of the complexity of recursive functions.

JACM

14, 1967.[Davis 65] M. Davis, ed.

The Undecidable.

Hewlett, N.Y. Raven Press, 1965.(The reprints of the original papers of K.Goedel, A.Turing, A.Church, E.Post and others).[Ikeno 58] Shinichi Ikeno. A 6-symbol 10-state Universal Turing Machine.

Proceedings, Institute of Electrical Communications,

Tokyo, 1958.[Seiferas, Meyer 95] Joel I. Seiferas, Albert R. Meyer. Characterization of Realizable Space Complexities.

Annals of Pure and Applied Logic

Third Con-vention of Sci.Soc.

Israel, 1959, 1-2. Abst.: Bull. of the Research Council of Israel, 8F:69-70, 1959.[Tseitin 56] G.S. Tseitin. Talk: seminar on math. logic, Moscow university, 11/14, 11/21, 1956.Also pp. 44-45 in: S.A. Yanovskaya, Math. Logic and Foundations of Math.,

Math. in the USSR for 40 Years,

Theory of Games and Economic Behavior .Princeton Univ. Press, 1944.[Stockmeyer, Meyer 73] L.Stockmeyer, A.Meyer. Word problems requiring exponential time.

STOC -1973[Chandra, Kozen, Stockmeyer 81] Ashok K. Chandra, Dexter C. Kozen, Larry J. Stockmeyer. Alternation.

J. ACM , 28(1):114-133, 1981.[Robson 83, 84] J.M. Robson. N by N checkers is EXPTIME-complete.

SIAM J. Comput

Proc. 1983 IFIP World Computer Congress , p. 413-417.[Fraenkel, Lichtenstein 81] A.S. Fraenkel, D. Lichtenstein. Computing a perfect strategy for n × n chessrequires time exponential in n . J. Combin. Theory (Ser. A) 31:199-214. ICALP-1981. eonid A. Levin Fundamentals of Computing

J.Comput. Syst. Sci.

Economica i Mat. Metody

MatEcon

FOCS -1980.[Garey, Johnson 79] M.R.Garey, D.S.Johnson.

Computers and Intractability.

W.H.Freeman & Co. 1979.[Trakhtenbrot 84] B.A.Trakhtenbrot. A survey of Russian approaches to

Perebor (brute-force search) algo-rithms.

Annals of the History of Computing,

J. Number Theory , 12: 128-138, 1980.[Miller 76] G.L.Miller. Riemann’s Hypothesis and tests for Primality.

J. Comp. Sys. Sci.

SIComp

Communicationof the ACM , 29(2):98-109, 1986.[Johnson 84] David S. Johnson. The NP-Completeness Column.

J. of Algorithms

Algorithms andComplexity. (J.F.Traub, ed.) pp. 1-19. Academic Press, NY 1976.[Gurevich 85] Y. Gurevich, Average Case Complexity.

Internat. Symp. on Information Theory, IEEE,

Combinatorics, Probability, and Computing , 27(5):808-828, 2018.https://arxiv.org/abs/cs/0112001[Kolmogorov, V.A.Uspenskii 87] A.N.Kolmogorov, V.A.Uspenskii. Algorithms and Randomness.

TheoriaVeroyatnostey i ee Primeneniya = Theory of Probability and its Applications , 3(32):389-412, 1987.[Li, Vitanyi 19] M. Li, P.M.B. Vitanyi.

Introduction to Kolmogorov Complexity and its Applications.

SpringerVerlag, New York, 2019.[Blum, Micali 84] M.Blum, S.Micali. How to generate Cryptographically Strong Sequences.

SICOMP , 13,1984.[Yao 82] A. C. Yao. Theory and Applications of Trapdoor Functions.

FOCS -1982.[Goldreich, Levin 89] O.Goldreich, L.Levin. A Hard-Core Predicate for all One-Way Functions.

STOC -1989.[Rivest, Shamir, Adleman 78] R.Rivest, A.Shamir, L.Adleman. A Method for Obtaining Digital Signatureand Public-Key Cryptosystems.

Comm. ACM , 21:120-126, 1978.[Blum, Goldwasser 82] M. Blum, S. Goldwasser. An Eﬃcient Probabilistic Encryption Scheme Hiding AllPartial Information.

Crypto -1982.[Rabin 79] M. Rabin.