[PDF] The Hardy space from an engineer's perspective

Abstract

Full PDF

TThe Hardy space from an engineer’s perspective

Nicola Arcozzi Richard RochbergSeptember 25, 2020

Abstract

We give an overview of parts of the theory of Hardy spaces from theviewpoint of signals and systems theory. There are books on this topic,which dates back to Bode, Nyquist, and Wiener, and that eventually led tothe developement of H ∞ optimal control. Our modest goal here is givinga beginner’s dictionary for mathematicians and engineers who know littleof either systems or H spaces. Contents (cid:96) ( Z ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 (cid:96) ( N ) . . . . . . . 134.4 The characterization of inner functions . . . . . . . . . . . . . . . 134.5 Inner/outer factorization . . . . . . . . . . . . . . . . . . . . . . . 15 H and BMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 a r X i v : . [ m a t h . C V ] S e p Beyond the Hardy space; RKHS 24 D dyad is a model for D . . . . . . . . . . . . . . . . . . . . 317.4.2 The results are similar . . . . . . . . . . . . . . . . . . . . 317.4.3 Phase space analysis . . . . . . . . . . . . . . . . . . . . . 32 The theory of Hardy spaces is a nice example of the “unreasonable eﬀectivenessof mathematics” in providing a conceptual and computational framework forthe applied sciences. The theory itself lives comfortably in pure mathematics.It had its inception in Privalov’s study of the boundary behavior of boundedholomorphic functions, some years before Hardy deﬁned the spaces which gounder its name. For many years the Hardy spaces H p and the operators actingon them were studied in great depth, and an elegant and profound theory wasdeveloped.A notable breakthrough was C. Feﬀerman discovery, in 1971, that the dualof the Hardy space H is the space BMO of functions having bounded meanoscillations. This result contained the deﬁnite solution of the problem of charac-terizing the symbols for which the corresponding Hankel operator is bounded on H , developing a line of investigation in which Nehari had been a primary ﬁgure.One of the unxpected features of Feﬀerman result is that BMO had been earlierdeﬁned by Fritz John, and developed by him and Luis Niremberg, in the distantrealm of elasticity theory (“the unreasonable eﬀectiveness of mathematics” ofthe applied sort in providing tools for the pure ones).While the pure mathematicians were developing the theory of the Hardyspaces, engineers found out that they were a very useful tool in signal processing,then in linear control theory. The basic idea is that signals and systems can beextended, in frequency space, to holomorphic functions, whose poles and zerosprovide crucial information. This was the beginning of H control theory. Theuse of frequency methods was pioneered by Bode, Black, and Nyquist at BellLabs in the 1930’s. Soon after, Wiener entered the picture designing optimalﬁltering. Helton, Francis, and many others, developed the contemporary theoryand applications between 1970’s and 1990’s.Our goal here is providing an overview of some rather classical parts of Hardyspace theory, highlighting the interpretation in terms of signals and systems. Wehope this helps the pure mathematician, especially the one who is new to the2opic, to develop an intuition for it. Partial as they are, intuitions are a necessarypart of understanding. On the other side, we aim at convincing the engineereventually reading these notes that there are interesting things in Hardy theoryto be learned, interpreted, used.The frontier between these theories is so vast that we do not even try tomake a list of what we are not covering. For the topics we do cover we willnot give specic references to the literature. We do however include at the enda list of some the many books and surveys in the area, with the hope theywill help the interested reader who wants to learn more. We restrict to signalsin discrete time. The case of continuous time is not much diﬀerent, but fortechnical headaches. We do not even mention the matrix valued case, that is,what we say concers SISO (single input/single output) systems, not SIMO orMIMO ones.The Hardy space theory functions as a model for those studying holomorphicfunction spaces, and often the ﬁrst questions asked when studying a diﬀerentfunction space are “do things work here as in the Hardy space?” In the ﬁnalsection we discuss that question and others for closely related function spaces,including the Dirichlet space. We will work all along with complex valued signals in discrete time, i.e. φ : Z → C , the space of which is denoted by (cid:96) ( Z ). It will be soon clear that the complexﬁeld is best suited for dealing with linear systems, and real valued signals canbe treated, with some care, as a special case. In doing preliminary calculationswe consider signals φ with ﬁnite support, φ ( n ) = 0 for | n | large, and write φ ∈ (cid:96) c ( Z ). A single input/single output system (SISO) is simply a map T : (cid:96) ( Z ) → (cid:96) ( Z ), deﬁned on some subset of allowable signals.Some properties a system is often required to satisfy are the following. • Linearity : T ( aφ + bψ ) = aT ( φ )+ bT ( ψ ), in which cas we write T ( φ ) = T φ ; • Time (or shift) invariance : let τ φ ( n ) = φ ( n −

1) be the forward shift by one unit of time, then T ( τ φ ) = τ ( T ( φ )); • Causality : if φ ( n ) = ψ ( n ) for all n (cid:54) m , then T ( φ )( m ) = T ( φ )( m ); • p - Stability : for a linear system, it can be phrased as (cid:57) T (cid:57) B ( (cid:96) p ) =sup φ (cid:107) T φ (cid:107) (cid:96)p (cid:107) φ (cid:107) (cid:96)p < ∞ , where (cid:107) φ (cid:107) (cid:96) p = (cid:26) sup n | φ ( n ) | if p = ∞ ( (cid:80) n | φ ( n ) | p ) /p if 1 (cid:54) p < ∞ is a measure of the size of the signal, the choices p = 1 , , ∞ being themost important in applications. 3he meaning of time invariance is clear: the system works the same way alltimes; if the input φ is delayed by one time unit, τ φ , then the output T ( φ )is delayed by one unit of time. Causality means that the output T ( φ )( m ) attime m only depends on inputs up to time m , not on future information. Inother words, the time scale for input and output is the same: if we process asignal in its entirety, as it is done for instance when denoising an old musicalrecord, causality is not an issue; but if we denoise a broadcast in real time, thencausality is an obvious requirement.Stability is a requirement of systems (bounds on energy, on size,...), or,often, a law of nature, if the system describes a phenomenon. The assumptionof linearity simpliﬁes the mathematics and is a very good approximation tomany systems of interest. We will not consider the nonlinear theory here.It is an easy and instructive exercise using the deﬁnitions to show that alinear, time invariant system is causal if and only if φ ( n ) = 0 for negative n implies T φ ( n ) = 0 for negative n . We will denote by (cid:96) ( N ) the subspace of those φ in (cid:96) ( Z ) for which φ ( n ) = 0 for negative n and we set (cid:96) c ( N ) = (cid:96) c ( Z ) ∩ (cid:96) ( N ).Causality can then be rephrased as saying that T : (cid:96) ( N ) → (cid:96) ( N ).The characterization of linear, shift invariant systems acting on (cid:96) c ( Z ), ispurely algebraic, as it is that of the subclass of causal ones. We recall that the convolution of φ, ψ : Z → C is φ ∗ ψ : Z → C , φ ∗ ψ ( m ) = (cid:88) n φ ( m − n ) ψ ( n ) = ψ ∗ φ ( m ) , whenever the sum is deﬁned (e.g. if φ or ψ belong to (cid:96) c ( Z )). Theorem 1

Let T be a linear system deﬁned on (cid:96) c ( Z ) . Then, T is shift invari-ant is and only if there is a function k : Z → C such that T φ = k ∗ φ. Moreover k , the unit impulse response , is uniquely determined by k = T δ ,where δ m ( n ) = (cid:26) if n = m if n (cid:54) = m . The system is also causal if and only if k ( n ) = 0 for n < . Let τ m φ ( n ) = φ ( n − m ) = τ ◦ m φ ( n ), m = σ | m | ∈ Z , where f ◦ m = f σ ◦ . . . ◦ f σ , | m | times. In particular, τ m δ n = δ n + m . Then, using time invariance of T in the4hird equality, T φ ( n ) = T (cid:32)(cid:88) m φ ( m ) δ m (cid:33) ( n ) . = (cid:88) m φ ( m ) T ( τ m δ )( n )= (cid:88) m φ ( m ) τ m T ( δ )( n )= (cid:88) m φ ( m ) T ( δ )( n − m )= φ ∗ T ( δ )( n ) . That the system φ (cid:55)→ k ∗ φ is time invariant is easy to check. If T is also causal,then k ( m ) = T δ ( m ) = 0for all m < δ ( m ) = 0 for negative m .In the causal case, the action of T on φ ∈ (cid:96) ( N ) is a ﬁnite sum: k ∗ φ ( m ) = m (cid:88) n =0 k ( m − n ) φ ( n ) . Although the algebraic analysis is straightforward, the analytic details are sub-tle. The problem lies in establishing stability. We consider here the case p = 2,which will take us to the Hardy spaces, but we ﬁrst mention p = ∞ , leading toWiener’s algebra.For a linear system (operator) T : X → Y between two Banach functionspaces X and Y we write (cid:57) T (cid:57) B ( X,Y ) = sup v ∈ X,v (cid:54) =0 (cid:107) T v (cid:107) X (cid:107) v (cid:107) Y , and we shorten B ( X, X ) = B ( X ). Theorem 2

A linear, time invariant system is ∞ -stable if and only if k ∈ (cid:96) ( Z ) , in which case (cid:57) T (cid:57) B ( (cid:96) ∞ ( Z )) = (cid:107) k (cid:107) (cid:96) . The elementary estimate | k ∗ φ ( n ) | (cid:54) (cid:107) k (cid:107) (cid:96) · (cid:107) φ (cid:107) (cid:96) ∞ gives us (cid:57) T (cid:57) ∞ (cid:54) (cid:107) k (cid:107) (cid:96) . In the other direction, set φ ( n ) = k ( − n ) | k ( − n ) | χ ( n : k ( − n ) (cid:54) =0) to have k ∗ φ (0) = (cid:107) k (cid:107) (cid:96) and (cid:107) φ (cid:107) (cid:96) ∞ = 1.We leave it to the reader to show that in the causal case k ∈ (cid:96) ( N ), we couldconsider an extremal sequence φ m ∈ (cid:96) ∞ ( N ) to show thatsup φ ∈ (cid:96) ∞ ( N ) (cid:107) k ∗ φ (cid:107) (cid:96) ∞ ( N ) (cid:107) φ (cid:107) (cid:96) ∞ ( N ) = (cid:107) k (cid:107) (cid:96) ( N ) , ∞ -norm of a causal system can be estimated by considering signalsin positive time.The space (cid:96) ( Z ) with the multiplication given by convolution is a Banachalgebra. Using Fourier series the algebra is isomorphic to the Banach algebraof continuous functions on the circle which have absolutely convergent Fourierseries, now with multiplication given by the pointwise product of functions.Both versions are called the Wiener algebra.The case of 2-stability is richer. Theorem 3

We have (cid:57) T (cid:57) B ( (cid:96) ( Z )) (cid:54) (cid:107) k (cid:107) (cid:96) , with equality if k (cid:62) .In the causal case, we have (cid:107)| T |(cid:107) B ( (cid:96) ( Z )) = sup φ ∈ (cid:96) ( N ) (cid:107) T φ (cid:107) (cid:96) ( N ) (cid:107) φ (cid:107) (cid:96) ( N ) . However there are systems, even stable ones, for which (cid:107) k (cid:107) (cid:96) = ∞ . The estimate (cid:57) T (cid:57) B ( (cid:96) ( Z )) (cid:54) (cid:107) k (cid:107) (cid:96) follows from an easy instance of Hausdorﬀ-Young’s inequality, (cid:107) k ∗ φ (cid:107) (cid:96) p (cid:54) (cid:107) k (cid:107) (cid:96) · (cid:107) φ (cid:107) (cid:96) p , which holds for 1 (cid:54) p (cid:54) ∞ . If T is causal, to have its norm we can just teston φ ∈ (cid:96) ( N ); this will be easily proved using holomorphic functions. Usingholomorphic theory, examples with (cid:57) T (cid:57) B ( (cid:96) ( N )) < ∞ and (cid:107) k (cid:107) (cid:96) = ∞ will nat-urally come to mind. Using that approach we will ﬁnd necessary and suﬃcientconditions on k for T to be stable.A reasonable problem is designing a causal system T , that is as closeas possible to a given non causal system V : V is what we would like todo, while T is what we can do remaining in the causal class. A quantitativeway to state the problem is the following. For given V with (cid:57) V (cid:57) B ( (cid:96) ( Z )) < ∞ ,we want ﬁnd a causal T for which it is achievedmin T causal sup φ ∈ (cid:96) ( N ) (cid:107) V φ − T φ (cid:107) (cid:96) (cid:107) φ (cid:107) (cid:96) . We will see later that the problem has a solution within Nehari’s theory ofHankel operators, which will be sketched below.Another important problem is having the complete library of time-invariant features of signals ; that is, those features which remain unchangedif the signal is anticipated or delayed. One such quality is the frequency spec-trum, which we will more rigorously deﬁne below.Each feature might be identiﬁed with the set

H ⊆ (cid:96) of the functions φ having that feature. The time invariance of the feature can be meant in astrong sense ( bi-invariance ): φ ∈ H ⇔ τ φ ∈ H ,

6r in a weaker sense ([ forward ] invariance ): φ ∈ H ⇒ τ φ ∈ H , in which a signal might acquire a feature it did not possess before. This isespecially meaningful in the causal case, where the only bi-invariant (linear)features are trivial: all or none.As we are dealing with linear theory, we will assume that H is a closed,linear subspace of (cid:96) , and that H (cid:54) = 0 , (cid:96) is not trivial. We will say in this casethat H is a bi-invariant , resp. invariant , subspace of (cid:96) . In this section we review the L Fourier theory on Z , which might be readas Fourier series upside-down. The ﬁrst motivation comes from invariant sub-spaces. Suppose φ (cid:54) = 0 is an eigenfunction of the shift, τ φ = λφ (with, bynecessity, λ (cid:54) = 0). Then, span { φ } is a 1-dimensional bi-invariant subspace,provided that φ ∈ (cid:96) .A little calculation gives φ ( n ) = λ − τ φ ( n ) = λ − φ ( n − λ − τ φ ( n − . . . = λ − n φ (0) , a formula which hold for negative n ’s as well. After normalizing φ (0) = 1, wesee that (i) φ (cid:54)∈ (cid:96) ( Z ), and (ii) φ is bounded if and only if λ = e it for some t ∈ (0 , π ] = T , in which case φ ( n ) = e t ( n ) = e − nit . It is natural to assign tothe signal e t the period π/t (cid:62)

1: a time interval which is a fortiori larger thanthe gap between successive integers; then a frequency ω = t/ π .To each signal φ ∈ (cid:96) assign its Fourier transform ˆ φ ( e it ) = (cid:80) n φ ( n ) e int ,a function in L = L ( T , dθ/ π ) with (cid:107) φ (cid:107) (cid:96) = (cid:107) ˆ φ (cid:107) L . Then, (cid:107) φ (cid:107) (cid:96) = (cid:90) T | ˆ φ ( e it ) | dt π ,φ ( n ) = 12 π (cid:90) T ˆ φ ( e it ) e − int dt, ( φ ∗ ψ ) (cid:98) ( t ) = ˆ φ ( t ) ˆ ψ ( t ) . This is all we need from Fourier theory.

From these relations, it is easy to characterize time invariant operators on (cid:96) ( Z ).7 heorem 4 The time-invariant system

T φ = k ∗ φ is -stable if and only if ˆ k = b ∈ L ∞ ( T ) . Moreover, (cid:57) T (cid:57) B ( (cid:96) ( Z )) = sup (cid:107) bh (cid:107) L ( T ) (cid:107) h (cid:107) L ( T ) . Denote by M b : h (cid:55)→ bh the operator of multiplication times b . Then, (cid:57) T (cid:57) B ( (cid:96) ( Z )) = (cid:57) M b (cid:57) B ( L ( T )) , where the latter refers to the norm as boundedoperator on L ( T ).The proof is easy. First, k = T δ is a priori in (cid:96) , hence b is in L , and (cid:88) n | k ∗ φ ( n ) | = 12 π (cid:90) T | ( k ∗ φ ) (cid:98) ( t ) | dt = 12 π (cid:90) T | b ( t ) ˆ φ ( t ) | dt (cid:54) (cid:107) b (cid:107) L ∞ π (cid:90) T | ˆ φ ( t ) | dt = (cid:107) b (cid:107) L ∞ (cid:107) φ (cid:107) L , hence (cid:57) T (cid:57) B ( (cid:96) ( Z )) (cid:54) (cid:107) b (cid:107) L ∞ ( T ) , and choosing ˆ φ ( t ) supported where | b ( t ) | is closeto its supremum it is easy to show that (cid:57) T (cid:57) B ( (cid:96) ( Z )) (cid:62) (cid:107) b (cid:107) L ∞ ( T ) − (cid:15) for all positive (cid:15) . The function b = ˆ k is the transfer function of the system T φ = k ∗ φ . (cid:96) ( Z ) Similarly simple is the characterization of the bi-invariant subspaces: the in-variant features are the sets of frequencies . First, on the frequency sidewe look for subspaces ˆ H of L ( T ) such that S ˆ H = ˆ H , where Sh ( t ) = e it h ( t ) isthe shift on the frequency side. We still call them “invariant subspaces for theshift”. Theorem 5 M is a closed doubly invariant subspace of L = L ( T ) if and onlyif M = ηL for some η which is the characteristic function of some E ⊂ T . That M is doubly invariant is straightforward.Suppose we have such an M . Let P be the orthogonal projection of L onto M and let η = P (1). Let γ ( t ) = e it . By deﬁnition of the projection 1 − η ⊥ M ,hence 1 − η ⊥ ηγ n for all n ∈ Z .0 = < − η, ηγ n > = 12 π (cid:90) T (¯ η − | η | ) γ n dt, so all the Fourier coeﬃcients of ¯ η − | η | are zero. Hence η is the characteristicfunction of some set. Hence N = ηL is an invariant subspace contained in M .8f λ ∈ M (cid:9) N , then λ is orthogonal to ηL and hence by computing Fouriercoeﬃcients λ ¯ η is identically zero. Also1 − η ⊥ M ⊇ N ⊇ { γ n λ } so, computing Fourier coeﬃcients we ﬁnd (1 − ¯ η ) λ is identically zero. Combiningthese two shows λ is the zero function, hence M = N , and the theorem is proved.Clearly, two sets identify the same subspace if and only if their symmetricdiﬀerence has zero measure. The Booleian structure of the Borel σ -algebra B makes the set of the bi-invariant subspaces a lattice which is isomorphic to B .We state the characterization of the invariant subspaces of L ( T ), and sketchits proof. Theorem 6

The invariant, non-bi-invariant, subspaces of L ( T ) have the form ψH ( D ) , where ψ is measurable and | ψ ( e it ) | = 1 a.e. The function ψ is uniqueup to a multiplicative, unimodular constant. How do we extract ψ from K ? For a given invariant subspace K such that S K ⊂ K , let ψ (cid:54) = 0 be in K (cid:9) S K ⊆ K (cid:9) S n K . Then, (cid:90) T | ψ ( e it ) | e int dt = < e n ψ, ψ > L ( T ) = < S n ψ, ψ > L ( T ) = 0for n (cid:62)

1. Similarly (cid:82) T | ψ ( e it ) | e int dt = 0 for n (cid:54) −

1, and so | ψ | is a constant,which can be normalized to | ψ | = 1.The reader who is familiar with the spectral theorem can view some of theseresults as a special instance of it. The shift is a normal operator , τ ∗ τ = τ − τ = I = τ τ − (this implies, more, that τ is an unitary operator on (cid:96) ( Z )).Its spectrum is σ ( τ ) = T , and the shift can be identiﬁed with the identity map z (cid:55)→ z on T . The measurable calculus for τ identiﬁes each bounded and Borelmeasurable b on T with the operator b ( τ ) on (cid:96) ( Z ); σ ( b ( τ )) = ess-range( b ),and (cid:107) b (cid:107) L ∞ = ||| b ( τ ) ||| , the operator norm of b ( τ ). The bi-invariant subspacesof τ correspond to measurable subsets of the spectrum. For φ ∈ (cid:96) ( N ), deﬁne its Z -transform Zφ to be Zφ ( z ) = ∞ (cid:88) n =0 φ ( n ) z n . D = { z : | z | < } : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) n = M +1 φ ( n ) z n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) N (cid:88) n = M +1 | φ ( n ) | · N (cid:88) n = M +1 | z | n (cid:54) N (cid:88) n = M +1 | φ ( n ) | · | z | M +2 − | z | which tends to zero uniformly for | z | (cid:54) r <

1. In holomorphic control theorythe Z transform is usually deﬁned as Zφ ( z ) = (cid:80) ∞ n =0 φ ( n ) z − n , and the exteriorof the unit disc plays the role which is in these notes played by the unit disc.What we are doing is extending the notion of “frequency” from T to D ∪ T , andthe use of the notation ˆ φ ( z ) = Zφ ( z ) is justiﬁed.The old ˆ φ ( e it ) can be recovered as the L .limit of e it (cid:55)→ ˆ φ ( re it ) as r → π (cid:90) T | ˆ φ ( e it ) − ˆ φ ( re it ) | dt = (cid:80) ∞ n =0 | φ ( n ) | (1 − r n ) → r → Hardy space H ( D ) is the image of (cid:96) ( N ) under the Z -transform. Alter-natively, it can be deﬁned as the space of the functions f which are holomorphicin D , for which (cid:107) f (cid:107) H = sup r< π (cid:90) T | f ( re it ) | dt = lim r< π (cid:90) T | f ( re it ) | dt < ∞ . Or, it can be characterized as the space of those f ∗ ( e it ) in L ( T ), f ∗ ( e it ) = (cid:80) + ∞ n = −∞ φ ( n ) e int , for which φ ( n ) = 0 for all negative n ’s and { φ ( n ) } ∈ (cid:96) ( Z ),that is, { φ ( n ) } ∈ (cid:96) ( N ). The function f ∗ : T → C is the boundary function of f ( z ) = f ( re it ) = (cid:80) + ∞ n =0 φ ( n ) r n e int , which we identify with f , f = f ∗ .On the frequency side we have the points of D , and the value of functionsin H can be computed at those points, and not just a.e. In fact, it can becomputed in a rather quantitative way. f ( z ) = ∞ (cid:88) n =0 a n z n = < ∞ (cid:88) n =0 a n w n , ∞ (cid:88) n =0 ¯ z n w n > H = < f ( w ) , − ¯ zw > H = < f, k z > H , where k ( w, z ) = k z ( w ) = − ¯ zw , k : D × D → C is the reproducing kernel of H .The theory of Hilbert function spaces with a reproducing kernel (RKHS) is old, and it had its inception in work of Bergman and Aronszajn in10he early ’40s. Much of what is written in these notes can be proved, or posedas a problem, for general RKHS’s. We will see instances of that in the ﬁnalsection.

To deal with causal systems, we need H ∞ ( D ), the space of the bounded analyticfunctions on the unit disc. Theorem 7

The causal, time-invariant, linear, -stable systems T are thosehaving the form ( T φ ) (cid:98) ( z ) = b ( z ) ˆ φ ( z ) , with b in H ∞ . Moreover, (cid:57) T (cid:57) B ( H ) := sup (cid:107) bh (cid:107) H ( T ) (cid:107) h (cid:107) H ( T ) = (cid:107) h (cid:107) H ∞ . Using the maximum principle, it is easy to see that if the transfer function b is given by the boundary values of a function in H ∞ , which we continue tocall b ; then sup (cid:107) bh (cid:107) H T ) (cid:107) h (cid:107) H T ) (cid:54) (cid:107) h (cid:107) H ∞ . In the other direction, let M b : H → H be the multiplication operator f (cid:55)→ bf , and let M ∗ b be its adjoint. Then, usingthe reproducing property of k z , M ∗ b k z ( w ) = < M ∗ b k z , k w > = < k z , M b k w > = < M b k w , k z > = < bk w , k z > = b ( z ) k w ( z )= b ( z ) k z ( w ) , i.e. M ∗ b k z = b ( z ) k z : the kernel functions are eigenvectors of the adjoint of themultiplication operator, having the conjugates of values of b as eigenvalues. Thisfact holds for general RKHS and we will encounter it again. We use it now toshow the opposite inequality in the theorem above:sup (cid:107) bh (cid:107) H ( T ) (cid:107) h (cid:107) H ( T ) = (cid:107) M b (cid:107) B ( H ) = (cid:107) M ∗ b (cid:107) B ( H ) (cid:62) sup (cid:107) M ∗ b k z (cid:107) H ( T ) (cid:107) k z (cid:107) H ( T ) = sup | b ( z ) | = (cid:107) b (cid:107) H ∞ . Hidden behind this rather straightforward proof there is a curious fact. Thereare f (cid:15) in H such that( (cid:107) b (cid:107) H ∞ − (cid:15) ) 12 π (cid:90) T | f (cid:15) ( e it ) | dt (cid:54) π (cid:90) T | b ( e it ) f (cid:15) ( e it ) | dt, | f (cid:15) ( e it ) | is rather concentrated on the set where | b ( e it ) | is largest. It is aninteresting exercise showing that the functions f (cid:15) can be chosen among kernelfunctions. (Hint: use the nonintegrability of t (cid:55)→ − e it ).The theorem above applies to causal systems having input φ in (cid:96) ( N ): T b φ ( n ) = n (cid:88) j =0 ˇ b ( n − m ) φ ( m ) , where ˇ b ( n ) is the n th coeﬃcient in the series expansion of b with center at 0.The same conclusion applies to T b having input on the larger space (cid:96) ( Z ).Passing to the frequency side,sup ψ ∈ L ( T ) (cid:107) bψ (cid:107) L ( T ) (cid:107) ψ (cid:107) L ( T ) = sup f ∈ H (cid:107) bf (cid:107) H (cid:107) f (cid:107) H . In fact, as we have proved, both sides have value (cid:107) b (cid:107) H ∞ = (cid:107) b (cid:107) L ∞ ( T ) .We can now give an example of k (cid:54)∈ (cid:96) ( N ) such that φ (cid:55)→ k ∗ φ is boundedon (cid:96) ( N ). If k were summable, then b ( z ) = (cid:80) ∞ n =0 φ ( n ) z n would extend to afunction which is continuous on ¯ D . We only have, then, to ﬁnd a boundedholomorphic b which does not admit a continuous extension to the closed unitdisc. As an example, let b ( z ) = exp (cid:18) − z − z (cid:19) . We will see below (and it can be easily veriﬁed) that b is inner : bounded andwith boundary values of unit modulus a.e . The boundary values are in fact: b ( e it ) = exp (cid:18) e it + 1 e it − (cid:19) = exp( − i cot( t/ t = 0.This theorem was given a far reaching generalization by von Neumann. Theorem 8

Let T be a linear contraction on a Hilbert space H , (cid:107) T x (cid:107) (cid:54) (cid:107) x (cid:107) ,and let p be a complex polynomial. Then, (cid:57) p ( T ) (cid:57) (cid:54) (cid:107) p (cid:107) H ∞ , with equality (for any given polynomial p ) when H = H and T = S is the shift. This result exempliﬁes a general trend, of reducing (when possible) questionsconcerning a large family of abstract operators to the corresponding questionfor a shift-related operator on H , which works as a model for the generaltheory. A nice reading on these topics is the monograph Nagy and Fojas (seereferences).Observe that the equality (cid:57) p ( S ) (cid:57) B ( H ) = (cid:107) p (cid:107) H ∞ holds without restrictionson p ∈ H ∞ . In the general operator theoretic framework this is no longer true.12 .3 The characterization of the invariant spaces for (cid:96) ( N ) A inner function Θ is a nonconstant function in H ∞ such that | Θ( e it ) | = 1a.e. Such functions play a preminent role in Hardy theory. Theorem 9 [Beurling]

The invariant subspaces of H have the form Θ H .The representation is unique up to unimodular constants. Since H ( D ) is closed in L ( T ), Beurling’s Theorem easily follows from thecharacterization of the invariant subspaces for the shift on L ( T ). However, thedirect approach to the problem is of interest.Is is clear that each space having the form Θ H is invariant under multi-plication by z . In the opposite direction, we only mention how to ﬁnd Θ if aninvariant subspace K is given. The key point is showing that M z K (cid:36) K , sowe can pick Θ ∈ K (cid:9) M z K (which will be if necessary normalized). Let n (cid:62) z n divides all f in K . Then, n + 1 is lowest for M z K , so M z K (cid:54) = K .This simple reasoning, based on the mere existence of a “order of zero”for holomorphic functions, rules out the existence of bi-invariant spaces for theshift: there are no bi-invariant linear features for signals in positive time. This issomehow intuitive (the backward shift destroys some of the information carriedby the signal), but it is nonetheless worth mentioning.The operator M Θ , mapping H onto Θ H , is an isometry (but not a unitaryoperator): (cid:107) Θ f (cid:107) H = (cid:107) f (cid:107) H . Since the class of inner functions is the library of “invariant features”, it is inter-esting to have a more concrete characterization for them. There are two mainbuilding blocks we have to consider. The ﬁrst, generated by Blaschke products,are determined by the points at which the functions vanish; the second, thesingular inner factors, are determined by the rate at which the function tendsto zero along various radii.Let a be a point in D . The Blaschke factor φ a ( z ) = | a | a a − z − ¯ az maps D ,respectively, T , onto itself, holomorphically and 1 −

1, hence it it an innerfunction. We normalize it so that φ a ( a ) = 0 and φ a (0) = | a | >

0. Then, the finite Blaschke product B ( z ) = λz m Π nj =1 | a j | a j a j − z − a j z , where n, m are nonnegative integers ( n + m > a , . . . , a n ∈ D (repetitionbeing allowed), and | λ | = 1, is also inner. It is clear that B ( z ) = 0 if and only if z = a , . . . , a n or, if m > z = 0. In applications to engineering, ﬁnite Blaschkeproducts are especially important, for reasons that will be clear in Section 6.See also the lecture notes of Francis in the reference list.We can pass to the limit to infinite Blaschke products .13 heorem 10 Let m be a nonnegative integer and { a j } ∞ j =0 be a sequence in D (repetition being allowed), and | λ | = 1 . Then, B ( z ) = λz m Π ∞ j =1 | a j | a j a j − z − a j z converges to a nonzero holomorphic function in D if and only if the Blaschkecondition holds, ∞ (cid:88) j =1 (1 − | a j | ) < ∞ . Convergence is uniform on compact subsets of D and B ( z ) = 0 if and only if z = a j for some j , or, if m > , if z = 0 . Given a nonconstant, inner function Θ, let { a j } ∞ j =0 be the sequence of itszeros a j (cid:54) = 0 in D (repetition being allowed if the zero has higher order) and let m (cid:62) z ) at z = 0. Then,Θ( z ) = λ B ( z ) S ( z ) , where | λ | = 1, B ( z ) = z m Π ∞ j =1 | a j | a j a j − z − a j z is the Blaschke factor of Θ, normal-ized to have B (0) >

0, and S is a inner function with no zero inside D , the singular inner factor of Θ, S (0) > Caleymap ψ ( z ) = z − z , mapping D one-to-one and onto the right half-plane C + = { x + iy : x > } . For any µ >

0, the function S ,µ ( z ) = e − µψ ( z ) is then an innerfunction, and an ∞ -one mapping D onto D with no zero inside D . It tendsto zero rapidly as z = 1 − ε approaches 1 along the real axis; S ,µ (1 − ε ) ∼ exp( − µ/ε ). We might take products of factors S α,µ ( z ) = S , µ ( e − iα z ) andobtain other such singular inner functions. We might think of taking inﬁniteproducts, or even “continuous products”. It turns out that such products couldwell be “continuous”, but not too much. Theorem 11

The singular factor has the form: S ( z ) = exp (cid:18) − (cid:90) T e − it z − e − it z dµ ( t ) (cid:19) , where µ (cid:62) is a Borel measure on T which is mutually singular with respect toarclength measure. When µ = (cid:80) j µ j δ α j is a ﬁnite, positive linear combination of Dirac delta’s,then S ( z ) = Π j e − µ j ψ ( e − iαj z ) . At this point we can describe the lattice of (singly) invariant subspaces of H .For invariant subspaces generated by Blaschke products the lattice structureis determined by the lattice of zero sets with the operations ∩ and ∪ . For thesubspaces generated by singular functions the lattice is determined by the latticeof positive singular measures with the operations ∧ and ∨ . The full lattice isdescribed by combining these two. 14 .5 Inner/outer factorization The multiplication operator M Θ takes H onto the invariant subspace Θ H .It turns out that all multiplication operators we have seen in the analysis ofcausal systems admit a canonical factorization through an operator of this sort.Actually, it is convenient to look at things in more generality.A function u in H is outer if u ( z ) = exp (cid:18) π (cid:90) T e − it z − e − it z k ( e it ) dt (cid:19) , for some real valued, integrable k on T . The function k can be easily recoveredfrom u : k ( e it ) = log | u ( e it ) | , a.e. We have chosen a normalization for which u (0) > Theorem 12

Let b be in H . Then, there are a unique outer function u andinner function Θ such that b = u Θ . Moreover, (cid:107) b (cid:107) H p = (cid:107) u (cid:107) H p for p = 1 (cid:54) p (cid:54) ∞ . Outer functions u ∈ H ∞ ( D ) can be characterized as those which are invert-ible in the weak sense that uH ( D ) is dense in H ( D ). In fact, more can besaid. Theorem 13

Let f be in H and let [ f ] be the smallest invariant subspace of H containing f . Then, with Θ u as in the inner/outer factorization of f , wehave [ f ] = Θ H . Hence if f is outer then [ f ] = H and in particular 1 ∈ [ f ]. Thus f is invertible in H in the weak sense that there is a sequence { g n } ⊂ H suchthat g n f → H . However 1 /f need not be in H ; for instance f ( z ) = 1 − z is outer (as is most easily seen by computing [1 − z ] ⊥ , i.e. showingthat H ( D ) (cid:9) (1 − z ) H ( D ) = 0). Inner functions are not invertible in H ∞ ;further, if Θ is inner then [Θ] = Θ H (cid:36) H and thus Θ does not even have aninverse in a weak sense we just saw.Thus if b has the inner/outer factorization b = Θ u then we can write theoperator M b as a product of two commuting operators; the isometric map M Θ which imposes”features” on the signal, and M u which is a (roughly) invertibleoperator on the space of functions with speciﬁed features.Another consequence of the inner/outer factorization is the following. Lemma 1

For h ∈ H we have (cid:107) h (cid:107) H = inf {(cid:107) f (cid:107) H (cid:107) g (cid:107) H : h = f g } . The (cid:54) direction is just Cauchy-Schwarz. In the other direction, we canwrite h = u Θ with u outer, then zero free in D : h = ( u / )( u / Θ) = f g , with (cid:107) h (cid:107) H = (cid:107) f (cid:107) H (cid:107) g (cid:107) H . 15 Approximating noncausal systems by causalones: Hankel operators and Nehari theory

Given a function φ ∈ L ∞ ( T ), here identiﬁed with the invariant operator ψ (cid:55)→ M φ ψ = φψ on L ( T ), what is the best approximation of M φ by causal operators M b with b ∈ H ∞ ? Namely, we look forinf b ∈ H ∞ sup f ∈ H (cid:107) φf − bf (cid:107) H (cid:107) f (cid:107) H = inf b ∈ H ∞ (cid:107) φ − b (cid:107) L ∞ = dist( φ, H ∞ ) . Indeed, one would also like to know if a minimizing b exists (yes), if it is unique(sometimes, in many relevant cases), if there is a way to construct it (again, yesin many cases of interest).In the passage from ﬁrst to second member the (cid:54) direction is obvious. Forthe opposite direction, note that the L ∞ norm of φ − b requires testing on L functions, while on the left we only test on H functions. We use the shift invari-ance of the L ( T ) norm. For (cid:15) > ψ ∈ L ( T ) be such that (cid:107) ψ (cid:107) L = 1 and (cid:107) φψ (cid:107) L (cid:62) (cid:107) φ (cid:107) L ∞ − (cid:15) . Find N such that for | z | = 1, ψ N ( z ) = (cid:80) ∞ n = − N ˆ ψ ( n ) z n satisﬁes (cid:107) ψ − ψ N (cid:107) L < (cid:15) . Then, (cid:107) φ (cid:107) L ∞ − (cid:15) (cid:54) (cid:107) φψ (cid:107) L (cid:54) (cid:107) φψ N (cid:107) L + (cid:107) φ ( ψ − ψ N ) (cid:107) L (cid:54) (cid:18) π (cid:90) T | φ ( e it ) ψ N ( e it ) | dt (cid:19) / + (cid:107) φ (cid:107) L ∞ · (cid:15) = (cid:18) π (cid:90) T | φ ( e it ) e iNt ψ N ( e it ) | dt (cid:19) / + (cid:107) φ (cid:107) L ∞ · (cid:15) = (cid:18) π (cid:90) T | φ ( e it ) f ( e it ) | dt (cid:19) / + (cid:107) φ (cid:107) L ∞ · (cid:15) where f ( z ) = z N ψ N ( z ) is holomorphic and 1 (cid:62) (cid:107) f (cid:107) H = (cid:107) ψ N (cid:107) L . Thus, (cid:107) φf (cid:107) L (cid:107) f (cid:107) L (cid:62) (cid:107) φ (cid:107) L ∞ (1 − (cid:15) ) − (cid:15), and the (cid:62) direction in the equality is proved. A shorter proof can be derivedusing Toeplitz operators. The approximation problem just described, ﬁnding b , the optimal H ∞ approx-imation to φ , can be stated in the language of Hankel operators and Nehari’stheorem characterizing the norm of Hankel gives information about b . We beginwith some deﬁnitions. 16he Hankel matrix operator Γ α induced by a complex valued sequence α = { α n } ∞ n =0 is deﬁned on sequences a = { a n } ∞ n =0 (in (cid:96) c ( N ), to start with) by(Γ α a )( m ) = ∞ (cid:88) n =0 α m + n a n , or < Γ α a, b > (cid:96) = (cid:88) m.n (cid:62) α m + n a n b m A famous example of an Hankel matrix is Hilbert’s matrix [( i + j + 1) − ] ∞ i,j =0 .We have already seen how useful it is to pass to the frequency side by the Z -transform. Let P + be the orthogonal projection of L ( T ) onto H and forany g ∈ L ( T ) write g + = P + g and g − = g − g + . Hence g − is the projection of g onto L (cid:9) H and the g − obtained this way are exactly the functions zj for j ∈ H . For φ ∈ L ( T ) we deﬁne the Hankel bilinear form B φ associated to φ , a bilinear map H × H → C and deﬁne the Hankel operator with symbol φ , H φ , to be the linear map of H to L (cid:9) H by B φ ( f, g ) := (cid:104) f g, ¯ z ¯ φ (cid:105) L =: (cid:104) H φ f, zg (cid:105) L . In particular H φ f = ( φf ) − .The relation between Hankel forms and Hankel matrices is the following: B φ ( f, g ) = (cid:104) ∞ (cid:88) n =0 ˆ f ( n ) z n ∞ (cid:88) m =0 ˆ g ( m ) z m , ∞ (cid:88) k = −∞ ˆ φ ( k ) z − k − (cid:105) L = (cid:88) k (cid:54) − ˆ φ ( k ) ∞ (cid:88) m + n = − k − ˆ f ( n )ˆ g ( m )= ∞ (cid:88) m =0 ˆ g ( m ) ∞ (cid:88) n =0 ˆ φ ( − m − n −

1) ˆ f ( n )= (cid:104) Γ α ˆ f , ˆ g (cid:105) (cid:96) , where α ( j ) = ˆ φ ( − j − B φ ] := sup f,g ∈ H | B φ ( f, g ) |(cid:107) f (cid:107) H (cid:107) g (cid:107) H = (cid:107) H φ (cid:107) operator = (cid:57) Γ α (cid:57) B ( (cid:96) ) . If γ is bounded then | B γ ( f, g ) | = |(cid:104) f g, ¯ z ¯ γ (cid:105) L | = (cid:12)(cid:12)(cid:12)(cid:12) π (cid:90) T f ( e it ) g ( e it ) e it γ ( e it ) dt (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) (cid:107) γ (cid:107) L ∞ (cid:107) f (cid:107) H (cid:107) g (cid:107) H , and hence [ B γ ] ≤ (cid:107) γ (cid:107) L ∞ .Also clearly for any b ∈ H B φ = B φ − b . Combiningthese facts we have[ B φ ] (cid:54) inf {(cid:107) φ − h (cid:107) L ∞ : h ∈ H } = dist( φ, H ∞ ) . φ ∈ L let b ∈ H be that function, if there is one, such that (cid:107) φ − b (cid:107) L ∞ =dist( φ, H ∞ ). If φ is bounded then b is in H ∞ and is the function we discussedearlier, the best approximation to φ in the L ∞ norm. To complete the story weshow the opposite inequality, and will then know that the norm of the Hankeloperator, or of the Hankel form, equals the distance of the symbol from H ∞ .That result is Nehari’s theorem. Theorem 14

Given φ ∈ L [ B φ ] = (cid:107) H φ (cid:107) operator = (cid:57) Γ α (cid:57) B ( (cid:96) ) = dist( φ, H ∞ ) . The previous discussion shows that the expression on the right is larger. Toﬁnish we must show that there is a holomorphic function b so that (cid:107) φ − b (cid:107) ∞ =[ B φ ]. Starting with the formula B φ ( f, g ) := (cid:104) f g, ¯ z ¯ φ (cid:105) L and taking note ofLemma 1 which shows that f g is a generic element of H we see that [ B φ ]is equal to the norm of the functional h → (cid:104) f g, ¯ z ¯ φ (cid:105) L acting on H . By theHahn-Banach theorem that functional extends in a norm preserving way to afunctional on L . That functional on L will be of the form k → (cid:104) k, j (cid:105) L for abounded j with (cid:107) j (cid:107) ∞ = [ B φ ] and j will satisfy (cid:104) h, ¯ z ¯ φ (cid:105) L = (cid:104) h, j (cid:105) L ∀ h ∈ H . In particular j and ¯ z ¯ φ have the same nonnegative Fourier coeﬃcients and thus j = = (¯ z ¯ φ ) + .We now want to ﬁnd b so that (cid:107) φ − b (cid:107) ∞ = (cid:107) j (cid:107) ∞ . We have zφ = ( zφ ) + + ( zφ ) − = j + + ( zφ ) − = j − j − + ( zφ ) . Rearranging gives zφ − ( − j − + ( zφ )) = j . From that one quickly shows thereis a holomorphic b so that φ − b = z ¯ j and that is enough to give what we want,because (cid:107) φ − b (cid:107) L ∞ = (cid:107) j (cid:107) L ∞ = [ B φ ] (cid:54) (cid:107) φ − b (cid:107) L ∞ .On Hankel operators, for the mathematical side a good starting point isPeller’s survey; their use in control theory is in Francis’ lecture notes. For ψ ∈ L ( T ) given, the Toeplitz operator T ψ with symbol ψ is deﬁned for f ∈ H by T ψ f = M ψ f − H ψ f = P + ( ψf ), where P + : L → H is orthogonalprojection. The Toeplitz operator coincides with the multiplication operator M ψ if ψ ∈ H is holomorphic. The adjoint of T ψ is T ∗ ψ = T ¯ ψ .In signal theory Toeplitz operators naturally appear in connection with analternative deﬁnition of on (cid:96) ( N ). Recall that τ φ ( n ) = φ ( n −

1) deﬁnes the shifton (cid:96) ( N ). Its adjoint, the backward shift , is the operator τ ∗ φ ( n ) = φ ( n + 1), τ ∗ : (cid:96) ( N ) → (cid:96) ( N ). It is readily veriﬁed that τ ∗ τ φ = φ and that τ τ ∗ φ = φ − φ (0) δ . A linear system T on (cid:96) ( N ) is called time invariant if τ ∗ T τ = T : ifwe shift the input forward, feed it to T , then shift backward, we have the sameas just applying T .The rationale for this new deﬁnition of invariant system for signals in positivetime is that the previous deﬁnition assumed, in order to be veriﬁed, that all the18ast values of the signal have been stored and are accessible, a requirementwhich is not practical.We now see how invariant systems lead to Toeplitz operators. Passing to thefrequency side with h ( z ) = (cid:80) ∞ j =0 a j z j , a (linear) system T on (cid:96) ( N ), representedby a matrix [ F ij ] ∞ i,j =0 ( F ij = < T ( z j ) , z i > H are the matrix elements of T withrespect to the basis { z n } ∞ n =0 of H ), is invariant if ∞ (cid:88) i =0 ∞ (cid:88) j =0 F ij a j z i = ( T φ ) (cid:98) ( z )= ( τ ∗ T τ ) (cid:98) φ ( z )= ¯ z (( T τ ) (cid:98) φ ( z ) − ( T τ ) (cid:98) φ (0))= ¯ z  ∞ (cid:88) i =0 ∞ (cid:88) j =1 F ij a j − z i − ∞ (cid:88) j =1 F j a j −  = ∞ (cid:88) i =1 ∞ (cid:88) j =1 F ij a j − z i − = ∞ (cid:88) i =0 ∞ (cid:88) j =0 F i +1 ,j +1 a j z i , i.e. F i +1 ,j +1 = F i,j : T is represented, w.r.t. the basis { z n } ∞ n =0 , by a Toeplitzmatrix F i,j = f i − j . Recall that in a Hankel matrix the i, j entry is a functionof i + j .Inserting this back in the expression for T in frequency space,( T φ ) (cid:98) ( z ) = ∞ (cid:88) i =0 ∞ (cid:88) j =0 f i − j a j z i = P + ( b ( z ) f ( z )) , where ψ ( z ) = (cid:80) + ∞ n = −∞ f n z n .When φ is holomorphic, the matrix [ f i − j ] is lower triangular.As with Hankel operators, it is clear that (cid:57) T φ (cid:57) B ( H ) (cid:54) (cid:107) φ (cid:107) L ∞ : (cid:107) P + ( φf ) (cid:107) H (cid:54) (cid:107) φf (cid:107) L (cid:54) (cid:107) f (cid:107) H . Contrary to the Hankel case, there is no way to improve thisestimate: (cid:57) T φ (cid:57) B ( H ) = (cid:107) φ (cid:107) L ∞ . Let k a ( z ) = − ¯ az be the reproducing kernel at a : |(cid:104) T φ k a , k a (cid:105)| = |(cid:104) P + ( φk a ) , k a (cid:105)| = |(cid:104) φk a , k a (cid:105)| = (cid:12)(cid:12)(cid:12)(cid:12) π (cid:90) π − π φ ( e it ) | k a ( e it ) | dt (cid:12)(cid:12)(cid:12)(cid:12) = 11 − | a | (cid:12)(cid:12)(cid:12)(cid:12) π (cid:90) π − π φ ( e it ) 1 − | a | | − ¯ az | dt (cid:12)(cid:12)(cid:12)(cid:12) = (cid:107) k a (cid:107) H | P φ ( a ) | , P φ is the Poisson integral of φ at a , because P ( a, e it ) = π −| a | | − ¯ az | is thePoisson kernel in the unit disc. Hence, (cid:57) T φ (cid:57) B ( H ) (cid:62) sup a ∈ D |(cid:104) T φ k a , k a (cid:105)|(cid:107) k a (cid:107) H = (cid:107) P φ (cid:107) L ∞ ( D ) = (cid:107) φ (cid:107) L ∞ ( T ) . H and BMO

We will not touch here

Nehari’s problem ; that is, how to ﬁnd the best ap-proximant of φ in H ∞ . Even the estimate we have found, however, are of littleuse unless we have tools for estimating (cid:107) b (cid:107) ( H ) ∗ . Contrary to a ﬁrst, naif guess,the dual of H contains, but is larger, than H ∞ .Shortly after Nehari’s article on Hankel forms, Fritz John introduced, inconnection to problems in elasticity theory, the space BMO of functions withBounded Mean Oscillations, which he further studied together with John Niren-berg. Restricted to functions on T , the deﬁnition is as follows. For each arc I ⊂ T , denote by φ I = | I | (cid:82) I φ ( e it ) dt be the average of φ over I . The meanoscillation of φ over I is | I | (cid:82) [ φ ( e it ) − φ I ] dt, andtheBMO norm of φ is (cid:107) φ (cid:107) L ∞ + sup I | I | (cid:90) | φ ( e it ) − φ I | dt. In 1971 C. Feﬀerman made the surprising discovery that ( H ) ∗ = BMOA, thespace of the BMO functions which extend holomorphically to the unit disc.Duality is with respect to the H inner product. It is not diﬃcult to see thatthis result implies that if φ is bounded, then Hφ , its Hilbert transform, belongsto BMO.On his way to the proof, Feﬀerman proved that the BMO norm of a functioncan be characterized in terms of Carleson measures . Let µ (cid:62) D . We say that it is a Carleson measure for H if there is a positiveconstant [ µ ] CM such that (cid:90) D | f | dµ (cid:54) [ µ ] CM (cid:107) f (cid:107) H . The concept itself had been introduced by Carleson in connection to the problemof interpolating functions in H ∞ . Feﬀerman showed that b ∈ BMOA if and onlyif dµ b ( z ) = (1 − | z | ) | b (cid:48) ( z ) | dxdy is a Carleson measure.The appearence of such measures is easily explained. A equivalent norm for H is [ f ] H = | f (0) | + (cid:90) D (1 − | z | ) | f (cid:48) ( z ) | dxdy. If dµ b is a Carleson measure for H , then (assuming momentarily that b (0) = 020nd using the equivalent norm to deﬁne the inner product), |(cid:104) f g, b (cid:105) H | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D ( f g ) (cid:48) b (cid:48) (1 − | z | ) dxdy (cid:12)(cid:12)(cid:12)(cid:12) (cid:54) (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D f g (cid:48) b (cid:48) (1 − | z | ) dxdy (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D gf (cid:48) b (cid:48) (1 − | z | ) dxdy (cid:12)(cid:12)(cid:12)(cid:12) (cid:54)(cid:54) (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D | g (cid:48) | (1 − | z | ) | b (cid:48) | dxdy (cid:12)(cid:12)(cid:12)(cid:12) / (cid:107) f (cid:107) H + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D | f (cid:48) | (1 − | z | ) | b (cid:48) | dxdy (cid:12)(cid:12)(cid:12)(cid:12) / (cid:107) g (cid:107) H = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D | g (cid:48) | dµ b (cid:12)(cid:12)(cid:12)(cid:12) / (cid:107) f (cid:107) H + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) D | f (cid:48) | dµ b (cid:12)(cid:12)(cid:12)(cid:12) / (cid:107) g (cid:107) H (cid:54) µ b ] CM (cid:107) f (cid:107) H (cid:107) g (cid:107) H . Recalling Section 5.1, this shows that if µ b is Carleson, then the Hankel form B b , hence the Hankel operator H b , is bounded. By Nehari’s theorem, b ∈ ( H ) ∗ .The delicate point is proving the opposite implication.The short and dense monograph of Sarason well explains the connectionsbetween Hankel operators, basic questions of operator theory, and harmonicanalysis.We summarize part of what we have seen in a diagram:Mult( H ) = H ∞ (cid:44) → BMOA = ( H ) ∗ (cid:44) → H (cid:44) → H = H · H . We see here, as it often happens, that analysis on a function Hilbert spacerequires introducing a number of other Banach function spaces.

A typical device (a plant ) can be modeled by a linear, time invariant, causal,stable operator P , which acts in frequency as M b , with b ∈ H ∞ , and which weassume to be free of feedback loops. Generally the output P a ( n ) only dependson ﬁnitely many values a n − m +1 , . . . , a n of the input (which have to be stored),and it is easy to verify that this holds if and only if b is a polynomial of degree m . This property is sometimes expressed saying that transient inputs producetransient outputs, and it is clear that it suﬃces to verify this for the unit impulse δ . A feedback system is one in which the output of P is “fed back” into P ,possibly after having been processed by a diﬀerent plant C . For instance:21e use the same symbols for signals and plants, and their Z -transforms andtransfer functions; the letter n stands for time and ω for frequency. In a realsituation, the output y ( n ) can not immediately aﬀect the input u ( n ) at time n .In order to have this, C ( ω ) must include a delay by at least a time unit; i.e. thepolynomial C (0) = 0.The system represented by the diagram is:  y = P vv = u − zz = Cy Overall, y = P ( u − Cy ), i.e. y = P P C u . Observe that the rational function P P C is not a polynomial, hence the system with feedback gives a persistentsignal as output if the input is the unit impulse (the feedback produces an“echo”).This easy example shows how nontrivial conclusions can be drawn by ele-mentary algebra in frequency space. Hardy space theory leads system theorymuch further. We give here just one example, giving us the opportunity ofmentioning Pick theory, a topic of current research.

Let T be an ideal plant we want to best approximate by a cascade U CV , where U and V are given plants, and C is a plant we can design. That is, we want toﬁnd C which minimizes (cid:107) T − U CV (cid:107) H ∞ . This is the

Model Matching Problem with data

T, U, V .22onsider the inner/outer factorization

U V = A i A o . Since U and V are ra-tional (we allow feedbacks), A i is a ﬁnite Blaschke product, with zeros λ , . . . , λ n in D . We can then write H := T − U CV = T − A i F , with F = A o C . Since H ( λ j ) = µ j =: T ( λj ), we have that (assuming a minimizer exists):min {(cid:107) T − A i F (cid:107) H ∞ : F ∈ H ∞ } = min {(cid:107) H (cid:107) H ∞ : H ( λ j ) = µ j } . In fact, if H is a minimizer for the right hand side, then the equation T − A i F = H has the solution F = A − i ( T − H ), which is well deﬁned in H ∞ because T − H vanishes at λ , . . . , λ n . Since A o is outer, we can then reconstruct C from F .Finding a function H of minimal norm satisfying the interpolation costraints H ( λ j ) = µ j is the Pick problem with data { λ , . . . , λ n } and { µ , . . . , µ n } .Suppose that the minimal norm of H is not larger that R . We have sequences { λ , . . . , λ n } and { µ /R, . . . , µ n /R } in D and we have an interpolating H/R ofnorm at most one. A necessary condition for this to hold is the

Pick property .For any choice of complex a , . . . , a n , denoting by k λ j the reproducing functionat λ j and using M ∗ H ( k λ j ) = H ( λ j ) k λ j :0 (cid:54) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) j =1 a j k λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R M ∗ H  n (cid:88) j =1 a j k λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) j =1 a j k λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R n (cid:88) j =1 a j H ( λ j ) k λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) j =1 a j k λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) j =1 a j µ j /Rk λ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = n (cid:88) i,j =1 a i a j < k λ i , k λ j > H (cid:18) − µ i µ j R (cid:19) = n (cid:88) i,j =1 a i a j k λ i ( λ j ) (cid:18) − µ i µ j R (cid:19) That is, the

Pick matrix (cid:104) k λ i ( λ j ) (cid:16) − µ i µ j R (cid:17)(cid:105) ni,j =1 is positive semideﬁnite.23ick’s Theorem says that the converse is true. Theorem 15

Given points λ , . . . , λ n in the unit disc and values µ , . . . , µ n in the unit disc, there exists a function H in H having norm at most oneinterpolating them, H ( λ j ) = µ j , if and only if the matrix [ k λ i ( λ j )(1 − µ i µ j )] ni,j =1 is positive deﬁnite.Moreover, the interpolating function H of minimal norm is a rational func-tion. Pick’s Theorem holds, with natural modiﬁcations, for inﬁnite sequences ofpoints and values.Extensions and applications of Pick theory are one of the most active areasof current research at the frontier between operator theory and function spaces.

Up to this point this article could be seen as a bus tour of an interesting city.The bus goes from place to place, the tour guide oﬀers enthusiastic descriptionand commentary, and at a few of the places the passengers have a chance to getoﬀ the bus and look in detail at some of the sights. That tour is over now andwhat comes next can be seen as the airplane ride home. We ﬂy over a landscapeand a voice on the speaker points out some interesting features below; just aquick glance at them, perhaps enough to whet the appetite.The Hardy space lives in the intersection of several powerful mathematicaltechnologies. Hardy space functions are holomorphic functions in the disk andcan be analyzed using tools from function theory. The boundary values ofHardy space functions are in the Lebesgue space L of the circle and hence themachinery of Fourier analysis can be used can be used to study them. In factthey form a closed subspace of L and hence there is an associated projectionoperator and that lets questions about Hardy space functions be be formulatedand studied in the language of linear operators on Hilbert space. We have seenbits of all of these approaches.A point of view we are emphasizing here is that the Hardy space is a Hilbertspace with reproducing kernel, RKHS. That is, it is a Hilbert space whoseelements are functions on a set X (in this case X = D ), the evaluation ofthe functions at points x ∈ X are continuous linear functionals, and henceeach of those evaluations is given by taking the inner product of f with somedistinguished element k x in the space; f ( x ) = (cid:104) f.k x (cid:105) . The k x are the reproducingkernels and, in some sense, the collection of them, { k x } x ∈ X , plays the role inthis theory that an orthonormal basis plays for ﬁnite dimensional inner productspaces.In the next three sections we take a look, from great height, at three otherexamples of RKHS. The ﬁrst is the Paley Wiener space, a space somewhat24imilar to the Hardy space (but the Hardy space of the half plane rather thanthe disk) that is of great interest in the theory of sampling and reconstructingband limited signals such as speech and music. The second example is theDirichlet space. It is a variation of the Hardy space with some similarities andsome diﬀerences, and it is dear to the authors. The third example is the dyadicDirichlet space. That space is a simpliﬁed model of the Dirichlet space, usefulin analyzing the Dirichlet space. It is also a space which makes explicit theparameter space for the ”phase space analysis” of signals, of which wavelets arethe most prominent example. The Paley-Wiener space,

P W , is the subspace of L ( R ) of all functions f whoseFourier transform ˆ f supported on the interval [ − π, π ], The space is often usedin signal analysis; f ∈ P W is a signal, f ( t ) is its value at time t ∈ R is its valueat time t and ˆ f , its Fourier transform is the frequency space representation ofthe signal. The fact that ˆ f is supported in [ − π, π ] is is a statement that thesignal contains no frequencies outside this range, the signal is ”band limited”.The norm of f in P W , which is the same as the norm of f in L and (withour normalization) the same as the norm of ˆ f in L ( − π, π ), is the energy of thesignal. In short P W is a space of ﬁnite energy band limited signals. This canbe compared with the Hardy space of the upper half plane; the boundary valuesof those functions are exactly the functions f with ˆ f ∈ L (0 , ∞ ).(The same space of functions can also be deﬁned by restricting to the realaxis a certain class of entire functions deﬁned by their growth at inﬁnity. Theequivalence between the two deﬁnitions uses the fundamental ideas developedby Paley and Wiener in the 1930’s relating the smoothness of functions and thedecay of their Fourier transforms.)To see that P W is an RKHS we want to know that the evaluations of pointsof R are continuous functionals. Consider ﬁrst evaluation at t = 0. We nowdescribe the picture from our very high altitude. The value of f at 0 is gottenby using the bilinear pairing ( f, g ) → (cid:82) f ¯ g to pair f with the point mass at t = 0, Fourier transform theory tells us that the same value is obtained bypairing their Fourier transforms. The Fourier transform of the point mass is theconstant function 1 but because we know ˆ f is supported in [ − π, π ] we can replace1 with 1 · χ [ − π,π ] , the characteristic function [ − π, π ]. 1 · χ [ − π,π ] , is the Fouriertransform of some function k in P W and this discussion suggests, correctlythat that function is k , the P W reproducing kernel for evaluating at t = 0: k ( t ) = (1 · χ [ − π,π ] ) ∨ = sin πtπt = sinc tf (0) = (cid:104) f, k (cid:105) all f ∈ P W

Here ∨ is the inverse Fourier transform, the second equality on the ﬁrst lineis an elementary Fourier transform computation and the third is the deﬁnitionof the function sinc. 25his gives k , the reproducing kernel for evaluating at the origin. By transla-tion invariance k x , the reproducing kernel for evaluating at x is k x ( t ) − sinc( t − x )and its Fourier transform is ( k x ) ∧ ( ξ ) = e πξ χ [ − π,π ] ( ξ ). In particular the func-tions { ( k n ) ∧ } n ∈ Z are an orthonormal basis of the space of Fourier transforms offunctions in P W . Performing the inverse Fourier transform we see that { k n } n ∈ Z is an orthnormal basis of P W . Hence we have

Theorem 16 (Shannon sampling theorem) If f ( t ) is a ﬁnite energy bandlimited signal with spectrum contained in [ − π, π ] then f ∈ P W (by deﬁnition)and1. the sequence of sample values { f ( n ) } is a square summable sequence and,2. f can be reconstructed from those values using the formula f ( t ) = (cid:88) (cid:104) f, k n (cid:105) k n ( t ) = (cid:88) f ( n )sinc( t − n ) . (1)

3. Conversely given any square summable sequence { a n } there is a function f in P W with f ( n ) = a n , for all n and the value of f at all points is givenby (1). (The previous result has many names, we retreat behind the Wikipedia entryon Stigler’s law.)This result describes the type of values obtained by regular sampling of thefunction f and gives a scheme for reconstructing f from those sample values—think of electronic device which samples audio signal at rate of 100 kHz andthen a device which reconstructs the signal from the sample data—think aboutdigital music.More generally, the space P W and variations provide the mathematicalframework in which to study sampling and reconstruction of band limited sig-nals.

In this section we compare the answers to some questions for the Hardy spacewith the answers to the analogous questions for the closely related Dirichletspace. Some answers are very similar, some are not. Each space has a storyof its own, and we consider the Dirichlet space because much is known aboutit and also, on the contrary, much is still open. We will see that sometimes thesame object of the Hardy theory has, like in a broken mirror, more than oneanalog in Dirichlet theory.The Dirichlet space D is the Hilbert space of holomorphic functions on thedisk. f ( z ) = (cid:80) ∞ n =0 a n z n is in D exactly if, with α = 1, the following norm isﬁnite:( ∗ ) (cid:107) f (cid:107) D = ∞ (cid:88) n =0 ( n + 1) a | a n | = | f (0) | + 1 π (cid:90) (cid:90) D | f (cid:48) ( z ) | (1 − | z | ) − α dxdy,

26e wrote the norm in this form to emphasize the analogy with the Hardy spacein which the formula for the norm is the case α = 0 of the previous formula. Theparameter α in (*) helps highlight the close relationship with the Hardy space.With α = 0 the formula describes the Hardy space norm. (With α = − k z ( w ) = 1¯ zw log (cid:18) − ¯ zw (cid:19) . These kernel functions, as well as the kernel functions for the Hardy space, havethe property that the region where k z is relatively large is roughly the regionbetween z and the unit circle. More speciﬁcally, if z = re iθ then the regionwhere k z is large is, roughly, T z , the intersection of the unit disk with a diskcentered at e iθ of radius 2(1 − r ). In particular the boundary value function hasits mass concentrated near a particular point with a speciﬁc scale of dispersion.We will discuss the two parameter phase space described by position and scalefurther in the next section.The Dirichlet space has not so far found a place in signal theory. We discussit here because it helps illuminate the Hardy space, and, truth be told, becausethe authors are very fond of it. The operator M z of multiplication by z acts boundedly on D . This operator,called the Dirichlet shift, has the same action on the sequence of Taylor coeﬃ-cients of a function as the Hardy space shift does for Hardy space functions, itshifts each entry of that sequence one place to the right. The shift on the Hardyspace is isometric and that is the starting point of an analysis which eventualleads to the theory of inner-outer factorization of functions and a characteriza-tion of the invariant subspaces of the shift operator acting on H . The analysisof the invariant subspaces of the shift operator on D is more complicated andless complete than for H .The Dirichlet shift is bounded and it is easy to see that it has lots of invari-ant subspaces. In particular the structure of the invariant subspaces of ﬁnitecodimension is exactly the same as for H ; they are the subspaces of functionswhich vanish on a given ﬁnite point sets. Some other properties of the shiftinvariant subspaces of H which follow easily from Beurling’s theorem are alsotrue for spaces invariant of the Dirichlet shift, but with proofs that are lessstraightforward and more subtle. Two examples are the fact that any invariantsubspace contains a bounded function and the fact that the intersection of anytwo nontrivial invariant subspace contains a third.There is not yet a description of the shift invariant subspaces of D . In factit is not yet known how to characterize the functions with the property that thesmallest closed invariant subspace containing them is the whole space. For theHardy space those functions are exactly the outer functions. For the Dirichletspace the functions must be Hardy space outer functions and the set on which27heir boundary values are zero must be a Dirichlet space null set. (A Hardyspace function is the zero function if its boundary values are zero on a setof positive Lebesgue measure. The analogous statement for the Dirichlet spaceholds for smaller sets, those of logarithmic capacity zero.) It was conjectured byBrown and Shields in 1984 that those two conditions characterize the Dirichletspace analogs of outer functions. Multiplication by the coordinate function is a bounded operator on D and itfollows that multiplication by a polynomial is a bounded operator on D . It isthen natural to ask what are the multipliers of D , the functions b such that M b ,multiplication by b is a bounded map of D into itself. (Elements in a RKHS arefunctions on a set and hence there is a natural way to multiply two of them.The question of characterizing the multipliers makes sense on any RKHS.)If M b is a bounded multiplier on the Dirichlet space then b must be a boundedfunction; in fact the argument is the same as for the Hardy space multipliers,an argument that works for any RKHS. Also b must be holomorphic. Thoseconditions, b ∈ H ∞ , are the full story for the Hardy space but not for theDirichlet space. To see why not select f ∈ D and consider the requirement that bf ∈ D . By deﬁnition we must have that ( bf ) (cid:48) = b (cid:48) f + bf (cid:48) is square integrable.Because f ∈ D and b is bounded the second term is. Requiring the ﬁrst term tobe square integable, for every f ∈ D , leads to the deﬁnition of Carleson measurefor D .A measure µ on D is a Carleson measure for D if[ µ ] CM( D ) = sup f ∈D (cid:82) (cid:82) | f | dµ (cid:107) f (cid:107) D = (cid:107) Id (cid:107) B ( D ,L ( µ )) < ∞ . We deﬁne X to be the space of holomorphic functions b deﬁned on the disk suchthat | b (cid:48) | dxdy is a Carleson measure for D . Considering f = 1 in the previousdeﬁnition we see that X ⊂ D .Our analysis to this point shows that if M b is a bounded multiplicationoperator then b ∈ X ∩ H ∞ . The argument is easily reversed and we have thefull story. Theorem 17 M b is a bounded multiplication operator on D if and only if b ∈ X ∩ H ∞ , Although this does not look like our description of bounded multiplicationoperators for the Hardy space, it is in fact very similar. Using the description ofthe Hardy space given by by (*) with α = 0 and then following the ideas in thatsection will lead to the conclusion that M b is a bounded multiplication operatoron H if and only if b ∈ BM O ∩ H ∞ , which is the analog of the previoustheorem. However that last statement can be simpliﬁed because H ∞ ⊂ BM O ,the analogous simpliﬁcation is not possible for the Dirichlet space because H ∞ (cid:42) X . 28f course our understanding of the space X is limited by how well we un-derstand Dirichlet space Carleson measures. There are several known charac-terizations of those measures, some are measure theoretic ”local T D are deﬁned by a Sobolevnorm, and capacity has a role in the study of Sobolev spaces somewhat similarto the role of measure theory in studying Lebesgue spaces. However even withthose results the space X and the Dirichlet space Carleson measures are muchless well understood then their more classical cousins; BM O and ”classical”Carleson measures.

Having gone this far with our analysis of multipliers for the Dirichlet space wecan consider the analog of Pick’s question: Given a ﬁnite set of points in thedisk what are the necessary and suﬃcient conditions on a set of target valueswhich insure that there is a Dirichlet space multiplier of norm at most one whichtakes the target values at the points of the given set.When we looked at the similar question in the Hardy space we started byshowing that the kernel functions were eigenfunctions of the operator M ∗ b , theadjoint of M b , and the associated eigenvalues were the conjugates of the valuesof the multiplier at the given point set. This was enough to generate a conditioninvolving ﬁnite matrices which was necessary in order for there to be a multiplierof the desired sort. That argument holds for any RKHS and the matrix producedthis way is called the Pick matrix of the problem. Pick’s theorem was that in theHardy space the condition on the Pick matrix was also suﬃcient for a solution tothe interpolation problem. It is now understood that there is a class of RKHS forwhich an analog of Pick’s theorem holds as well as a matricial version, spaceswith the complete Pick property.

In recent decades it has become clear thatthose RKHS have a very rich additional structure. One of the reasons for recentinterest in the Dirichlet space is that it is one of simplest spaces other than theHardy space with this fundamental property.

On the Hardy space we considered the following bilinear Hankel form. Select aholomorphic symbol function b and deﬁne the Hankel form on the Hardy spacewith symbol b to be the bilinear form on H given by, for f, g ∈ H H b ( f, g ) = (cid:104) f g, b (cid:105) H . We can deﬁne a Hankel form on the Dirichlet space for f, g ∈ D using the sameformula but, of course, with the D inner product.When we looked at Hankel forms on the Hardy space it was straightfor-ward to see that if | b (cid:48) | dxdy was a Hardy space Carleson measure then H b wasbounded on the Hardy space. It then follows that having b in BM O will be asuﬃcient condition for boundedness. The same analysis shows that having b in29 is suﬃcient for H b to be bounded on the Dirichlet space. In fact, as with theHardy space, that is the full story. Theorem 18

The Dirichlet space Hankel form H b is bounded if and only if b ∈ X (The deﬁnition of Hankel operators and forms for the Dirichlet space is aplace where there is more than one natural extension of the Hardy space ideas.Emphasizing diﬀerent analogies between the Dirichlet space and Hardy spacecan lead to the conjugate linear map from D to itself given by f → (cid:90) P ( b (cid:48) ¯ f ) = H b f as the natural generalization of Hankel operators to the Dirichlet space. (Here P is the orthogonal projection associated with the Bergman space.) The condition b ∈ X is also necessary and suﬃcient for is suﬃcient for the boundedness of H b and the proof of the easy half of the result is the same as for H b . However thefull proof is diﬀerent.)The proof of the Hardy space version of the previous theorem exploited thefact that every function in H is the product of two functions in H and theduality between H and BM O . Starting with the previous theorem one can tryto reverse those arguments to ﬁnd our what the space X is the dual of. Thatleads to the notion of weakly factored spaces. We deﬁne the weakly factoredspace D (cid:12) D to be the space of those f holomorphic on D for which (cid:107) f (cid:107) D(cid:12)D = inf (cid:88) j (cid:107) g j (cid:107) D (cid:107) h j (cid:107) D : (cid:88) j g j h j = f  < ∞ . A consequence of the previous theorem is the duality relation

Corollary 1 ( D (cid:12) D ) ∗ = X Using the factorization of H functions described in Lemma 1 it is straight-forward to see that H = H (cid:12) H . Hence the previous corollary is the Dirichletspace analog of Feﬀerman’s classical ( H ) ∗ = BM O .Using interpolation of Banach spaces, real or complex, it is possible to startfrom the spaces H and BM O and recover the full range of Hardy spaces H p ,1 < p < ∞ with the starting Hilbert space H in the middle of the scale.Similarly one can construct the scale of spaces connecting D (cid:12) D and X whichhas the Hilbert space D in the middle. Very little is known about those spaces. Let T be the vertex set the dyadic tree, which we choose to also call T . Thus T is a connected, simply connected, rooted graph with two edges at the rootvertex o and three edges at all the other vertices. We put a partial order, (cid:22) ,30n the vertices by saying α (cid:22) β exactly if α is a vertex on the geodesic pathconnecting o and β . For any β ∈ T \ { o } we let β − be its predecessor, themaximal α such that α (cid:22) β and α (cid:54) = β .We use two functions, I and ∆ acting on functions deﬁned on T : If ( β ) = (cid:88) o (cid:22) α (cid:22) β f ( α ) , ∆ f ( β ) = (cid:26) β = of ( β ) − f ( β − ) otherwise . These operators are models for integration and iﬀerentiation. If f is a func-tion on T with f ( o ) = 0 then I ∆ f = ∆ If = f . We deﬁne the dyadic Dirichletspace, D dyad , to be the Hilbert space of functions f deﬁned on T for which∆ f ∈ (cid:96) ( T ). The space is normed by (cid:107) f (cid:107) D dyad = | f ( o ) | + (cid:107) ∆ f (cid:107) (cid:96) ( T ) . This space is a RKHS, the reproducing kernel for evaluation at α ∈ T is k α = I ( χ [ o,α ] ). D dyad is a model for D One of the reasons for considering the space D dyad is that it is a simple modelfor D . The analogy is best understood by regarding T as a point set in the unitdisk. Informally, the root is placed at the origin, the 2 n vertices connected to theorigin by geodesics of length n are spaced evenly on the circle of radius 1 − − n .The edge between an α on that circle to its predecessor α − is represented by analmost radial line segment connecting the two.In this picture the values of an f ∈ D dyad at points of the abstract tree area model for the values of some unspeciﬁed function ˜ f ∈ D . If fact startingwith any g ∈ D and restricting to the points of the realization of T inside thedisk will given an element of D dyad . Continuing the analogy, if f ∈ D dyad then∆ f is a model for ˜ f (cid:48) and the fact that ∆ f is required to be square summablemodels the fact that ˜ f (cid:48) must be square integrable. (Our view from great heightis ignoring scaling: ∆ f is actually a model of the invariant derivative δf ( z ) =(1 − | z | ) f (cid:48) ( z ).) The analogies just described are relatively superﬁcial. More interesting is thatthe analogies extend to subtle aspects of the Dirichlet space theory. Thereare natural extensions of the deﬁnitions of multipliers, of Carleson measures,of Hankel forms, etc. from the Dirichlet space to the dyadic Dirichlet space.For all of the results we have discussed (and many others) the results for thetwo spaces are ”the same”, that is they continue the pattern suggested by the31nalogy. Generally the proofs in the dyadic case are easier and sometimes thoseproofs provide road maps for the more diﬃcult proofs for the classical space.Carleson measures are a particularly interesting case. The measure theoreticcharacterization of Carleson measures for D is most simply obtained by ﬁrstsolving the analogous problem in D dyad and then using the fact mentioned before,that the restriction of functions in D produces functions in D dyad , to lift theresult to D . A number of interrelated ideas form the general category of phase space analysis.The RKHS we have discussed are in this category and the dyadic Dirichlet spaceis a particularly simply instance. We will say a few words about the generaltheme but, even by the standards of what has gone before, we will be veryinformal. Our main point is that some of the ideas we have seen here areinstances of general themes.Suppose we wanted to analyze a function f in the Dirichlet space. We knowthere are reproducing kernels { k z } z ∈ D and hence f ( z ) = (cid:104) f, k z (cid:105) . We mentionedthat reproducing kernels were a substitute for an orthonormal basis. If theywere an orthogonal basis we would have a representation f = (cid:88) (cid:28) ˙ f , k z (cid:107) k z (cid:107) (cid:29) k z (cid:107) k z (cid:107) (2)but that is not true. A possible path forward is to replace the sum by an integraland hope for a representation f = (cid:90) (cid:104) ˙ f , k z (cid:105) k z dµ ( z ) . (3)Here we have absorbed the normalizing factors into the measure but we areintentionally vague about the details. This does not hold but a formula of thistype is true for the Bergman space (”Bergman reproducing formula”) and ina number of spaces of interest in quantum theory (”coherent state representa-tions”). Another way to try to go forward is to try to use a subset of the { k x } and obtain a summation formula of the type (2), for instance using only those z which correspond to the vertices of T . That set is still not an orthogonalbasis but it is close enough so that (2), while not latterly true, is a good enoughapproximation, both analytically and conceptually, to be a useful starting point.That fact is the heart of the relation between D dyad and D . It is also the start-ing point for obtaining representations of functions in various function spaces aslinear combinations of reproducing kernels associated with points in a set suchas T .When we discussed the Hardy space there were diﬀerent viewpoints; Hardyspace functions can be viewed as holomorphic functions in the disk or as bound-ary value functions on the circle, and it is possible to pass back and forth betweenthose viewpoints with no loss of information. The same is true for many other32paces of functions on the disk. Consider now how that interacts with formulassuch as (2) and (3) and their various reﬁnements. We could start with a bound-ary function f boundary pass to the associated function inside the disk, f inside , usethe analytical tools to represent f inside as a sum or integral of simple pieces,and then pass back to the boundary function. This would realize f boundary as asum (or integral) of boundary values of a set of well understood functions. If thecoeﬃcient corresponding to z in the representations is built by taking the innerproduct of f inside with some h z , function concentrated on the set we introducedearlier, T z , then it will be mainly responsive to the values of f inside inside T z andhence presumably to the values of f boundary near the part of the unit circle cut oﬀby T z . Furthermore the boundary values of the function in the representation,perhaps again h z , will also be concentrated on that same interval. In sum, therepresentation of a function on the boundary uses analysis and reconstructiontools paramertrized by two real parameters. The parameters can be understoodas position and scale, the center of the boundary interval and its length, andthose parameters form points in ”phase space”. For the Hardy space the points z = re iθ parameterize the disk D which is the phase space; re iθ is the complexparameter describing the interval on the circle with center e iθ and radius 1 − r .Without examples the previous paragraph is idle talk. However there areexamples. Many RKHS of holomorphic functions in one and several complexvariables ﬁt this pattern, or they do after minor modiﬁcations. The Bergmanspaces are fundamental examples. Also there is an important class of examplesnot related to holomorphic functions. It is possible to start with a generalfunction on the circle, or on the line, or on n -space and form an associatedphase space, a space of one higher dimension whose new coordinate is scale.There are systematic ways to extend a function f on the space to a function f inside deﬁned on the phase space and to introduce functions { k ζ } for ζ in thephase space. and proceed exactly as described. With the appropriate detailsﬁlled in the result is an exact formula in the style of (3). The functions k ζ areeach associated with a point in phase space and their boundary values, theirtraces on the starting space, are concentrated in the associated ball, the ballwhose center and radius are the coordinates in phase space. In fact all this canbe done with the k ζ all translates and dilates of a single function, a ”motherwavelet”. The resulting formula is the ”Calderon reproducing formula” or the”continuous wavelet transform”. There is a striking reﬁnement of these ideas.It is possible to arrange the details so that the set of normalized k ζ with ζ in a discrete subset of phase space, shaped like T , is an orthonormal basis ofthe Lebesgue space of the starting manifold. In that case there is a discreterepresentation, a formula of the form (2) for representing any function. In thatformula the coeﬃcients and the summands, the analysis and the reconstruction,respect the description of the function in terms of the phase space parametersof location and scale. The resulting formula is the ”wavelet representation” ofthe function which is fundamental in large areas of signal analysis.33 Further reading • A lovely and quick introduction to some of the topics we have discussedis the self-contained, expository articleJohn McCarthy Pick’s theorem - what’s the big deal?

American Mathe-matical Monthly

Vol. 110 No. 1 [2003] 36-45,where in a few pages the route from the Hardy space to control theory toPick’s theory is covered. • The pure mathematician who wants to painlessely understand what signaltheory and the related control theory are about, can watch the old, butclear and enjoyable, 1987 MIT lectures of Alan Oppenheim,https://ocw.mit.edu/resources/res-6-007-signals-and-systems-spring-2011/video-lectures/where some surprisingly eﬀective pratical applications are shown. • A very nice introduction to H ∞ • A largely overlapping body of knowledge, but from the viewpoint of thepure mathematician, is in the monographJonathan R. Partington - Linear operators and linear systems: An ana-lytical approach to control theory (2004, CUP)which also works as a comprehensive introduction to Hardy space theory. • An excellent survey (with proofs) on Hankel operators and Nehari theoryisVladimir Peller, An Excursion into the Theory of Hankel Operators, Holo-morphic Spaces MSRI Publications Volume 33, 1998,34hich can be found here: http://mathscinet.ru/ﬁles/PellerV.pdf • An excellent, self-contained, and easy to read monograph on reproducingkernel Hilbert spaces and Pick theory, also providing an introduction toHardy space theory, isJim Agler, John McCarthy, Pick Interpolation and Hilbert Function Spaces,American Mathematical Society, 2002. • The discourse on Nehari, Hankel, Toeplitz, Hilbert transform, and BMO,is the subject ofthe short and denseDonald Sarason, Function Theory on the Unit Disc, Virginia PolytechnicInstitute and State University, 1978 • To move deeper in hard-analysis Hardy space theory, our standard refer-ence is stillJohn Garnett, Bounded analytic functions, Springer, Revised 1st ed. 2007 • A standard text of Functional Analysis which is fully adequate for thesubject isPeter Lax, Functional Analysis, Wiley 2002. • A chapter on the Paley-Wiener space, with a thourogh discussion of sam-pling results (which are crucial in applications to engineering) isKristian Seip, Interpolation and Sampling in Spaces of Analytic Functions,American Mathematical Soc., 2004. • There are two recent monographs on the Dirichlet space:Omar El-Fallah, Karim Kellay, Javad Mashreghi, Thomas Ransford, Aprimer on the Dirichlet Space, Cambridge Tracts in Mathematics, 2014,andNicola Arcozzi, Richard Rochberg, Eric T. Sawyer, Brett D. Wick, TheDirichlet Space and Related Function Spaces, American MathematicalSociety, 2019. 35he former develops the theory from a classical point of view, the latterfrom the viewpoint of Reproducing Kernel Hilbert Spaces. • An excellent way to become acquainted to time-frequency analysis isIngrid Daubechies, Ten lectures on Wavelets, SIAM, 1994,by one of the pioneers of wavelet theory. • Speciﬁc operators on speciﬁc Hilbert function spaces can “model” generalclasses of operators acting on Hilbert spaces. This line of investigationhas one of its milestones in:B. Sz. Nagy and C. Foias, Harmonic Analysis of Operators on HilbertSpace. VIII + 387 S. Budapest/Amsterdam/London 1970. North HollandPublishing Company ••