[PDF] Using Householder Matrices to Establish Mixing Test Critical Values

Abstract

A measure-preserving dynamical system can be approximated by a Markov shift with a bistochastic matrix. This leads to using empirical stochastic matrices to measure and estimate properties of stirring protocols. Specifically, the second largest eigenvalue can be used to statistically decide if a stirring protocol is weak-mixing, ergodic, or nonergodic. Such hypothesis tests require appropriate probability distributions. In this paper, we propose using Monte Carlo empirical probability distributions from unistochastic matrices to establish critical values. These unistochastic matrices arise from randomly constructed Householder matrices.

Full PDF

UUsing Householder Matrices to

Establish Mixing Test Critical Values

Aaron Carl [email protected]

Department of MathematicsUniversity of Central Florida4000 Central Florida BlvdP.O. Box 161364Orlando, FL 32816-1364(407) 823-0538(407) 823-6253 (fax)

Abstract:

A measure-preserving dynamical system can be approxi-mated by a Markov shift with a bistochastic matrix. This leads to usingempirical stochastic matrices to measure and estimate properties of stir-ring protocols. Speciﬁcally, the second largest eigenvalue can be usedto statistically decide if a stirring protocol is weak-mixing, ergodic, ornonergodic. Such hypothesis tests require appropriate probability distri-butions. In this paper, we propose using Monte Carlo empirical probabil-ity distributions from unistochastic matrices to establish critical values.These unistochastic matrices arise from randomly constructed House-holder matrices.

AMS 2000 subject classiﬁcations:

Primary 37A25, 62P30; secondary37A05.

Keywords and phrases: weak-mixing, measure-preserving, stirringprotocol.

Contents imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 a r X i v : . [ m a t h . D S ] O c t

1. Introduction

If a dynamical system has a probability measure and the system is measure-preserving, then partitioning the domain into n states of equal measure leadsto a Markov shift whose measure is deﬁned by a bistochastic matrix and thelength n row vector (cid:0) n . . . n (cid:1) . (1.1)This partition approximation is called Ulam’s method [21]. After partitioningthe space, data point movement from one iteration of the function providesa stochastic matrix that approximates the bistochastic matrix from Ulam’smethod. Hence a Markov shift with an empirical stochastic matrix and the1 /n row vector approximates the dynamical system [see [17, Chapter 1, Chap-ter 9] for procedure and convergence rate].We may model a stirring protocol’s aﬀect on a compression-resistant ﬂuidwith a measure-preserving dynamical system. In this paper we are interestedin discrete interations of a stirring protocol where the ﬂuid at the beginningis the same ﬂuid at the end. We ’look’ at the ﬂuid before and after stirring,but not during.Properties of an empirical Markov shift can measure and evaluate proper-ties of a measure-preserving dynamical system [8, 10, 13]. The second largesteigenvalue of a empirical stochastic matrix arising from Ulam’s method maybe used to statistically decide if a measure-preserving dynamical system isweak-mixing, ergodic, or nonergodic [6, 7, 9, 14, 17]. To statistically test ifthe dynamical system is ergodic, we need to have some knowledge of P ( | (cid:98) λ − | > k : λ = 1); (1.2)to statistically test if the dynamical system is weak-mixing, we need to havesome knowledge of P ( | (cid:98) λ | > k : | λ | = 1) . (1.3)We use λ to denote the second largest eigenvalue of a bistochastic matrixarising from Ulam’s method, and (cid:98) λ denotes the second largest eigenvalue imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 of a corresponding empirical stochastic matrix (bistochastic and unistochas-tic matrices will be deﬁned shortly). The utility of (cid:98) λ as a test statisticarises directly from the relationship between stochastic matrix eigenvaluesand Markov shift ergodic, mixing properties. Since the unit circle containsall eigenvalues of stochastic matrices, there are no reasonable probabilitydistributions of (cid:98) λ with 1 as the mean or median of either (cid:98) λ or | (cid:98) λ | . So forhypothesis testing, we should use a probability distribution that has signiﬁ-cant mass near (cid:98) λ = 1 or | (cid:98) λ | = 1.In this paper, we show that it is reasonable to approximate the condi-tional probability distributions with Monte Carlo probability distributionswhen the equal-measure partition sets are small. These Monte Carlo prob-ability distributions are constructed using randomly generated Householdermatrices.Stirring protocols of compression resistant ﬂuids, such as chocolate andwater, provide examples of nearly measure-perserving dynamical systems.The need for conﬁdence in the mixing of food items and in the mixing ofpharmaceuticals highlights the utility of such probability distributions.We propose using randomly generated, nonzero, independent, indenticallydistributed real numbers to generate Householder matrices; take products ofpermutation matrices with Householder matrices; then square the magnitudeof the products’ entries to get unistochastic matrices. From these unistochas-tic matrices, construct a Monte Carlo approximation of a desired probabilitydistribution. From the Monte Carlo probability distribution, establish thecritical value for rejecting the null-hypothesis. The primary focus of this pa-per is to establish a method for determining hypothesis test critical values.Deciding which speciﬁc probability distribution to use in a hypothesis testdepends on properties of the dynamical system; we will only show that thepresented Monte Carlo methods are reasonable and leave probability distri-bution selection for the future.There are several ways to use Monte Carlo methods to generate bistochas-tic matrices; unfortunately, many techniques lead to empirical probabilitydistributions where the central tendency of (cid:98) λ is close to zero [17, Chapter12]. Such distributions provide little utility for a weak-mixing, ergodic, ornonergodic hypothesis test. The methods presented here lead to Householdermatrices that, in a Frobenius norm sense, are likely to be close to the identitymatrix. Squaring the magnitude of entries from these unitary matrices givesunistochatic matrices. If we want a unistochastic matrix close to a particular imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 permutation matrix, we may multiply the Householder matrix by the desiredpermutation matrix. The main advantage of this method is that it providesprobability distributions based on observed unistochastic matrices.

2. Bistochastic MatricesDeﬁnition 2.1. An n × n bistochastic matrix is a stochastic matrix whosetranspose is also a stochastic matrix. By the Birkhoﬀ-von Neumann theorem, the set of n × n bistochastic ma-trices form a convex set with permutation matrices as extreme points. Werefer to this set as Birkhoﬀ’s polytope [2, 3]. Bistochastic matrices are alsoreferred to as doubly stochastic . Deﬁnition 2.2. An n × n bistochastic matrix is called unistochastic if eachentry is equal to the squared magnitude of some unitary matrix. The set of n × n unistochastic matrices form a proper subset of Birkhoﬀ’spolytope [2, page 307, section 1]. Since the set of unistochastic matrices is aproper subset, the proposed method should only be used when the Ulam’smethod bistochastic matrix is approximately unistochastic. Deﬁnition 2.3. An n × n Householder matrix is of the form H = I − (cid:126)v(cid:126)v ∗ (2.1) where (cid:126)v is a unit vector. Every Householder matrix is a unitary matrix [11, Chapter 5]. Since theset of n × n unitary matrices is closed under multiplication, taking the squaremagnitude of entries from a Householder matrix-permutation matrix productresults in a unistochastic matrix.

3. Modeling Dynamical Systems

Consider running a stirring protocol on a compression resistant ﬂuid. Let’smodel this with a measure-preserving dynamical system ( D , B , µ, f ),1. D represents the compression resistant ﬂuid,2. B is the Borel σ -algebra,3. µ is rescaled Lebesgue measure so that µ ( D ) = 1,4. f : D → D models ﬂuid movement during stirring. imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 The Monte Carlo method we will outline uses n × n unistochastic matricesarising from Householder matrix-permutation matrix products. These unis-tochastic matrices are close to the permutation matrices when n is large. It isreasonable to use the described Monte Carlo distribution for hypothesis test-ing when the dynamical system has the following property: For any A, B ∈ B where P ( A ∩ B ) = 0, if f is perturbed so that P ( f ( x ) ∈ A | x ∈ B ) increases (decreases), (3.1)then for all Borel set C ⊆ B c P ( f ( x ) ∈ A | x ∈ C ) decreases (increases) proportionally. (3.2)The 1 /n row vector and unistochastic matrices arising from Householdermatrix-permutation matrix products provide Markov shifts that reﬂect Ulam’smethod with an equal measure partition applied to such dynamical systems.This is not saying that all such dynamical systems lead unistochastic ma-trices, but that the Householder constructed unistochastic matrices reﬂectthese properties.If (cid:126)v is a real unit vector and H is the corresponding Householder matrix, (cid:126)v =  v v ... v n  , H = I − (cid:126)v(cid:126)v t . (3.3)Then squaring the entries of H gives a unistochastic matrix M , m ij = (cid:40) (1 − v i ) if i = j, v i v j if i (cid:54) = j. (3.4)If D , D , . . . , D n are our equal measure partition sets, and M arose fromUlam’s method, then the entries of M provide conditional probabilities, m ij = P ( f ( x ) ∈ D j | x ∈ D i ) . (3.5)Increasing (decreasing) v i leads to nearly proportional decreases (increases)in m ij = P ( f ( x ) ∈ D j | x ∈ D i ) when i (cid:54) = j. (3.6)Because of these observations, we propose using unistochastic Ulam matri-ces arising from squaring the entries of real Householder matrix-permutationmatrix products to model weak-mixing stirring protocols of such dynamicalsystems. imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018

4. Mixing Hypothesis Test Procedure

In this section we will discuss and outline our procedure for testing a compression-resistant ﬂuid stirring protocol.There are many techniques to measure a stirring protocol’s ability to mix,such as decay of correlations [5], Fourier analysis, Artin braid patterns [1, 20],chaotic advection [18, 19], and other topological methods [4, 12]. Unfortu-nately, it is typical for diﬀerent protocols to lend themselves to diﬀerentanalytical methods. Thus comparing mixing quality between mechanicallydissimilar stirring protocols is diﬃcult. The primary advantage of the Ulammethod approximation is that it can be used to evaluate any incompressibleﬂuid stirring pattern. This allows one to compare and evaluate the mixingof stirring protocols by comparing and evaluating eigenvalues. The main dis-advantage is that the method is statistical and does not prove the results.Another signiﬁcant advantage of our method is that it requires only oneiteration of stirring, in contrast to other techniques that call for iteratedexperiments.Since our method only approximates the dynamical system, we make noinferences regarding strong-mixing when we conclude that the stirring proto-col is weak-mixing. If the protocol is not ergodic, then it is not weak-mixing.If the protocol is not weak-mixing, then it is not strong-mixing.Our test hypotheses are1. H o : ( D , B , µ, f ) is not ergodic (and hence not weak-mixing).2. H a : ( D , B , µ, f ) is ergodic but not weak-mixing.3. H a : ( D , B , µ, f ) is weak-mixing (and hence ergodic).We partition the ﬂuid into connected, equal volume regions, and use thesepartition sets to generate a new σ -algebra contained in B . If our data stronglyindicate that the stirring protocol is weak-mixing or ergodic over the gen-erated σ -algebra, we will conclude the same about the original dynamicalsystem. If the stirring protocol is nonmixing or nonergodic over the gener-ated σ -algebra, then the original dynamical system is nonmixing or noner-godic. The procedure evaluates stirring over a smaller σ -algebra, thus thetest is inherently more reliable for detecting if a protocol is nonmixing ornonergodic.The null hypothesis is that the stirring protocol is nonergodic. It is betterto reject a protocol that mixes well than to produce poorly mixed product.The repercussions of a testing error are as follows:1. Type I Error: discard a desirable stirring protocol for a diﬀerent stirring imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 method2. Type II Error: produce a product that is insuﬃciently mixedThe stirring protocol’s purpose determines the tolerable risks of error andthe number of partition states. If poorly mixed ﬂuid could result in minorconsequences or mixing on a small scale is inapt, then the number of par-tition regions may be relatively small. If poorly mixed ﬂuid could result insevere consequences, then the number of partition regions must be large andpartition volume small. For example, poorly mixed batter from a kitchencould result in unpalatable food; poorly mixed pharmaceuticals with a lowLD50 could lead to overdose and death. A mixing test for a kitchen coulduse a relatively coarse partition, while a pharmaceutical company would usea ﬁne partition.If we know an upper bound for the stirring protocol’s entropy, call it h ,then Froyland’s entropy estimate and expected values show that the numberof states should be greater than e h [8].Data point movement from one iteration of stirring leads to our empiricalstochastic matrix, (cid:98) P . The percent of points that start in region i and end inregion j gives us (cid:98) p ij . We model the entries of (cid:98) P as nonindependent binomialrandom variables, whose probabilities come from the Ulam stochastic matrix.Our test statistic is (cid:98) λ = λ ( (cid:98) P ). Some dynamical systems have measure zerosets with atypical properties. In an attempt to avoid such diﬃculties, werandomly select data points rather than select points from a grid.If the data points are independent, uniform, and randomly distributedwithin each region, then the empirical matrix will converge to a bistochasticmatrix in a Frobenius norm sense (the proof of this follows from extending astandard Monte Carlo argument [16]). We approximate the stirring protocolwith a one-sided Markov shift, (cid:18)(cid:0) n , . . . , n (cid:1) , (cid:98) P (cid:19) . The pair will not deﬁne aMarkov shift if (cid:0) n , . . . , n (cid:1) (cid:98) P (cid:54) = (cid:0) n , . . . , n (cid:1) . (4.1)The 1 /n row vector is a stationary distribution for any bistochastic matrix.Since Ulam method’s partitions our ﬂuid into n equal volume regions,it is reasonable to use the 1 /n vector as the stationary distribution. If wedo not use equal volume partitions, the stationary distribution will be theprobability vector corresponding to the rescaled volume of each region. Manyof the results regarding convergence, expected values, convergence rates, etc. imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 depend on equal measure partitions, we should use equal volume regions ifappropriate [17]. Mixing Hypothesis Test Procedure:

1. Set the type II error signiﬁcance levels for both alternative hypotheses, α and α .2. Set n to be the number of partition regions.3. Decide which conditional probability distribution(s) for | λ ( P ) | = 1and λ ( P ) = 1 to use.4. Establish the critical values for H a and H a , c and c . The purpose ofthis paper is to propose using Householder matrix-permutation matrixproducts to estimate c and c .5. Partition the ﬂuid into n connected equal volume regions, D , D , . . . , D n .6. Randomly select data points in each partition region. These pointsshould be independent and uniformly distributed.7. Run the stirring protocol one time.8. Use data point movement between regions toconstruct an empirical stochastic matrix, (cid:98) P .9. Determine the hypothesis test result. The test statistic is λ ( (cid:98) P );compare | λ ( (cid:98) P ) − | to c ;compare | λ ( (cid:98) P ) | to c .10. Use Froyland’s entropy estimate to estimate the dynamical system’sentropy. − n n (cid:88) i =1 n (cid:88) j =1 (cid:98) p ij log (cid:98) p ij (4.2)(we deﬁne 0 log 0 to equal 0).11. If the null hypothesis is rejected in favor of weak-mixing, let the rateat which (cid:18) Nn − (cid:19) ( λ ( (cid:98) P )) N − n +1 → N → ∞ (4.3)be our estimate of the rate of mixing.

5. Constructing the Monte Carlo Matrices

Let n ∈ { , , , , , ... } (5.1) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 be the number of equal measure states that we partition the measure-preservingdynamical system into while using Ulam’s method [see [17, Chapter 1] forthe procedure]. Let { u , u , . . . , u n } (5.2)be real independent, identically distributed random variables such that u i (cid:54) = 0 almost surely, and E (cid:0) u i (cid:1) , E ( u i ) , E (cid:0) u i (cid:1) , E ( u i ) < ∞ . (5.3)Let (cid:126)u =  u u ... u n  , (cid:126)v = (cid:126)u | (cid:126)u | . (5.4)We may use a unit vector to construct a Householder matrix. Let H = ( h ij )be the Householder matrix corresponding to (cid:126)v . H = I − (cid:126)v(cid:126)v T (5.5)The entries of H are h ij = (cid:40) − u + u + ... + u n u i if i = j, − u + u + ... + u n u i u j if i (cid:54) = j. (5.6)Let Q be a permutation matrix that we want our random unistochasticmatrix to be proximal to. Set U = QH . Now, let M = ( m ij ) be the matrixdeﬁned by m ij = u ij . (5.7)Since H is a Householder matrix and Q is a permutation matrix, U is aunitary matrix. It follows that M is unistochastic.Notice that m ij = (cid:40)(cid:0) − u + u + ... + u n u i (cid:1) if i = j, u + u + ... + u n ) u i u j if i (cid:54) = j. (5.8) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Since u i (cid:54) = 0 almost surely for all i , all entries of M are positive almostsurely; by the Perron-Frobenius theorem, all but one of M ’s eigenvalues are ofmagnitude strictly less than one [15, Chapter 8]. It follows that any Markovshift with stationary distribution (cid:0) n . . . n (cid:1) (5.9)will be strong-mixing (for Markov shifts, weak-mixing is equivalent to strong-mixing). Our hypothesis test for weak-mixing (ergodic) requires a probabilitydistribution over [0 ,

1] (the unit circle) with signiﬁcant mass near 1. In thenext two sections, we will see that the expected value of M ’s eigenvaluesconverge to one as n goes to inﬁnity.How does our dynamical system relate to n ? Generally speaking, ﬁner par-titions are more apt to detect nonmixing (nonergodicity). If we are conﬁdentin mixing, we will use a coarse partition to reduce eﬀort; if our conﬁdence inmixing is poor, we will use a ﬁne partition.By the Birckoﬀ-von Neumann theorem, bistochastic matrices are convexcombinations of permutation matrices [2, 3]. So our unistochastic matriceswill tend to be near the ’corners’ of the set of bistochastic matrices. For anystatistic from unistochastic matrices we are interested in, we may use suchmatrices to generate a Monte Carlo empirical probability distribution.

6. Establishing Critical Values

In this section, we will outline the procedure we propose for establishingcritical values for a weak-mixing, ergodic, nonergodic hypothesis test.Our test hypotheses are1. H o : ( D , B , µ, f ) is not ergodic (and hence not weak-mixing).2. H a : ( D , B , µ, f ) is ergodic but not weak-mixing.3. H a : ( D , B , µ, f ) is weak-mixing (and hence ergodic).After partitioning the space into n equal measure connected subsets, Ulam’smethod approximates the dynamical system with a Markov shift. We willapproximate the bistochastic matrix deﬁning the Markov shift’s measure, P ,with an empirical stochastic matrix, (cid:98) P . So we approximate( D , B , µ, f ) with (cid:16)(cid:0) n , . . . , n (cid:1) , (cid:98) P (cid:17) . (6.1) Remark 6.1.

The pair (cid:16)(cid:0) n , . . . , n (cid:1) , (cid:98) P (cid:17) (6.2) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 will not deﬁne a Markov shift if (cid:0) n , . . . , n (cid:1) (cid:98) P (cid:54) = (cid:0) n , . . . , n (cid:1) , (6.3) but if our data points are uniform random variables within each state, then E ( (cid:107) P − (cid:98) P (cid:107) F ) → as the minimum number of points in a state goes towards inﬁnity. It followsthat for each eigenvalue | λ i ( P ) − λ i ( (cid:98) P ) |→ in probability ∀ i (6.5) in the Hausdorf topology when our data points are uniform random variableswithin each state and the minimum number of points in a state goes towardsinﬁnity [17, Chapter 8]. Our test statistic is the second largest eigenvalue of (cid:98) P . Let α , α ∈ (0 ,

1) bethe alpha values for the hypothesis test; let c , c ∈ (0 ,

1) be the correspondingcritical values, c ≤ − c , P ( | λ ( (cid:98) P ) − |≥ c : λ ( P ) = 1) < α , (6.6) P ( | λ ( (cid:98) P ) |≤ c : | λ ( P ) | = 1) < α . (6.7)Our goal is to use Householder matrices to estimate c and c . The proba-bility distribution used to establish c , c should reﬂect properties of a classof dynamical systems containing our stirring protocol. imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Argand diagram of mixing hypothesis test criteria:

1. If λ ( (cid:98) P ) is in the region containing 1 (yellow), fail to reject thenull hypothesis; conclude that the dynamical system isnonergodic.2. If λ ( (cid:98) P ) is in the outer region away from 1 (green), reject thenull hypothesis in favor of the ﬁrst alternative hypothesis;conclude that the dynamical system is ergodic, but notweak-mixing.3. If λ ( (cid:98) P ) is in the center region (red), reject the null hypothesisin favor of the second alternative hypothesis; conclude that thedynamical system is weak-mixing (and hence ergodic). − − c c − c − i − c ic ii Establishing Critical Values for the Test:

1. Partition D into n equal measure connected subsets. If an upper boundof the dynamical system’s entropy is known, call the upper bound h ,set n greater than e h [8].2. Select a random variable with which to construct unit vectors.Let u , u , . . . , u n be independent, identically distributed, random vari-ables, (cid:126)v i = (cid:126)u (cid:107) u (cid:107) .3. Set N ∈ N so that our empirical probability distributions will be suﬃ-ciently accurate.4. Select permutation matrices { Q i } Ni =1 near which we want the probabil-ity distribution to have signiﬁcant mass.5. Randomly generate N Householder matrices, H i = I − (cid:126)v i (cid:126)v Ti . Then imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 square the entries of Q i H i to get the matrix M i .6. Use { λ ( M i ) } Ni =1 to approximate P ( | λ ( (cid:98) P ) − |≥ k : λ ( P ) = 1). Usethe approximation to estimate c .7. Use {| λ ( M i ) |} Ni =1 to approximate P ( | λ ( (cid:98) P ) | < k : λ ( P ) = 1). Usethe approximation to estimate c .

7. Matrix Convergence

In this section, we will show that M i from the previous section will convergeto the permutation matrix Q i as n increases. Since permutation matrix eigen-values are on the unit circle, as a random variable, it is likely that the secondlargest eigenvalue from one of our unistochastic matrices will be near mag-nitude one. Because of the likely proximity to one, it is reasonable to use aprobability distribution from such an eigenvalue to establish critical valuesfor our weak-mixing, ergodic, nonergodic hypothesis test.Our proofs take advantage of the Frobenius norm. After using Jensen’sinequality to remove the square root from consideration, ﬁnding an expectedvalue upper bound is similar to ﬁnding a second moment. A permutationmatrix acting on a matrix does not change the magnitude of the entries,without loss of generality will prove the results for when Q is the indentitymatrix and focus on M ’s convergence to the identity matrix. Proposition 7.1. If M is a matrix constructed in section with Q = I , n ∈ { , , , . . . } , { u i } ni =1 are identically distributed, u i (cid:54) = 0 a.s. and E ( u i ) , E ( u i ) , E (cid:0) u i (cid:1) , E (cid:0) u i (cid:1) < ∞ , (7.1) then E ( (cid:107) M − I (cid:107) F ) → as n → ∞ . Moreover, E ( (cid:107) M − I (cid:107) F ) ≤ n ( n − E ( u i ) E ( u i ) (7.2)+ n ( n − E ( u i ) E ( u i ) (7.3)+ n ( n − n − E ( u i )( E ( u i )) . (7.4)(7.5) Proof.

First, by Jensen’s inequality E ( (cid:107) M − I (cid:107) F ) ≤ (cid:113) E ( (cid:107) M − I (cid:107) F ) . (7.6) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 So it is suﬃcient to show the second part of the proposition. Let’s look atthe entries of M − I ; by computation we see that:( M − I ) ij =  − u + u + ... + u n u i ) × (1 − u + u + ... + u n u i ) if i = j, u + u + ... + u n ) u i u j if i (cid:54) = j. (7.7)It follows that ( M − I ) ij =  u + u + ... + u n ) u i − u + u + ... + u n ) u i + u + u + ... + u n ) u i if i = j, u + u + ... + u n ) u i u j if i (cid:54) = j. (7.8)If we expand the addends and remove the negative terms, it follows thatalmost surely (cid:107) M − I (cid:107) F < n (cid:88) i =1 16( u + u + ... + u n ) u i (7.9)+ n (cid:88) i =1 16( u + u + ... + u n ) u i (7.10)+ (cid:88) i (cid:54) = j u + u + ... + u n ) u i u j . (7.11)Almost surely, all of the terms in the denominators are positive; if we subtractterms from the denominators, we get an upper bound on the fractions. Thusalmost surely (cid:107) M − I (cid:107) F < n (cid:88) i =1 16( u + u + ... + u n − u i ) u i (7.12)+ n (cid:88) i =1 16( u + u + ... + u n − u i ) u i (7.13)+ (cid:88) i (cid:54) = j u + u + ... + u n − u i − u j ) u i u j . (7.14) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Notice that the numerator and denominator in each fraction in the upperbound are independent.The subtraction of terms in denominators removed some positive terms,so the denominators are sums of positive terms. Therefore, we may use theharmonic-arithmetic means inequality, (cid:107) M − I (cid:107) F < n (cid:88) i =1 1( n − ( u + u + . . . + u n − u i ) u i (7.15)+ 16 n (cid:88) i =1 1( n − ( u + u + . . . + u n − u i ) u i (7.16)+ 16 (cid:88) i (cid:54) = j n − ( u + u + . . . + u n − u i − u j ) u i u j (7.17)Now let’s take expected values; since the u i ’s are independent, E ( (cid:107) M − I (cid:107) F )16 ≤ n (cid:88) i =1 E (( u + u + . . . + u n − u i ) ) E ( u i )( n − (7.18)+ n (cid:88) i =1 E (( u + u + . . . + u n − u i ) ) E ( u i )( n − (7.19)+ (cid:88) i (cid:54) = j E (( u + u + . . . + u n − u i − u j ) ) E ( u i ) E ( u j )( n − . (7.20)Next we use Minkowski’s inequality and the fact that the u i ’s are indepen-dently distributed, E ( (cid:107) M − I (cid:107) F ) ≤ n ( n − E ( u i ) E ( u i ) (7.21)+ n ( n − E ( u i ) E ( u i ) (7.22)+ n ( n − n − E ( u i )( E ( u i )) . (7.23)(7.24)Since E ( u i ) , E ( u i ) , E ( u i ), and E ( u i ) are all ﬁnite, it follows that E ( (cid:107) M − I (cid:107) F ) → n → ∞ . Hence by Jensen’s inequality, E ( (cid:107) M − I (cid:107) F ) → n → ∞ . imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Since the second largest eigenvalue gives a test statistic to decide if ameasure-preserving dynamical system is weak mixing, ergodic, or nonergodic,we need a conditional probability distribution of eigenvalues to conduct hy-pothesis tests. To statistically test if a measure-preserving dynamical systemis weak-mixing, we could randomly select permutation matrices { Q k } Nk =1 (7.27)and generate { M k } Nk =1 , with our Householder method, then use the empiricalprobability distribution from {| λ ( M k ) |} Ni =1 (7.28)to establish the critical value for the weak-mixing hypothesis test.To statistically test if a measure-preserving dynamical system is ergodic,we could randomly select permutation matrices { Q k : the multiplicity of λ = 1 is at least two } Nk =1 (7.29)and use Householder matrices to generate { M k } Nk =1 , then use the empiricalprobability distribution from { λ ( M k ) } Nk =1 (7.30)to establish a critical value for the ergodic hypothesis test.

8. Using Speciﬁc Random Variables

In this section, we ﬁnd more precise upper bounds for speciﬁc random vari-ables. These upper bounds give better estimates of convergence rate than theresults in the previous section. The ﬁrst two proofs in this section start outthe same way as the ﬁrst proof in the previous section, then the argumentstake advantage of the distribution properties.Let’s ﬁnd a more precise upper bound when the u i ’s are independent stan-dard normal random variables. The proof is similar to the ﬁrst convergenceproof, the diﬀerence is that we take advantage of the relationship betweennormal random variables and χ -distributions. imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Proposition 8.1.

If the u i ’s in the construction of a unistochastic matrixare independent standard normal random variables and n ∈ { , , , . . . } ,then E ( (cid:107) M − I (cid:107) F ) < n ( n − n − n − n − (8.1)+ n ( n − n − (8.2)+ n ( n − n − n − n − n − . (8.3) Proof.

From the previous proof, we know that almost surely (cid:107) M − I (cid:107) F < n (cid:88) i =1 16( u + u + ... + u n − u i ) u i (8.4)+ n (cid:88) i =1 16( u + u + ... + u n − u i ) u i (8.5)+ (cid:88) i (cid:54) = j u + u + ... + u n − u i − u j ) u i u j . (8.6)Since u i ’s are independent standard normal random variables, we mayreplace the u i ’s with χ -random variables when we take expected values. E ( (cid:107) M − I (cid:107) F ) < n (cid:88) i =1 E ( γ n − ) ) E ( u i ) (8.7)+ n (cid:88) i =1 E ( γ n − ) ) E ( u i ) (8.8)+ (cid:88) i (cid:54) = j E ( γ n − ) ) E ( u i ) E ( u j ) . (8.9)We use γ j to denote a χ -random variable with j degrees of freedom. If wetake expected values and remove negative terms, it follows that E ( (cid:107) M − I (cid:107) F ) < n ( n − n − n − n − (8.10)+ n ( n − n − (8.11)+ n ( n − n − n − n − n − . (8.12) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Now let’s consider gamma random variables. A ﬁnite sum of independentgamma random variables with the same scale parameter is a new gammarandom variable with the same scale parameter, but the shape parameteris the sum of the addend shape parameters. In the next proof, we look atindependent and indentically distributed gamma random variables.

Proposition 8.2.

If the u i ’s in the construction of our unistochastic matrixare independent Γ( α, β ) random variables and α + 2 < n , then E ( (cid:107) M − I (cid:107) F ) < n (cid:81) i =0 ( α + i ) (cid:81) i =1 [( n − α − i ] (8.13)+ n (cid:81) i =0 ( α + i ) (cid:81) i =1 [( n − α − i ] (8.14)+ n ( n − (cid:81) i =0 ( α + i ) (cid:81) i =1 [( n − α − i ] . (8.15) Proof.

Previously we showed that almost surely (cid:107) M − I (cid:107) F < n (cid:88) i =1 16( u + u + ... + u n − u i ) u i (8.16)+ n (cid:88) i =1 16( u + u + ... + u n − u i ) u i (8.17)+ (cid:88) i (cid:54) = j u + u + ... + u n − u i − u j ) u i u j . (8.18)Using the Cauchy-Schwarz inequality and the fact that 0 < u i almost surelyfor all i , we get (cid:107) M − I (cid:107) F < n (cid:88) i =1 16 n ( u + u + ... + u n − u i ) u i (8.19)+ n (cid:88) i =1 16 n ( u + u + ... + u n − u i ) u i (8.20)+ (cid:88) i (cid:54) = j n ( u + u + ... + u n − u i − u j ) u i u j . (8.21)If we take expected values, and take advantage of the independent and iden-tically distributed u i ’s, imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 E ( (cid:107) M − I (cid:107) F )

Assume true for n . Using the fact that interchangingany two rows or any two columns of a real matrix changes the sign of thedeterminant, we see thatdet( D n +1 ) = α det( D n ) − βn det( S n ) and (8.33)det( S n +1 ) = β det( D n ) − βn det( S n ) . (8.34)Using the induction hypothesis, these equations becomedet( D n +1 ) = α (cid:18) ( α − β ) n − ( α + ( n − β ) (cid:19) − βn (cid:18) ( α − β ) n − β (cid:19) and(8.35)det( S n +1 ) = β (cid:18) ( α − β ) n − ( α + ( n − β ) (cid:19) − βn (cid:18) ( α − β ) n − β (cid:19) . (8.36)Factoring out the ( α − β ) terms gives us the results. Proposition 8.4. If M is the n × n matrix matrix M = (cid:16) n − n (cid:17) I +  n . . . n ... . . . ... n . . . n  , (8.37) then det( M ) = (cid:0) n − n (cid:1) n − , trace ( M ) = ( n − n and the Jordan canonical formof M is the diagonal matrix with entries (1 , n − n , . . . , n − n ) . imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Proof.

The trace of M follows from the deﬁnition. The determinant followsfrom the previous lemma by setting α = ( n − n , and β = n .Now, the matrix is a symmetric real matrix; hence it is diagonalizable. Theeigenvalues follow from the previous lemma by setting α = ( n − n − λ , and β = n to get the characteristic polynomial of M .Since M is diagonalizable, if the Markov shift ((1 /n, /n, . . . , /n ) , M )approximates a measure-preserving dynamical system that is mixing, theestimate of mixing rate is the rate at which (cid:0) n − n (cid:1) N → N → ∞ , (8.38)instead of the estimate given in [17, Chapter 4], (cid:0) Nn − (cid:1) ( n − n ) N − n +1 → N → ∞ .

9. Two Region Partitions

There are few instances of interest where one would use our method with twopartition regions. We look at this special case as an example to help developunderstanding. A potential application is equal ratio mixing of items withminimal consequences of poor mixing, such as combining blends of coﬀee.When combining two equal volumes of coﬀee, poor mixing would result isinconsistent taste. Only the most serious baristas would say that inconsistentcup-of-Joe ﬂavor is worse than poorly mixing pharmaceuticals.Say that our unit vector is (cid:126)v = (cid:18) v v (cid:19) . (9.1)Since the vector has norm one, we may write its Householder matrix as H = (cid:20) − v ± v (cid:112) − v ± v (cid:112) − v v − (cid:21) . (9.2)There are two possible doubly-stochastic matrices arising from our method, (cid:20) (1 − v ) v (1 − v )4 v (1 − v ) (1 − v ) (cid:21) , and (cid:20) v (1 − v ) (1 − v ) (1 − v ) v (1 − v ) (cid:21) . (9.3) imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 Whose characteristic polynomials are(1 − λ )(8 v − v + 1 − λ ) , and (1 − λ )( − v + 8 v − − λ ) . (9.4)The second largest eigenvalue depends on v . Let’s graph the relationship. − − √ − √ √ √ − − v λ Graphs of the relationship between v and λ when n = 2 When n = 2, the relationship between v and second largest eigenvaluedeﬁnes a function from [ − ,

1] to [ − , P ( | (cid:98) λ − | > k : λ = 1) and P ( | (cid:98) λ | > k : | λ | = 1) . (9.5)If v is a beta random variable with parameters α and β , then E ( λ ) = ± α ( α +1)( α +2)( α +3)( α + β )( α + β +1)( α + β +2)( α + β +3) (9.6) ∓ α ( α +1)( α + β )( α + β +1) ± , (9.7)where the sign of addends depends on the permutation matrix used. References [1]

G. Band and P. Boyland , The Burau estimate for the entropy of abraid , Algebr. Geom. Topol., 7 (2007), pp. 1345–1378. imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 [2] I. Bengtsson, ˚A. Ericsson, M. Ku´s, W. Tadej, andK. ˙Zyczkowski , Birkhoﬀ ’s polytope and unistochastic matrices, N = 3 and N = 4, Comm. Math. Phys., 259 (2005), pp. 307–324.[3] G. Birkhoff , Three observations on linear algebra , Univ. Nac. Tu-cum´an. Revista A., 5 (1946), pp. 147–151.[4]

P. Boyland and J. Harrington , The entropy eﬃciency ofpoint-push mapping classes on the punctured disk , Arxiv preprintarXiv:1103.1829, (2011).[5]

G. Casati, G. Comparin, and I. Guarneri , Decay of correlationsin certain hyperbolic systems , Phys. Rev. A, 26 (1982), pp. 717–719.[6]

M. Dellnitz, G. Froyland, and S. Sertl , On the isolated spectrumof the Perron-Frobenius operator , Nonlinearity, 13 (2000), pp. 1171–1188.[7]

J. Ding and A. Zhou , Finite approximations of Frobenius-Perronoperators. A solution of Ulam’s conjecture to multi-dimensional trans-formations , Phys. D, 92 (1996), pp. 61–68.[8]

G. Froyland , Using Ulam’s method to calculate entropy and otherdynamical invariants , Nonlinearity, 12 (1999), pp. 79–101.[9] ,

On Ulam approximation of the isolated spectrum and eigenfunc-tions of hyperbolic maps , Discrete Contin. Dyn. Syst., 17 (2007), pp. 671–689 (electronic).[10]

G. Froyland and K. Aihara , Ulam formulae for random and forcedsystems , Thinking, 1, p. 1.[11]

G. H. Golub and C. F. Van Loan , Matrix computations , JohnsHopkins Studies in the Mathematical Sciences, Johns Hopkins Univer-sity Press, Baltimore, MD, third ed., 1996.[12]

J. Harrington , Topological eﬃciency of stirring with obstacles , PhDthesis, University of Florida, 2011.[13]

F. Y. Hunt , Unique ergodicity and the approximation of attractors andtheir invariant measures using Ulam’s method , Nonlinearity, 11 (1998),pp. 307–317.[14]

T. Y. Li , Finite approximation for the Frobenius-Perron operator. Asolution to Ulam’s conjecture , J. Approximation Theory, 17 (1976),pp. 177–186.[15]

C. Meyer , Matrix analysis and applied linear algebra , Society for Indus-trial and Applied Mathematics (SIAM), Philadelphia, PA, 2000. With1 CD-ROM (Windows, Macintosh and UNIX) and a solutions manual(iv+171 pp.). imsart-generic ver. 2011/11/15 file: UsingHouseholder.tex date: November 2, 2018 [16] C. Robert and G. Casella , Introducing Monte Carlo Methods withR , Use R!, Springer, 2009.[17]

A. C. Smith , Using Ulam’s method to test for mixing , ProQuest LLC,Ann Arbor, MI, 2010. Thesis (Ph.D.)–University of Florida.[18]

M. A. Stremler , Fluid mixing, chaotic advection, and microarrayanalysis , in Analysis and control of mixing with an application to mi-cro and macro ﬂow processes, vol. 510 of CISM Courses and Lectures,SpringerWienNewYork, Vienna, 2009, pp. 323–337.[19]

M. A. Stremler, F. R. Haselton, and H. Aref , Designing forchaos: applications of chaotic advection at the microscale , Philos. Trans.R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 362 (2004), pp. 1019–1036.[20]

J.-L. Thiffeault and M. D. Finn , Topology, braids and mixing inﬂuids , Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 364(2006), pp. 3251–3266.[21]

S. M. Ulam , Problems in modern mathematics , Science Editions JohnWiley & Sons, Inc., New York, 1964., Science Editions JohnWiley & Sons, Inc., New York, 1964.