The Kadison-Singer Problem for Strongly Rayleigh Measures and Applications to Asymmetric TSP
aa r X i v : . [ c s . D S ] J u l The Kadison-Singer Problem for Strongly Rayleigh Measuresand Applications to Asymmetric TSP
Nima Anari ∗ Shayan Oveis Gharan † Abstract
Marcus, Spielman, and Srivastava in their seminal work [MSS13b] resolved the Kadison-Singer conjecture by proving that for any set of finitely supported independently distributedrandom vectors v , . . . , v n which have “small” expected squared norm and are in isotropic posi-tion (in expectation), there is a positive probability that the sum P v i v ⊺ i has small spectral norm.Their proof crucially employs real stability of polynomials which is the natural generalizationof real-rootedness to multivariate polynomials.Strongly Rayleigh distributions are families of probability distributions whose generatingpolynomials are real stable [BBL09]. As independent distributions are just special cases ofstrongly Rayleigh measures, it is a natural question to see if the main theorem of [MSS13b] canbe extended to families of vectors assigned to the elements of a strongly Rayleigh distribution.In this paper we answer this question affirmatively; we show that for any homogeneousstrongly Rayleigh distribution where the marginal probabilities are upper bounded by ǫ andany isotropic set of vectors assigned to the underlying elements whose norms are at most √ ǫ ,there is a set in the support of the distribution such that the spectral norm of the sum of thenatural quadratic forms of the vectors assigned to the elements of the set is at most O ( ǫ + ǫ ).We employ our theorem to provide a sufficient condition for the existence of spectrally thintrees. This, together with a recent work of the authors [AO14], provides an improved upperbound on the integrality gap of the natural LP relaxation of the Asymmetric Traveling SalesmanProblem. Marcus, Spielman and Srivastava [MSS13b] in a breakthrough work proved the Kadison-Singerconjecture [KS59] by proving Weaver’s [Wea04] conjecture KS and the Akemann and Anderson’sPaving conjecture [AA91]. The following is their main technical contribution. Theorem 1.1. If ǫ > and v , . . . , v m are independent random vectors in R d with finite supportwhere, m X i =1 E v i v ⊺ i = I, such that for all i , E k v i k ≤ ǫ, ∗ Computer Science Division, UC Berkeley. Email: [email protected] . † Department of Computer Science and Engineering, University of Washington. This work was partly done whilethe author was a postdoctoral Miller fellow at UC Berkeley. Email: [email protected] . hen P "(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (1 + √ ǫ ) > . In this paper, we prove an extension of the above theorem to families of vectors assigned toelements of a not necessarily independent distribution.Let µ : 2 [ m ] → R + be a probability distribution on the subsets of the set [ m ] = { , , . . . , m } .In particular, we assume that µ ( . ) is nonnegative and, X S ⊆ [ m ] µ ( S ) = 1 . We assign a multi-affine polynomial with variables z , . . . , z m to µ , g µ ( z ) = X S ⊆ [ m ] µ ( S ) · z S , where for a set S ⊆ [ m ], z S = Q i ∈ S z i . The polynomial g µ is also known as the generating polynomial of µ . We say µ is a homogeneous probability distribution if g µ is a homogeneous polynomial.We say that µ is a strongly Rayleigh distribution if g µ is a real stable polynomial. See Subsection 2.2for the definition of real stability. Strongly Rayleigh measures are introduced and deeply studiedin the seminal work of Borcea, Br¨and´en and Liggett [BBL09]. They are natural generalizations ofproduct distributions and cover several interesting families of probability distributions includingdeterminantal measures and random spanning tree distributions. We refer interested readers to[OSS11, PP14] for applications of these probability measures.Our main theorem extends Theorem 1.1 to families of vectors assigned to the elements of astrongly Rayleigh distribution. This can be seen as a generalization because independent distribu-tions are special classes of strongly Rayleigh measures. To state the main theorem we need anotherdefinition. The marginal probability of an element i with respect to a probability distribution, µ ,is the probability that i is in a sample of µ , P S ∼ µ [ i ∈ S ] = ∂ z i g µ ( z ) (cid:12)(cid:12) z = ... = z m =1 . (1) Theorem 1.2 (Main) . Let µ be a homogeneous strongly Rayleigh probability distributions on [ m ] such that the marginal probability of each element is at most ǫ , and let v , . . . , v m ∈ R d be vectorsin an isotropic position, m X i =1 v i v ⊺ i = I, such that for all i , k v i k ≤ ǫ . Then, P S ∼ µ "(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ S v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ǫ + ǫ ) + 2( ǫ + ǫ ) > . The above theorem does not directly generalize Theorem 1.1, but it can be seen as a variantof Theorem 1.1 to the case where the vectors v , . . . , v m are negatively dependent. We expectto see several applications of our main theorem that are not realizable by the original proof of2MSS13b]. In the following subsections we describe our main motivation for studying the abovestatement, which is to design approximation algorithms for the Asymmetric Traveling SalesmanProblem (ATSP).Let us conclude this part by proving a simple application of the above theorem to prove KS r for r ≥ Corollary 1.3.
Given a set vectors v , . . . , v m ∈ R d in isotropic position, m X i =1 v i v ⊺ i = I, if for all i , k v i k ≤ ǫ , then for any r , there is an r partitioning of [ m ] into S , . . . , S r such that forany j ≤ r , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ S j v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ /r + ǫ ) + 2(1 / ǫ ) . Proof.
The proof is inspired by the lifting idea in [MSS13b]. For i ∈ [ m ] and j ∈ [ r ] let w i,j ∈ R d · r be the directed sum of r vectors all of which are 0 d except the j -th one which is v i , i.e., w i, = v i d ...0 d , w i, = d v i ...0 d , and so on.Let E = { ( i, j ) : i ∈ [ m ] , j ∈ [ r ] } and let µ : 2 E → R + be a product distribution defined in a waythat selects exactly one pair ( i, j ) ∈ E for any i ∈ [ m ] uniformly at random. Observe that thereare m r sets in the support of µ each of size exactly m and each has probability 1 /r m . Therefore, µ is a homogeneous probability distribution and the marginal probability of each element of E isexactly 1 /r . In addition, since product distributions are strongly Rayleigh, µ is strongly Rayleigh.Therefore, by Theorem 1.2, there is a set S in the support of µ such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X ( i,j ) ∈ S w i,j w ⊺ i,j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ α, for α = 4(1 /r + ǫ ) + 2(1 /r + ǫ ) . Now, let S j = { i : ( i, j ) ∈ S } . It follows that for any j ∈ [ r ], (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ S j v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ α, as desired. In this section we use the main theorem to prove the existence of a thin basis among a given set ofisotropic vectors. In the next section, we will use this theorem to prove the existence of thin treesin graphs, i.e., trees which are “sparse” in all cuts of a given graph.3iven a set of vectors v , . . . , v m ∈ R d in the isotropic position, m X i =1 v i v ⊺ i = I, we want to find a sufficient condition for the existence of a thin basis . Recall that a set T ⊂ [ m ] isa basis if | T | = d and all vectors indexed by T are linearly independent. We say T is α -thin if (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ T v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ α. An obvious necessary condition for the existence of an α -thin basis is that the set V ( α ) := { v i : k v i k ≤ α } , contains a basis. We show that there exist universal constants C , C > C /α disjoint bases in V ( C · α ) is a sufficient condition. Theorem 1.4.
Given a set of vectors v , . . . , v m ∈ R d in the sub-isotropic position m X i =1 v i v ⊺ i (cid:22) I, if for all ≤ i ≤ m , k v i k ≤ ǫ , and the set { v , . . . , v m } contains k disjoint bases, then there existsan O ( ǫ + 1 /k ) -thin basis T ⊆ [ m ] . We will use Theorem 1.2 to prove the above theorem. To use Theorem 1.2 we need to definea strongly Rayleigh distribution on [ m ] with small marginal probabilities. This is proved in thefollowing proposition. Proposition 1.5.
Given a set of vectors v , . . . , v m ∈ R d that contains k disjoint bases, there isa strongly Rayleigh probability distribution µ : 2 [ m ] → R + supported on the bases such that themarginal probability of each element is at most O (1 /k ) . Now, Theorem 1.4 follows simply from the above proposition. Letting µ be defined as above,we get ǫ = ǫ and ǫ = O (1 /k ) in Theorem 1.2 which implies the existence of a basis T ⊆ [ m ] suchthat (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ T v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ O ( ǫ + 1 /k ) , as desired.In the rest of this section we prove the above proposition. In our proof µ will in fact be ahomogeneous determinantal probability distribution. We say µ : 2 [ m ] → R + is a determinantalprobability distribution if there is a PSD matrix M ∈ R m × m such that for any set T ⊆ [ m ], P S ∼ µ [ T ⊆ S ] = det( M T,T ) , where M T,T is the principal submatrix of M whose rows and columns are indexed by T . It is provedin [BBL09] that any determinantal probability distribution is a strongly Rayleigh measure, so this4s sufficient for our purpose. In fact, we will find nonnegative weights λ : [ m ] → R + and for anybasis T we will let µ λ ( T ) ∝ det X i ∈ T λ i v i v ⊺ i ! . (2)It follows by the Cauchy-Binet identity that for any λ , such a distribution is determinantal withrespect to the gram matrix M ( i, j ) = p λ i λ j D B − / v i , B − / v j E where B = P mi =1 λ i v i v ⊺ i . So, all we need to do is find { λ i } ≤ i ≤ m such that the marginal probabilityof each element in µ λ is O (1 /k ).For any basis T ⊂ [ m ] let T ∈ R m be the indicator vector of the set T . Let P be the convexhull of bases’ indicator vectors, P := conv { T : T is a basis } . Recall that a point x is in the relative interior of P , x ∈ relint P , if and only if x can be written asa convex combination of all of the vertices of P with strictly positive coefficients.We find the weights in two steps. First, we show that for any point x ∈ relint P , there existweights λ : [ m ] → R such that for any i , P S ∼ µ λ [ i ∈ S ] = x ( i ) , where x ( i ) is the i -th coordinate of x and µ λ is defined as in (2). Then, we show that there existsa point x ∈ relint P such that for all i , x ( i ) ≤ O (1 /k ). Lemma 1.6.
For any x ∈ relint P there exist λ : [ m ] → R + such that the marginal probability ofeach element i in µ λ is x ( i ) .Proof. Let µ ∗ := µ be the (determinantal) distribution where λ i = 1 for all i . The idea is to find adistribution p ( . ) maximizing the relative entropy with respect to µ ∗ and preserves x as the marginalprobabilities. This is analogous to the recent applications of maximum entropy distributions inapproximation algorithms [AGM +
10, SV14].Consider the following entropy maximization convex program.min X T p ( T ) · log p ( T ) µ ∗ ( T )s.t. X T : i ∈ T p ( T ) = x ( i ) ∀ i,p ( T ) ≥ . (3)Note that any feasible solution satisfies P T p ( T ) = 1 so we do not need to add this as a constraint.First of all, since x ∈ relint P , there exists a distribution p ( . ) such that for all bases T , p ( T ) > i let γ i be the Lagrange dual variable of the first constraint. TheLagrangian L ( p, γ ) is defined as follows: L ( p, γ ) = inf p ≥ X T p ( T ) · log p ( T ) µ ∗ ( T ) − X i γ i X T : e ∈ T ( p ( T ) − x ( i ))5et p ∗ be the optimum p , letting the gradient of the RHS equal to zero we obtain, for any bases T ,log p ∗ ( T ) µ ∗ ( T ) + 1 = X i ∈ T γ i . For all i , let λ i = exp( γ i − /d ), where d is the dimension of the v i ’s. Then, we get p ∗ ( T ) = Y i ∈ T λ i · µ ∗ ( T )= Y i ∈ T λ i · det X i ∈ T v i v ⊺ i ! = det X i ∈ T λ i v i v ⊺ i ! . Therefore p ∗ ≡ µ λ . Since the duality gap is zero, the above p ∗ is indeed an optimal solution of theconvex program (3). Therefore, the marginal probability of every element i with respect to p ∗ ( µ λ )is equal to x ( i ). Lemma 1.7. If { v , . . . , v m } contains k disjoint bases, then there exists a point x ∈ relint P , suchthat x ( i ) = O (1 /k ) for all i .Proof. Let T , . . . , T k be the promised disjoint bases. Let x = T + · · · + T k k . The above is a convex combination of the vertices of P ; so x ∈ P . We now perturb x by a smallamount to find a point in relint P . Let x be an arbitrary point in relint P (such as the average ofall vertices). For any ǫ >
0, the point x = (1 − ǫ ) x + ǫx ∈ relint P . If ǫ is small enough, we get x ( i ) = O (1 /k ) which proves the claim.This completes the proof of Proposition 1.5. For a graph G = ( V, E ), the Laplacian of G , L G , is defined as follows: For a vertex i ∈ V let i ∈ R V be the vector that is one at i and zero everywhere else. Fix an arbitrary orientation onthe edges of E and let b e = i − j for an edge e oriented from i to j . Then, L G = X e ∈ E b e b ⊺ e . We use L † G to denote the pseudo-inverse of L G . Also, for a set T ⊆ E , we write L T = X e ∈ T b e b ⊺ e . We say a spanning tree T ⊆ E is α -thin with respect to G if for any set S ⊂ V , | T ( S, S ) | ≤ α · | E ( S, S ) | , T ( S, S ) , E ( S, S ) are the set of edges cross the cut (
S, S ) in
T, G respectively. We say aspanning tree T is α -spectrally thin with respect to G if L T (cid:22) α · L G . It is easy to see that spectral thinness is a generalization of the combinatorial thinness, i.e., if T is α -spectrally thin it is also α -thin.We say a graph G is k -edge connected if it has at least k edges in any cut. In recent works onAsymmetric TSP [AGM +
10, OS11] it was shown that the existence of (combinatorially) thin treesin k -edge connected graphs plays an important role in bounding the integrality gap of the naturallinear programming relaxation of the Asymmetric TSP [AO14].It turns out that the existence of spectrally thin trees is significantly easier to prove thancombinatorially thin trees thanks to Theorem 1.1 of [MSS13b]. Given a graph G = ( V, E ), Harveyand Olver [HO14] employ a recursive application of [MSS13b] and show that if for all edges e ∈ E , b ⊺ e L † G b e ≤ α , then G has an O ( α )-spectrally thin tree. The quantity b e L † G b e is the effective resistancebetween the endpoints of e when we replace every edge of G with a resistor of resistance 1 [LP13,Ch. 2]. Unfortunately, k -edge connectivity is a significantly weaker property than max e b e L † G b e ≤ α [AO14]. So, this does not resolve the thin tree problem.The main idea of [AO14] is to slightly change the graph G in order to decrease the effectiveresistance of edges while maintaining the size of the cuts intact. More specifically, to add a “few”edges E ′ to G such that in the new graph G ′ = ( V, E ∪ E ′ ), the effective resistance of every edgeof E is small and the size of every cut of G ′ is at most twice of that cut in G . If we can prove that G ′ has a spectrally thin tree T ⊆ E such a tree is combinatorially thin with respect to G because G, G ′ have the same cut structure. To show that G ′ has a spectrally thin tree we need to answerthe following question. Problem 1.8.
Given a graph G = ( V, E ) , suppose there is a set F ⊆ E such that ( V, F ) is k -edgeconnected, and that for all e ∈ F , b ⊺ e L † G b e ≤ α . Can we say that G has a C · max { α, /k } -spectrallythin tree for a universal constant C ? We use Theorem 1.4 to answer the above question affirmatively. Note that the above questioncannot be answered by Theorem 1.1. One can use Theorem 1.1 to show that the set F can bepartitioned into two sets F , F such that each F i is 1 / O ( α )-spectrally thin, but Theorem 1.1gives no guarantee on the connectivity of F i ’s. On the other hand, once we apply our main theoremto a strongly Rayleigh distribution supported on connected subgraphs of G , e.g. the spanning treesof G , we get connectivity for free. Corollary 1.9.
Given a graph G = ( V, E ) and a set F ⊆ E such that ( V, F ) is k -edge connected,if for ǫ > and any edge e ∈ F , b ⊺ e L † G b e ≤ ǫ , then G has an O (1 /k + ǫ ) spectrally thin tree.Proof. Let L † / G be the square root of L † G . Note that since L † G (cid:23)
0, its square root is well defined.For all e ∈ F , let v e = L † / G b e . Then, by the corollary’s assumption, for each e ∈ F , k v e k = b e L † G b e ≤ ǫ, { v e } e ∈ F are in sub-isotropic position, X e ∈ F v e v ⊺ e = L † / G X e ∈ F v e v ⊺ e ! L † / G = L † / G L F L † / G (cid:22) I. In addition, we can show that { v e } e ∈ F contains k/ { v e } e ∈ F corresponds to a spanning tree of the graph ( V, F ). Nash-Williams[NW61] proved that any k -edge connected graph has k/ V, F )is k -edge connected, it has k/ { v e } e ∈ F contains k/ T ⊆ F such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X e ∈ T v e v ⊺ e (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ α, (4)for α = O ( ǫ + 1 /k ). Fix an arbitrary vector y ∈ R V . We show that y ⊺ L T y ≤ α · y ⊺ L G y, (5)and this completes the proof. By (4) for any x ∈ R V , x ⊺ X e ∈ T v e v ⊺ e ! x ≤ α · k x k . Let x = L / G y , we get y ⊺ L / G L † / G X e ∈ T b e b ⊺ e L † / G ! L / G y ≤ α · y ⊺ L G y. The above is the same as (5) and we are done.The above corollary completely answers Problem 1.8 but it is not enough for our purpose in[AO14]; we need a slightly stronger statement. For a matrix D ∈ R V × V we say D (cid:22) (cid:3) L G , if forany set S ⊂ V , ⊺ S D S ≤ ⊺ S L G S , where as usual S ∈ R V is the indicator vector of the set S . In the main theorem of [AO14] we showthat for any k -edge-connected graph G (for k = 7 log n ) there is a positive definite (PD) matrix D (cid:22) (cid:3) L G and a set F ⊆ E such that ( V, F ) is Ω( k )-edge-connected andmax e ∈ F b ⊺ e D − b e ≤ polylog( k ) k . To show that G has a combinatorially thin tree it is enough to show that there is a tree T ⊆ E that is α -spectrally thin w.r.t. L G + D for α = polylog( k ) /k , i.e., L T (cid:22) polylog( k ) k ( L G + D ) . α -combinatorially thin w.r.t. G because D (cid:22) (cid:3) L G . Note that the above corollarydoes not imply L G + D has a spectrally thin tree because D is not necessarily a Laplacian ma-trix. Nonetheless, we can prove the existence of a spectrally thin tree with another application ofTheorem 1.4. Corollary 1.10.
Given a graph G = ( V, E ) , a PD matrix D , and F ⊆ E such that ( V, F ) is k -edgeconnected, if for any edge e ∈ F , b ⊺ e D − b e ≤ ǫ, then G has a spanning tree T ⊆ F such that L T (cid:22) O ( ǫ + 1 /k ) · ( L G + D ) . Proof.
The proof is very similar to Corollary 3.2. For any edge e ∈ F , let v e = ( D + L G ) − / b e .Note that since D is PD, D + L G is PD and ( D + L G ) − / is well defined. By the assumption, k v e k = b ⊺ e ( D + L G ) − b e ≤ b ⊺ e D − b e = ǫ, where the inequality uses Lemma 2.14. In addition, the vectors are in sub-isotropic position, X e ∈ F v e v ⊺ e = ( D + L G ) † / L F ( D + L G ) † / (cid:22) I. The matrix PSD inequality uses that L F (cid:22) L G (cid:22) D + L G . Furthermore, every basis of { v e } e ∈ E isa spanning tree of G and by Ω( k )-connectivity of F , there are Ω( k )-edge disjoint bases. Therefore,by Theorem 1.4, there is a tree T ⊆ F such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X e ∈ T v e v ⊺ e (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ α, for α = O ( ǫ + 1 /k ). Similar to Corollary 3.2 this tree satisfies L T (cid:22) α · ( L G + D ) , and this completes the proof. We build on the method of interlacing polynomials of [MSS13a, MSS13b]. Recall that an interlacingfamily of polynomials has the property that there is always a polynomial whose largest root is atmost the largest root of the sum of the polynomials in the family. First, we show that for any setof vectors assigned to the elements of a homogeneous strongly Rayleigh measure, the characteristicpolynomials of natural quadratic forms associated with the samples of the distribution form aninterlacing family. This implies that there is a sample of the distribution such that the largestroot of its characteristic polynomial is at most the largest root of the average of the characteristicpolynomials of all samples of µ . Then, we use the multivariate barrier argument of [MSS13b] toupper-bound the largest root of our expected characteristic polynomial.Our proof has two main ingredients. The first one is the construction of a new class of expectedcharacteristic polynomials which are the weighted average of the characteristic polynomials of the9atural quadratic forms associated to the samples of the strongly Rayleigh distribution, where theweight of each polynomial is proportional to the probability of the corresponding sample set in thedistribution. To show that the expected characteristic polynomial is real rooted we appeal to thetheory of real stability. We show that our expected characteristic polynomial can be realized byapplying Q mi =1 (1 − ∂/∂ z i ) operator to the real stable polynomial g µ ( z ) · det( P mi =1 z i v i v ⊺ i ), and thenprojecting all variables onto x .Our second ingredient is the extension of the multivariate barrier argument. Unlike [MSS13b],here we need to prove an upper bound on the largest root of the mixed characteristic polynomialwhich is very close to zero. It turns out that the original idea of [BSS14] that studies the behaviorof the roots of a (univariate) polynomial p ( x ) under the operator 1 − ∂/∂ x cannot establish upperbounds that are less than one. Fortunately, here we need to study the behavior of the roots of a(multivariate) polynomial p ( z ) under the operators 1 − ∂/∂ z i . The 1 − ∂/∂ z i operators allow usto impose very small shifts on the multivariate upper barrier assuming the barrier functions aresufficiently small. The intuition is that, since1 − ∂∂ z i = (cid:18) − ∂∂ z i (cid:19) · (cid:18) ∂∂ z i (cid:19) , we expect (1 − ∂/∂ z i ) to shift the upper barrier by 1 + Θ( δ ) (for some δ depending on the valueof the i -th barrier function) as proved in [MSS13b] and (1 + ∂/∂ z i ) to shift the upper barrier by1 − Θ( δ ). Therefore, applying both operators the upper barrier must be moved by no more thanΘ( δ ). We adopt a notation similar to [MSS13b]. We write (cid:0) [ m ] k (cid:1) to denote the collection of subsets of [ m ]with exactly k elements. We write 2 [ m ] to denote the family of all subsets of the set [ m ]. We write ∂ z i to denote the operator that performs partial differentiation with respect to z i . We use k v k todenote the Euclidean 2-norm of a vector x . For a matrix M , we write k M k = max k x k =1 k M x k todenote the operator norm of M . We use to denote the all 1 vector. We recall the definition of interlacing families of polynomials from [MSS13a], and its main conse-quence.
Definition 2.1.
We say that a real rooted polynomial g ( x ) = α Q m − i =1 ( x − α i ) interlaces a realrooted polynomial f ( x ) = β Q mi =1 ( x − β i ) if β ≤ α ≤ β ≤ α ≤ . . . ≤ α m − ≤ β m . We say that polynomials f , . . . , f k have a common interlacing if there is a polynomial g suchthat g interlaces all f i . The following lemma is proved in [MSS13a]. Lemma 2.2.
Let f , . . . , f k be polynomials of the same degree that are real rooted and have positiveleading coefficients. Define f ∅ = k X i =1 f i . f f , . . . , f k have a common interlacing, then there is an i such that the largest root of f i is at mostthe largest root of f ∅ . Definition 2.3.
Let
F ⊆ [ m ] be nonempty. For any S ∈ F , let f S ( x ) be a real rooted polynomialof degree d with a positive leading coefficient. For s , . . . , s k ∈ { , } with k < m , let F s ,...,s k := { S ∈ F : i ∈ S ⇔ s i = 1 } . Note that F = F ∅ . Define f s ,...,s k = X S ∈F s ,...,sk f S , and f ∅ = X S ∈F f S . We say polynomials { f S } S ∈F form an interlacing family if for all ≤ k < m and all s , . . . , s k ∈{ , } the following holds: If both of F s ,...,s k , and F s ,...,s k , are nonempty, f s ,...,s k , and f s ,...,s k , have a common interlacing. The following is analogous to [MSS13b, Thm 3.4].
Theorem 2.4.
Let
F ⊆ [ m ] and let { f S } S ∈F be an interlacing family of polynomials. Then, thereexists S ∈ F such that the largest root of f ( S ) is at most the largest root of f ∅ .Proof. We prove by induction. Assume that for some choice of s , . . . , s k ∈ { , } (possibly with k = 0), F s ,...,s k is nonempty and the largest root of f s ,...,s k is at most the largest root of f ∅ .If F s ,...,s k , = ∅ , then f s ,...,s k = f s ,...,s k , , so we let s k +1 = 1 and we are done. Similarly, if F s ,...,s k , = ∅ , then we let s k +1 = 0 and we are done with the induction. If both of these sets arenonempty, then f s ,...,s k , and f s ,...,s k , have a common interlacing. So, by Lemma 2.2, for somechoice of s k +1 ∈ { , } , the largest root of f s ,...,s k +1 is at most the largest root of f ∅ .We use the following lemma which appeared as Theorem 2.1 of [Ded92] to prove that a certainfamily of polynomials that we construct in Section 3 form an interlacing family. Lemma 2.5.
Let f , . . . , f k be univariate polynomials of the same degree with positive leadingcoefficients. Then, f , . . . , f k have a common interlacing if and only if P ki =1 λ i f i is real rooted forall convex combinations λ i ≥ , P ki =1 λ i = 1 . Stable polynomials are natural multivariate generalizations of real-rooted univariate polynomi-als. For a complex number z , let Im( z ) denote the imaginary part of z . We say a polynomial p ( z , . . . , z m ) ∈ C [ z , . . . , z m ] is stable if whenever Im( z i ) > ≤ i ≤ m , p ( z , . . . , z m ) = 0.We say p ( . ) is real stable, if it is stable and all of its coefficients are real. It is easy to see that anyunivariate polynomial is real stable if and only if it is real rooted.One of the most interesting classes of real stable polynomials is the class of determinant poly-nomials as observed by Borcea and Br¨and´en [BB08].11 heorem 2.6. For any set of positive semidefinite matrices A , . . . , A m , the following polynomialis real stable: det (cid:16) m X i =1 z i A i (cid:17) . Perhaps the most important property of stable polynomials is that they are closed under sev-eral elementary operations like multiplication, differentiation, and substitution. We will use theseoperations to generate new stable polynomials from the determinant polynomial. The following isproved in [MSS13b].
Lemma 2.7. If p ∈ R [ z , . . . , z m ] is real stable, then so are the polynomials (1 − ∂ z ) p ( z , . . . , z m ) and (1 + ∂ z ) p ( z , . . . , z m ) . The following corollary simply follows from the above lemma.
Corollary 2.8. If p ∈ R [ z , . . . , z m ] is real stable, then so is (1 − ∂ z ) p ( z , . . . , z m ) . Proof.
First, observe that(1 − ∂ z ) p ( z , . . . , z m ) = (1 − ∂ z )(1 + ∂ z ) p ( z , . . . , z m ) . So, the conclusion follows from two applications of Lemma 2.7.The following closure properties are elementary.
Lemma 2.9. If p ∈ R [ z , . . . , z m ] is real stable, then so is p ( λ · z , . . . , λ m · z m ) for real-valued λ , . . . , λ m > .Proof. Say ( z , . . . , z m ) ∈ C m is a root of p ( λ · z , . . . , λ m · z m ). Then ( λ · z , . . . , λ m · z m ) is a rootof p ( z , . . . , z m ). Since p is real stable, there is an i such that Im( λ i · z i ) ≤
0. But, since λ i >
0, weget Im( z i ) ≤
0, as desired.
Lemma 2.10. If p ∈ R [ z , . . . , z m ] is real stable, then so is p ( z + x, . . . , z m + x ) for a new variable x .Proof. Say ( z , . . . , z m , x ) ∈ C m is a root of p ( z + x, . . . , z m + x ). Then ( z + x, . . . , z m + x ) is aroot of p ( z , . . . , z m ). Since p is real stable, there is an i such that Im( z i + x ) ≤
0. But, then eitherIm( x ) ≤ z i ) ≤
0, as desired.
For a Hermitian matrix M ∈ C d × d , we write the characteristic polynomial of M in terms of avariable x as χ [ M ]( x ) = det( xI − M ) . We also write the characteristic polynomial in terms of the square of x as χ [ M ]( x ) = det( x I − M ) . ≤ k ≤ n , we write σ k ( M ) to denote the sum of all principal k × k minors of M , inparticular, χ [ M ]( x ) = d X k =0 x d − k ( − k σ k ( M ) . The following lemma follows from the Cauchy-Binet identity. See [MSS13b] for the proof.
Lemma 2.11.
For vectors v , . . . , v m ∈ R d and scalars z , . . . , z m , det xI + m X i =1 z i v i v ⊺ i ! = d X k =0 x d − k X S ⊆ ( [ m ] k ) z S σ k (cid:0) X i ∈ S v i v ⊺ i (cid:1) . In particular, for z = . . . = z m = − , det xI − m X i =1 v i v ⊺ i ! = d X k =0 x d − k ( − k X S ⊆ ( [ m ] k ) σ k (cid:0) X i ∈ S v i v ⊺ i (cid:1) . The following is Jacboi’s formula for the derivative of the determinant of a matrix.
Theorem 2.12.
For an invertible matrix A which is a differentiable function of t , ∂ t det( A ) = det( A ) · Tr( A − ∂ t A ) . Lemma 2.13.
For an invertible matrix A which is a differentiable function of t , ∂A − ∂t = − A − ( ∂ t A ) A − . Proof.
Differentiating both sides of the identity A − A = I with respect to t , we get A − ∂A∂t + ∂A − ∂t A = 0 . Rearranging the terms and multiplying with A − gives the lemma’s conclusion.The following two standard facts about trace will be used throughout the paper. First, for A ∈ R k × n and B ∈ R n × k , Tr( AB ) = Tr( BA ) . Secondly, for positive semidefinite matrices
A, B of the same dimension,Tr( AB ) ≥ . Also, we use the fact that for any positive semidefinite matrix A and any Hermitian matrix B , BAB is positive semidefinite.
Lemma 2.14. If A, B ∈ R n × n are PD matrices and A (cid:22) B , then B − (cid:22) A − . Proof.
Since A (cid:22) B , B − / AB − / (cid:22) B − / BB − / = I. So, B / A − B / = ( B − / AB − / ) − (cid:23) I Multiplying both sides of the above by B − / , we get A − = B − / B / A − B / B − / (cid:23) B − / IB − / = B − . The Mixed Characteristic Polynomial
For a probability distribution µ , let d µ be the degree of the polynomial g µ . Theorem 3.1.
For v , . . . , v m ∈ R d and a homogeneous probability distribution µ : [ m ] → R + , x d µ − d E S ∼ µ χ "X i ∈ S v i v ⊺ i ( x ) = m Y i =1 (cid:0) − ∂ z i (cid:1) g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i !! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 . (6)We call the polynomial E S ∼ µ χ [ P i ∈ S v i v ⊺ i ]( x ) the mixed characteristic polynomial and wedenote it by µ [ v , . . . , v m ]( x ). Proof.
For S ⊆ [ m ], let z S = Q i ∈ S z i . By Lemma 2.11, the coefficient of z S in g µ ( x + z ) · det( xI + m X i =1 z i v i v ⊺ i )is equal to Y i ∈ S ∂ z i ! g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i !! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 . Each of the two polynomials g µ ( x + z ) and det( xI + P mi =1 z i v i v ⊺ i ) is multi-linear in z , . . . , z m .Therefore, for k = | S | , the above is equal to2 k · Y i ∈ S ∂ z i ! g µ ( x + z ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 · Y i ∈ S ∂ z i ! det xI + m X i =1 z i v i v ⊺ i ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 . (7)Since g µ is a homogeneous polynomial of degree d µ , the first term in the above is equal to x d µ − k P T ∼ µ [ S ⊆ T ] . And, by Lemma 2.11, the second term of (7) is equal to x d − k σ k X i ∈ S z i v i v ⊺ i ! . Applying the above identities for all S ⊆ [ m ], m Y i =1 (cid:0) − ∂ z i (cid:1) g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i ! !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 = m X k =0 ( − k X S ⊆ ( [ m ] k ) Y i ∈ S ∂ z i ! g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i !! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 = d X k =0 ( − k k x d µ + d − k X S ∈ ( [ m ] k ) P T ∼ µ [ S ⊆ T ] · σ k X i ∈ S v i v ⊺ i ! = x d µ − d E S ∼ µ χ "X i ∈ S v i v ⊺ i ( x ) . The last identity uses Lemma 2.11. 14 orollary 3.2. If µ is a strongly Rayleigh probability distribution, then the mixed characteristicpolynomial is real-rooted.Proof. First, by Theorem 2.6, det xI + m X i =1 z i v i v ⊺ i ! is real stable. Since µ is strongly Rayleigh, g µ ( z ) is real stable. So, by Lemma 2.10, g µ ( x + z ) isreal stable. The product of two real stable polynomials is also real stable, so g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i ! is real stable. Corollary 2.8 implies that m Y i =1 (cid:0) − ∂ z i (cid:1) g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i !! is real stable as well. Wagner [Wag11, Lemma 2.4(d)] tells us that real stability is preserved undersetting variables to real numbers, so m Y i =1 (cid:0) − ∂ z i (cid:1) g µ ( x + z ) · det xI + m X i =1 z i v i v ⊺ i !! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z = ... = z m =0 is a univariate real-rooted polynomial. The mixed characteristic polynomial is equal to the abovepolynomial up to a term x d µ − d . So, the mixed characteristic polynomial is also real rooted.Now, we use the real-rootedness of the mixed characteristic polynomial to show that the char-acteristic polynomials of the set of vectors assigned to any set S with nonzero probability in µ forman interlacing family. For a homogeneous strongly Rayleigh measure µ , let F = { S : µ ( S ) > } , and for s , . . . , s k ∈ { , } let F s ,...,s k be as defined in Definition 2.3. For any S ∈ F , let q S ( x ) = µ ( S ) · χ "X i ∈ S v i v ⊺ i ( x ) . Theorem 3.3.
The polynomials { q S } S ∈F form an interlacing family.Proof. For 1 ≤ k ≤ m and s , . . . , s k ∈ { , } , let µ s ,...,s k be µ conditioned on the sets S ∈ F s ,...,s k ,i.e., µ conditioned on i ∈ S for all i ≤ k where s i = 1 and i / ∈ S for all i ≤ k where s i = 0. Weinductively write the generating polynomial of µ s ,...,s k in terms of g µ . Say we have written g µ s ,...,sk in terms of g µ . Then, we can write, g µ s ,...,sk, ( z ) = z k +1 · ∂ z k +1 g µ s ,...,sk ( z ) ∂ z k +1 g µ s ,...,sk ( z ) (cid:12)(cid:12) z i =1 , (8) g µ s ,...,sk, ( z ) = g µ s ,...,sk ( z ) (cid:12)(cid:12) z k +1 =0 g µ s ,...,sk ( z ) (cid:12)(cid:12) z k +1 =0 ,z i =1 for i = k +1 . (9)15ote that the denominators of both equations are just normalizing constants. The above poly-nomials are well defined if the normalizing constants are nonzero, i.e., if the set F s ,...,s k ,s k +1 isnonempty. Since the real stable polynomials are closed under differentiation and substitution, forany 1 ≤ k ≤ m , and s , . . . , s k ∈ { , } , if g µ s ,...,sk is well defined, it is real stable, so µ s ,...,s k is astrongly Rayleigh distribution.Now, for s , . . . , s k ∈ { , } , let q s ,...,s k ( x ) = X S ∈F s ,...,sk q S ( x ) . Since µ s ,...,s k is strongly Rayleigh, by Corollary 3.2, q s ,...,s k ( x ) is real rooted.By Lemma 2.5, to prove the theorem it is enough to show that if F s ,...,s k , and F s ,...,s k , arenonempty, then for any 0 < λ < λ · q s ,...,s k , ( x ) + (1 − λ ) · q s ,...,s k , ( x )is real rooted. Equivalently, by Corollary 3.2, it is enough to show that for any 0 < λ < λ · g µ s ,...,sk, ( z ) + (1 − λ ) · g µ s ,...,sk, ( z ) (10)is real stable. Let us write, g µ s ,...,sk ( z ) = z k +1 · ∂ z k +1 g µ s ,...,sk ( z ) + g µ s ,...,sk ( z ) (cid:12)(cid:12) z k +1 =0 = α · g µ s ,...,sk, ( z ) + β · g µ s ,...,sk, ( z ) , for some α, β >
0. The second identity follows by (8) and (9). Let λ k +1 > λ k +1 · αλ = β − λ . (11)Since g µ s ,...,sk is real stable, by Lemma 2.9 g µ s ,...,sk ( z , . . . , z k , λ k +1 · z k +1 , z k +2 , . . . , z m )is real stable. But, by (11) the above polynomial is just a multiple of (10). So, (10) is real stable. In this section we upper-bound the roots of the mixed characteristic polynomial in terms of themarginal probabilities of elements of [ m ] in µ and the maximum of the squared norm of vectors v , . . . , v m . Theorem 4.1.
Given vectors v , . . . , v m ∈ R d , and a homogeneous strongly Rayleigh probabilitydistribution µ : [ m ] → R + , such that the marginal probability of each element i ∈ [ m ] is at most ǫ , P mi =1 v i v ⊺ i = I and k v i k ≤ ǫ , the largest root of µ [ v , . . . , v m ]( x ) is at most ǫ + ǫ ) , where ǫ = ǫ + ǫ , First, similar to [MSS13b] we derive a slightly different expression.16 emma 4.2.
For any probability distribution µ and vectors v , . . . , v m ∈ R d such that P mi =1 v i v ⊺ i = I , x d µ − d µ [ v , . . . , v m ]( x ) = m Y i =1 (cid:0) − ∂ y i (cid:1) g µ ( y ) · det m X i =1 y i v i v ⊺ i !! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = ... = y m = x . Proof.
This is because for any differentiable function f , ∂ y i f ( y i ) | y i = z i + x = ∂ z i f ( z i + x ) . Let Q ( y , . . . , y m ) = m Y i =1 (cid:0) − ∂ y i (cid:1) g µ ( y ) · det m X i =1 y i v i v ⊺ i !! . Then, by the above lemma, the maximum root of Q ( x, . . . , x ) is the same as the maximum root of µ [ v , . . . , v m ]( x ). In the rest of this section we upper-bound the maximum root of Q ( x, . . . , x ).It directly follows from the proof of Theorem 5.1 in [MSS13b] that the maximum root of Q ( x, . . . , x ) is at most (1 + √ ǫ ) . But, in our setting, any upper-bound that is more than 1obviously holds, as for any S ⊆ [ m ], (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ . The main difficulty that we are facing is to prove an upper-bound of O ( ǫ ) on the maximum root of Q ( x, . . . , x ).We use an extension of the multivariate barrier argument of [MSS13b] to upper-bound themaximum root of Q . We manage to prove a significantly smaller upper-bound because we apply1 − ∂ y i operators as opposed to the 1 − ∂ y i operators used in [MSS13b]. This allows us to imposesignificantly smaller shifts on the barrier upper-bound in our inductive argument. Definition 4.3.
For a multivariate polynomial p ( z , . . . , z m ) , we say z ∈ R m is above all roots of p if for all t ∈ R m + , p ( z + t ) > . We use Ab p to denote the set of points which are above all roots of p . We use the same barrier function defined in [MSS13b].
Definition 4.4.
For a real stable polynomial p , and z ∈ Ab p , the barrier function of p in direction i at z is Φ ip ( z ) := ∂ z i p ( z ) p ( z ) = ∂ z i log p ( z ) . To analyze the rate of change of the barrier function with respect to the − ∂ z i operator, we needto work with the second derivative of p as well. We define, Ψ ip ( z ) := ∂ z i p ( z ) p ( z ) . quivalently, for a univariate restriction q z,i ( t ) = p ( z , . . . , z i − , t, z i +1 , . . . , z m ) , with real roots λ , . . . , λ r we can write, Φ ip ( z ) = q ′ z,i ( z i ) q z,i ( z i ) = r X j =1 z i − λ j , Ψ ip ( z ) = q ′′ z,i ( z i ) q z,i ( z i ) = X ≤ j Lemma 4.5. If p is real stable and z ∈ Ab p , then for all i ≤ m , Ψ ip ( z ) ≤ Φ ip ( z ) . Proof. Since z ∈ Ab p , z i > λ j for all 1 ≤ j ≤ r , so,Φ ip ( z ) − Ψ ip ( z ) = r X j =1 z i − λ j − X ≤ j Lemma 4.6. Suppose p ( . ) is a real stable polynomial and z ∈ Ab p . Then, for all i, j ≤ m and δ ≥ , Φ ip ( z + δ j ) ≤ Φ ip ( z ) and , (monotonicity) (12)Φ ip ( z + δ j ) ≤ Φ ip ( z ) + δ · ∂ z j Φ ip ( z + δ j ) (convexity) . (13)Recall that the purpose of the barrier functions Φ ip is to allow us to reason about the relationshipbetween Ab p and Ab p − ∂ zi p ; the monotonicity property and Lemma 4.5 imply the following lemma. Lemma 4.7. If p is real stable and z ∈ Ab p is such that Φ ip ( z ) < , then z ∈ Ab p − ∂ zi p .Proof. Fix a nonnegative vector t . Since Φ is nonincreasing in each coordinate,Φ ip ( z + t ) ≤ Φ ip ( z ) < . Since z + t ∈ Ab p , by Lemma 4.5, Ψ ip ( z + t ) ≤ Φ ip ( z + t ) < . Therefore, ∂ z i p ( z + t ) < p ( z + t ) ⇒ (1 − ∂ z i ) p ( z + t ) > , as desired. 18e use an inductive argument similar to [MSS13b]. We argue that when we apply each operator(1 − ∂ z j ), the barrier functions, Φ ip ( z ), do not increase by shifting the upper bound along the direction j . As we would like to prove a significantly smaller upper bound on the maximum root of themixed characteristic polynomial, we may only shift along direction j by a small amount. In thefollowing lemma we show that when we apply the (1 − ∂ z j ) operator we only need to shift the upperbound proportional to Φ jp ( z ) along the direction j . Lemma 4.8. Suppose that p ( z , . . . , z m ) is real stable and z ∈ Ab p . If for δ > , δ Φ jp ( z ) + Φ jp ( z ) ≤ , then, for all i , Φ ip − ∂ zj p ( z + δ · j ) ≤ Φ ip ( z ) . To prove the above lemma we first need to prove a technical lemma to upper-bound ∂ zi Ψ jp ( z ) ∂ zi Φ jp ( z ) . Weuse the following characterization of the bivariate real stable polynomials proved by Lewis, Parrilo,and Ramana [LPR05]. The following form is stated in [BB10, Cor 6.7]. Lemma 4.9. If p ( z , z ) is a bivariate real stable polynomial of degree d , then there exist d × d positive semidefinite matrices A, B and a Hermitian matrix C such that p ( z , z ) = ± det( z A + z B + C ) . Lemma 4.10. Suppose that p is real stable and z ∈ Ab p , then for all i, j ≤ m , ∂ z i Ψ jp ( z ) ∂ z i Φ jp ( z ) ≤ jp ( z ) . Proof. If i = j , then we consider the univariate restriction q z,i ( z i ) = Q rk =1 ( z i − λ k ). Then, ∂ z i P ≤ k<ℓ ≤ r z i − λ k )( z i − λ ℓ ) ∂ z i P rk =1 1( z i − λ k ) = P k = ℓ − z i − λ k ) ( z i − λ ℓ ) P rk =1 − z i − λ k ) ≤ r X ℓ =1 z i − λ ℓ ) = 2Φ jp ( z ) . The inequality uses the assumption that z ∈ Ab p .If i = j , we fix all variables other than z i , z j and we consider the bivariate restriction q z,ij ( z i , z j ) = p ( z , . . . , z m ) . By Lemma 4.9, there are Hermitian positive semidefinite matrices B i , B j , and a Hermitian matrix C such that q z,ij ( z i , z j ) = ± det( z i B i + z j B j + C ) . Let M = z i B i + z j B j + C . Marcus, Spielman, and Srivastava [MSS13b, Lem 5.7] observed that thesign is always positive, that B i + B j is positive definite. In addition, M is positive definite since B i + B j is positive definite and z ∈ Ab p .By Theorem 2.12, the barrier function in direction j can be expressed asΦ jp ( z ) = ∂ z j det( M )det( M ) = det( M ) Tr( M − B j )det( M ) = Tr( M − B j ) . (14)19y another application of Theorem 2.12,Ψ jp ( z ) = ∂ z j det( M )det( M ) = ∂ z j (det( M ) Tr( M − B j ))det( M )= det( M ) Tr( M − B j ) det( M ) + det( M ) Tr(( ∂ z j M − ) B j )det( M )= Tr( M − B j ) + Tr( − M − B j M − B j )= Tr( M − B j ) − Tr(( M − B j ) ) . The second to last identity uses Lemma 2.13. Next, we calculate ∂ z i Φ jp and ∂ z i Ψ jp . First, by anotherapplication of Lemma 2.13, ∂ z i M − B j = − M − B i M − B j =: L. Therefore, ∂ z i Φ jp ( z ) = ∂ z i Tr( M − B j ) = Tr( L ) , and ∂ z i Ψ jp ( z ) = ∂ z i Tr( M − B j ) − ∂ z i Tr(( M − B j ) )= 2 Tr( M − B j ) Tr( L ) − Tr (cid:0) L ( M − B j ) + ( M − B j ) L (cid:1) = 2 Tr( M − B j ) Tr( L ) − LM − B j ) . Putting above equations together we get ∂ z i Ψ jp ( z ) ∂ z i Φ jp ( z ) = 2 Tr( M − B j ) Tr( L ) − Tr( LM − B j )Tr( L )= 2 Tr( M − B j ) − LM − B j )Tr( L )= 2Φ jp ( z ) − LM − B j )Tr( L )where we used (14).To prove the lemma it is enough to show that Tr( LM − B j )Tr( L ) ≥ 0. We show that both the numeratorand the denominator are nonpositive. First,Tr( L ) = − Tr( M − B i M − B j ) ≤ M − B i M − and B j are positive semidefinite and the fact that the trace of theproduct of positive semidefinite matrices is nonnegative. Secondly,Tr( LM − B j ) = Tr( − M − B i M − B j M − B j ) = − Tr( B i M − B j M − B j M − ) ≤ , where we again used that M − B j M − B j M − and B i are positive semidefinite and the trace of theproduct of two positive semidefinite matrices is nonnegative.20 roof of Lemma 4.8. We write ∂ i instead of ∂ z i for the ease of notation. First, we write Φ ip − ∂ j p in terms of Φ ip and Ψ jp and ∂ i Ψ jp .Φ ip − ∂ j p = ∂ i ( p − ∂ j p ) p − ∂ j p = ∂ i ((1 − Ψ jp ) p )(1 − Ψ jp ) p = (1 − Ψ jp )( ∂ i p )(1 − Ψ jp ) p + ( ∂ i (1 − Ψ jp )) p (1 − Ψ jp ) p = Φ ip − ∂ i Ψ jp − Ψ jp . We would like to show that Φ ip − ∂ j p ( z + δ j ) ≤ Φ ip ( z ). Equivalently, it is enough to show that − ∂ i Ψ jp ( z + δ j )1 − Ψ jp ( z + δ j ) ≤ Φ ip ( z ) − Φ ip ( z + δ j ) . By (13) of Lemma 4.6, it is enough to show that − ∂ i Ψ jp ( z + δ j )1 − Ψ jp ( z + δ j ) ≤ δ · ( − ∂ j Φ ip ( z + δ j )) . By (12) of Lemma 4.6, δ · ( − ∂ j Φ ip ( z + δ j )) > − ∂ i Ψ jp ( z + δ j ) − δ · ∂ i Φ jp ( z + δ j ) · − Ψ jp ( z + δ j ) ≤ , where we also used ∂ j Φ ip = ∂ i Φ jp . By Lemma 4.10, ∂ i Ψ jp ∂ i Φ jp ≤ jp . So, we can write,2 δ Φ jp ( z + δ j ) · − Ψ jp ( z + δ j ) ≤ . By Lemma 4.5 and (12) of Lemma 4.6,Φ jp ( z + δ j ) ≤ Φ jp ( z ) , Ψ jp ( z + δ j ) ≤ Φ jp ( z + δ j ) ≤ Φ jp ( z ) . So, it is enough to show that 2 δ Φ jp ( z ) · − Φ jp ( z ) ≤ jp ( z ) < − Φ jp ( z ) and we obtain,2 δ Φ jp ( z ) + Φ jp ( z ) ≤ , as desired. 21ow, we are read to prove Theorem 4.1. Proof of Theorem 4.1. Let p ( y , . . . , y m ) = g µ ( y ) · det m X i =1 y i v i v ⊺ i ! . Set ǫ = ǫ + ǫ and δ = t = p ǫ + ǫ . For any z ∈ R m with positive coordinates, g µ ( z ) > 0, and additionallydet m X i =1 z i v i v ⊺ i ! > . Therefore, for every t > t ∈ Ab p .Now, by Theorem 2.12,Φ ip ( y ) = ( ∂ i g µ ( y )) · det( P mi =1 y i v i v ⊺ i ) g µ ( y ) · det( P mi =1 y i v i v ⊺ i ) + g µ ( y ) · ( ∂ i det( P mi =1 y i v i v ⊺ i )) g µ ( y ) · det( P mi =1 y i v i v ⊺ i )= ∂ i g µ ( y ) g µ ( y ) + Tr m X i =1 y i v i v ⊺ i ! − y i y ⊺ i Therefore, since g µ is homogeneous,Φ ip ( t ) = 1 t · ∂ i g µ ( ) g µ ( ) + k v i k t = P S ∼ µ [ i ∈ S ] t + k v i k t ≤ ǫ t + ǫ t = ǫt . The second identity uses (1). Let φ = ǫ/t . Using t = δ , it follows that2 δ φ + φ = 2 ǫt + ǫ t = 1 . For k ∈ [ m ] define p k ( y , . . . , y m ) = k Y i =1 (cid:0) − ∂ y i (cid:1) g µ ( y ) · det m X i =1 y i v i v ⊺ i !! , and note that p m = Q . Let x be the all- t vector and x k be the vector that is t + δ in the first k coordinates and t in the rest. By inductively applying Lemma 4.7 and Lemma 4.8 for any k ∈ [ m ], x k is above all roots of p k and for all i ,Φ ip k ( x k ) ≤ φ ⇒ δ Φ ip k ( x i ) + Φ ip k ( x i ) ≤ . Therefore, the largest root of µ [ v , . . . , v m ]( x ) is at most t + δ = 2 p ǫ + ǫ . roof of Theorem 1.2. Let ǫ = ǫ + ǫ as always. Theorem 4.1 implies that the largest root of themixed characteristic polynomial, µ [ v , . . . , v m ]( x ), is at most 2 √ ǫ + ǫ . Theorem 3.3 tells us thatthe polynomials { q S } S : µ ( S ) > form an interlacing family. So, by Theorem 2.4 there is a set S ⊆ [ m ]with µ ( S ) > x I − X i ∈ S v i v ⊺ i ! is at most 2 √ ǫ + ǫ . This implies that the largest root ofdet xI − X i ∈ S v i v ⊺ i ! is at most (2 √ ǫ + ǫ ) . Therefore, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ S v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 12 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)X i ∈ S v i v ⊺ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ 12 (2 p ǫ + ǫ ) = 4 ǫ + 2 ǫ . Similar to [MSS13b] our main theorem is not algorithmic, i.e., we are not aware of any polynomialtime algorithm that for a given homogeneous strongly Rayleigh distribution with small marginalprobabilities and for a set of vectors assigned to the underlying elements with small norm finds asample of the distribution with spectral norm bounded away from 1. Such an algorithm can leadto improved approximation algorithms for the Asymmetric Traveling Salesman Problem.Although our main theorem can be seen as a generalization of [MSS13b] the bound that weprove on the maximum root of the mixed characteristic polynomial is incomparable to the boundof Theorem 1.1. In Corollary 1.3 we used our main theorem to prove Weaver’s KS r conjecture[Wea04] for r > 4. It is an interesting question to see if the dependency on ǫ in our multivariatebarrier can be improved, and if one can reprove KS using our machinery. Acknowledgement We would like to thank Adam Marcus, Dan Spielman, and Nikhil Srivastava for stimulating dis-cussions regarding the main obstacles in generalizing their proof of the Kadison-Singer problem. References [AA91] Charles A. Akemann and Joel Anderson. Lyapunov theorems for operator algebras ,volume 94. Memoirs of the American Mathematical Society, 1991. 1[AGM + 10] Arash Asadpour, Michel X. Goemans, Aleksander Madry, Shayan Oveis Gharan, andAmin Saberi. An O (log n/ log log n )-approximation Algorithm for the Asymmetric Trav-eling Salesman Problem. In SODA , pages 379–389, 2010. 5, 723AO14] Nima Anari and Shayan Oveis Gharan. Effective-Resistance-Reducing Flows and Asym-metric TSP. 2014. 1, 7, 8[BB08] Julius Borcea and Petter Br¨and´en. Applications of stable polynomials to mixed deter-minants: Johnson’s conjectures, unimodality, and symmetrized Fischer products. DukeMath. Journal , 143(2):205–223, 2008. 11[BB10] J. Borcea and P. Br¨ad´en. Multivariate P´olya-Schur classification problems in the Weylalgebra. Proceedings of the London Mathematical Society , 101(3):73–104, 2010. 19[BBL09] Julius Borcea, Petter Branden, and Thomas M. Liggett. Negative dependence and thegeometry of polynomials. Journal of American Mathematical Society , 22:521–567, 2009.1, 2, 4[BSS14] Joshua D. Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-Ramanujan Spar-sifiers. SIAM Review , 56(2):315–334, 2014. 10[Ded92] Jean Pierre Dedieu. Obreschkoff’s theorem revisited: what convex sets are contained inthe set of hyperbolic polynomials? Journal of Pure and Applied Algebra , 81(3):269–278,1992. 11[HO14] Nicholas J. A. Harvey and Neil Olver. Pipage Rounding, Pessimistic Estimators andMatrix Concentration. In SODA , pages 926–945, 2014. 7[KS59] Richard V. Kadison and I. M. Singer. Extensions of Pure States. American Journal ofMathematics , 81(2):383–400, 1959. 1[LP13] Russell Lyons and Yuval Peres. Probability on Trees and Networks . Cambridge Univer-sity Press, 2013. To appear. 7[LPR05] A. Lewis, P. Parrilo, and M. Ramana. The Lax conjecture is true. Proceedings of theAmerican Mathematical Society , 133(9), 2005. 19[MSS13a] Adam Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing Families I: Bi-partite Ramanujan Graphs of All Degrees. In FOCS , pages 529–537, 2013. 9, 10[MSS13b] Adam Marcus, Daniel A Spielman, and Nikhil Srivastava. Interlacing Families II: MixedCharacteristic Polynomials and the Kadison-Singer Problem. 2013. 1, 3, 7, 9, 10, 11,12, 13, 16, 17, 18, 19, 23[NW61] C. St. J. A. Nash-Williams. Edge disjoint spanning trees of finite graphs. J. LondonMath. Soc. , 36:445–45, 1961. 8[OS11] Shayan Oveis Gharan and Amin Saberi. The Asymmetric Traveling Salesman Problemon Graphs with Bounded Genus. In SODA , pages 967–975, 2011. 7[OSS11] Shayan Oveis Gharan, Amin Saberi, and Mohit Singh. A Randomized Rounding Ap-proach to the Traveling Salesman Problem. In FOCS , pages 550–559, 2011. 2[PP14] Robin Pemantle and Yuval Peres. Concentration of Lipschitz Functionals of Determi-nantal and Other Strong Rayleigh Measures. Combinatorics, Probability and Comput-ing , 23:140–160, 1 2014. 2 24SV14] Mohit Singh and Nisheeth K. Vishnoi. Entropy, optimization and counting. In STOC ,pages 50–59, 2014. 5[Wag11] David G. Wagner. Multivariate stable polynomials: theory and applications. Bulletinof the American Mathematical Society , 48(1):53–53, January 2011. 15[Wea04] Nik Weaver. The Kadison-Singer problem in discrepancy theory.