Generalized Gapped-kmer Filters for Robust Frequency Estimation
aa r X i v : . [ c s . D M ] F e b Generalized Gapped-kmer Filters for RobustFrequency Estimation
M. Mohammad-Noori a,c,* , N. Ghareghani b,c , M. Ghandi d, ∗ a School of Mathematics, Statistics and Computer Science, College of Science, University of TehranP.O. Box 14155-6455, Tehran, Iran b Department of Engineering Science, College of Engineering, University of Tehran,P.O. Box 11165-4563, Tehran, Iran c School of Mathematics, Institute for Research in Fundamental Sciences (IPM),
P.O.Box: 19395-5746, Tehran, Iran d Broad Institute of MIT and Harvard 7 Cambridge Center, 4034C,Cambridge, MA 02142, United States of America
Emails: [email protected], [email protected] , [email protected], [email protected] , [email protected] Abstract
In this paper, we study the generalized gapped k-mer filters and derive a closed formsolution for their coefficients. We consider nonnegative integers ℓ and k , with k ≤ ℓ ,and an ℓ -tuple B = ( b , . . . , b ℓ ) of integers b i ≥ i = 1 , . . . , ℓ . We introduce and studyan incidence matrix A = A ℓ,k ; B . We develop a M¨obius-like function ν B which helps usto obtain closed forms for a complete set of mutually orthogonal eigenvectors of A ⊤ A as well as a complete set of mutually orthogonal eigenvectors of AA ⊤ corresponding tononzero eigenvalues. The reduced singular value decomposition of A and combinatorialinterpretations for the nullity and rank of A , are among the consequences of thisapproach. We then combine the obtained formulas, some results from linear algebra,and combinatorial identities of elementary symmetric functions and ν B , to provide theentries of the Moore-Penrose pseudo-inverse matrix A + and the Gapped k-mer filtermatrix A + A . ∗ Corresponding authors Introduction
Sequences of length k , commonly referred to as k -mers, are used in many computationalbiology algorithms. We previously showed that robust frequency estimation of k -mersusing gapped k -mer features could profoundly improve the performance of algorithmsused for sequence classification in computational biology [7, 6]. The method described inthese previous publications was based on analytically deriving the coefficients of a gapped k -mer filter that could be used to find the robust frequency estimates of k -mers. Althoughthis filter could be applied to datasets consisting of DNA or Protein sequences, it was notapplicable to complex datasets that included sequences defined on more heterogeneousfeature spaces. Here, we provide the closed-form solution for a generalized gapped k -merfilter matrix, by relaxing the constraint that all the features are defined on a fixed-sizealphabet.In order to introduce the main object of this introduction, we briefly mention fewdefinitions and notations here; These are presented in more extent and details in the bodyof the paper. Given two integers ℓ and k with 0 ≤ k ≤ ℓ and a sequence B = ( b , . . . , b ℓ )of integers b i ≥ i = 1 , . . . , ℓ , we associate to them two sets of sequences, U ℓ ; B and V ℓ,k ; B , a match relation between the elements of these two sets and a corresponding (0 , A ℓ,k ; B as below. The set U ℓ ; B consists of all sequences x · · · x ℓ of integers x i satisfying 0 ≤ x i < b i for i = 1 , . . . , ℓ . The set V ℓ,k ; B consists of all sequences y · · · y ℓ ,where each y i is either an integer satisfying 0 ≤ y i < b i or an additional gap symboldenoted as g ; Furthermore, there are exactly ℓ − k occurrences of the gap symbol in any y · · · y ℓ ∈ V ℓ,k ; B . Two sequences x · · · x ℓ ∈ U ℓ ; B and y · · · y ℓ ∈ V ℓ,k ; B are then matchableif for any i , 1 ≤ i ≤ ℓ , we have y i = x i or y i = g . In other words, the gap symbol g acts asa wildcard and can match to any symbol. The corresponding (0 ,
1) matrix A ℓ,k ; B is thenobtained by indexing its columns and rows respectively by the elements of U ℓ ; B and V ℓ,k ; B and setting A ℓ,k ; B ( v, u ) = 1 if and only if u and v are matchable.When b = · · · = b ℓ = b , we have a fixed b -letter alphabet Σ b and we use the name A ℓ,k ; b instead of A ℓ,k ; B . In computational biology for DNA sequences, we have b = 4,and Σ = { A,C,G,T } is the set of four DNA bases. Then the set of column and rowindexes have special names: The set of column indexes, Σ ℓ is non-gapped oligomers oflength ℓ , briefly called non-gapped ℓ -mers and the set of row indexes is gapped oligomerswith k non-gapped positions and length ℓ , briefly called gapped k -mers (of length ℓ ).For amino acid sequences, b = 20, Σ is the set of the 20 amino acids, and the column2ndexes and row indexes are the ungapped and gapped polypeptide sequences of length ℓ .Apart from some previous studies of A ℓ,k ; b in mathematics (see [3, 14, 4]), this matrix hasrecently found profound applications in the field of computational biology and machinelearning [7, 6]. Specifically, the inherent symmetry in matrix A ℓ,k ; b allowed finding simpleclosed-form solutions for two related matrices: W ℓ,k ; b and H ℓ,k ; b , where W ℓ,k ; b = A + ℓ,k ; b is the Moore-Penrose pseudo-inverse of A ℓ,k ; b , and H ℓ,k ; b is the idempotent matrix givenby H ℓ,k ; b = W ℓ,k ; b A ℓ,k ; b . In [7] the matrix W ℓ,k ; b was derived and used to find robustestimates for ℓ -mer counts; This led to significant improvement to predict the binding ofcertain transcription factors to DNA sequences. This work was then extended in [6] andthe matrix H ℓ,k ; b was used to develop a method to efficiently compute the ℓ -mer countestimates and to compute a string kernel based on these robust count estimates to identifyenhancer sequences. Beyond modeling enhancer sequences in mammalian genomes, thismethod has been widely applied to several problems in computational biology includingprediction of the effect of non-coding variants [9], identification of local sequence featuresinfluencing cis-regulatory activity [15], identification of accessible chromatin regions [12],and estimation of evolutionary distances for phylogeny reconstruction [13].In all the above applications, the features were defined over a fixed alphabet length( b = 4 for DNA/RNA and b = 20 for amino acids). Here, we show that this constraintcould be relaxed to allow generalizing this method to cases with mixture of features thatare defined over alphabets of different sizes. For example, in addition to the DNA sequencethat is defined over the alphabet { A,C,G,T } , one can also add DNA methylation statuswhich is defined over { methylated , unmethylated } alphabet or other discrete features. Thena similar methodology described in [7] and [6] can be applied to find a robust estimateof the joint distribution of the features using a limited training data. To achieve this,we take a similar approach as was used in [7]. We introduce a M¨obius-like function ν B and use the related identities to obtain eigenvalues of A ℓ,k ; B A ⊤ ℓ,k ; B in terms of elementarysymmetric functions.Then we provide a complete set of mutually orthogonal eigenvectorsof A ℓ,k ; B A ⊤ ℓ,k ; B as well as a complete set of mutually orthogonal eigenvectors of A ℓ,k ; B A ⊤ ℓ,k ; B corresponding to the nonzero eigenvalues. This gives the reduced SVD (reduced singularvalue decomposition) of A ℓ,k ; B . We also give a combinatorial interpretations for the nullityand rank of A ℓ,k ; B via finding concrete bases for the null space and row space of thismatrix. Finally, we derive an equation for the entries of matrices W ℓ,k ; B and H ℓ,k ; B , where W ℓ,k ; B = A + ℓ,k ; B is the Moore-Penrose pseudo-inverse of A ℓ,k ; B and H ℓ,k ; B = W ℓ,k ; B A ℓ,k ; B .Deriving an explicit formula for matrices A ℓ,k ; B and H ℓ,k ; B allows efficient computation of3obust count estimates from a given training data. In practice, even with modest values of ℓ and k , these matrices have exponentially large dimensions which makes the applicationof numeric methods unfeasible.The rest of the paper is organized as following: Introduction of notation and prelimi-naries is given in Section 2: General notations and definitions for sets, strings, sequences,relations and some symmetric polynomials are presented in Section 2.1; Some prelimi-naries from linear algebra are discussed in Section 2.2. The function ν B and some of itsproperties is defined and studied in Section 3; The main results of this section, that is theidentities given in Propositions 1, 2 and 3, are used in later sections. Using the definitionof function ν B and also the elementary symmetric polynomials, we propose an orthonor-mal basis for the eigenspaces of the matrix A ℓ,k ; B A ⊤ ℓ,k ; B in Section 4. Concrete bases forthe null space and the row space of A ℓ,k ; B are presented in Section 5. Finally, in Section6 we compute the entries of W ℓ,k ; B and H ℓ,k ; B . Definition 1.
Let ℓ be a positive integer. The set [ ℓ ] is defined as [ ℓ ] = { , . . . , ℓ } . Fora set X and a nonnegative integer n , by (cid:0) Xn (cid:1) , we mean the set of all n -element subsets of X . Thus | (cid:0) Xn (cid:1) | = (cid:0) | X | n (cid:1) and | (cid:0) [ ℓ ] n (cid:1) | = (cid:0) ℓn (cid:1) . Definition 2.
A word x on a finite alphabet Σ , is a sequence x = x · · · x ℓ whose elements x i belong to the set Σ . As in [7] for a given integer b ≥ , the sets Σ b , ∆ b , Γ b are definedas follows Σ b = { , , · · · , b − } , ∆ b = Σ b ∪ { g } , Γ b = ∆ b \ { } , where g stands for the gap symbol. Definition 3.
Let B = ( b , b , . . . , b ℓ ) be an ℓ -tuple of integers b i ≥ . Define the sets B , ∆ B , Γ B , U ℓ ; B and V ℓ,k ; B as follows Σ B = Σ b × · · · × Σ b ℓ , ∆ B = ∆ b × · · · × ∆ b ℓ , Γ B = Γ b × · · · × Γ b ℓ , U ℓ ; B = Σ B ,V ℓ,k ; B = { v ∈ ∆ B : | v | g = ℓ − k } , V ′ ℓ,k ; B = { w ∈ Γ B : | w | g = ℓ − k } ,V ℓ, ≤ k ; B = k [ m =0 V ℓm , V ′ ℓ, ≤ k ; B = k [ m =0 V ′ ℓm A weak partial order on a set S is a binary relation (cid:22) on S which is reflexive, transitiveand antisymmetric. A set equipped with a weak partial order is called a partially orderedset or briefly a poset . If a (cid:22) b and a = b we write a ≺ b ; Then ≺ is nonreflexive, transitiveand nonsymmetric; Such a relation is called a strong partial order on S . If (cid:22) (resp. (cid:22) )is a weak partial order on S (resp. T ), then (cid:22) × (cid:22) is a weak partial order on S × T . Iffor any a and b in S , either a ≺ b or b ≺ a , then the partial order is called a total order ,or a linear order. This notation is used in the following definition. Definition 4.
Let B = ( b , b , . . . , b ℓ ) . We define a partial order on the set ∆ B . For thispurpose, firstly for any ≤ i ≤ ℓ , we consider the order ≺ i on the set ∆ b i given by ≺ i ≺ i . . . ≺ i b i − ≺ i g and consider the order (cid:22) B := ( (cid:22) × . . . × (cid:22) ℓ ) on ∆ B . Remark 1.
As it is clear from the definitions of U ℓ ; B and V ℓ,k ; B , when we use thesenotations we specially emphasize on parameters ℓ and k . Definition 5.
For any word v ∈ ∆ B we set G v = { i : 1 ≤ i ≤ ℓ, v i = g } and G v = [ ℓ ] \ G v .If X = { x , · · · , x n } is a subset of { , · · · , ℓ } with x < x < . . . < x n then by B ( X ) wemean ( b x , . . . , b x n ) . Especially, if v ∈ ∆ B and v ′ ∈ Γ B , then B ( G v ) = ( b i ) i ∈ G v and B ( G v ′ ) = ( b i ) i ∈ G v ′ . Definition 6.
Let B = ( b , . . . , b ℓ ) . We say elements u ∈ Σ B and v ∈ ∆ B match (or u and v are matchable) if for any ≤ i ≤ ℓ with v i = g we have u i = v i ; We denotethis by v ∼ u . The set of the elements v ∈ V ℓ,k ; B which are matchable with u ∈ Σ B , isdenoted by M ℓ,k ; B ( u ) . The set of elements u ∈ Σ B which are matchable with v , is denotedby N ℓ,k ; B ( v ) . efinition 7. The matrix A ℓ,k ; B is defined as a (0 , matrix whose rows and columnsare indexed respectively by the elements of V ℓ,k ; B and Σ B and A ℓ,k ; B ( v, u ) = 1 if and onlyif u and v are matchable. Remark 2.
Considering the definition 7, if we identify each row index v ∈ V ℓ,k ; B with N ℓ,k ; B ( v ) , then the matrix A ℓ,k ; B is seen as an incidence matrix, in which the points andblocks are row indexes and column indexes, respectively. Definition 8.
The matrix A ℓ, ≤ k ; B is defined as the (0 , matrix obtained by stacking thematrices A ℓ,i ; B ( i = 0 , . . . , k ), one on top of the other; Thus the rows and columns of A ℓ, ≤ k ; B are indexed respectively by the elements of V ℓ, ≤ k ; B and U ℓ,B . Elementary symmetric polynomials are well-studied objects in the study of polynomialsring k [ x , x , . . . , x n ] (see Chapter 7 of [2]). Below we formally mention their definitions;Then we define another symmetric polynomial which is useful in our work. This is followedby an example demonstrating their applications in our work. Definition 9.
Let i and n be nonnegative integers and let X = ( x , x , . . . , x n ) be a finitesequence of variables. The i -th elementary symmetric polynomial, denoted as S i ( X ) , isdefined as S i ( X ) := P I ∈ ( Xi ) Q i ∈ I x i . Notation.
Let X = ( x , . . . , x n ) be a finite sequence of numbers and α and β be arbitrarynumbers. Then we show the sequence ( βx + α, . . . , βx n + α ) by βX + α . Definition 10.
Let i and n be nonnegative integers and let X = ( x , x , . . . , x n ) be afinite sequence of variables. The expression R i ( X ) is then defined as follows: R i ( X ) = i X j =0 S j ( X −
1) (1)
Example 1.
Let ≤ k ≤ ℓ be integers and B = ( b , . . . , b ℓ ) , u ∈ Σ B and v ∈ V ℓk ; B . Thenwe have | Σ B | = | Γ B | = ℓ Y i =1 b i , | V ℓ,k ; B | = S k ( B ) , | V ′ ℓ,k ; B | = S k ( B − , | V ℓ, ≤ k ; B | = R k ( B + 1) , | V ′ ℓ, ≤ k ; B | = R k ( B ) , | M ℓ,k ( u ) | = (cid:18) ℓk (cid:19) , | N ℓ,k ( v ) | = Y i ∈ G v b i , .2 Notation and preliminaries from Linear Algebra All matrices we concern in this paper are real matrices. The row space of a A is denotedas row( A ), the column space of A is denoted as col( A ), and the dimension of the row spaceof A is denoted as rank( A ). The kernel of A , denoted as ker( A ) and the nullity of A anddenoted as null( A ). The matrix A is called diagonalizable in the field of real numbers ifthere exists a nonsingular real matrix P such that A = P Λ P − for some diagonal realmatrix Λ . If A is diagonalizable, then all eigenvalues of A appear on the main diagonalof Λ and the columns of P are the corresponding eigenvectors. The set of column vectorsof P is called a complete set of eigenvectors of A ; The set of column vectors of P whichcorrespond to nonzero eigenvalues is called a complete set of nonzero eigenvectors of A .If eigenvectors belonging to distinct eigenvalues of the matrix A are mutually orthogonal,then there exists an eigendecomposition A = P Λ P − with P − = P ⊤ , we call sucha decomposition an orthogonal eigendecomposition . Let A = P Λ P ⊤ be a orthogonaleigendecomposition for the matrix A and P = [ Q N ] where col( N ) = ker( A ). Then A = Q Λ Q ⊤ , where the matrix Q is obtained by deleting the columns of P which are inker( A ), and Λ is obtained by deleting the zero columns and zero rows of Λ ; we call thisdecomposition an orthonormal nonzero eigendecomposition .It is known that any symmetric real matrix A is diagonalizable on the field of real num-bers and eigenvectors corresponding to distinct eigenvalues of A are orthogonal. Hence,every symmetric real matrix A has an orthonormal nonzero eigendecomposition of theform A = Q Λ Q ⊤ with real matrices Λ and Q . A real symmetric matrix A of order n is positive definite (resp. positive semi-definite) if x ⊤ A x > x ⊤ A x ≥
0) forall nonzero x ∈ R n . For any matrix A , the matrix A ⊤ A is positive semidefinite, andrank( A ) = rank( AA ⊤ ). Conversely, any positive semidefinite matrix M can be written as M = A ⊤ A ; this is the Cholesky decomposition. If A is a real matrix, then both A ⊤ A and AA ⊤ are diagonalizable over the field of real numbers.A singular value decomposition (SVD) of a matrix A ∈ R n × m is a factorization A = U Σ V ⊤ with Σ = diag( σ , σ , . . . , σ p ), p = min { n, m } and σ ≥ σ ≥ . . . ≥ σ p ≥ , such that the set of columns of both matrices U = [ u , u , . . . , u n ] ∈ R n × n and V =[ v , v , . . . , v m ] ∈ R m × m are orthonormal. The diagonal entries of Σ are called singularvalues of A . If rank( A ) = r < p , then the reduced singular value decomposition (reducedSVD) of A is a factorization A = ˆ U ˆΣ ˆ V ⊤ with ˆΣ = diag( σ , σ , . . . , σ r ) ∈ R r × r and σ ≥ σ ≥ . . . ≥ σ r >
0, such that the matrices U = [ u , u , . . . , u r ] ∈ R n × r and7 = [ v , v , . . . , v r ] ∈ R m × r are both orthonormal. The following lemma gives the relationbetween the SVD of matrix A and eigendecomposition of the matrices AA ⊤ and A ⊤ A . Lemma 1. ([5], Section 5.6, Facts 8,9)
Let A ∈ R n × m , then the following facts holds: (i) The nonzero singular values of A are the square roots of nonzero eigenvalues of A ⊤ A or AA ⊤ . (ii) if U Σ V ⊤ is a reduced SVD of A , then columns of V are eigenvectors of A ⊤ A andcolumns of U are eigenvectors of AA ⊤ . The
Moore-Penrose pseudo-inverse of a matrix A , denoted by A + , is defined as amatrix that satisfies all the following four conditions: AA + A = A, A + AA + = A + , ( AA + ) ⊤ = AA + , ( A + A ) ⊤ = A + A The Moore-Penrose pseudo-inverse exists and is unique for any given matrix A . Wehave A + = ( A ⊤ A ) + A ⊤ = A ⊤ ( AA ⊤ ) + . For further properties of the Moore-Penrosepseudo-inverse see for instance [5]. The two following Lemmas provide the Moore-Penrosepseudo-inverse of the matrix A based on some nonzero eigendecomposition of AA ⊤ . Theproof of the first one is straight forward and left to the readers.The following lemmas provide the Moore-Penrose pseudo-inverse of a matrix A basedon some nonzero eigendecomposition of the matrix AA ⊤ . Lemma 2.
Let B be a positive semi-definite real matrix. Then B admits an orthonormalnonzero eigendecomposition of the form B = Q Λ Q ⊤ . Where Q ⊤ Q = I . Moreover let B = AA ⊤ , then we have A ⊤ QQ ⊤ = A ⊤ . Proof.
Using the previous notation, let AA ⊤ = P Λ P ⊤ be an orthonormal decompo-sition for AA ⊤ and P = [ Q N ] where the columns of N are in ker( AA ⊤ ). The equation Q ⊤ Q = I is concluded from the orthonormality of the columns of Q . If y denotes a col-umn of N , by A ℓk A ⊤ ℓk y = 0 we obtain y ⊤ A ℓk A ⊤ ℓk y = 0, hence || A ⊤ ℓk y || = 0, which yields A ⊤ ℓk y = 0. Thus A ⊤ N = 0 . Now, from
P P ⊤ = I we obtain QQ ⊤ + N N ⊤ = I ; Multiplyingfrom left by A ⊤ and using A ⊤ N = 0, we provide A ⊤ QQ ⊤ = A ⊤ . (cid:3) emma 3. Let A n × m be a real matrix and let AA ⊤ = Q Λ Q ⊤ be a nonzero orthonormaleigendecomposition of AA ⊤ . Then the Moore-Penrose pseudo-inverse of A is given by W = A ⊤ Q Λ − Q ⊤ . Moreover, if the all one column vector j = [1 1 . . . ⊤ is an eigenvectorof A ⊤ A , then W A j = j . Proof.
The proof is easily obtained by using Lemma 2. (cid:3)
Lemma 4.
Let A n × m be a real matrix and suppose that the columns of Υ are a completeset of eigenvectors corresponding to nonzero eigenvalues of AA ⊤ . Let the columns of Υ be c , . . . , c n corresponding to the nonzero eigenvalues λ , . . . , λ n . (i) An orthonormal nonzero eigendecomposition AA ⊤ = Q Λ Q ⊤ is obtained by setting Q = Υ E , where E = diag( k c i k ) ≤ i ≤ n . (ii) If we denote Moore-Penrose pseudo-inverse of A by W , then W = A ⊤ Υ D Υ ⊤ , where D = diag( || c i || λ i ) ≤ i ≤ n . Consequently, W = A ⊤ C where the entries of C are givenby C ij = P k Υ ik Υ jk || c k || λ k . Proof. (i) In order to obtain normal eigenvectors, it is enough to divide column c i ofΥ by its norm, that is to multiply the matrix Υ from right by the diagonal matrix E = diag( || c i || ) ≤ i ≤ n to get Q = Υ E .(ii) By Lemma 3, W = A ⊤ Q Λ − Q ⊤ = A ⊤ Υ E Λ − E ⊤ Υ ⊤ . Since both E and Λ arediagonal, so is E Λ − E ⊤ ; Setting D = E Λ − E ⊤ , we obtain W = A ⊤ Υ D Υ ⊤ , where D = diag( || c i || λ i ) ≤ i ≤ n ; Setting C = Υ D Υ ⊤ we obtain W = A ⊤ C and C ij = P k Υ ik Υ jk || c k || λ k , as required. (cid:3) ν B and some of its properties In this section, we consider an order on the set ∆ B and based on this define a function ν B on the set ∆ B × ∆ B and inspect some of its properties. For an integer b i ≥
2, thefollowing linear order makes ∆ b i a totally ordered set:0 ≺ i ≺ i . . . ≺ i b i − ≺ i g B = ∆ b × · · · × ∆ b ℓ , a poset isobtained; More precisely, for two elements x = x · · · x ℓ and y = y · · · y ℓ with x i , y i ∈ ∆ b i ,(1 ≤ i ≤ ℓ ), we have x (cid:22) B y if and only if x i (cid:22) i y i holds for i = 1 , . . . , ℓ . Below ispresented the definition of a useful function on ∆ B × ∆ B . Definition 11.
Consider the ℓ -tuple B = ( b , b , . . . , b ℓ ) , where b i ≥ is integer for i = 1 , . . . , ℓ . For any i , (1 ≤ i ≤ ℓ ) , we define the function ν i on ∆ b i × ∆ b i as ν i ( x, y ) = − b i if x = y = g, − y if x = y = g, if x ≺ y, if x ≻ y. Now the function ν B is defined on the product set ∆ B × ∆ B by the following product rule ν B ( x · · · x ℓ , y · · · y ℓ ) = ℓ Y i =1 ν i ( x i , y i ) (2) Remark 3.
The function ν satisfies the property “ ν B ( x, y ) = 0 unless x (cid:22) B y ”; Thismeans that it is an element of the incidence algebra of the poset ∆ B (For the definitionand some examples of this concept, see for instance Chapter 8 of [1]). It is observed that ν B satisfies X x (cid:22) z (cid:22) y ν B ( x, z ) = (Q ℓi =1 ( y ′ i − x ′ i ) if x (cid:22) y, otherwise. (3) where the values x ′ i , (1 ≤ i ≤ ℓ ) , are defined x ′ i = ( b i if x i = g,x i otherwise.and y ′ i ’s are defined similarly. The equation (3) shows similarities between the function ν B and the M¨obius function of the poset ∆ B . Some useful identities about ν B are stated in Proposition 1, but before stating thisproposition we need some definitions and lemmas. Definition 12.
Let ℓ be a positive integer, B = ( b , . . . , b ℓ ) and let v ′ , v ′′ ∈ ∆ B . Let m, n be integers with ≤ m, n ≤ ℓ such that | G v ′ | = ℓ − n and | G v ′′ | = ℓ − m . Define the sets A , A , A and A by A = G v ′ ∩ G v ′′ , A = G v ′′ \ G v ′ , A = G v ′ \ G v ′′ and A = G v ′ ∩ G v ′′ . emma 5. Let v ′ , v ′′ ∈ ∆ B and the sets A , A , A and A be as in Definition 12. (i) The sets A , A , A and A are mutually disjoint and A ∪ A ∪ A ∪ A = [ ℓ ] .Moreover A = [ ℓ ] unless v ′ = v ′′ = g ℓ . (ii) If A = A = ∅ , then A = G v = G v ′ and G v ′ = G v ′′ = A ; If furthermore v ′ = v ′′ ,then there exists i ∈ A such that v ′ i = v ′′ i Proof.
The proof is straightforward. (cid:3)
Lemma 6.
Let w, v ′ , v ′′ ∈ ∆ B . (i) If ν B ( w, v ′ ) ν B ( w, v ′′ ) = 0 , then G w ⊆ A . (ii) If G w ⊆ A , then ν B ( w, v ′ ) ν B ( w, v ′′ ) = p p p p , where p = Y i ∈ G w b i , p = Y i ∈ A ν i ( w i , v ′′ i ) ,p = Y i ∈ A ν i ( w i , v ′ i ) , p = Y i ∈ A ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) Proof.
The proof of part (i) is straightforward. The proof of part (ii) is obtained using ν B ( w, v ′ ) ν B ( w, v ′′ ) = ℓ Y i =1 ν i ( w i , v ′ i ) ν i ( w i , v ′′ i )= Y j =0 Y i ∈ A j ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) , and the definition of ν i . (cid:3) Proposition 1.
Let v ′ , v ′′ ∈ Γ B . Then (i) X w ∈ V ℓk ν B ( w, v ′ ) = ( − ℓ − k S ℓ − k ( B ) , if v ′ = g ℓ , , otherwise. (ii) X w ∈ V ℓk ν B ( w, v ′ ) ν B ( w, v ′′ ) = S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ b i Y i ∈ G v ′ ( v ′ i + v ′ i ) , if v ′ = v ′′ , , otherwise. roof. (i) For w ∈ V ℓ,k we have ν B ( w, g ℓ ) = Q i ∈ G w ( − b i ), hence we obtain X w ∈ V ℓk ν B ( w, g ℓ ) = ( − ℓ − k S ℓ − k ( B )which proves part(i) in the case v ′ = g ℓ .Now suppose that v ′ ∈ V ℓ ; ≤ k ; B \ { g ℓ } , hence for some 1 ≤ j ≤ ℓ , v ′ j = g . Then from X w ∈ V ℓk ν B ( w, v ′ ) = b i − X w i =0 ℓ Y i =1 ν i ( w i , v ′ i )= ℓ Y i =1 b i − X w i =0 ν i ( w i , v ′ i ) , by using P b j − w j =0 ν j ( w j , v ′ j ) = 0, the right side is simplified to 0, as required.(ii) To prove this part, observe that if | A | < ℓ − k , each summand in the left, is zero andthere is nothing to prove. So, let | A | ≥ ℓ − k ; Setting X ℓk ( B, G ) = { w ∈ V ℓk ( B ) : G w = G } we obtain X w ∈ V ℓk ( B ) ν B ( w, v ′ ) ν B ( w, v ′′ ) = X G ∈ ( A ℓ − k ) X w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) ν B ( w, v ′′ ) (4)First we compute the summand P w ∈ X ℓk ( B,G ) ν ( w, v ′ ) ν ( w, v ′′ ), for a fixed G ∈ (cid:0) A ℓ − k (cid:1) .For this, without loss of generality, let A = { , . . . , a } , A = { a + 1 , . . . , a + a } , A = { a + a +1 , . . . , a + a + a } and A = { a + a + a +1 , . . . , ℓ } , where a , a , a are non-negative integers. Moreover, without loss of generality, let G = { k +1 , . . . , ℓ } .Now w ∈ X ℓk ( B, G ) can be factorized in the form w = qrstg ℓ − k , with | q | = a , | r | = a , | s | = a and | t | = a − ( ℓ − k ) and when w runs over X ℓk ( B, G ), each ofthe words q, r, s and t runs over a proper set accordingly. By part (ii) of Lemma 6we obtain X w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) ν B ( w, v ′′ ) = P P P P , (5)where P = Y i ∈ A b i Y i ∈ G b i , P = Y i ∈ A b i − X w i =0 ν i ( w i , v ′ i ) ,P = Y i ∈ A b i − X w i =0 ν i ( w i , v ′′ i ) , P = Y i ∈ A b i − X w i =0 ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) Case 1. v ′ = v ′′ ; In this case A = G v ′ and A = A = ∅ , hence P = P = 1and P w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) = P P . In this case, P = Y i ∈ G v ′ ( v ′ i + v ′ i ) and P = Y i ∈ G v ′ b i Y i ∈ G b i , so X w ∈ V ℓk ( B ) ν B ( w, v ′ ) ν B ( w, v ′′ ) = X G ∈ ( Gv ′ ℓ − k ) X w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) ν B ( w, v ′′ )= X G ∈ ( Gv ′ ℓ − k ) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i Y i ∈ G b i = Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i X G ∈ ( A ℓ − k ) Y i ∈ G b i = Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i S ℓ − k ( B ( G v ′ )) Case 2. v ′ = v ′′ ; If A = ∅ then P = 0 and if A = ∅ then P = 0. Otherwise, if A = A = ∅ , then by Lemma 5 (ii), A = ∅ and there exists i ∈ A with v ′ i = v ′′ i ;For this i , P w i ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) = 0 thus P = 0. Hence, the hypothesis v ′ = v ′′ implies that the right side of (5) is zero in either case, and we get the result by (4). (cid:3) Proposition 2.
Let v ′ ∈ Γ B . Then (i) For any u ∈ U ℓ we have X y ∈ M ℓ,k ; B ( u ) ν B ( y, v ′ ) = ( − ℓ − k S ℓ − k ( B ( G v ′ )) ν B ( u, v ′ ) , (6)(ii) For any v ∈ V ℓk we have X u ∈ N ℓ,k ; B ( v ) ν B ( u, v ′ ) = ( − ℓ − k ν B ( v, v ′ ) (7) Proof. (i) If the summand ν ( y, v ′ ) is nonzero, then G y ⊆ G v ′ , on the other hand thenon-gaped positions of all such y ’s are the same as u . Let G y = { x , x , . . . , x ℓ − k } ,13hen ν B ( y, v ′ ) = ( − ℓ − k b x b x · · · b x ℓ − k ν B ( u, v ′ ). Therefore X y ∈ M ℓ,k ; B ( u ) ν B ( y, v ′ ) =( − ℓ − k ν B ( u, v ′ ) X { x ,x ,...,x ℓ − k }⊆ G v ′ b x b x . . . b x ℓ − k =( − ℓ − k S ℓ − k ( B ( G v ′ )) ν B ( u, v ′ ) , as required.(ii) We distinguish two cases: Case (a).
Suppose that G v ⊆ G v ′ . Then for any u ∈ N ℓ,k ; B ( v ) we have ν B ( u, v ′ ) = Q i ∈ G v ′ ν i ( u i , v ′ i ) = Q i ∈ G v ′ ν i ( v i , v ′ i ) and since there are totally Q i ∈ G v ′ b i such words u , the left side of equation (7) equals Y i ∈ G v b i Y i ∈ G v ′ ν i ( v i , v ′ i ) = ( − | G v | Y i ∈ G v ν i ( g, g ) Y i ∈ G v ′ \ G v ν i ( v i , g ) Y i ∈ G v ′ ν i ( v i , v ′ i )= ( − ℓ − k ℓ Y i =1 ν i ( v i , v ′ i )which equals ( − ℓ − k ν B ( v, v ′ ) as required. Case (b).
Suppose that G v G v ′ , consequently G v \ G v ′ = ∅ . Now for any i ∈ G v \ G v ′ we have ν ( v i , v ′ i ) = 0, thus the right side of (7) is 0; The followingargument shows that the left side is 0 as well: The nonzero summands in the leftside of (7) are obtained from elements u ∈ X where the subset X ⊆ Σ B is given by X = { u ∈ Σ B : u i ≤ v i for i ∈ G v \ G v ′ and u i = v i for i ∈ G v } . Thus we obtain X u ∈ N ℓk ; B ( v ) ν B ( u, v ′ ) = X u ∈ X ν B ( u, v ′ )= X u ∈ X ℓ Y i =1 ν i ( u i , v ′ i )= X u ∈ X Y i ∈ G v ν i ( u i , v ′ i ) Y i ∈ G v \ G v ′ ν i ( u i , v ′ i ) Y i ∈ G v ∩ G v ′ ν i ( u i , g ) = Y i ∈ G v ν i ( v i , v ′ i ) Y i ∈ G v \ G v ′ v ′ i X u i =0 ν i ( u i , v ′ i ) Y i ∈ G v ∩ G v ′ b i i ∈ G v \ G v ′ we have P v ′ i u i =0 ν i ( u i , v ′ i ) = P v ′ i − u i =0 − v ′ i = 0 . Thus (7) is true in either case. (cid:3)
Definition 13.
Let u ∈ Σ B and v ∈ ∆ B . Then P ( u, v ) and Q ( u, v ) are defined as below P ( u, v ) = { i : 1 ≤ i ≤ ℓ, v i = g, v i = u i } Q ( u, v ) = { i : 1 ≤ i ≤ ℓ, v i = g, v i = u i } We denote P ( u, v ) and Q ( u, v ) by P and Q , respectively, if there is no danger of confusion. With the above definition, it is obvious that | P ( u, v ) | + | Q ( u, v ) | = | G v | . Particularly, if v ∈ V ℓ,k and | P | = p then | Q | = k − p . Proposition 3.
Let u ∈ Σ B and v ∈ ∆ B . Recall the notation of Definition 13. (i) For ≤ i ≤ ℓ , let φ i ( u, v ) = b i − X j =max { ,v i } ν i ( v i , j ) ν i ( u i , j ) j ( j + 1) . Then we have φ i ( u, v ) = b i − b i , if i ∈ P ; − b i , otherwise, i.e. if i ∈ Q. (ii) Let G be a given subset of [ ℓ ] with G v ⊆ G . Then the following identity holds X v ′ ∈ Γ B ,G v ′ = G ν B ( v, v ′ ) ν B ( u, v ′ ) Y i ∈ G v ′ ( v ′ i + v ′ i ) = ( − | Q \ G | + | G v | Y i ∈ G v b i Y i ∈ P \ G ( b i − Y i ∈ G b i . (8) Proof. (i) The proof if this part is easy and left to the reader.(ii) Note that if G v ⊆ G v ′ = G , it is easily obtained that ν B ( v, v ′ ) ν B ( u, v ′ ) = ( − | G v | Y i ∈ G v b i Y i ∈ G ν i ( v i , v ′ i ) ν i ( u i , v ′ i )15ence if we denote by S the left side of (8), we obtain S = ( − | G v | Y i ∈ G v b i X v ′ ∈ Γ B ,G v ′ = G Y i ∈ G ν i ( v i , v ′ i ) ν i ( u i , v ′ i ) v ′ i ( v ′ i + 1)= ( − | G v | Y i ∈ G v b i Y i ∈ G b i − X j =max { ,v i } ν i ( v i , j ) ν i ( u i , j ) j ( j + 1)= ( − | G v | Y i ∈ G v b i Y i ∈ G φ i ( u, v )Thus by part (i), we get S = ( − | G v | Y i ∈ G v b i Y i ∈ G ∩ Q − b i Y i ∈ G ∩ P b i − b i = ( − | Q \ G | + | G v | Y i ∈ G v b i Y i ∈ P \ G ( b i − Y i ∈ G b i . (cid:3) A ℓk ; B A ⊤ ℓk ; B and A ⊤ ℓk ; B A In this section we give an orthonormal nonzero eigendecomposition of the matrices A ℓk ; B A ⊤ ℓk ; B and A ⊤ ℓk ; B A . Eigenvectors of these matrices are the elementary symmetric polynomials andthe entries of the corresponding eigenvectors are given in terms of the function ν B . Usingthe properties of ν B we show that these eigenvectors are mutually orthogonal. Definition 14.
Let n, k ≤ ℓ be integers. Given B = ( b , . . . , b ℓ ) and v ′ ∈ V ′ ℓ,n ; B , wedefine the column vector x ℓ,k,nv ′ as a vector whose rows are indexed by the elements of V ℓ,k ; B with entries x ℓ,k,nv ′ ( w ) = ( − ℓ − k ν B ( w, v ′ ) . The column vector z ℓ,nv ′ is then defined as z ℓ,nv ′ = x ℓ,ℓ,nv ′ ; In other words, z ℓ,nv ′ is a column vector whose rows are indexed by elements u of Σ B with entries z ℓ,nv ′ ( u ) = ν B ( u, v ′ ) . When there is no need to emphasize on theparameters ℓ , k and n , we simply write x v ′ and z v ′ . roposition 4. Let v ′ ∈ Γ B and n = | G v ′ | . Then the following identity holds: k x ℓ,k,nv ′ k = S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i Proof.
See proposition 1 (ii) (cid:3)
The following proposition contains a generalization of Proposition 2 of [7]:
Proposition 5.
Let ≤ k ≤ ℓ , ≤ n ≤ ℓ and v ′ ∈ V ′ ℓn . The following matrix identitieshold. (i) A ⊤ ℓk ; B x ℓknv ′ = S ℓ − k ( B ( G v ′ )) z ℓnv ′ . (ii) A ℓk ; B z ℓnv ′ = x ℓknv ′ . (iii) A ℓk ; B A ⊤ ℓk ; B x ℓknv ′ = S ℓ − k ( B ( G v ′ )) x ℓknv ′ . (iv) A ⊤ ℓk ; B A ℓk ; B z ℓnv ′ = S ℓ − k ( B ( G v ′ )) z ℓnv ′ . (v) For any two distinct words v ∈ V ′ ℓn and u ∈ V ′ ℓn , the vectors x ℓkn v and x ℓkn u areorthogonal. (vi) For any two distinct words v ∈ V ′ ℓn and u ∈ V ′ ℓn , the vectors z ℓn v and z ℓn u areorthogonal. Proof.
The proofs of (i) and (ii) are concluded from definitions of x v ′ and z v ′ and Propo-sition 2. Combining (i) and (ii) yields (iii) and (iv). The proofs of (v) is concluded fromProposition 1(ii). The same proposition yields part (vi) by setting k = ℓ . (cid:3) Theorem 1.
Let ≤ k ≤ ℓ . Then (i) The set { S ℓ − k ( B ( G v ′ )) : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ }} consists of all eigenvalues ofthe matrix A ⊤ ℓ,k ; B A ℓ,k ; B and the set { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ } is a completeset of eigenvectors corresponding to eigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B . Moreover, theseeigenvectors are pairwise orthogonal. The set { S ℓ − k ( B ( G v ′ )) : v ′ ∈ V ′ ℓ, ≤ k ; B } consists of all non-zero eigenvalues of thematrix A ℓ,k ; B A ⊤ ℓ,k ; B . The set { x ℓknv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } is a complete set of eigenvectorscorresponding to aforementioned nonzero eigenvalues. Moreover, these eigenvectorsare pairwise orthogonal. Proof. (i) First we note that for any v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ , z ℓnv ′ is a nonzero vector. ByProposition 5 (iv), for any v ′ ∈ Γ B , z ℓnv ′ is an eigenvector of A ⊤ ℓ,k ; B A ℓ,k ; B correspondingto the eigenvalue S ℓ − k ( B ( G v ′ )). By Proposition 5 (vi), these eigenvectors are pair-wise orthogonal. Since |{ z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ }| = | Γ B | and | Γ B | = | Σ B | equalsthe size of the matrix A ⊤ ℓ,k ; B A ℓ,k ; B , we conclude that { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ } is a complete set of eigenvectors corresponding to eigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B , asrequired.(ii) The set of all non-zero eigenvalues of A ℓ,k ; B A ⊤ ℓ,k ; B is the same as the set of all non-zeroeigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B . In part (i), we obtained the complete set of eigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B . On the other hand, if 0 ≤ n ≤ ℓ and v ′ ∈ V ′ ℓ,n ; B , then S ℓ − k ( B ( G v ′ )) = 0holds if and only if k < n ≤ ℓ . Hence, the set { S ℓ − k ( B ( G v ′ )) : v ′ ∈ V ′ ℓ, ≤ k ; B } consists of all non-zero eigenvalues of the matrix A ℓ,k ; B A ⊤ ℓ,k ; B as well as the matrix A ⊤ ℓ,k ; B A ℓ,k ; B . Moreover, using Proposition 5 (iii), the corresponding eigenvectors are { x ℓknv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } and these vectors are pairwise orthogonal. (cid:3) The following corollary is a straight conclusion of Theorem 1.
Corollary 1.
The set { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , k < n ≤ ℓ } , is a basis for the null-space of thematrix A ⊤ ℓ,k ; B A ℓ,k ; B . Using Fact 3 in Section 5.6 of [5] and Theorem 1, the following corollary gives thereduced SVD of A ℓ,k ; B . Corollary 2.
Let r = rank( A ⊤ ℓ,k ; B A ℓ,k ; B ) = | V ′ ℓ, ≤ k ; B | , { u , u , . . . , u r } = { S ℓ − k ( B ( G v ′ )) x ℓknv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } and { v , v , . . . , v r } = { z ℓnv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } . Then the reduced SVD of A ℓ,k ; B is given by A ℓ,k ; B = U Σ V , where the columns of U are vectors u i , ( ≤ i ≤ r ) and thecolumns of V are vectors v i , ( ≤ i ≤ r ) and Σ is an r × r diagonal matrix, whose diagonalentries are the corresponding nonzero eigenvalues of the matrix A ℓ,k ; B A ⊤ ℓ,k ; B . efinition 15. Given B = ( b , . . . , b ℓ ) , we define Υ ℓ,k ; B as a matrix whose rows andcolumns are indexed by the elements of V ℓ,k ; B and V ′ ℓ, ≤ k ; B respectively and whose en-tries are given by Υ ℓ,k ; B ( w, v ′ ) = ( − | w |−| w | g ν B ( w, v ′ ) . In other words, the columnsof Υ ℓ,k ; B are exactly the vectors x v ′ for v ′ ∈ V ′ ℓ, ≤ k ; B . Also we define the matrix Λ by Λ = diag( S ℓ − k ( B ( G v ′ ))) v ′ ∈ V ′ ℓ, ≤ k ; B . Remark 4.
Using Lemma and Theorem , we obtain an orthonormal nonzeroeigendecomposition for A ℓ,k ; B A ⊤ ℓ,k ; B . A ℓ,k ; B In this section we give concrete bases for the null space and the row space of A ℓ,k ; B . Thebasis for the null space is obtained just as a corollary of the arguments of the Section 4.The basis for the row space is obtained by a proper selection of some rows of A ℓ,k ; B ; Thisgives a combinatorial interpretation for the previous formula about the rank of A ℓ,k ; B ;The process of finding a basis for the row space of the matrix A ℓ,k ; B is similar to the oneabout incidence matrices presented in [16]. Theorem 2.
Let ≤ k ≤ ℓ . Then (i) The set { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , k < n ≤ ℓ } , is a basis for the null-space of the matrix A ℓ,k ; B . (ii) The matrix A ℓ, ≤ k ; B has the same row space as A ℓ,k ; B and rank ( A ℓ,k ; B ) = R k ( B ) .For w ∈ V ℓ, ≤ k ; B denote by r w the row of A ℓ, ≤ k ; B indexed w . Then the set { r w : w ∈ V ′ ℓ, ≤ k ; B } is a basis for the row space of A ℓ,k ; B . Proof. (i) First we observe that the vector x ℓknv ′ is the zero vector if and only if k Let u ∈ Σ B , v ∈ V ℓ,k ; B . Moreover, with notation of Definition 13, let P = P ( u, v ) and Q = Q ( u, v ) . Then the entry W ℓ,k ; B ( u, v ) , the Moore-Penrose pseudo- nverse of A ℓ,k ; B , is given as below W ℓ,k ; B ( u, v ) = 1 Y i ∈ G v b i X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q \ G | Y i ∈ P \ G ( b i − S ℓ − k ( B ( G )) (11) Proof. For v ′ ∈ V ′ ℓ, ≤ k , let d v ′ = || x v ′ || λ v ′ . By Lemma 4 and the definitions of A ℓ,k ; B andΥ ℓ,k ; B we obtain W ℓ,k ; B ( u, v ) = X y ∈ M ℓ,k ; B ( u ) X v ′ ∈ V ′ ℓ, ≤ k ν B ( v, v ′ ) ν B ( y, v ′ ) d v ′ = X v ′ ∈ V ′ ℓ, ≤ k ν B ( v, v ′ ) d v ′ X y ∈ M ℓ,k ; B ( u ) ν B ( y, v ′ )= X v ′ ∈ V ′ ℓ, ≤ k ( − ℓ − k ν B ( v, v ′ ) d v ′ S ℓ − k ( B ( G v ′ )) ν B ( u, v ′ ) (by (6))Replacing d v ′ using Proposition 4, we obtaine W ℓ,k ; B ( u, v ) = ( − ℓ − k X v ′ ∈ V ′ ℓ, ≤ k ν B ( v, v ′ ) ν B ( u, v ′ ) S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i = ( − ℓ − k X G,G v ⊆ G X v ′ ∈ V ′ ℓ, ≤ k ,G v ′ = G ν B ( v, v ′ ) ν B ( u, v ′ ) S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i = ( − ℓ − k X G, G v ⊆ G ⊆ [ ℓ ] S ℓ − k ( B ( G )) Y i ∈ G b i X v ′ ∈ Γ B ,G v ′ = G ν B ( v, v ′ ) ν B ( u, v ′ ) Y i ∈ G v ′ ( v ′ i + v ′ i )= 1 Y i ∈ G v b i X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q \ G | Y i ∈ P \ G ( b i − S ℓ − k ( B ( G )) (by (8)) (cid:3) Theorem 4. The sum of entries of any rows (columns) of the matrix H ℓ,k ; B := W ℓ,k ; B A ℓ,k ; B equals . Furthermore, for any u, w ∈ Σ B , the entry H ℓ,k ; B ( u, w ) is given as below, where y using Definition 13, P = P ( u, w ) and Q = Q ( u, w ) . H ℓ,k ; B ( u, w ) = 1 ℓ Y i =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | ( − | Q \ G | Y i ∈ P \ G ( b i − 1) (12) Proof. To compute the sum of entries of each row (column) of the matrix H , we observethat z ℓk gg ··· g = j . Now, using Proposition 5 (iv) and Lemma 3, we have H j = j , as desired.To calculate the entry H ℓ,k ; B ( u, w ) of H , firstly note that if v ∼ w then for any subset G v ⊆ G ⊆ [ ℓ ] we have P ( u, v ) \ G = P ( u, w ) \ G and Q ( u, v ) \ G = Q ( u, w ) \ G . Secondly,by H ℓ,k ; B = W ℓ,k ; B A ℓ,k ; B we have H ℓ,k ; B ( u, w ) = X v ∈ V ℓ,k ; B , v ∼ w W ℓ,k ; B ( u, v )Thirdly, the result of Theorem 3, is rewritten as W ℓ,k : B ( u, v ) = 1 ℓ Y i =1 b i X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q ( u,v ) \ G | Y i ∈ G v b i Y i ∈ P ( u,v ) \ G ( b i − S ℓ − k ( B ( G ))Thus by the two last formulas we obtain H ℓ,k ; B ( u, w ) = 1 ℓ Y i =1 b i X v ∈ V ℓ,k ; B , v ∼ w X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q ( u,v ) \ G | Y i ∈ G v b i Y i ∈ P ( u,v ) \ G ( b i − S ℓ − k ( B ( G ))= 1 Q ℓi =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | X v ∈ V ℓ,k ; B , v ∼ w,G v ⊆ G ( − | Q ( u,v ) \ G | Y i ∈ G v b i Y i ∈ P ( u,v ) \ G ( b i − S ℓ − k ( B ( G ))= 1 Q ℓi =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | X v ∈ V ℓ,k ; B , v ∼ w,G v ⊆ G ( − | Q ( u,w ) \ G | Y i ∈ G v b i Y i ∈ P ( u,w ) \ G ( b i − S ℓ − k ( B ( G ))= 1 Q ℓi =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | ( − | Q ( u,w ) \ G | Y i ∈ P ( u,w ) \ G ( b i − S ℓ − k ( B ( G )) X v ∈ V ℓ,k ; B , v ∼ w,G v ⊆ G Y i ∈ G v b i S ℓ − k ( B ( G )), we obtain(12). (cid:3) Acknowledgement. M. Mohammad-Noori would like to thank University of Tehranfor supporting him during his sabbatical leave at IPM. The researches of N. Ghareghaniand M. Mohammad-Noori were in part supported by grants from IPM (No. 94050016 andNo. 95050129). References ∼ pjc/notes/counting.pdf. Accessed 25 January 2012.[2] D. A Cox, J. Little, D. O’Shea, Ideals, varieties, and algorithms: an introduction tocomputational algebraic geomerty and commutative algebra, (3rd edition), Springer,2007.[3] Ph. Delsarte, Beyond the orthogonal array concept , European Journal of Combina-torics (2004), , 187-198.[4] Ph. Delsarte, Association schemes and t -designs in regular semilattices , Journal ofCombinatorial Theory Series A (1976), , 230-243.[5] L. Han and M. Neumann, Inner Product Spaces, Orthogonal Projection, Least Squaresand Singular Value Decomposition In: Leslie Hogben (eds) Handbook of Linear Alge-bra, 2nd Edition (Discrete Mathematics and Its Applications) Boca Raton, FL: CRCPress; (2013).[6] M. Ghandi, D. Lee, M. Mohammad-Noori, and M. A. Beer, Enhanced regulatorysequence prediction using gapped k-mer features , PLoS Comput Biol, 2014. (7): p.e1003711[7] M. Ghandi, M. Mohammad-Noori, and M. A. Beer, Robust k -mer frequency esti-mation using gapped k -mers , Journal of mathematical biology. (2014), , Issue 2 ,469-500.[8] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A Foundationfor Computer Science (Second Edition), Addison Wesley Publishing Company, 1994.239] D. Lee, D. U. Gorkin, M. Baker, B. J. Strober, A. L. Asoni, A. S. McCallion and M.A. Beer, A method to predict the impact of regulatory variants from DNA sequence ,Nature Genetics (August 2015), , no. 8 : 955-961. doi:10.1038/ng.3331.[10] M. Marcus and H. Minc, Introduction to Linear Algebra. New York: Dover, p. 182,1988.[11] M. Marcus and H. Minc, ”Positive Definite Matrices.”4.12 in A Survey of MatrixTheory and Matrix Inequalities. New York: Dover, p. 69, 1992.[12] A. Mo, Ch. Luo, F. P. Davis, E. A. Mukamel, G. L. Henry, J. R. Nery, M. A. Urich, etal, Epigenomic landscapes of retinal rods and cones , eLife (March 7, 2016), : e11613.doi:10.7554/eLife.11613.[13] B. Morgenstern, B. Zhu, S. Horwege and Ch. A. Leimeister, Estimating evolution-ary distances between genomic sequences from spaced-word matches , Algorithms forMolecular Biology (2015), : 5. doi:10.1186/s13015-015-0032-x.[14] P. Terwilliger, The incidence algebra of a uniform poset In: Codeng Theory andDesign Theory Part I: Coding Theory, IMA Volumes in Mathematics and its Appli-cations, vol. (20) , Springer, New York, 1990, 193-212.[15] Chaudhari HG and Cohen BA, Local sequence features that influence AP-1 cis-regulatory activity , Genome Res. 2018 Feb;28(2):171-181. doi: 10.1101/gr.226530.117.Epub 2018 Jan 5.[16] R. M. Wilson, A diagonal form for the incidence matrices of t-subsets vs. k-subsets .European Journal of Combinatorics (1990),11