[PDF] Generalized Gapped-kmer Filters for Robust Frequency Estimation

Abstract

In this paper, we study the generalized gapped k-mer filters and derive a closed form solution for their coefficients. We consider nonnegative integers \ell and k, with k\leq \ell, and an \ell-tuple B=(b_1,\ldots,b_{\ell}) of integers b_i\geq 2, i=1,\ldots,\ell. We introduce and study an incidence matrix A=A_{\ell,k;B}. We develop a M\"obius-like function \nu_B which helps us to obtain closed forms for a complete set of mutually orthogonal eigenvectors of A^{\top} A as well as a complete set of mutually orthogonal eigenvectors of AA^{\top} corresponding to nonzero eigenvalues. The reduced singular value decomposition of A and combinatorial interpretations for the nullity and rank of A, are among the consequences of this approach. We then combine the obtained formulas, some results from linear algebra, and combinatorial identities of elementary symmetric functions and \nu_B, to provide the entries of the Moore-Penrose pseudo-inverse matrix A^{+} and the Gapped k-mer filter matrix A^{+} A.

Full PDF

aa r X i v : . [ c s . D M ] F e b Generalized Gapped-kmer Filters for RobustFrequency Estimation

M. Mohammad-Noori a,c,* , N. Ghareghani b,c , M. Ghandi d, ∗ a School of Mathematics, Statistics and Computer Science, College of Science, University of TehranP.O. Box 14155-6455, Tehran, Iran b Department of Engineering Science, College of Engineering, University of Tehran,P.O. Box 11165-4563, Tehran, Iran c School of Mathematics, Institute for Research in Fundamental Sciences (IPM),

P.O.Box: 19395-5746, Tehran, Iran d Broad Institute of MIT and Harvard 7 Cambridge Center, 4034C,Cambridge, MA 02142, United States of America

Emails: [email protected], [email protected] , [email protected], [email protected] , [email protected] Abstract

In this paper, we study the generalized gapped k-mer ﬁlters and derive a closed formsolution for their coeﬃcients. We consider nonnegative integers ℓ and k , with k ≤ ℓ ,and an ℓ -tuple B = ( b , . . . , b ℓ ) of integers b i ≥ i = 1 , . . . , ℓ . We introduce and studyan incidence matrix A = A ℓ,k ; B . We develop a M¨obius-like function ν B which helps usto obtain closed forms for a complete set of mutually orthogonal eigenvectors of A ⊤ A as well as a complete set of mutually orthogonal eigenvectors of AA ⊤ corresponding tononzero eigenvalues. The reduced singular value decomposition of A and combinatorialinterpretations for the nullity and rank of A , are among the consequences of thisapproach. We then combine the obtained formulas, some results from linear algebra,and combinatorial identities of elementary symmetric functions and ν B , to provide theentries of the Moore-Penrose pseudo-inverse matrix A + and the Gapped k-mer ﬁltermatrix A + A . ∗ Corresponding authors Introduction

Sequences of length k , commonly referred to as k -mers, are used in many computationalbiology algorithms. We previously showed that robust frequency estimation of k -mersusing gapped k -mer features could profoundly improve the performance of algorithmsused for sequence classiﬁcation in computational biology [7, 6]. The method described inthese previous publications was based on analytically deriving the coeﬃcients of a gapped k -mer ﬁlter that could be used to ﬁnd the robust frequency estimates of k -mers. Althoughthis ﬁlter could be applied to datasets consisting of DNA or Protein sequences, it was notapplicable to complex datasets that included sequences deﬁned on more heterogeneousfeature spaces. Here, we provide the closed-form solution for a generalized gapped k -merﬁlter matrix, by relaxing the constraint that all the features are deﬁned on a ﬁxed-sizealphabet.In order to introduce the main object of this introduction, we brieﬂy mention fewdeﬁnitions and notations here; These are presented in more extent and details in the bodyof the paper. Given two integers ℓ and k with 0 ≤ k ≤ ℓ and a sequence B = ( b , . . . , b ℓ )of integers b i ≥ i = 1 , . . . , ℓ , we associate to them two sets of sequences, U ℓ ; B and V ℓ,k ; B , a match relation between the elements of these two sets and a corresponding (0 , A ℓ,k ; B as below. The set U ℓ ; B consists of all sequences x · · · x ℓ of integers x i satisfying 0 ≤ x i < b i for i = 1 , . . . , ℓ . The set V ℓ,k ; B consists of all sequences y · · · y ℓ ,where each y i is either an integer satisfying 0 ≤ y i < b i or an additional gap symboldenoted as g ; Furthermore, there are exactly ℓ − k occurrences of the gap symbol in any y · · · y ℓ ∈ V ℓ,k ; B . Two sequences x · · · x ℓ ∈ U ℓ ; B and y · · · y ℓ ∈ V ℓ,k ; B are then matchableif for any i , 1 ≤ i ≤ ℓ , we have y i = x i or y i = g . In other words, the gap symbol g acts asa wildcard and can match to any symbol. The corresponding (0 ,

1) matrix A ℓ,k ; B is thenobtained by indexing its columns and rows respectively by the elements of U ℓ ; B and V ℓ,k ; B and setting A ℓ,k ; B ( v, u ) = 1 if and only if u and v are matchable.When b = · · · = b ℓ = b , we have a ﬁxed b -letter alphabet Σ b and we use the name A ℓ,k ; b instead of A ℓ,k ; B . In computational biology for DNA sequences, we have b = 4,and Σ = { A,C,G,T } is the set of four DNA bases. Then the set of column and rowindexes have special names: The set of column indexes, Σ ℓ is non-gapped oligomers oflength ℓ , brieﬂy called non-gapped ℓ -mers and the set of row indexes is gapped oligomerswith k non-gapped positions and length ℓ , brieﬂy called gapped k -mers (of length ℓ ).For amino acid sequences, b = 20, Σ is the set of the 20 amino acids, and the column2ndexes and row indexes are the ungapped and gapped polypeptide sequences of length ℓ .Apart from some previous studies of A ℓ,k ; b in mathematics (see [3, 14, 4]), this matrix hasrecently found profound applications in the ﬁeld of computational biology and machinelearning [7, 6]. Speciﬁcally, the inherent symmetry in matrix A ℓ,k ; b allowed ﬁnding simpleclosed-form solutions for two related matrices: W ℓ,k ; b and H ℓ,k ; b , where W ℓ,k ; b = A + ℓ,k ; b is the Moore-Penrose pseudo-inverse of A ℓ,k ; b , and H ℓ,k ; b is the idempotent matrix givenby H ℓ,k ; b = W ℓ,k ; b A ℓ,k ; b . In [7] the matrix W ℓ,k ; b was derived and used to ﬁnd robustestimates for ℓ -mer counts; This led to signiﬁcant improvement to predict the binding ofcertain transcription factors to DNA sequences. This work was then extended in [6] andthe matrix H ℓ,k ; b was used to develop a method to eﬃciently compute the ℓ -mer countestimates and to compute a string kernel based on these robust count estimates to identifyenhancer sequences. Beyond modeling enhancer sequences in mammalian genomes, thismethod has been widely applied to several problems in computational biology includingprediction of the eﬀect of non-coding variants [9], identiﬁcation of local sequence featuresinﬂuencing cis-regulatory activity [15], identiﬁcation of accessible chromatin regions [12],and estimation of evolutionary distances for phylogeny reconstruction [13].In all the above applications, the features were deﬁned over a ﬁxed alphabet length( b = 4 for DNA/RNA and b = 20 for amino acids). Here, we show that this constraintcould be relaxed to allow generalizing this method to cases with mixture of features thatare deﬁned over alphabets of diﬀerent sizes. For example, in addition to the DNA sequencethat is deﬁned over the alphabet { A,C,G,T } , one can also add DNA methylation statuswhich is deﬁned over { methylated , unmethylated } alphabet or other discrete features. Thena similar methodology described in [7] and [6] can be applied to ﬁnd a robust estimateof the joint distribution of the features using a limited training data. To achieve this,we take a similar approach as was used in [7]. We introduce a M¨obius-like function ν B and use the related identities to obtain eigenvalues of A ℓ,k ; B A ⊤ ℓ,k ; B in terms of elementarysymmetric functions.Then we provide a complete set of mutually orthogonal eigenvectorsof A ℓ,k ; B A ⊤ ℓ,k ; B as well as a complete set of mutually orthogonal eigenvectors of A ℓ,k ; B A ⊤ ℓ,k ; B corresponding to the nonzero eigenvalues. This gives the reduced SVD (reduced singularvalue decomposition) of A ℓ,k ; B . We also give a combinatorial interpretations for the nullityand rank of A ℓ,k ; B via ﬁnding concrete bases for the null space and row space of thismatrix. Finally, we derive an equation for the entries of matrices W ℓ,k ; B and H ℓ,k ; B , where W ℓ,k ; B = A + ℓ,k ; B is the Moore-Penrose pseudo-inverse of A ℓ,k ; B and H ℓ,k ; B = W ℓ,k ; B A ℓ,k ; B .Deriving an explicit formula for matrices A ℓ,k ; B and H ℓ,k ; B allows eﬃcient computation of3obust count estimates from a given training data. In practice, even with modest values of ℓ and k , these matrices have exponentially large dimensions which makes the applicationof numeric methods unfeasible.The rest of the paper is organized as following: Introduction of notation and prelimi-naries is given in Section 2: General notations and deﬁnitions for sets, strings, sequences,relations and some symmetric polynomials are presented in Section 2.1; Some prelimi-naries from linear algebra are discussed in Section 2.2. The function ν B and some of itsproperties is deﬁned and studied in Section 3; The main results of this section, that is theidentities given in Propositions 1, 2 and 3, are used in later sections. Using the deﬁnitionof function ν B and also the elementary symmetric polynomials, we propose an orthonor-mal basis for the eigenspaces of the matrix A ℓ,k ; B A ⊤ ℓ,k ; B in Section 4. Concrete bases forthe null space and the row space of A ℓ,k ; B are presented in Section 5. Finally, in Section6 we compute the entries of W ℓ,k ; B and H ℓ,k ; B . Deﬁnition 1.

Let ℓ be a positive integer. The set [ ℓ ] is deﬁned as [ ℓ ] = { , . . . , ℓ } . Fora set X and a nonnegative integer n , by (cid:0) Xn (cid:1) , we mean the set of all n -element subsets of X . Thus | (cid:0) Xn (cid:1) | = (cid:0) | X | n (cid:1) and | (cid:0) [ ℓ ] n (cid:1) | = (cid:0) ℓn (cid:1) . Deﬁnition 2.

A word x on a ﬁnite alphabet Σ , is a sequence x = x · · · x ℓ whose elements x i belong to the set Σ . As in [7] for a given integer b ≥ , the sets Σ b , ∆ b , Γ b are deﬁnedas follows Σ b = { , , · · · , b − } , ∆ b = Σ b ∪ { g } , Γ b = ∆ b \ { } , where g stands for the gap symbol. Deﬁnition 3.

Let B = ( b , b , . . . , b ℓ ) be an ℓ -tuple of integers b i ≥ . Deﬁne the sets B , ∆ B , Γ B , U ℓ ; B and V ℓ,k ; B as follows Σ B = Σ b × · · · × Σ b ℓ , ∆ B = ∆ b × · · · × ∆ b ℓ , Γ B = Γ b × · · · × Γ b ℓ , U ℓ ; B = Σ B ,V ℓ,k ; B = { v ∈ ∆ B : | v | g = ℓ − k } , V ′ ℓ,k ; B = { w ∈ Γ B : | w | g = ℓ − k } ,V ℓ, ≤ k ; B = k [ m =0 V ℓm , V ′ ℓ, ≤ k ; B = k [ m =0 V ′ ℓm A weak partial order on a set S is a binary relation (cid:22) on S which is reﬂexive, transitiveand antisymmetric. A set equipped with a weak partial order is called a partially orderedset or brieﬂy a poset . If a (cid:22) b and a = b we write a ≺ b ; Then ≺ is nonreﬂexive, transitiveand nonsymmetric; Such a relation is called a strong partial order on S . If (cid:22) (resp. (cid:22) )is a weak partial order on S (resp. T ), then (cid:22) × (cid:22) is a weak partial order on S × T . Iffor any a and b in S , either a ≺ b or b ≺ a , then the partial order is called a total order ,or a linear order. This notation is used in the following deﬁnition. Deﬁnition 4.

Let B = ( b , b , . . . , b ℓ ) . We deﬁne a partial order on the set ∆ B . For thispurpose, ﬁrstly for any ≤ i ≤ ℓ , we consider the order ≺ i on the set ∆ b i given by ≺ i ≺ i . . . ≺ i b i − ≺ i g and consider the order (cid:22) B := ( (cid:22) × . . . × (cid:22) ℓ ) on ∆ B . Remark 1.

As it is clear from the deﬁnitions of U ℓ ; B and V ℓ,k ; B , when we use thesenotations we specially emphasize on parameters ℓ and k . Deﬁnition 5.

For any word v ∈ ∆ B we set G v = { i : 1 ≤ i ≤ ℓ, v i = g } and G v = [ ℓ ] \ G v .If X = { x , · · · , x n } is a subset of { , · · · , ℓ } with x < x < . . . < x n then by B ( X ) wemean ( b x , . . . , b x n ) . Especially, if v ∈ ∆ B and v ′ ∈ Γ B , then B ( G v ) = ( b i ) i ∈ G v and B ( G v ′ ) = ( b i ) i ∈ G v ′ . Deﬁnition 6.

Let B = ( b , . . . , b ℓ ) . We say elements u ∈ Σ B and v ∈ ∆ B match (or u and v are matchable) if for any ≤ i ≤ ℓ with v i = g we have u i = v i ; We denotethis by v ∼ u . The set of the elements v ∈ V ℓ,k ; B which are matchable with u ∈ Σ B , isdenoted by M ℓ,k ; B ( u ) . The set of elements u ∈ Σ B which are matchable with v , is denotedby N ℓ,k ; B ( v ) . eﬁnition 7. The matrix A ℓ,k ; B is deﬁned as a (0 , matrix whose rows and columnsare indexed respectively by the elements of V ℓ,k ; B and Σ B and A ℓ,k ; B ( v, u ) = 1 if and onlyif u and v are matchable. Remark 2.

Considering the deﬁnition 7, if we identify each row index v ∈ V ℓ,k ; B with N ℓ,k ; B ( v ) , then the matrix A ℓ,k ; B is seen as an incidence matrix, in which the points andblocks are row indexes and column indexes, respectively. Deﬁnition 8.

The matrix A ℓ, ≤ k ; B is deﬁned as the (0 , matrix obtained by stacking thematrices A ℓ,i ; B ( i = 0 , . . . , k ), one on top of the other; Thus the rows and columns of A ℓ, ≤ k ; B are indexed respectively by the elements of V ℓ, ≤ k ; B and U ℓ,B . Elementary symmetric polynomials are well-studied objects in the study of polynomialsring k [ x , x , . . . , x n ] (see Chapter 7 of [2]). Below we formally mention their deﬁnitions;Then we deﬁne another symmetric polynomial which is useful in our work. This is followedby an example demonstrating their applications in our work. Deﬁnition 9.

Let i and n be nonnegative integers and let X = ( x , x , . . . , x n ) be a ﬁnitesequence of variables. The i -th elementary symmetric polynomial, denoted as S i ( X ) , isdeﬁned as S i ( X ) := P I ∈ ( Xi ) Q i ∈ I x i . Notation.

Let X = ( x , . . . , x n ) be a ﬁnite sequence of numbers and α and β be arbitrarynumbers. Then we show the sequence ( βx + α, . . . , βx n + α ) by βX + α . Deﬁnition 10.

Let i and n be nonnegative integers and let X = ( x , x , . . . , x n ) be aﬁnite sequence of variables. The expression R i ( X ) is then deﬁned as follows: R i ( X ) = i X j =0 S j ( X −

1) (1)

Example 1.

Let ≤ k ≤ ℓ be integers and B = ( b , . . . , b ℓ ) , u ∈ Σ B and v ∈ V ℓk ; B . Thenwe have | Σ B | = | Γ B | = ℓ Y i =1 b i , | V ℓ,k ; B | = S k ( B ) , | V ′ ℓ,k ; B | = S k ( B − , | V ℓ, ≤ k ; B | = R k ( B + 1) , | V ′ ℓ, ≤ k ; B | = R k ( B ) , | M ℓ,k ( u ) | = (cid:18) ℓk (cid:19) , | N ℓ,k ( v ) | = Y i ∈ G v b i , .2 Notation and preliminaries from Linear Algebra All matrices we concern in this paper are real matrices. The row space of a A is denotedas row( A ), the column space of A is denoted as col( A ), and the dimension of the row spaceof A is denoted as rank( A ). The kernel of A , denoted as ker( A ) and the nullity of A anddenoted as null( A ). The matrix A is called diagonalizable in the ﬁeld of real numbers ifthere exists a nonsingular real matrix P such that A = P Λ P − for some diagonal realmatrix Λ . If A is diagonalizable, then all eigenvalues of A appear on the main diagonalof Λ and the columns of P are the corresponding eigenvectors. The set of column vectorsof P is called a complete set of eigenvectors of A ; The set of column vectors of P whichcorrespond to nonzero eigenvalues is called a complete set of nonzero eigenvectors of A .If eigenvectors belonging to distinct eigenvalues of the matrix A are mutually orthogonal,then there exists an eigendecomposition A = P Λ P − with P − = P ⊤ , we call sucha decomposition an orthogonal eigendecomposition . Let A = P Λ P ⊤ be a orthogonaleigendecomposition for the matrix A and P = [ Q N ] where col( N ) = ker( A ). Then A = Q Λ Q ⊤ , where the matrix Q is obtained by deleting the columns of P which are inker( A ), and Λ is obtained by deleting the zero columns and zero rows of Λ ; we call thisdecomposition an orthonormal nonzero eigendecomposition .It is known that any symmetric real matrix A is diagonalizable on the ﬁeld of real num-bers and eigenvectors corresponding to distinct eigenvalues of A are orthogonal. Hence,every symmetric real matrix A has an orthonormal nonzero eigendecomposition of theform A = Q Λ Q ⊤ with real matrices Λ and Q . A real symmetric matrix A of order n is positive deﬁnite (resp. positive semi-deﬁnite) if x ⊤ A x > x ⊤ A x ≥

0) forall nonzero x ∈ R n . For any matrix A , the matrix A ⊤ A is positive semideﬁnite, andrank( A ) = rank( AA ⊤ ). Conversely, any positive semideﬁnite matrix M can be written as M = A ⊤ A ; this is the Cholesky decomposition. If A is a real matrix, then both A ⊤ A and AA ⊤ are diagonalizable over the ﬁeld of real numbers.A singular value decomposition (SVD) of a matrix A ∈ R n × m is a factorization A = U Σ V ⊤ with Σ = diag( σ , σ , . . . , σ p ), p = min { n, m } and σ ≥ σ ≥ . . . ≥ σ p ≥ , such that the set of columns of both matrices U = [ u , u , . . . , u n ] ∈ R n × n and V =[ v , v , . . . , v m ] ∈ R m × m are orthonormal. The diagonal entries of Σ are called singularvalues of A . If rank( A ) = r < p , then the reduced singular value decomposition (reducedSVD) of A is a factorization A = ˆ U ˆΣ ˆ V ⊤ with ˆΣ = diag( σ , σ , . . . , σ r ) ∈ R r × r and σ ≥ σ ≥ . . . ≥ σ r >

0, such that the matrices U = [ u , u , . . . , u r ] ∈ R n × r and7 = [ v , v , . . . , v r ] ∈ R m × r are both orthonormal. The following lemma gives the relationbetween the SVD of matrix A and eigendecomposition of the matrices AA ⊤ and A ⊤ A . Lemma 1. ([5], Section 5.6, Facts 8,9)

Let A ∈ R n × m , then the following facts holds: (i) The nonzero singular values of A are the square roots of nonzero eigenvalues of A ⊤ A or AA ⊤ . (ii) if U Σ V ⊤ is a reduced SVD of A , then columns of V are eigenvectors of A ⊤ A andcolumns of U are eigenvectors of AA ⊤ . The

Moore-Penrose pseudo-inverse of a matrix A , denoted by A + , is deﬁned as amatrix that satisﬁes all the following four conditions: AA + A = A, A + AA + = A + , ( AA + ) ⊤ = AA + , ( A + A ) ⊤ = A + A The Moore-Penrose pseudo-inverse exists and is unique for any given matrix A . Wehave A + = ( A ⊤ A ) + A ⊤ = A ⊤ ( AA ⊤ ) + . For further properties of the Moore-Penrosepseudo-inverse see for instance [5]. The two following Lemmas provide the Moore-Penrosepseudo-inverse of the matrix A based on some nonzero eigendecomposition of AA ⊤ . Theproof of the ﬁrst one is straight forward and left to the readers.The following lemmas provide the Moore-Penrose pseudo-inverse of a matrix A basedon some nonzero eigendecomposition of the matrix AA ⊤ . Lemma 2.

Let B be a positive semi-deﬁnite real matrix. Then B admits an orthonormalnonzero eigendecomposition of the form B = Q Λ Q ⊤ . Where Q ⊤ Q = I . Moreover let B = AA ⊤ , then we have A ⊤ QQ ⊤ = A ⊤ . Proof.

Using the previous notation, let AA ⊤ = P Λ P ⊤ be an orthonormal decompo-sition for AA ⊤ and P = [ Q N ] where the columns of N are in ker( AA ⊤ ). The equation Q ⊤ Q = I is concluded from the orthonormality of the columns of Q . If y denotes a col-umn of N , by A ℓk A ⊤ ℓk y = 0 we obtain y ⊤ A ℓk A ⊤ ℓk y = 0, hence || A ⊤ ℓk y || = 0, which yields A ⊤ ℓk y = 0. Thus A ⊤ N = 0 . Now, from

P P ⊤ = I we obtain QQ ⊤ + N N ⊤ = I ; Multiplyingfrom left by A ⊤ and using A ⊤ N = 0, we provide A ⊤ QQ ⊤ = A ⊤ . (cid:3) emma 3. Let A n × m be a real matrix and let AA ⊤ = Q Λ Q ⊤ be a nonzero orthonormaleigendecomposition of AA ⊤ . Then the Moore-Penrose pseudo-inverse of A is given by W = A ⊤ Q Λ − Q ⊤ . Moreover, if the all one column vector j = [1 1 . . . ⊤ is an eigenvectorof A ⊤ A , then W A j = j . Proof.

The proof is easily obtained by using Lemma 2. (cid:3)

Lemma 4.

Let A n × m be a real matrix and suppose that the columns of Υ are a completeset of eigenvectors corresponding to nonzero eigenvalues of AA ⊤ . Let the columns of Υ be c , . . . , c n corresponding to the nonzero eigenvalues λ , . . . , λ n . (i) An orthonormal nonzero eigendecomposition AA ⊤ = Q Λ Q ⊤ is obtained by setting Q = Υ E , where E = diag( k c i k ) ≤ i ≤ n . (ii) If we denote Moore-Penrose pseudo-inverse of A by W , then W = A ⊤ Υ D Υ ⊤ , where D = diag( || c i || λ i ) ≤ i ≤ n . Consequently, W = A ⊤ C where the entries of C are givenby C ij = P k Υ ik Υ jk || c k || λ k . Proof. (i) In order to obtain normal eigenvectors, it is enough to divide column c i ofΥ by its norm, that is to multiply the matrix Υ from right by the diagonal matrix E = diag( || c i || ) ≤ i ≤ n to get Q = Υ E .(ii) By Lemma 3, W = A ⊤ Q Λ − Q ⊤ = A ⊤ Υ E Λ − E ⊤ Υ ⊤ . Since both E and Λ arediagonal, so is E Λ − E ⊤ ; Setting D = E Λ − E ⊤ , we obtain W = A ⊤ Υ D Υ ⊤ , where D = diag( || c i || λ i ) ≤ i ≤ n ; Setting C = Υ D Υ ⊤ we obtain W = A ⊤ C and C ij = P k Υ ik Υ jk || c k || λ k , as required. (cid:3) ν B and some of its properties In this section, we consider an order on the set ∆ B and based on this deﬁne a function ν B on the set ∆ B × ∆ B and inspect some of its properties. For an integer b i ≥

2, thefollowing linear order makes ∆ b i a totally ordered set:0 ≺ i ≺ i . . . ≺ i b i − ≺ i g B = ∆ b × · · · × ∆ b ℓ , a poset isobtained; More precisely, for two elements x = x · · · x ℓ and y = y · · · y ℓ with x i , y i ∈ ∆ b i ,(1 ≤ i ≤ ℓ ), we have x (cid:22) B y if and only if x i (cid:22) i y i holds for i = 1 , . . . , ℓ . Below ispresented the deﬁnition of a useful function on ∆ B × ∆ B . Deﬁnition 11.

Consider the ℓ -tuple B = ( b , b , . . . , b ℓ ) , where b i ≥ is integer for i = 1 , . . . , ℓ . For any i , (1 ≤ i ≤ ℓ ) , we deﬁne the function ν i on ∆ b i × ∆ b i as ν i ( x, y ) =  − b i if x = y = g, − y if x = y = g, if x ≺ y, if x ≻ y. Now the function ν B is deﬁned on the product set ∆ B × ∆ B by the following product rule ν B ( x · · · x ℓ , y · · · y ℓ ) = ℓ Y i =1 ν i ( x i , y i ) (2) Remark 3.

The function ν satisﬁes the property “ ν B ( x, y ) = 0 unless x (cid:22) B y ”; Thismeans that it is an element of the incidence algebra of the poset ∆ B (For the deﬁnitionand some examples of this concept, see for instance Chapter 8 of [1]). It is observed that ν B satisﬁes X x (cid:22) z (cid:22) y ν B ( x, z ) = (Q ℓi =1 ( y ′ i − x ′ i ) if x (cid:22) y, otherwise. (3) where the values x ′ i , (1 ≤ i ≤ ℓ ) , are deﬁned x ′ i = ( b i if x i = g,x i otherwise.and y ′ i ’s are deﬁned similarly. The equation (3) shows similarities between the function ν B and the M¨obius function of the poset ∆ B . Some useful identities about ν B are stated in Proposition 1, but before stating thisproposition we need some deﬁnitions and lemmas. Deﬁnition 12.

Let ℓ be a positive integer, B = ( b , . . . , b ℓ ) and let v ′ , v ′′ ∈ ∆ B . Let m, n be integers with ≤ m, n ≤ ℓ such that | G v ′ | = ℓ − n and | G v ′′ | = ℓ − m . Deﬁne the sets A , A , A and A by A = G v ′ ∩ G v ′′ , A = G v ′′ \ G v ′ , A = G v ′ \ G v ′′ and A = G v ′ ∩ G v ′′ . emma 5. Let v ′ , v ′′ ∈ ∆ B and the sets A , A , A and A be as in Deﬁnition 12. (i) The sets A , A , A and A are mutually disjoint and A ∪ A ∪ A ∪ A = [ ℓ ] .Moreover A = [ ℓ ] unless v ′ = v ′′ = g ℓ . (ii) If A = A = ∅ , then A = G v = G v ′ and G v ′ = G v ′′ = A ; If furthermore v ′ = v ′′ ,then there exists i ∈ A such that v ′ i = v ′′ i Proof.

The proof is straightforward. (cid:3)

Lemma 6.

Let w, v ′ , v ′′ ∈ ∆ B . (i) If ν B ( w, v ′ ) ν B ( w, v ′′ ) = 0 , then G w ⊆ A . (ii) If G w ⊆ A , then ν B ( w, v ′ ) ν B ( w, v ′′ ) = p p p p , where p = Y i ∈ G w b i , p = Y i ∈ A ν i ( w i , v ′′ i ) ,p = Y i ∈ A ν i ( w i , v ′ i ) , p = Y i ∈ A ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) Proof.

The proof of part (i) is straightforward. The proof of part (ii) is obtained using ν B ( w, v ′ ) ν B ( w, v ′′ ) = ℓ Y i =1 ν i ( w i , v ′ i ) ν i ( w i , v ′′ i )= Y j =0 Y i ∈ A j ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) , and the deﬁnition of ν i . (cid:3) Proposition 1.

Let v ′ , v ′′ ∈ Γ B . Then (i) X w ∈ V ℓk ν B ( w, v ′ ) =  ( − ℓ − k S ℓ − k ( B ) , if v ′ = g ℓ , , otherwise. (ii) X w ∈ V ℓk ν B ( w, v ′ ) ν B ( w, v ′′ ) =  S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ b i Y i ∈ G v ′ ( v ′ i + v ′ i ) , if v ′ = v ′′ , , otherwise. roof. (i) For w ∈ V ℓ,k we have ν B ( w, g ℓ ) = Q i ∈ G w ( − b i ), hence we obtain X w ∈ V ℓk ν B ( w, g ℓ ) = ( − ℓ − k S ℓ − k ( B )which proves part(i) in the case v ′ = g ℓ .Now suppose that v ′ ∈ V ℓ ; ≤ k ; B \ { g ℓ } , hence for some 1 ≤ j ≤ ℓ , v ′ j = g . Then from X w ∈ V ℓk ν B ( w, v ′ ) = b i − X w i =0 ℓ Y i =1 ν i ( w i , v ′ i )= ℓ Y i =1 b i − X w i =0 ν i ( w i , v ′ i ) , by using P b j − w j =0 ν j ( w j , v ′ j ) = 0, the right side is simpliﬁed to 0, as required.(ii) To prove this part, observe that if | A | < ℓ − k , each summand in the left, is zero andthere is nothing to prove. So, let | A | ≥ ℓ − k ; Setting X ℓk ( B, G ) = { w ∈ V ℓk ( B ) : G w = G } we obtain X w ∈ V ℓk ( B ) ν B ( w, v ′ ) ν B ( w, v ′′ ) = X G ∈ ( A ℓ − k ) X w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) ν B ( w, v ′′ ) (4)First we compute the summand P w ∈ X ℓk ( B,G ) ν ( w, v ′ ) ν ( w, v ′′ ), for a ﬁxed G ∈ (cid:0) A ℓ − k (cid:1) .For this, without loss of generality, let A = { , . . . , a } , A = { a + 1 , . . . , a + a } , A = { a + a +1 , . . . , a + a + a } and A = { a + a + a +1 , . . . , ℓ } , where a , a , a are non-negative integers. Moreover, without loss of generality, let G = { k +1 , . . . , ℓ } .Now w ∈ X ℓk ( B, G ) can be factorized in the form w = qrstg ℓ − k , with | q | = a , | r | = a , | s | = a and | t | = a − ( ℓ − k ) and when w runs over X ℓk ( B, G ), each ofthe words q, r, s and t runs over a proper set accordingly. By part (ii) of Lemma 6we obtain X w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) ν B ( w, v ′′ ) = P P P P , (5)where P = Y i ∈ A b i Y i ∈ G b i , P = Y i ∈ A b i − X w i =0 ν i ( w i , v ′ i ) ,P = Y i ∈ A b i − X w i =0 ν i ( w i , v ′′ i ) , P = Y i ∈ A b i − X w i =0 ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) Case 1. v ′ = v ′′ ; In this case A = G v ′ and A = A = ∅ , hence P = P = 1and P w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) = P P . In this case, P = Y i ∈ G v ′ ( v ′ i + v ′ i ) and P = Y i ∈ G v ′ b i Y i ∈ G b i , so X w ∈ V ℓk ( B ) ν B ( w, v ′ ) ν B ( w, v ′′ ) = X G ∈ ( Gv ′ ℓ − k ) X w ∈ X ℓk ( B,G ) ν B ( w, v ′ ) ν B ( w, v ′′ )= X G ∈ ( Gv ′ ℓ − k ) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i Y i ∈ G b i = Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i X G ∈ ( A ℓ − k ) Y i ∈ G b i = Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i S ℓ − k ( B ( G v ′ )) Case 2. v ′ = v ′′ ; If A = ∅ then P = 0 and if A = ∅ then P = 0. Otherwise, if A = A = ∅ , then by Lemma 5 (ii), A = ∅ and there exists i ∈ A with v ′ i = v ′′ i ;For this i , P w i ν i ( w i , v ′ i ) ν i ( w i , v ′′ i ) = 0 thus P = 0. Hence, the hypothesis v ′ = v ′′ implies that the right side of (5) is zero in either case, and we get the result by (4). (cid:3) Proposition 2.

Let v ′ ∈ Γ B . Then (i) For any u ∈ U ℓ we have X y ∈ M ℓ,k ; B ( u ) ν B ( y, v ′ ) = ( − ℓ − k S ℓ − k ( B ( G v ′ )) ν B ( u, v ′ ) , (6)(ii) For any v ∈ V ℓk we have X u ∈ N ℓ,k ; B ( v ) ν B ( u, v ′ ) = ( − ℓ − k ν B ( v, v ′ ) (7) Proof. (i) If the summand ν ( y, v ′ ) is nonzero, then G y ⊆ G v ′ , on the other hand thenon-gaped positions of all such y ’s are the same as u . Let G y = { x , x , . . . , x ℓ − k } ,13hen ν B ( y, v ′ ) = ( − ℓ − k b x b x · · · b x ℓ − k ν B ( u, v ′ ). Therefore X y ∈ M ℓ,k ; B ( u ) ν B ( y, v ′ ) =( − ℓ − k ν B ( u, v ′ ) X { x ,x ,...,x ℓ − k }⊆ G v ′ b x b x . . . b x ℓ − k =( − ℓ − k S ℓ − k ( B ( G v ′ )) ν B ( u, v ′ ) , as required.(ii) We distinguish two cases: Case (a).

Suppose that G v ⊆ G v ′ . Then for any u ∈ N ℓ,k ; B ( v ) we have ν B ( u, v ′ ) = Q i ∈ G v ′ ν i ( u i , v ′ i ) = Q i ∈ G v ′ ν i ( v i , v ′ i ) and since there are totally Q i ∈ G v ′ b i such words u , the left side of equation (7) equals Y i ∈ G v b i Y i ∈ G v ′ ν i ( v i , v ′ i ) = ( − | G v | Y i ∈ G v ν i ( g, g ) Y i ∈ G v ′ \ G v ν i ( v i , g ) Y i ∈ G v ′ ν i ( v i , v ′ i )= ( − ℓ − k ℓ Y i =1 ν i ( v i , v ′ i )which equals ( − ℓ − k ν B ( v, v ′ ) as required. Case (b).

Suppose that G v G v ′ , consequently G v \ G v ′ = ∅ . Now for any i ∈ G v \ G v ′ we have ν ( v i , v ′ i ) = 0, thus the right side of (7) is 0; The followingargument shows that the left side is 0 as well: The nonzero summands in the leftside of (7) are obtained from elements u ∈ X where the subset X ⊆ Σ B is given by X = { u ∈ Σ B : u i ≤ v i for i ∈ G v \ G v ′ and u i = v i for i ∈ G v } . Thus we obtain X u ∈ N ℓk ; B ( v ) ν B ( u, v ′ ) = X u ∈ X ν B ( u, v ′ )= X u ∈ X ℓ Y i =1 ν i ( u i , v ′ i )= X u ∈ X  Y i ∈ G v ν i ( u i , v ′ i ) Y i ∈ G v \ G v ′ ν i ( u i , v ′ i ) Y i ∈ G v ∩ G v ′ ν i ( u i , g )  =  Y i ∈ G v ν i ( v i , v ′ i )   Y i ∈ G v \ G v ′ v ′ i X u i =0 ν i ( u i , v ′ i )   Y i ∈ G v ∩ G v ′ b i  i ∈ G v \ G v ′ we have P v ′ i u i =0 ν i ( u i , v ′ i ) = P v ′ i − u i =0 − v ′ i = 0 . Thus (7) is true in either case. (cid:3)

Deﬁnition 13.

Let u ∈ Σ B and v ∈ ∆ B . Then P ( u, v ) and Q ( u, v ) are deﬁned as below P ( u, v ) = { i : 1 ≤ i ≤ ℓ, v i = g, v i = u i } Q ( u, v ) = { i : 1 ≤ i ≤ ℓ, v i = g, v i = u i } We denote P ( u, v ) and Q ( u, v ) by P and Q , respectively, if there is no danger of confusion. With the above deﬁnition, it is obvious that | P ( u, v ) | + | Q ( u, v ) | = | G v | . Particularly, if v ∈ V ℓ,k and | P | = p then | Q | = k − p . Proposition 3.

Let u ∈ Σ B and v ∈ ∆ B . Recall the notation of Deﬁnition 13. (i) For ≤ i ≤ ℓ , let φ i ( u, v ) = b i − X j =max { ,v i } ν i ( v i , j ) ν i ( u i , j ) j ( j + 1) . Then we have φ i ( u, v ) =  b i − b i , if i ∈ P ; − b i , otherwise, i.e. if i ∈ Q. (ii) Let G be a given subset of [ ℓ ] with G v ⊆ G . Then the following identity holds X v ′ ∈ Γ B ,G v ′ = G ν B ( v, v ′ ) ν B ( u, v ′ ) Y i ∈ G v ′ ( v ′ i + v ′ i ) = ( − | Q \ G | + | G v | Y i ∈ G v b i Y i ∈ P \ G ( b i − Y i ∈ G b i . (8) Proof. (i) The proof if this part is easy and left to the reader.(ii) Note that if G v ⊆ G v ′ = G , it is easily obtained that ν B ( v, v ′ ) ν B ( u, v ′ ) = ( − | G v | Y i ∈ G v b i Y i ∈ G ν i ( v i , v ′ i ) ν i ( u i , v ′ i )15ence if we denote by S the left side of (8), we obtain S = ( − | G v | Y i ∈ G v b i X v ′ ∈ Γ B ,G v ′ = G Y i ∈ G ν i ( v i , v ′ i ) ν i ( u i , v ′ i ) v ′ i ( v ′ i + 1)= ( − | G v | Y i ∈ G v b i Y i ∈ G b i − X j =max { ,v i } ν i ( v i , j ) ν i ( u i , j ) j ( j + 1)= ( − | G v | Y i ∈ G v b i Y i ∈ G φ i ( u, v )Thus by part (i), we get S = ( − | G v | Y i ∈ G v b i Y i ∈ G ∩ Q − b i Y i ∈ G ∩ P b i − b i = ( − | Q \ G | + | G v | Y i ∈ G v b i Y i ∈ P \ G ( b i − Y i ∈ G b i . (cid:3) A ℓk ; B A ⊤ ℓk ; B and A ⊤ ℓk ; B A In this section we give an orthonormal nonzero eigendecomposition of the matrices A ℓk ; B A ⊤ ℓk ; B and A ⊤ ℓk ; B A . Eigenvectors of these matrices are the elementary symmetric polynomials andthe entries of the corresponding eigenvectors are given in terms of the function ν B . Usingthe properties of ν B we show that these eigenvectors are mutually orthogonal. Deﬁnition 14.

Let n, k ≤ ℓ be integers. Given B = ( b , . . . , b ℓ ) and v ′ ∈ V ′ ℓ,n ; B , wedeﬁne the column vector x ℓ,k,nv ′ as a vector whose rows are indexed by the elements of V ℓ,k ; B with entries x ℓ,k,nv ′ ( w ) = ( − ℓ − k ν B ( w, v ′ ) . The column vector z ℓ,nv ′ is then deﬁned as z ℓ,nv ′ = x ℓ,ℓ,nv ′ ; In other words, z ℓ,nv ′ is a column vector whose rows are indexed by elements u of Σ B with entries z ℓ,nv ′ ( u ) = ν B ( u, v ′ ) . When there is no need to emphasize on theparameters ℓ , k and n , we simply write x v ′ and z v ′ . roposition 4. Let v ′ ∈ Γ B and n = | G v ′ | . Then the following identity holds: k x ℓ,k,nv ′ k = S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i Proof.

See proposition 1 (ii) (cid:3)

The following proposition contains a generalization of Proposition 2 of [7]:

Proposition 5.

Let ≤ k ≤ ℓ , ≤ n ≤ ℓ and v ′ ∈ V ′ ℓn . The following matrix identitieshold. (i) A ⊤ ℓk ; B x ℓknv ′ = S ℓ − k ( B ( G v ′ )) z ℓnv ′ . (ii) A ℓk ; B z ℓnv ′ = x ℓknv ′ . (iii) A ℓk ; B A ⊤ ℓk ; B x ℓknv ′ = S ℓ − k ( B ( G v ′ )) x ℓknv ′ . (iv) A ⊤ ℓk ; B A ℓk ; B z ℓnv ′ = S ℓ − k ( B ( G v ′ )) z ℓnv ′ . (v) For any two distinct words v ∈ V ′ ℓn and u ∈ V ′ ℓn , the vectors x ℓkn v and x ℓkn u areorthogonal. (vi) For any two distinct words v ∈ V ′ ℓn and u ∈ V ′ ℓn , the vectors z ℓn v and z ℓn u areorthogonal. Proof.

The proofs of (i) and (ii) are concluded from deﬁnitions of x v ′ and z v ′ and Propo-sition 2. Combining (i) and (ii) yields (iii) and (iv). The proofs of (v) is concluded fromProposition 1(ii). The same proposition yields part (vi) by setting k = ℓ . (cid:3) Theorem 1.

Let ≤ k ≤ ℓ . Then (i) The set { S ℓ − k ( B ( G v ′ )) : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ }} consists of all eigenvalues ofthe matrix A ⊤ ℓ,k ; B A ℓ,k ; B and the set { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ } is a completeset of eigenvectors corresponding to eigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B . Moreover, theseeigenvectors are pairwise orthogonal. The set { S ℓ − k ( B ( G v ′ )) : v ′ ∈ V ′ ℓ, ≤ k ; B } consists of all non-zero eigenvalues of thematrix A ℓ,k ; B A ⊤ ℓ,k ; B . The set { x ℓknv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } is a complete set of eigenvectorscorresponding to aforementioned nonzero eigenvalues. Moreover, these eigenvectorsare pairwise orthogonal. Proof. (i) First we note that for any v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ , z ℓnv ′ is a nonzero vector. ByProposition 5 (iv), for any v ′ ∈ Γ B , z ℓnv ′ is an eigenvector of A ⊤ ℓ,k ; B A ℓ,k ; B correspondingto the eigenvalue S ℓ − k ( B ( G v ′ )). By Proposition 5 (vi), these eigenvectors are pair-wise orthogonal. Since |{ z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ }| = | Γ B | and | Γ B | = | Σ B | equalsthe size of the matrix A ⊤ ℓ,k ; B A ℓ,k ; B , we conclude that { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , ≤ n ≤ ℓ } is a complete set of eigenvectors corresponding to eigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B , asrequired.(ii) The set of all non-zero eigenvalues of A ℓ,k ; B A ⊤ ℓ,k ; B is the same as the set of all non-zeroeigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B . In part (i), we obtained the complete set of eigenvalues of A ⊤ ℓ,k ; B A ℓ,k ; B . On the other hand, if 0 ≤ n ≤ ℓ and v ′ ∈ V ′ ℓ,n ; B , then S ℓ − k ( B ( G v ′ )) = 0holds if and only if k < n ≤ ℓ . Hence, the set { S ℓ − k ( B ( G v ′ )) : v ′ ∈ V ′ ℓ, ≤ k ; B } consists of all non-zero eigenvalues of the matrix A ℓ,k ; B A ⊤ ℓ,k ; B as well as the matrix A ⊤ ℓ,k ; B A ℓ,k ; B . Moreover, using Proposition 5 (iii), the corresponding eigenvectors are { x ℓknv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } and these vectors are pairwise orthogonal. (cid:3) The following corollary is a straight conclusion of Theorem 1.

Corollary 1.

The set { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , k < n ≤ ℓ } , is a basis for the null-space of thematrix A ⊤ ℓ,k ; B A ℓ,k ; B . Using Fact 3 in Section 5.6 of [5] and Theorem 1, the following corollary gives thereduced SVD of A ℓ,k ; B . Corollary 2.

Let r = rank( A ⊤ ℓ,k ; B A ℓ,k ; B ) = | V ′ ℓ, ≤ k ; B | , { u , u , . . . , u r } = { S ℓ − k ( B ( G v ′ )) x ℓknv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } and { v , v , . . . , v r } = { z ℓnv ′ : v ′ ∈ V ′ ℓ, ≤ k ; B } . Then the reduced SVD of A ℓ,k ; B is given by A ℓ,k ; B = U Σ V , where the columns of U are vectors u i , ( ≤ i ≤ r ) and thecolumns of V are vectors v i , ( ≤ i ≤ r ) and Σ is an r × r diagonal matrix, whose diagonalentries are the corresponding nonzero eigenvalues of the matrix A ℓ,k ; B A ⊤ ℓ,k ; B . eﬁnition 15. Given B = ( b , . . . , b ℓ ) , we deﬁne Υ ℓ,k ; B as a matrix whose rows andcolumns are indexed by the elements of V ℓ,k ; B and V ′ ℓ, ≤ k ; B respectively and whose en-tries are given by Υ ℓ,k ; B ( w, v ′ ) = ( − | w |−| w | g ν B ( w, v ′ ) . In other words, the columnsof Υ ℓ,k ; B are exactly the vectors x v ′ for v ′ ∈ V ′ ℓ, ≤ k ; B . Also we deﬁne the matrix Λ by Λ = diag( S ℓ − k ( B ( G v ′ ))) v ′ ∈ V ′ ℓ, ≤ k ; B . Remark 4.

Using Lemma and Theorem , we obtain an orthonormal nonzeroeigendecomposition for A ℓ,k ; B A ⊤ ℓ,k ; B . A ℓ,k ; B In this section we give concrete bases for the null space and the row space of A ℓ,k ; B . Thebasis for the null space is obtained just as a corollary of the arguments of the Section 4.The basis for the row space is obtained by a proper selection of some rows of A ℓ,k ; B ; Thisgives a combinatorial interpretation for the previous formula about the rank of A ℓ,k ; B ;The process of ﬁnding a basis for the row space of the matrix A ℓ,k ; B is similar to the oneabout incidence matrices presented in [16]. Theorem 2.

Let ≤ k ≤ ℓ . Then (i) The set { z ℓnv ′ : v ′ ∈ V ′ ℓ,n ; B , k < n ≤ ℓ } , is a basis for the null-space of the matrix A ℓ,k ; B . (ii) The matrix A ℓ, ≤ k ; B has the same row space as A ℓ,k ; B and rank ( A ℓ,k ; B ) = R k ( B ) .For w ∈ V ℓ, ≤ k ; B denote by r w the row of A ℓ, ≤ k ; B indexed w . Then the set { r w : w ∈ V ′ ℓ, ≤ k ; B } is a basis for the row space of A ℓ,k ; B . Proof. (i) First we observe that the vector x ℓknv ′ is the zero vector if and only if k

Let u ∈ Σ B , v ∈ V ℓ,k ; B . Moreover, with notation of Deﬁnition 13, let P = P ( u, v ) and Q = Q ( u, v ) . Then the entry W ℓ,k ; B ( u, v ) , the Moore-Penrose pseudo- nverse of A ℓ,k ; B , is given as below W ℓ,k ; B ( u, v ) = 1 Y i ∈ G v b i X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q \ G | Y i ∈ P \ G ( b i − S ℓ − k ( B ( G )) (11) Proof.

For v ′ ∈ V ′ ℓ, ≤ k , let d v ′ = || x v ′ || λ v ′ . By Lemma 4 and the deﬁnitions of A ℓ,k ; B andΥ ℓ,k ; B we obtain W ℓ,k ; B ( u, v ) = X y ∈ M ℓ,k ; B ( u ) X v ′ ∈ V ′ ℓ, ≤ k ν B ( v, v ′ ) ν B ( y, v ′ ) d v ′ = X v ′ ∈ V ′ ℓ, ≤ k ν B ( v, v ′ ) d v ′ X y ∈ M ℓ,k ; B ( u ) ν B ( y, v ′ )= X v ′ ∈ V ′ ℓ, ≤ k ( − ℓ − k ν B ( v, v ′ ) d v ′ S ℓ − k ( B ( G v ′ )) ν B ( u, v ′ ) (by (6))Replacing d v ′ using Proposition 4, we obtaine W ℓ,k ; B ( u, v ) = ( − ℓ − k X v ′ ∈ V ′ ℓ, ≤ k ν B ( v, v ′ ) ν B ( u, v ′ ) S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i = ( − ℓ − k X G,G v ⊆ G X v ′ ∈ V ′ ℓ, ≤ k ,G v ′ = G ν B ( v, v ′ ) ν B ( u, v ′ ) S ℓ − k ( B ( G v ′ )) Y i ∈ G v ′ ( v ′ i + v ′ i ) Y i ∈ G v ′ b i = ( − ℓ − k X G, G v ⊆ G ⊆ [ ℓ ] S ℓ − k ( B ( G )) Y i ∈ G b i X v ′ ∈ Γ B ,G v ′ = G ν B ( v, v ′ ) ν B ( u, v ′ ) Y i ∈ G v ′ ( v ′ i + v ′ i )= 1 Y i ∈ G v b i X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q \ G | Y i ∈ P \ G ( b i − S ℓ − k ( B ( G )) (by (8)) (cid:3) Theorem 4.

The sum of entries of any rows (columns) of the matrix H ℓ,k ; B := W ℓ,k ; B A ℓ,k ; B equals . Furthermore, for any u, w ∈ Σ B , the entry H ℓ,k ; B ( u, w ) is given as below, where y using Deﬁnition 13, P = P ( u, w ) and Q = Q ( u, w ) . H ℓ,k ; B ( u, w ) = 1 ℓ Y i =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | ( − | Q \ G | Y i ∈ P \ G ( b i −

1) (12)

Proof.

To compute the sum of entries of each row (column) of the matrix H , we observethat z ℓk gg ··· g = j . Now, using Proposition 5 (iv) and Lemma 3, we have H j = j , as desired.To calculate the entry H ℓ,k ; B ( u, w ) of H , ﬁrstly note that if v ∼ w then for any subset G v ⊆ G ⊆ [ ℓ ] we have P ( u, v ) \ G = P ( u, w ) \ G and Q ( u, v ) \ G = Q ( u, w ) \ G . Secondly,by H ℓ,k ; B = W ℓ,k ; B A ℓ,k ; B we have H ℓ,k ; B ( u, w ) = X v ∈ V ℓ,k ; B , v ∼ w W ℓ,k ; B ( u, v )Thirdly, the result of Theorem 3, is rewritten as W ℓ,k : B ( u, v ) = 1 ℓ Y i =1 b i X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q ( u,v ) \ G | Y i ∈ G v b i Y i ∈ P ( u,v ) \ G ( b i − S ℓ − k ( B ( G ))Thus by the two last formulas we obtain H ℓ,k ; B ( u, w ) = 1 ℓ Y i =1 b i X v ∈ V ℓ,k ; B , v ∼ w X G, G v ⊆ G ⊆ [ ℓ ] ( − | Q ( u,v ) \ G | Y i ∈ G v b i Y i ∈ P ( u,v ) \ G ( b i − S ℓ − k ( B ( G ))= 1 Q ℓi =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | X v ∈ V ℓ,k ; B , v ∼ w,G v ⊆ G ( − | Q ( u,v ) \ G | Y i ∈ G v b i Y i ∈ P ( u,v ) \ G ( b i − S ℓ − k ( B ( G ))= 1 Q ℓi =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | X v ∈ V ℓ,k ; B , v ∼ w,G v ⊆ G ( − | Q ( u,w ) \ G | Y i ∈ G v b i Y i ∈ P ( u,w ) \ G ( b i − S ℓ − k ( B ( G ))= 1 Q ℓi =1 b i X G ⊆ [ ℓ ] , ℓ − k ≤| G | ( − | Q ( u,w ) \ G | Y i ∈ P ( u,w ) \ G ( b i − S ℓ − k ( B ( G )) X v ∈ V ℓ,k ; B , v ∼ w,G v ⊆ G Y i ∈ G v b i S ℓ − k ( B ( G )), we obtain(12). (cid:3) Acknowledgement.

M. Mohammad-Noori would like to thank University of Tehranfor supporting him during his sabbatical leave at IPM. The researches of N. Ghareghaniand M. Mohammad-Noori were in part supported by grants from IPM (No. 94050016 andNo. 95050129).

References ∼ pjc/notes/counting.pdf. Accessed 25 January 2012.[2] D. A Cox, J. Little, D. O’Shea, Ideals, varieties, and algorithms: an introduction tocomputational algebraic geomerty and commutative algebra, (3rd edition), Springer,2007.[3] Ph. Delsarte, Beyond the orthogonal array concept , European Journal of Combina-torics (2004), , 187-198.[4] Ph. Delsarte, Association schemes and t -designs in regular semilattices , Journal ofCombinatorial Theory Series A (1976), , 230-243.[5] L. Han and M. Neumann, Inner Product Spaces, Orthogonal Projection, Least Squaresand Singular Value Decomposition

In: Leslie Hogben (eds)

Handbook of Linear Alge-bra, 2nd Edition (Discrete Mathematics and Its Applications)

Boca Raton, FL: CRCPress; (2013).[6] M. Ghandi, D. Lee, M. Mohammad-Noori, and M. A. Beer,

Enhanced regulatorysequence prediction using gapped k-mer features , PLoS Comput Biol, 2014. (7): p.e1003711[7] M. Ghandi, M. Mohammad-Noori, and M. A. Beer, Robust k -mer frequency esti-mation using gapped k -mers , Journal of mathematical biology. (2014), , Issue 2 ,469-500.[8] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A Foundationfor Computer Science (Second Edition), Addison Wesley Publishing Company, 1994.239] D. Lee, D. U. Gorkin, M. Baker, B. J. Strober, A. L. Asoni, A. S. McCallion and M.A. Beer, A method to predict the impact of regulatory variants from DNA sequence ,Nature Genetics (August 2015), , no. 8 : 955-961. doi:10.1038/ng.3331.[10] M. Marcus and H. Minc, Introduction to Linear Algebra. New York: Dover, p. 182,1988.[11] M. Marcus and H. Minc, ”Positive Deﬁnite Matrices.”4.12 in A Survey of MatrixTheory and Matrix Inequalities. New York: Dover, p. 69, 1992.[12] A. Mo, Ch. Luo, F. P. Davis, E. A. Mukamel, G. L. Henry, J. R. Nery, M. A. Urich, etal, Epigenomic landscapes of retinal rods and cones , eLife (March 7, 2016), : e11613.doi:10.7554/eLife.11613.[13] B. Morgenstern, B. Zhu, S. Horwege and Ch. A. Leimeister, Estimating evolution-ary distances between genomic sequences from spaced-word matches , Algorithms forMolecular Biology (2015), : 5. doi:10.1186/s13015-015-0032-x.[14] P. Terwilliger, The incidence algebra of a uniform poset

In: Codeng Theory andDesign Theory Part I: Coding Theory, IMA Volumes in Mathematics and its Appli-cations, vol. (20) , Springer, New York, 1990, 193-212.[15] Chaudhari HG and Cohen BA,

Local sequence features that inﬂuence AP-1 cis-regulatory activity , Genome Res. 2018 Feb;28(2):171-181. doi: 10.1101/gr.226530.117.Epub 2018 Jan 5.[16] R. M. Wilson,

A diagonal form for the incidence matrices of t-subsets vs. k-subsets .European Journal of Combinatorics (1990),11