[PDF] On the weight and density bounds of polynomial threshold functions

Abstract

In this report, we show that all n-variable Boolean function can be represented as polynomial threshold functions (PTF) with at most 0.75× 2 n non-zero integer coefficients and give an upper bound on the absolute value of these coefficients. To our knowledge this provides the best known bound on both the PTF density (number of monomials) and weight (sum of the coefficient magnitudes) of general Boolean functions. The special case of Bent functions is also analyzed and shown that any n-variable Bent function can be represented with integer coefficients less than 2 n while also obeying the aforementioned density bound. Finally, sparse Boolean functions, which are almost constant except for m<< 2 n number of variable assignments, are shown to have small weight PTFs with density at most m+ 2 n−1 .

Full PDF

1 On the weight and density bounds of Polynomial threshold functions

Erhan Oztop , Minoru Asada

Osaka University, Japan, Ozyegin University, Turkey

Abstract.

In this report, we show that all n-variable Boolean function can be represented as polynomial threshold functions (PTF) with at most non-zero integer coefficients and give an upper bound on the absolute value of these coefficients. To our knowledge this provides the best known bound on both the PTF density (number of monomials) and weight (sum of the coefficient magnitudes) of general Boolean functions. The special case of Bent functions is also analyzed and shown that any n-variable Bent function can be represented with integer coefficients less than 2 n while also obeying the aforementioned density bound. Finally, sparse Boolean functions, which are almost constant except for m ≪ 2 number of variable assignments, are shown to have small weight PTFs with density at most m + 2 . Keywords. Boolean Function, Polynomial Threshold Function, Bent function, Sparse function, Polynomial Sign Representation Introduction

We consider Boolean functions : {−1,1} → {−1,1} and study their polynomial threshold function (PTF) representations in terms of the number of monomials and the magnitude of the polynomial coefficients. PTF representation is also called polynomial sign-representation, and in the literature, abstract units computing PTFs appear under the names of PTF units, higher-order neurons, product units, or sigma-pi units [1-3]. As it is known that the use of higher-order units increases the computational power and storage capacities of neural networks [4-6], past research in neural networks have focused on developing algorithms for finding a small set of monomials to sign-represent a given Boolean function (BF) without suffering from the combinatorial growth problem [e.g. 6, 7-9]. Besides the practical application of PTFs, it is of theoretical importance to know the extremal properties of PTF representation of Boolean functions. The theoretical studies on this front come from extremal combinatorics [e.g. 10, 11, 12] and circuit complexity [e.g. 13, 14]. The three PTF complexity measures focused in the literature are degree , density and weight of BFs [ 10, 15]. The PTF degree of a BF , refers to the minimum degree polynomial that can sign-represent f. The PTF density of is defined as the minimum number of monomials with which one can sign-represent . Finally, the PTF weight of is the smallest sum of the absolute values of the weights in an integer-weight PTF representation of . This paper is focused on the latter two measures. In an earlier report we have derived a non-asymptotic upper bound, on the PTF density of n-variable BFs, which, to our knowledge, is still the best upper bound known [16]. In this report, we show that it is possible to obtain integer-weight PTF representation with

2 the same density bound for any BF and derive an upper bound on the weight of such representation. We then direct our attention to Bent functions and prove that they assume integer coefficient PTF representations with surprisingly low absolute values, i.e. at most , while still having a density of at most. Finally, we derive PTF density and weight upper bounds for sparse Boolean functions, which are almost constant except for ≪ 2 number of variable assignments. Basic definitions and results

We use bold lower-case letters to indicate vectors and bold upper case for matrices. The vectors are column vectors unless otherwise stated. Table 1 lists the acronyms and notations used in the paper. In some cases, the precise meaning of the terms used in the table will become clear with their full definition given subsequently in the text.

Table 1. The acronyms and notations used in the paper.

BF Boolean Function PTF Polynomial Threshold Function A real column vector A real matrix The identity matrix (sometimes size is given as a subscript) ℋ Sylvester type Hadamard matrix of order ℋ A Sylvester type Hadamard matrix (when order is clear from the text) ||

Component wise absolute value of the real vector ⌈⌉ The maximum component in || sgn()

The sign function applied to each component of diag() The square matrix where the diagonal elements are taken from log() The base-2 logarithm of All ones column vector All zeros column vector If is a n-variable BF, then is its component vector representation The number of rows in the matrix PTF density of PTF weight of In this study, we consider Boolean functions over the domain {−1,1} where −1 and corresponds to binary variables of 1 (True) and False (0) respectively. A binary variable ∈{0,1} can be converted to {−1,1} domain with the transformation (−1) . By adopting a fixed ordering for the truth assignments of the input variables, an n-variable Boolean function can be represented as sized ±1 vector . A BF have several other useful representations. In particular, in this report, the coefficients of multi-linear polynomials either interpolating or matching the sign of f at every variable assignment will be utilized. We describe these representations concretely in the following definitions. Definition (Spectrum). Any n-variable Boolean function has a unique representation as the weighted sum of monomials ( , , ⋯ , ) = ∑ ∏ ∈ where ⊂ {1,2, … , } . This can be seen for example by direct application of Lagrange interpolation. The coefficient vector s is called the spectrum, or the Fourier coefficients of denoted by . Definition (Sign Representation/Polynomial Threshold Function). A multilinear polynomial is said to sign-represent an n-variable Boolean function if ( , , ⋯ , ) =sgn( ( , , ⋯ , )) for all assignments [ , , ⋯ , ] ∈ {−1,1} . In this case we say is a Polynomial Threshold Function (PTF) representation of , or sign-represents f. Definition (Walsh spectrum, Walsh polynomial). The scaled version of the exact interpolating polynomial of BF is called the Walsh polynomial of . Consequently, the spectrum scaled by is called the Walsh spectrum or Walsh coefficients of . Definition (

Sylvester type Hadamard matrix). A Sylvester type Hadamard matrix of order ( > 0 ) can be defined as [see e.g. 13, 17] ℋ = [1] if = 0ℋ ℋ ℋ −ℋ if > 0 Eq. 1

Lemma 1 . A Sylvester type Hadamard matrix ℋ is symmetric, orthogonal and has ℋ as the inverse. Proof . Symmetricity is evident from the definition. Orthogonality and the expression for the inverse can be obtained by applying induction on the following identity. ℋ ℋ = ⎩⎨⎧ 1 if = 02 ℋ ℋ ℋ ℋ if > 0 □ Definition (Exact polynomial representation in vector form). One can choose a natural ordering for the truth assignments of the variables and the monomials such that the condition ( , , ⋯ , ) = ( , , ⋯ , ) can be precisely expressed with = ℋ where ℋ is a × 2 Sylvester type Hadamard matrix [13, 16]. Note that here is the vector representation of while is its spectrum. Lemma 2 . The spectrum, of the polynomial representation of a Boolean function, is given by = 2 ℋ . Consequently, the Walsh spectrum is given by = ℋ Proof . Exact interpolation at each assignment means = ℋ . Multiplying both sides with the inverse of ℋ gives the desired result. □ Lemma 3 . The norm of the spectrum ( ) of any BF is 1. Consequently, the norm of the Walsh spectrum ( ) of any BF is . Proof . Use Lemma 2 for the expression of the spectrum and compute ‖‖ : ‖‖ = (2 ℋ) ℋ = 2 ℋℋ = 2 = 1 which says ‖‖ = 1 and ‖‖ = 2 . □ Definition (The vector form of sign-representation). Using the natural ordering introduced above, the sign-representation condition can be expressed as ℋ > where =diag() , which is called the vector form of sign-representation. By using the vector form of sign-representation it is easy to see that there is a one-to-one correspondence between sign representations and the positive half space of ℝ due to the orthogonality of ℋ . Below we state this as a theorem. Theorem 1 . For a given BF , Let = () then ℋ > if and only if ∈ℋ | ∈ ℝ and > . Proof . ℋ > ⟺ ℋ = for some ∈ ℝ with > . Multiplying the equation with the inverse of ℋ gives the desired result. □ Contributions

To present the contributions of this study concretely we make the following definitions.

Definition (

Density/length ). The length or density of a PTF is the number of non-zero weight input lines. Equivalently, it is the number of monomials in a given PTF representation of a BF.

Definition (

Weight ). The weight of a PTF (representing some BF ) is the sum of the absolute values of the coefficients. Definition (PTF density, ). PTF density of a BF ( ) is the minimum number of monomials that one can sign-represent it. Equivalently, density of is the minimum length PTF representation of it. When the function is clear from the context or when all BFs are implied then subscript may be dropped. Definition (PTF weight, ). PTF weight of a BF ( ) is the minimum sum of the absolute values of the coefficients over all possible PTF representations of with integer coefficients. When we wish to constraint the weight to set of PTFs satisfying a specific condition, we write the condition expression in the square brackets, as in [] . When the function is clear from the context or when all BFs are implied then subscript may be dropped.

5 We can now concisely state the contributions of the current study and give the related results (Table 2):  Analyze the proof [16] that establishes ≤ 0.75 × 2 for all  Derive an upper bound on [ ≤ 0.75 × 2 ] over all BF .  Derive an upper bound on [ ≤ 0.75 × 2 ] over all Bent functions  Derive an upper bound on over all m-sparse functions  Derive an upper bound on [ ≤ + 2 ] over all m-sparse functions Table 2. The summary of the results presented/obtained. and represents PTF density and PTF weight respectively. PTF measure Function class Upper bound

Source All BFs Theorem 2 , (Oztop[16]) ≤ 0.75 × 2 All BFs Theorem 3 , corollary [ ≤ 0.75 × 2 ] Bent functions Theorem 4 , corollary m-sparse functions + ( , 2 ) Theorem 5 [ ≤ + 2 ] m-sparse functions . . Theorem 6 corollary

The PTF weight bounds derived in this report utilizes the well-known Hadamard inequality together with the logic of the proof followed in our earlier work [16]. We state the former as a lemma without proof and give a sketch of the proof for the latter in the next section.

Lemma 4 .(Hadamard). If is an × matrix, then | | ≤ ∏ ∑ / where is the matrix entry at row and column of . Furthermore, the equality is attained if and only if is an orthogonal matrix. A simple but useful result we borrow for weight bound estimations is due to Håstad on the divisibility of the determinants of ±1 matrices [18]: Lemma 5 (

Håstad). The determinant of an × matrix with entries ±1 is divisible by . Proof. Adding row 1 to the other rows in produce a new matrix where rows have only zeros and ±2 entries. Since adding a scaled version of a row to another row does not change the determinant det = det . Factoring out ’s from det shows that det = det is divisible by □ PTF representation with length at most We use the recipe given in [16] to construct PTFs with or smaller number of monomials; but while doing so we ensure that the weights found are integers, and then bound these integers from above. Therefore, a review of the theorem given in [16] is in order. Theorem 2 (Oztop). For any n-variable Boolean function, there exists a sign-representing polynomial with or a smaller number of monomials. Proof . (Sketch) Remembering that all the PTF representations of a Boolean function are characterized by the inequality system diag()ℋ > , we represent the upper and lower halves of the coefficient vector with and , and expand ℋ by noting that ℋ = ℋ ℋ ℋ −ℋ and writing out the rows of ℋ as row vectors : diag ⎣⎢⎢⎢⎢⎡ ⋮ ⋮ − ⋮ ⋮ − ⎦⎥⎥⎥⎥⎤ > Eq. 2

Now, we construct two matrices and according to whether the function output flips sign when the last variable flips sign. With the ordering adopted for the rows of ℋ , this corresponds to comparing with for = 1,2. . 2 . With slight abuse of notation, we can write = = and = − ≠ . Thus and are made up of disjoint rows of ℋ which may have been scaled by −1 . It can be shown that the original inequality system can be written in terms of and as follows [16]: > > − > > − Further, this system can be converted to an equivalent system for − and + : − ( − ) > ( + ) > The solution space of these inequality systems can be conveniently found since − and are almost Sylvester type Hadamard matrices, where some rows may be negated and/or permuted with respect to our natural row order. So, we can apply Theorem 1 to write the expressions for − and + , since row permutations does not change the fact that all the solutions are spanned by positive vectors from ℝ . Having the expressions for − and + as functions of arbitrary positive vectors (and rows of and ), we can eventually write and individually. This in turn, gives the full solution space for the original inequality in terms of positive (row) vectors , , , of appropriate size as follows [16]: = = + − + − + + = , , , > Eq. 3

From this expression it can be shown that either the part or the part of the solution can be forced to have zeros as outlined next. The theorem proceeds symmetrically according to whether the number of rows in , ( is larger than the number of rows in , ( . Let’s assume , then elements of part of the solution vector can be zeroed by arbitrarily setting , and computing , accordingly. To see this, note that is of full rank, and thus − + can be used to

7 generate arbitrary real numbers at component positions. These positions can be identified by finding a full rank column subset of by, for example row Echelon reduction. So, components of can be forced to become zero since = ( + ) + (− + ) . The argument for is proven symmetrically (exchange the roles of , with , ). Therefore, since we can always zero at least components of . □ Weight upper bound on PTFs with length at most . × Theorem 3 . Any n-variable BF can be represented by a PTF with length at most and integer weights bounded above by . In particular, by using the order ℎ = 2 , of the Hamdard matrix involved, we have ⌈⌉ ≤ 2 . . Eq. 4

To prove this theorem, we first prove a statement about determinants of the submatrices of Hadamard matrices.

Lemma 6 . Given a Hadamard matrix ℋ in block form ℋ = , is invertible if and only if is invertible. Proof . Assume is invertible. Take an arbitrary vector ∈ ℝ . Clearly there exists a vector ∈ ℝ such that [ ] = (i.e. take = − ). This means that is a linear combination of the columns of because ℋ is orthogonal. Therefore, there exists ∈ ℝ with = . In particular, this means that for arbitrary ∈ ℝ there is always a linear combination of the columns of such that = , i.e. spans ℝ . Thus is of full rank. Since it is square, it must be invertible. To see the double implication, replace the roles with and with . □ Corollary . Consider a submatrix of a Hadamard matrix ℋ obtained by deleting the k rows indexed by ∈ and k columns index by ∈ , where R and C are k-sized subsets of {1,2, … , } . Then, is invertible if and only if the matrix formed by the intersection of the deleted rows and columns, i.e. ℋ , is invertible. Proof . By pre- and post- multiplying ℋ with appropriate permutation matrices can be brought to the lower left of the permuted matrix taking the role of whereas ℋ , is taking the role of . Since the row and column permutation of a Hadamard matrix is still Hadamard, the corollary follows. □ Proof of Theorem 3 . By Theorem 2 we know that a given n-variable BF induces two matrices and with columns. Let = and ℊ = , i.e. the number of rows in and . These matrices allows us to construct a polynomial sign-representation with at most monomials.

8 In fact, all sign representing polynomials (i.e. their coefficient vectors ) are determined by arbitrarily chosen positive (appropriately sized) row vectors , , , (Eq. 3). According to Theorem 2, if ℊ ≥ [ > ℊ ] , we know that ℊ [ ] elements of [ ] part of the coefficient vector can be zeroed by arbitrarily selecting , [ , ] and appropriately computing , [ , ] based on the selection. Without loss of generality let’s assume that ℊ ≥ , and further assume that the elements that will be zeroed are the first ℊ elements so that we can we can write and as [ ] and [ ] respectively. Then Eq. 3 becomes: + − + − + + = , , , > Eq. 5

Since is invertible, for any , > we can always find , > such that [ + − + ] =0 Eq. 6

This means that the vector [ + − + ] is a linear combination of the columns of since is orthogonal. Therefore, there exist a real vector such that [ + − + ] = Eq. 7

Conversely, for any satisfying > we can find , , , > such that Eq. 6 holds and thus generate a sign representation with at most monomials with + = and − + = . In the above derivation, observe that we can freely choose to require = ′ . Substituting + and − + in the original solution (Eq. 5) with = ′ choice, we obtain the following expression for the coefficient vector : = = + + ( + ) ( + ) Eq. 8

Since is orthogonal we see that = = 2 ( + ) ( + ) Eq. 9

Now let’s find bounds for ⌈⌉ and ⌈ + ⌉ in the following lemmas. Lemma 7 . (upper bound of ⌈⌉ ). ⌈⌉ is upper bounded by (2 − 1) Proof . Let’s consider the case, ℊ ≥ as assumed in the main theorem. From Lemma 6 we know that is invertible. Hence for any + there is some satisfying =[ + ] . Choose α = α = 0.5 so that we have the system of linear equations = where is a positive constant integer to be substituted later. Following the technique of Håstad [18], the solution to this equation can be found by Cramer’s rule. Accordingly, the

9 solution to = is given by = || where is the matrix obtained by taking and replacing the j th column with . In our case, we have = and = . Since is a ±1 matrix, according to Hadamard’s inequality (Lemma 4) , || ≤ () . Writing the determinant by using the cofactor expansion along the replaced column we have: = (−1) , Eq. 10 Where , corresponds to the (i, j) minor, i.e. the determinant of where j th the column and i th row are deleted. So, , is the determinant of a ( − 1) × ( − 1) matrix with entries ±1 . Thus ≤ ( − 1) . On the other hand, according to Lemma 5, || and are integers that are divisible by and respectively. Therefore by letting = || / we are guaranteed that the = || will be integer and upper bounded by ( − 1) in magnitude i.e. ⌈⌉ ≤ 2 ( − 1) Eq. 11

It is easy to show that when > ℊ this becomes ⌈⌉ ≤ 2 ℊ ℊ(ℊ − 1) ℊ

Eq. 12

Combining both case yields ⌈⌉ ≤ 2 (2 − 1) Eq. 13

Lemma 8. (upper bound on ⌈⌉ ). ⌈⌉ ⌈⌉ . Proof . Let’s find upper bounds for the upper and lower halves of the solution vector separately: i. part: Since is given by = [ 2 ] (Eq. 9) it is clearly integer and bounded by ⌈⌉ , i.e. ⌈⌉ ≤ 2 ⌈⌉ ii. part: Since = ( + )[ ] (Eq. 9), we seek an upper bound for ⌈ + ⌉ . We know that [− + ] = must hold; but we are free to choose , > to satisfy this. Let = for brevity and choose , as follows: = − + 0.5, if < 00.5, else = 0.5, if < 0 + 0.5, else

10 Observe that the required equality [− + ] = is satisfied and the components of + are positive integers when is integer. Furthermore, due to the construction of , we have ⌈ + ⌉ = ⌈ ⌉ + 1 , which at once yields ⌈⌉ ≤ (⌈ ⌉ + 1)ℊ where ℊ = = . Noting that ⌈ ⌉ ≤ ⌈⌉ we get an expression for an upper bound on : ⌈⌉ ≤ ⌈⌉ℊ + ℊ . Combining (i) and (ii) we establish an upper bound for the coefficients: ⌈⌉ ≤ max {ℊ + ℊ, 2 } ⌈⌉ Eq. 14 It is possible to show that when > ℊ this becomes ⌈⌉ ≤ max {ℊ + , 2 } ⌈⌉

Eq. 15 Since ≤ 2 [ ℊ ≤ 2 ] and + ℊ = 2 we have ⌈⌉ ≤ 2 () + 2 ⌈⌉ , so assuming n>1 we can write ⌈⌉ ≤ 2 ⌈⌉ Eq. 16 □ Finally, combining the last two results given in Eq. 13Eq. 11 and Eq. 16 we get ⌈⌉ ≤ 2 (2 − 1) replacing − 1 with for simplicity we get ⌈⌉ ≤ 2 Eq. 17

Substituting ℎ = 2 we reach the desired result: ⌈⌉ ≤ 2 . . Eq. 18 □ Corollary to Theorem 3 . The PTF weight of n-variable Boolean functions with PTF representations with monomials are bounded above by . ≤ 0.75 × 2 ≤ 0.75 × 2 −3 −2 −1 +5 Eq. 19

Proof . Since we have at most coefficients, we multiply the right-hand size of Eq. 18 with to get to the given upper bound. □ Bent functions with short PTFs and small weights

Bent functions are interesting set of BFs introduced by Rothaus [19], which have complicated combinatorial properties with significance in cryptanalysis [20, 21]. In this report, we are interested in their PTF representation in terms of weight and length. Semi-bent function are another related class of functions with important cryptographic properties [22], which we also use in this section. We first give the definitions of these function classes. Definition (Bent function). An n-variable Boolean function is called bent if and only if the Walsh coefficients of the function are all ±2 / . Definition (Semi-bent function). If the Walsh spectrum of an n-variable Boolean function takes only the values 0 or ±2 ()/ then is a semi-bent function. Remark . Bent functions exist in only even dimensions; whereas semi-bent functions exist only in odd dimension.

Lemma 9 . Let be an n-variable bent function and be its Walsh polynomial, then it can be decomposed into two semi-bent functions and with Walsh polynomials and with the following relation: ( , ⋯ , ) = 0.5( + 1)( , ⋯ , ) + 0.5(− + 1)( , ⋯ , ) Proof . It is clear that if polynomials and part are Walsh polynomials for and then is a Walsh polynomial for so let’s look at the semi-bentness. Let [ ] the coefficients of . Note that as ( , ⋯ ,1) = ( , ⋯ , ) and ( , ⋯ , −1) =( , ⋯ , ) , the upper half and lower halves of coincides with and . Applying Lemma 2 and the Eq. 1 we have: = ℋ ℋ ℋ −ℋ By multiplying both sides by [ ] and [ −] we get ℋ = ( + )/2 ℋ = ( − )/2 Note that the left-hand sides are exactly the (Walsh) coefficients of and (due to Lemma 2). Further, observe that they can only take values of 0 or ±2 / since the components of and are ±2 / . Thus, and are semi-bent functions of order − 1 . □ Lemma 10 . An n-variable semi-bent function have number of zeros in its Walsh spectrum. Proof . Let be the number of non-zero Walsh coefficients of an n-variable semi-bent function. Then, the norm of the Walsh spectrum will be / ()/ which must be equal to , as the Walsh spectrum of all n-variable BFs is (due to Lemma 3). Solving / ()/ = 2 we see that the number of non-zero coefficients is = 2 . So, the number of zeros is also since we have coefficients in total. □ Theorem 4 . Any n-variable bent function can be represented as a polynomial threshold functions by using monomials with integer coefficients less than or equal to in absolute value.

12 To prove this, let’s first prove a Lemma concerning the inner workings of Theorem 2.

Lemma 11 . If is a bent function then the row-sum of and − have zeros, where up and are the matrices formed by the application of Theorem 2 to . Proof . Write as partitioned into upper and lower halves so that = , and observe that the rows making up and matrices, are unique rows of a Sylvester type Hadamard matrix of order − 1 which are scaled by , for some ∈ {1,2, … , 2 } . Thus is a Hadamard matrix which is almost Sylvester type, except that some rows might be permuted and/or negated due to the scaling by , . Luckily, we can recover the natural order by appropriate row permutations, and further we can pull out the scaling factor outside to obtain a Sylvester type Hadamard matrix. The same arguments are also true for − , where this time the scaling is done with the lower half . Thus, for some permutation matrices and , we have the following relations: diag ℋ = diag ℋ = − Now, taking the row-sum of both sides of the equations (i.e. multiplying by ), we see that left sides become the Walsh coefficients of the functions and , which are semi-bent according to Lemma 9 . Also noting that = for any arbitrary matrix and any permutation matrix we have: = and = − Since the Walsh coefficients and of semi-bent functions have zeros due to Lemma 10, the proof of the Lemma is complete. □ Proof of Theorem 4 . From Theorem 2, we know that for a given BF , once and matrices are constructed, any , , , > assignment produces a polynomial threshold function representation of with the weights given below (replicated from Eq. 5) = = + − + − + + , , , > Eq. 20

To show that at least one fourth of the elements of can be always zeroed, Theorem 3 creates zeros in either or the part of the solution by finding a full rank square submatrix (by row Echelon reduction) in or , and use its inverse to eliminate some of the terms appearing in or . In the case of Bent functions, we are lucky: we neither need to apply Echelon reduction to detect the elements to be zeroed, nor need to take inverse. By using Lemma 11 we can choose , , , in a straightforward manner to readily get zeros

13 in the coefficient vector . To be concrete, let = = 0.5 × and = 0.5 × , =1.5 × so that = which has zeros according to Lemma 11. Another alternative is to let = = 0.5 × and = 1.5 × , = 0.5 × . In which case we would have = − , which again would have zeros. Thus, due to Lemma 11, in either case we are guaranteed to have at least zero coefficients in the polynomial threshold representation of . So, with the aforementioned choice of , , , we obtain the same density bound that is given in Theorem 2 for general BFs. However, now, we can greatly improve the weight upper bound with this choice of parameters. To see this, plug in the proposed choices of , , , in Eq. 20 to get: = ±0 2 × Eq. 21

Which immediately tells us that ⌈⌉ ≤ 2 . □ Corollary to Theorem 4 . The PTF weight of n-variable Bent functions with PTF representations with at most monomials are bounded above by . ≤ 0.75 × 2 ≤ Eq. 22 Proof . Each coefficient is upper bounded by and we have at most of those coefficients, thus the result follows. □ PTF density and weight of sparse Boolean functions

In this section, we address PTF representation of BF that are constant except for some small number of variable assignments. Let’s define this concretely.

Definition . A BF is called m-sparse, if = min(| (1)|, | (−1)|) ≪ 2 An upper bound on the PTF degree of m-sparse BF functions was obtained as ⌊log ⌋ + 1 in a recent repot [10]; however, to our knowledge there is no PTF density bound better than the known for general BFs, i.e. [16]. Here we show that m-sparse BF can always be represented with + 2 monomials for small .We state this as a theorem. Theorem 5 . Let be an m-sparse BF with ≤ 2 , then has a PTF representation with length ≤ 2 + ( , 2 ) Proof. To prove this claim, we can start from a constant BF , say = 1 and assess the effect of transforming into an m-sparse function on and matrices produced through the application of Theorem 2. Note when ≥ 2 the theorem statement is already satisfied by Theorem 2. So, let’s consider the case < 2 and recall the sign-representation condition, and write as partitioned into upper and lower halves: diag ℋ ℋ ℋ −ℋ >

14 When we consider the constant function = 1 [ = −1] , = = [ = = −] , and thus according to the construction in Theorem 2, we will have = ℋ and = [ ] . Now to convert this constant function to an m-sparse function, one needs to pick m components of = and negate them. There are 2 possibilities, each of which generate certain and matrices. For our purposes, only the size of them are relevant, as according to Theorem 2, the number of zeros obtained in the sign representation is at least max( − . Let’s calls this quantity ( . Remember that is equal to the number of matches between the upper and the lower half of , i.e. = } and is equal to when we start with the constant . Thus, we can imagine an adversarial negation strategy to change into an m-sparse matrix to minimize ( , thereby minimizing the number of zeros in the final solution. Since ≥ ( what the adversarial choice can do is to reduce from as much as possible without increasing − beyond . The latter case is not possible since m < 2 so the adversarial choice does not need to worry about it. Thus, the adversarial choice minimizes ( by creating upper-lower mismatches (i.e. apply m negations such that = {| ≠ } ). This strategy yields ( − zeros which is the minimum possible for m-negations. Thus, we conclude that ( − for any m-sparse function. Therefore, for any m-sparse function it is possible to construct a PTF representation with + or a smaller number of monomials. □ Theorem 6 . Let be an m-sparse BF with ≤ 2 , then has a integer-coefficient PTF representation with length ≤ 2 + ( , 2 ) with coefficient magnitudes ≤ +0.5 log −+1.5 log +2 . Proof.

Note in our m-sparse construction ≥ ℊ where = and ℊ = . From Lemma 7 and Lemma 8 (Eq. 12 and Eq. 15) we know that for a general BF (with ≥ ℊ ), an upper bound on the absolute value of the PTF weights, is given by ⌈⌉ ≤ max {ℊ + , 2 } ⌈⌉ where ⌈⌉ ≤ 2 ℊ ℊ(ℊ − 1) ℊ So, to obtain a weight upper bound for the m-sparse function case, what remains to be done is to plug in the upper bounds of ℊ + and ℊ for m-sparse functions, which are attained at = 2 − and ℊ = . So, we have: (i) ⌈⌉ ≤ ℊ + = (2 − )( + 1)⌈⌉ (ii) ⌈⌉ ≤ 2 Where to get (ii) we substituted instead of − 1 for ℊ − 1 in the base of the power expression. Now, plugging (ii) in (i) and taking log of both sides we have log⌈⌉ ≤ log(2 − ) + log( + 1) + (2 − ) + (0.5 + 0.5) log For the sake of a compact expression, we remove − from the argument of first log, and replace with in the argument of the second log to obtain: log⌈⌉ ≤ + 0.5 log − + 1.5 log + 2

15 Raising both sides to the power 2 we get ⌈⌉ ≤ 2 . . □ Corollary to Theorem 6.

For a ny of n-variable m-sparse Boolean function with ≤ 2 there exists a PTF representation with length at most + 2 and PTF weight bounded by . . . In other words, we have: ≤ + 2 ≤ Proof . Multiply the upper bound found for ⌈⌉ by as it is an upper bound on the number of non-zero coefficients in . □ Conclusion

In an earlier report we have shown that any BF can be represented as a PTF with at most monomials (i.e. nonzero weight input lines) [16]. However, no upper bound on the absolute value of the coefficients was known when the coefficients were constrained to integers. In this report, we fill this gap by establishing an upper bound on the PTF weight of general Boolean functions when they are represented with at most monomials. In addition, we study m-sparse and bent BFs. For the former we obtain new density and weight bounds, which indicate that low-m sparse BFs can be represented with low density and low weight. For the bent functions, we show that they assume a surprisingly low weight PTF representation when they are represented with monomials. When the PTF representation of a bent function with its Walsh coefficient is considered, the number of monomials involved is and the absolute value of the coefficients are / . When we wish to sign-represent a bent function without using monomials ( which , depends on the function) then the coefficient magnitudes increase; but, merely become at most . We think that this bound is tight; however, the bound established for general BF is far from being tight. Thus, future work is needed to tighten the weight bound for general BFs. Acknowledgements

Support for this work is provided by the International Joint Research Promotion Program, Osaka University under the project “Developmentally and biologically realistic modeling of perspective invariant action understanding”.

References [1] Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, group aP, editors. Parallel Distributed Processing1986. p. 151-93. [2] Giles CL, Maxwell T. Learning, invariance, and generalization in high-order neural networks. Apllied Optics. 1987;26:4972-8.

16 [3] Schmitt M. On the capabilities of higher-order neurons: a radial basis function approach. Neural Computation. 2005;17:715-29. [4] Schmitt M. On the complexity of computing and learning with multiplicative neural networks. Neural Computation. 2002;14:241-301. [5] Abbott LF, Arian Y. Storage capacity of generalized networks. Physical Review A. 1987;36:5091 LP - 4. [6] Guler M. A model with an intrinsic property of learning higher order correlations. Neural Networks. 2001;14:495-504. [7] Fahner G, Eckmiller R. Structural adaptation of parsimonious higher-order neural classifiers. Neural Networks. 1994;7:279-89. [8] Guler M, Sahin E. A binary-input supervised neural unit that forms input dependent higher-order synaptic correlations. World Congress on Neural Networks, III. San Diego1994. p. 730-5. [9] Sezener CE, Oztop E. Minimal Sign Representation of Boolean Functions: Algorithms and Exact Results for Low Dimensions. Neural Computation. 2015;27:1796-823. [10] O'Donnell R, Servedio R. Extremal properties of polynomial threshold functions. Eighteenth Annual Conference on Computational Complexity2003. p. 3-12. [11] Gotsman C. On boolean functions, polynomials and algebraic threshold functions. Technical Report TR-89-18: Department of Computer Science, Hebrew University; 1989. [12] Gotsman C, Linial N. Spectral properties of threshold functions. Combinatorica. 1994;14:35-50. [13] Bruck J. Harmonic Analysis of Polynomial Threshold Functions. SIAM Jorunal of Discrete Mathematics. 1990;3:168-77. [14] Roychowdhury V, Siu KY, Orlitsky A, Kailath T. Vector Analysis of Threshold Functions. Information and Computation. 1995;120:22-31. [15] Saks ME. Slicing the Hypercube. In: Walker K, editor. London Mathematical Society Lecture Note Series 187: Surveys in Combinatorics: Cambridge University Press; 1993. p. 211-55. [16] Oztop E. An Upper Bound on the Minimum Number of Monomials Required to Separate Dichotomies of {-1, 1} nn