[PDF] Practical graph signal sampling with log-linear size scaling

Abstract

Graph signal sampling is the problem of selecting a subset of representative graph vertices whose values can be used to interpolate missing values on the remaining graph vertices. Optimizing the choice of sampling set can help minimize the effect of noise in the input signal. While many existing sampling set selection methods are computationally intensive because they require an eigendecomposition, existing eigendecompostion-free methods are still much slower than random sampling algorithms for large graphs. In this paper, we propose a sampling algorithm that can achieve speeds similar to random sampling, while reaching accuracy similar to existing eigendecomposition-free methods for a broad range of graph types.

Full PDF

aa r X i v : . [ ee ss . SP ] F e b Practical graph signal sampling with log-linear sizescaling

Ajinkya Jayawant ∗ [email protected] , Antonio Ortega [email protected] Ming Hsieh Department of Electrical and Computer Engineering, University of Southern Calfornia3740 McClintock Ave, Los Angeles, CA 90089, US

Abstract —Graph signal sampling is the problem of selecting asubset of representative graph vertices whose values can be usedto interpolate missing values on the remaining graph vertices.Optimizing the choice of sampling set can help minimize theeffect of noise in the input signal. While many existing samplingset selection methods are computationally intensive because theyrequire an eigendecomposition, existing eigendecompostion-freemethods are still much slower than random sampling algorithmsfor large graphs. In this paper, we propose a sampling algorithmthat can achieve speeds similar to random sampling, whilereaching accuracy similar to existing eigendecomposition-freemethods for a broad range of graph types.

Keywords —Graph, signal, sampling, D-optimal, volume, coher-ence.

I. I

NTRODUCTION

Graphs are a convenient way to represent and analyzedata having irregular relationships between data points [1]and can be useful in a variety of different scenarios, suchas characterizing the Web [2], semi-supervised learning [3],community detection [4], or trafﬁc analysis [5]. We call graphsignal the data associated with the nodes of a graph. Similarto traditional signals, a smooth graph signal can be sampledby making observations on a few nodes of the graph, sothat the signal at the remaining (non-observed) nodes can beestimated [6], [7], [8]. For this we need to choose a set ofvertices, S , called the sampling set, on which we observe thesignal values in order to predict signal values on the othervertices (the complement of S , S c ). In the presence of noise,some sampling sets lead to better signal reconstructions thanothers: the goal of sampling set selection is to ﬁnd the bestsuch sampling set. However, unlike traditional signals, thestructure of graph signals is not ﬁxed. Thus, for unknowngraphs, the best sampling set is also unknown. The conceptof sampling signals usually relies on the assumption that theunderlying signal is smooth, which is a reasonable expectationin a variety of scenarios such as sensor networks modellingtemperature distribution, graph signal representing labels insemi-supervised learning, or preferences in social networks.This makes it possible for us to reconstruct them by knowinga few signal values [9].A common model for smooth graph signals assumes thatmost of their energy lies in a speciﬁc set of eigenvectors ofthe graph Laplacian or other graph operator [1]. Thus, theproblem of selecting the best sampling set naturally translates ∗ Corresponding author to the problem of selecting a subset of the eigenvector matrixof the graph Laplacian [7]. Speciﬁcally, the problem reduces toa row/column subset selection similar to linear measurementsensor selection problem [10]. One way to look at the matrixsubset selection is as a design of experiments problem, forwhich there exists a rich literature [11]. In the graph signalsampling context, several papers leverage this knowledge topropose novel algorithms — [12], [6], [13], [14], [15], [16].We refer the reader to [8] for a recent comprehensive reviewof the literature on this topic.However, to solve the graph sampling set selection problem,row/column selection needs to be applied on the matrix ofeigenvectors of the graph Laplacian (or those of some othersuitable graph operator). The corresponding eigendecompo-sition is an O ( n ) operation for an n × n matrix . Thismakes it impractical for large graphs in machine learning,social networks, and other applications, for which the cost ofeigendecomposition would be prohibitive. Thus, methods thatsolve this subset selection problem without explicitly requiringeigendecomposition are valuable.For the purpose of this discussion, we classify sampling setselection methods into two main types of approaches, basedon whether they require eigendecomposition or not. Somemethods compute the full eigendecomposition [12], [6], [13],or instead require a sequential eigendecomposition, whereone eigenvector is computed at each step [7]. Alternatively, eigendecomposition-free methods do not make use of an eigen-decomposition of the Laplacian matrix [17], [16], [14], [18]and are usually faster. Weighted Random Sampling (WRS)[17] is the fastest method, but provides only guarantees onaverage performance, which means that it may exhibit poorreconstruction accuracy for speciﬁc instances. It also needsmore samples to match the reconstruction accuracy of othereigendecompostion-free methods. Among eigendecompositionfree methods discussed in [8], Neumann series based sampling[16] has a higher computational complexity, Binary Searchwith Gershgorin Disc Alignment (BS-GDA) [19] has lowcomputational complexity for smaller graphs, but cannot com-pete with WRS for large graphs, and Localization operatorbased Sampling Set Selection (LSSS) [18] achieves goodperformance but requires some parameter tuning to achieve op- In practice only the ﬁrst f eigenvectors are needed if the signal isbandlimited to the ﬁrst f frequencies, but even this can be a complex problem(e.g., a signal bandlimited to the top 10% frequencies of a graph with millionsof nodes). For simplicity we describe these as full decomposition methods,even though in practice only a subset of eigenvectors is needed. timal performance. Our proposed method can overcome theselimitations: similar to [16], [19], [18] it is eigendecomposition-free, but it has complexity closer to WRS, while requiringfewer parameters to tune than WRS.To motivate our methods consider ﬁrst WRS, where verticesare sampled with a probability proportional to their squaredlocal coherence [17]. However, selecting vertices having thehighest coherence may not result in the best sampling set,because some vertices may be “redundant” (e.g., if they areclose to each other on the graph). Other sampling algorithms[18] improve performance by selecting vertices based onimportance but avoid the redundancy by minimizing a notionof overlapped area between functions centered on the sampledvertices.In our preliminary work [20], we proposed the Distance-Coherence (DC) algorithm, which mitigates the effect ofredundancy between vertices by adding new vertices to thesampling set only if they are at a sufﬁcient distance on thegraph from the previously selected nodes. While this caneliminate redundancy, it has a negative impact on computationcost, since distance computation is expensive. As an alterna-tive, in this paper we propose a novel Approximate VolumeMaximization (AVM) algorithm that replaces the distancecomputation with a ﬁltering operation. Loosely speaking,our proposed scheme precomputes squared coherences, as[17], with an additional criterion to estimate overlap betweenvertices. This additional criterion results in computations thatare proportional to the graph size and as a consequence AVMcan be shown theoretically to require a linear increase incomplexity as compared to WRS. AVM can also be viewed asan efﬁcient approximation to the D-optimality criterion [11].In this paper we review the main concepts in DC and introduceAVM, showing that these methods can improve upon existingalgorithms in various ways. Our main contributions are:1) We describe our distance-based sampling DC algorithm(Section III) and use its vertex domain interpretation asa stepping stone to understand the problem and developour improved AVM algorithm.2) We introduce a new algorithm, AVM (Algorithm 2),which can be used for any graph size or topologywhile requiring few parameters to tune. Moreover, theaccuracy of the reconstruction is a monotonic functionof those parameters. This eliminates the need to searchfor the right parameter, as we only evolve a parameterunidirectionally for a better reconstruction.3) Using the framework of volume based sampling (SectionIII), we interpret a series of algorithms — exact greedy[13], WRS, Spectral Proxies (SP) [7], LSSS, DC, andour proposed AVM as variations of the volume maxi-mization problem formulation (Section IV), and explaincritical differences between existing methods and AVM.4) AVM provides competitive reconstruction performanceon a variety of graphs and sampling scenarios, improv-ing reconstruction SNR over WRS by at least 0.6dB andfrequently signiﬁcantly higher (e.g., 2dB) — Section V.The practicality of AVM is apparent for larger graphsizes (e.g., of the order of several thousand nodes),where it can be at least 10 times faster compared to state-of-the-art algorithms such as SP, LSSS and BS-GDA, while sacriﬁcing less than 0.1dB of reconstructionSNR — Section VI.As a summary, our proposed AVM sampling can achievespeeds comparable to WRS, with signiﬁcantly better recon-struction accuracy, without requiring any prior knowledge ofthe signal bandwidth, and can be used for different graphswhile requiring a few easy-to-tune parameters.II. P ROBLEM SETUP

A. Notation

In this paper, we represent sets using calligraphic uppercase,e.g., X , vectors using bold lowercase, x , matrices using bolduppercase, A , and scalars using plain uppercase or lowercaseas x or X . Table I lists additional notations.A graph is deﬁned as the pair ( V , E ) , where V is the set ofnodes or vertices and E is the set of edges [21]. The set ofedges E is a subset of the set of unordered pairs of elementsof V . A graph signal is a real-valued function deﬁned on thevertices of the graph, f : V → R . We index the vertices v ∈V with the set { , · · · , n } and deﬁne w ij as the weight ofthe edge between vertices i and j . The ( i, j ) th entry of theadjacency matrix of the graph A is w ij , with w ii = 0 . Thedegree matrix D of a graph is a diagonal matrix with diagonalentries d ii = P j w ij . In this paper we consider weightedundirected graphs, without self loops and with non-negativeedge weights.The combinatorial Laplacian for the graph is given by L = D − A , with its corresponding eigendecomposition deﬁnedas L = U Σ U T since the Laplacian matrix is symmetric andpositive semideﬁnite. The eigenvalues of the Laplacian matrixare Σ = diag( λ , · · · , λ n ) , with λ ≤ · · · ≤ λ n representingthe frequencies. The column vectors of U provide a frequencyrepresentation for graph signals, so that the operator U T is usually called the graph Fourier transform (GFT). Theeigenvectors u i of L associated with larger eigenvalues λ i correspond to higher frequencies, and the ones associated withlower eigenvalues correspond to lower frequencies [1].The sampling set S is deﬁned as a subset of V where thevalues of the graph signal f are known, leading to a vectorof known values f S . The problem we consider here is that ofﬁnding the set S such that the error in interpolating f S c from f S is minimized. Here, different error metrics are possible andthe actual error depends on assumptions made about the signal.When comparing algorithms we assume they all operate withthe same sampling set size: s . For the sake of convenience,without loss of generality, for a given algorithm the verticesare relabeled after sampling, so that their labels correspond tothe order in which they were chosen, S = { , , · · · } .For reconstruction, we will often work with sub-matrices of U corresponding to different frequency or vertex localization.The cardinality of the set of frequencies, |F| , is the bandwidthof the signal, whereas the set F is the bandwidth support.Letting F be the set { , · · · , f } , where f = |F| , the matrixconstructed by selecting the ﬁrst f columns of U will bedenoted by U VR or simply U F . The matrix constructed byfurther selecting rows of U F indexed by S (corresponding toselected nodes) will be written as U SF . TABLE IL

INEAR ALGEBRA NOTATION IN THIS PAPER

Notation Description X i X after iteration i |X | Cardinality of set X A XY or A X , Y Submatrix of A indexed by sets X and Y A ij ( i, j ) th element of AA X A : , X , selection of the columns of AA i A after iteration ix i or x ( i ) i th element of the vector xx X or x ( X ) Subset of the vector x corresponding to indices X x v Vector corresponding to a vertex v among a sequence of vectors indexed over the set of vertices Vk . k Two/Euclidean norm of matrix or vector | x | , | x | Entry wise absolute value of scalar x or vector x B. Problem formulation

We focus on sampling bandlimited signals x that can bewritten as: x = U F ˜ x F and selecting a sampling set that satisﬁes the following twoconditions: i) the number of samples is greater than thebandwidth, |S| ≥ f , and ii) the sampling set S is a uniquenessset [9] corresponding to the bandwidth support F . Under theseconditions, given the observed samples, x S , the least squaresreconstruction given by: ˆx = U F ( U T SF U SF ) − U T SF x S (1)allows us to recover x exactly. In this paper we consider thewidely studied scenario of bandlimited signals with addednoise, and choose sampling rates that satisfy Condition i)for the underlying noise-free signal . While Condition ii) isdifﬁcult to verify without computing the eigendecompositionof the Laplacian, it is likely to be satisﬁed if Condition i)holds. Indeed, for most graphs, except those that are eitherdisconnected or have some symmetries (e.g., unweighted pathor grid graphs), any sets such that |S| ≥ f are uniquenesssets, Thus, similar to most practical sampling methods [7],[17], [18], [19], our sampling algorithms are not designed toreturn uniqueness sets satisfying Condition ii) thus providingexact recovery, and instead we just assume that Condition i)is sufﬁcient to guarantee exact recovery.In practice signals are never exactly bandlimited and wemay sample a signal f = x + n , where x is bandlimited and n is a noise vector. Then, given the noise affecting the observedsamples, n S , the error in the reconstructed signal is: ˆf − x = U F ( U T SF U SF ) − U T SF n S . The expected value of the corresponding error matrix is E [( ˆf − x )( ˆf − x ) T ] = U F ( U T SF U SF ) − U T SF E [ n S n T S ] U SF ( U T SF U SF ) − U F T . (2)If we assume individual noise entries to be independent withzero mean and equal variance, sampling set selection can be We do not consider cases where signals are not bandlimited but can besampled and reconstructed (refer to [8] and references therein). Exploringmore general models for signal sampling is left for future work. framed as optimizing a function f ( · ) of (2), leading to anobjective function of the form: S = arg max S⊂V , |S| = s f (cid:0) U F ( U T SF U SF ) − U T F (cid:1) . (3)Note that the set S achieving optimality under general criteriain the form of (3) is a function of F , so that S is optimizedfor reconstruction with that particular bandwidth support F .While typically we do not know the bandwidth of the originalsignal, in what follows we assume that a speciﬁc bandwidthfor reconstructing the signal has been given.In this paper we choose the D-optimality criterion fromdesign of experiments [11], where the goal is to ﬁnd: S = arg max S⊂V , |S| = s det( U T SF U SF ) . (4)Maximizing the determinant leads to minimizing the conﬁ-dence interval of the solution [11] as will be seen in AppendixB. As a further advantage, the D-optimal objective leads to anovel uniﬁed view of different types of sampling algorithmsproposed in the literature — see Section IV-C. Moreover, forour goal of an eigendecomposition-free subset selection, D-optimality has favorable properties that lead to fast algorithms.Sampling algorithms are designed to implicitly or explicitlyoptimize the sampling set for a particular bandwidth support.In this paper, we denote by R the bandwidth support assumedby a sampling algorithm, which can be equal to the recon-struction bandwidth support F for which the objective (4) canbe rewritten as: S = arg max S⊂V , |S| = s det( U T SR U SR ) , with R = F . (5)However, there are advantages to choosing a different R for optimization than F . For example, if we consider R = { , · · · , s } so that |R| = |S| , we can rewrite the objectivefunction (5) without changing its value, by permuting the orderof the matrices: S = arg max S⊂V , |S| = s det( U SR U T SR ) . (6)As we will see, this new form is easier to interpret and use.However, since choosing |R| = |S| is required, it raisesconcerns about the optimality of our sampling set for theoriginal objective function. This issue will be discussed inAppendix B. C. Solving D-optimal objectives

D-optimal subsets for matrices are determinant maximizingsubsets. The determinant measures the volume, and selectinga maximum volume submatrix is an NP-Hard problem [22].Nearly optimal methods have been proposed in the literature[23], [24], but these are based on selecting a subset of rowsor columns of given input matrices. Similarly, in the graphsignal processing literature, several contributions [13], [15]develop algorithms for D-optimal selection assuming that U is available. In contrast, the main novelty of our work is todevelop greedy algorithms for approximate D-optimality, i.e.,solving (4) without requiring explicit eigendecomposition toobtain U . This is made possible by speciﬁc characteristics ofour problem to be studied next.Among graph signal sampling approaches that solve the D-optimal objective, the closest to our work is the application ofWilson’s algorithm for Determinantal Point Process (WDPP)of [14], which similarly does not require explicitly computing U . However, our proposed technique, AVM, achieves this goalin a different way and leads to better performance. Speciﬁ-cally, WDPP avoids eigendecomposition while approximatingthe bandlimited kernel using Wilson’s marginal kernel [14]upfront. This is a one-time approximation, which does nothave to be updated each time nodes are added to the samplingset. This approach relies on a relation between Wilson’smarginal kernel and random walks on the graph, leading toa probability of choosing sampling sets that is proportional tothe determinant [14]. In contrast, AVM solves an approximateoptimization at each iteration, i.e., each time a new vertex isadded to the existing sampling set. Thus, AVM optimizes thecost function (4) at every iteration as opposed to WDPP whichaims to achieve the expected value of the cost function.The WDPP and AVM algorithms differ in their performanceas well. AVM is a greedy algorithm, and the performancegreedy determinant maximization algorithms is known tolie within a factor of the maximum determinant [22]. Incontrast, WDPP samples with probabilities proportional tothe determinants, so that its average performance dependson the distribution of the determinants. In fact, for certaingraph types in [14], we observe that WDPP has worse averageperformance than WRS. In comparison, in our experiments, fora wide variety of graph topologies and sizes, AVM consistentlyoutperforms WRS [17] in terms of average reconstructionerror. III. P ROPOSED ALGORITHMS

In what follows we assume that the conditions for equiv-alence between the two objective function forms (5) and (6)are veriﬁed, so that we focus on solving (6).

A. Incremental subset selection

The bandwidth support for the purpose of sampling isassumed to be R = { , · · · , s } . Let us start by deﬁning thelow pass ﬁltered Kronecker delta function δ v at vertex v : d v = U R U T R δ v . (7) With this deﬁnition, the objective in (6) can be written as: det( U SF U T SF ) = det( U SF U T R U R U T SF )= det( I T S U R U T R U R U T R I S )= det (cid:16)(cid:2) d · · · d s (cid:3) T (cid:2) d · · · d s (cid:3)(cid:17) = Vol ( d , · · · , d s ) . (8)Thus, maximizing the determinant det( U SF U T SF ) is equiv-alent to maximizing Vol ( d , · · · , d s ) , and as a consequencethe set maximizing (8) also maximizes (6).In an iterative algorithm where the goal is to select s samples, consider a point where m < s samples have beenselected and we have to choose the next sample from amongthe remaining vertices. Throughout the rest of the paper, wedenote the sampling set at the end of the m th iteration of analgorithm by S m . Given the ﬁrst m chosen samples we deﬁne D m = [ d · · · d m ] and the space spanned by the vectors in D m as D m = span( d , · · · , d m ) . Throughout the rest of thepaper, we denote D and D at the end of the m th iteration of analgorithm by D m and D m . Note that both D m and D m are afunction of the choice of the sampling bandwidth support R .Next, the best column d v to be added to D m should maximize: det (cid:16)(cid:2) D m d v (cid:3) T (cid:2) D m d v (cid:3)(cid:17) (9a) = det (cid:18)(cid:20) D T m D m D T m d v d T v D m d T v d v (cid:21)(cid:19) (9b) = det( D T m D m ) det( d T v d v − d T v D m ( D T m D m ) − D T m d v ) (9c) = det( D T m D m )( k d v k − d T v D m ( D T m D m ) − D T m d v ) (9d) = det( D T m D m ) (cid:16) k d v k − k P D m d v k (cid:17) . (9e)The effect on the determinant of adding a column to D m canbe represented according to a multiplicative update (Section11.2 [11]) in our D-optimal design. (9c) follows from [25](Section 0.8.5 in Second Edition), while (9e) follows because P D m = D m ( D T m D m ) − D T m is a projection onto the space D m . Direct greedy determinant maximization requires select-ing a vertex that maximizes the update term in (9d): v ∗ = arg max v ∈S cm (cid:16) k d v k − d T v D m ( D T m D m ) − D T m d v (cid:17) (10)over all possible vertices v ∈ S cm , which requires the expensivecomputation of ( D T m D m ) − .The ﬁrst step towards a greedy incremental vertex selec-tion is estimating the two components, k d v k and d T v D m ( D T m D m ) − D T m d v , of the multiplicative update. The ﬁrst term k d v k is the squared coherence introduced in [17], which isestimated here using the same techniques as in [17]. For thesecond term, the projection interpretation of (9e) will be usefulto develop approximations to reduce complexity. Additionally,we will make use of the following property of our bandlimitedspace to develop an approximation. Lemma 1.

The space of bandlimited signals span( U R ) equipped with the dot product is a reproducing kernel Hilbertspace (RKHS). Proof.

Deﬁning the inner product for signals f , g ∈ span( U R ) as h f , g i = P i f i g i , span( U R ) is a Hilbert space. AHilbert space further needs an existing reproducing kernel tobe an RKHS. Towards that end, consider a mapping to ourbandlimited space φ : R n → span( U R ) given as: φ ( x ) = U R U T R x . A function K : R n × R n → R that uses that mapping and thescalar product in our Hilbert space is: ∀ x , y ∈ R n , K ( x , y ) = h φ ( x ) , φ ( y ) i . Now using Theorem 4 from [26], K is a reproducing kernel forour Hilbert space and using Theorem 1 from [26] we concludethat our bandlimited space of signals is an RKHS. Corollary 1.

The dot product of a bandlimited signal f ∈ span( U R ) with a ﬁltered delta d v is f ( v ) , the entry at node v of signal f : h f , d v i = f ( v ) . (11) Proof.

The dot product h f , d v i in our RHKS can be seen as theevaluation functional of f at the point v . Using the deﬁnition ofreproducing kernel K , the evaluation functional for a signal f in the bandlimited space at a point x ∈ R n is (using Section 2deﬁnition and Theorem 1 Property b from [26]) h f , φ ( x ) i . Thisdeﬁnition provides us the required interpretation for h f , d v i : h f , d v i = h f , φ ( δ v ) i . (12)An evaluation functional h f , φ ( x ) i in our bandlimited spacecan be simpliﬁed as: h f , φ ( x ) i = h f , U R U T R x i = h U R U T R f , x i = h f , x i . (13)Thus, from (12) and (13): h f , d v i = h f , φ ( δ v ) i = h f , δ v i = f ( v ) . As a consequence of (11), if f = d w we have: h d w , d v i = d v ( w ) = d w ( v ) . (14) B. Approximation through distances

We start by proposing a distance based algorithm (DC)based on the updates we derived in (9). While in principlethose updates are valid only when s = f , in DC we applythem even when s > f . We assume f is known to us. Wetake the bandwidth support for the purpose of sampling to be R = { , · · · , f } , which is the same as the signal reconstruc-tion bandwidth support F . To maximize the expression in (9e)we would like to select nodes that have:1) Large squared local squared graph coherence k d v k withrespect to f frequencies (the ﬁrst term in (9e), which isa property of each node and independent of S m ), and2) small squared magnitude of projection onto the subspace D m (which does depend on S m ) (cid:13)(cid:13)(cid:13) P D m ˆ d v (cid:13)(cid:13)(cid:13) and thuswould increase (9e). Algorithm 1

Distance-coherence (DC) procedure DC(

L, s, f ) S ← ∅ , ∆ ← . R ← { , · · · , f } for all v ∈ V do Compute squared coherences k d v k end for while |S| < s do V d ( S ) ← { v ∈ S c | d ( S , v ) > ∆ . max u d ( S , u ) } v ∗ ← arg max v ∈V d ( S ) k d v k S ← S ∪ v ∗ end while return S end procedure The norm k d v k = (cid:13)(cid:13) U T R δ v (cid:13)(cid:13) is the squared local graphcoherence of that vertex and varies between and , takingthe largest values at vertices that are poorly connected to therest of the graph [17]. On the other hand, the subspace D m is a linear combination of ﬁltered delta signals correspondingto the vertices in S m . A ﬁltered delta signal at a vertex v isexpected to decay as a function of distance from v . Therefore,for a particular energy k d v k , a vertex whose overlap isminimum with the ﬁltered delta signals corresponding tovertices in S m will have a small k P D m d v k . A ﬁltered deltasignal d v for a vertex which is farther apart from the sampledvertices will have lesser overlap with the ﬁltered delta signalscorresponding to the sampled vertices, which also span thespace D m . Therefore for a vertex v ∈ S cm whose “distance”to the vertices S m is large, the corresponding ﬁltered deltasignal d v will have a small projection on the space D m .Our proposed algorithm (see Algorithm 1) consists of twostages; it ﬁrst identiﬁes vertices that are at a sufﬁciently largedistance from already chosen vertices in S m . This helps inreducing the set size, by including only those v ∈ V d ( S m ) thatare expected to have a small k P D m d v k . From among thoseselected vertices it chooses the one with the largest value of k d v k .The nodes with sufﬁcient large distance from S are deﬁnedas follows V d ( S m ) = { v ∈ S cm | d ( S m , v ) > ∆ . max u d ( S m , u ) } , where ∆ ∈ [0 1] , d ( S m , v ) = min u ∈S m d ( u, v ) and d is thegeodesic distance on the graph. The distance between twoadjacent vertices i, j is given by d ( i, j ) = 1 /w ( i, j ) .The parameter ∆ is used to control how many nodes canbe included in V d ( S m ) . With a small ∆ , more nodes will beconsidered at the cost of increased computations; while witha large ∆ , lesser nodes will be considered with the beneﬁtof reduced computations. For small ∆ , the DC algorithmbecomes similar to WRS, except the vertices are picked inthe order of their squared coherence, rather than randomlywith probability proportional to their squared coherence as in[17]. Algorithm 2

Approximate volume maximization (AVM) procedure

AVM( L , s ) S ← ∅R ← { , · · · , s } for all v ∈ V do Compute squared coherence k d v k end forwhile |S| < s do v ∗ ← arg max v ∈S c k d v k − P w ∈S d w ( v ) k d w k S ← S ∪ v ∗ end whilereturn S end procedure C. Approximate volume maximization (AVM) through innerproducts

Algorithm 1 requires obtaining geodesic distances on thegraph, which is a computationally expensive task. In thissection we use a more efﬁcient technique, based on ﬁltering,as a substitute to distance between nodes. In contrast to DC,when selecting the samples, we assume the bandwidth supportis R = { , · · · , s } , rather than corresponding to the targetbandwidth f . However, for the ﬁnal reconstruction of thesignal from its samples, we will still use the target bandwidth, f . This has several advantages: • It models the real world scenario better in the sense thatwe may not know the perfect bandwidth on which toreconstruct our signal; • We can use the optimization framework we deﬁned inSection III; • For our chosen set of samples we do not have to limitourselves to one reconstruction bandwidth.AVM successively simpliﬁes the greedy volume maximiza-tion step (10) in three stages.

1) Approximate squared coherence:

Algorithm 2 estimatesthe squared coherence, k d v k , v ∈ S m , using the method ofrandom projections method from Section 4.1 in [17] in thesame way as in Algorithm 1. This approach avoids explicitlyﬁnding d v to compute k d v k .

2) Approximate inner product matrix:

We know that thevolume of parallelepiped formed by two ﬁxed length vectorsis maximized when the vectors are orthogonal to each other.Now, since vectors that optimize (10) also approximatelymaximize the volume, we expect them to be close to or-thogonal. Thus, we approximate D T m D m by an orthogonalmatrix (Appendix C). That is, assuming that the ﬁltered deltasignals corresponding to the previously selected vertices areapproximately orthogonal we can write: D T m D m ≈ diag (cid:16) k d k , · · · , k d m k (cid:17) , ( D T m D m ) − ≈ diag k d k , · · · , k d m k ! , which leads to an approximation of the determinant: det (cid:18)(cid:20) D T m D m D T m d v d T v D m d T v d v (cid:21)(cid:19) ≈ det( D T m D m ) det( d T v d v − d T v ˆ D m ˆ D T m d v ) , (15)where ˆ D m is obtained from D m by normalizing the columns: ˆ D m = D m diag (1 / k d k , · · · , / k d m k ) . The second term in(15) can be written as: d T v ˆ D m ˆ D T m d v = h d v , d i k d k + · · · + h d v , d m i k d m k , (16)which would be the signal energy of projected signal d v on to span( d , · · · , d m ) , if the vectors d , · · · , d m were mutuallyorthogonal. Note that it is consistent with our assumption that D T m D m is diagonal which would only hold if the vectors forman orthogonal set.

3) Fast inner product computations:

Maximization of (15)requires evaluating the inner products h d v , d i i in (16) for all i ∈ S m and all vertices v outside S m . Suppose we knew d i for sampled vertices, i ∈ S m , and the inner products from thepast iteration. The current ( m + 1) th iteration would still needto compute n − m new inner products.To avoid this computation we use the inner product propertyof (14), which allows us to simplify (16) as follows: d T v ˆ D m ˆ D T m d v = d ( v ) k d k + · · · + d m ( v ) k d m k . Thus, there is no need to compute n − m new inner products,while we also avoid computing d v , v ∈ S cm . With this, ourgreedy optimization step becomes: v ∗ ← arg max v ∈S cm k d v k − X w ∈S m d w ( v ) k d w k . In summary, by virtue of these series of approximations, wedo not need to compute distances and no longer rely on thechoice of a parameter ∆ , as in Algorithm 1. Algorithm 2 onlyrequires the following inputs:1) The number of samples requested s ,2) The constant c specifying cs log s random projections,3) The scalar ǫ specifying the convergence criteria forrandom projection iterations while computing squaredcoherences.The last two inputs are speciﬁcally needed by Algorithm 1in [17], which we use in Step 1 (Section III-C1) to computesquared coherences.While the inner product property is deﬁned based on theassumption that we use an “ideal” low pass ﬁlter for recon-struction, it can also be used to maximize the volume formedby the samples of more generic kernels — see Appendix D.IV. V OLUME MAXIMIZATION RELATED WORK

We next study how existing graph signal sampling methodsare related to volume maximization. We start by focusing onthe SP algorithm and show how can it be seen as a volumemaximization method. This idea is developed in SectionIV-A and Section IV-B. Section IV-C then considers othereigendecomposition-free methods and draw parallels with ourvolume maximization approach.

A. SP algorithm as Gaussian elimination

The SP algorithm of [7] is based on the following theorem.

Theorem 1. [7] Let L be the combinatorial Laplacian of anundirected graph. For a set S m of size m , let U S m , m be fullrank. Let ψ ∗ k be zero over S m and a minimizing signal in theRayleigh quotient of L k for a positive integer k . ψ ∗ k = arg min ψ , ψ ( S m )= ψ T L k ψψ T ψ . (17) Let the signal ψ ∗ be a linear combination of ﬁrst m +1 eigen-vectors such that ψ ∗ ( S m ) = . Now if there is a gap betweenthe singular values σ m +2 > σ m +1 , then k ψ ∗ k − ψ ∗ k → as k → ∞ .Proof. [7], for l convergence see Appendix A.The theorem leads to the main step in SP algorithm, namely v ∗ = arg max v ∈S mc | ψ ∗ k | , where ψ ∗ k is from (17). Consider ﬁrst the ideal SP algorithm,where k → ∞ and the solution tends to the ideal bandlimitedsolution. In the ideal case, given a full rank U S m , m , fromTheorem 1 we can always get another vertex v such that U S m ∪ v, m +1 is also full rank. Thus, at all iterations anysubmatrix of U S m , m will have full rank.When k → ∞ and S m vertices are selected, ψ ∗ is given bythe m + 1 th column of U ′ , where U ′ is obtained by applyingGaussian elimination to the columns of U such that the m +1 th column has zeros at indexes given by S m [7]. U ′ can bewritten as: U ′ =  u (1) u ′ (2) . . . u m +2 · · · u n u ′ m +1 ( m + 1) ⋆ ... u ′ m +1 ( n )  , (18)where ⋆ denotes arbitrary entries and denotes zero entries inthe corresponding matrix regions. Because we have non-zeropivots, u (1) to u ′ m +1 ( m + 1) , U S m , m is full rank. Thecolumns in U from m + 2 to n remain intact.The next section explains how an iteration in the ideal SPalgorithm can be seen as a volume maximization step. B. SP algorithm as volume maximization

Consider a single stage in the SP algorithm, where thecurrent sampling set is S m , |S m | = m is number of sampledvertices, and the bandwidth support is R m = { , · · · , m +1 } . At this stage, choosing a vertex v that maximizes det( U S m ∪ v, R m U T S m ∪ v, R m ) is equivalent to choosing a vertexthat maximizes | det( U S m ∪ v, R m ) | . We brieﬂy state our resultsin terms of | det( U S m ∪ v, R m ) | as this gives us the additional advantage of making the connection with the Gaussian elimi-nation concept we discussed in Section IV-A. Now, focusingon the selection of the ( m + 1) th sample, we can state thefollowing result. Proposition 1.

The sample v ∗ selected in the ( m + 1) th iteration of the ideal SP algorithm is the vertex v from S cm that maximizes | det( U S m ∪ v, R m ) | .Proof. The ideal SP algorithm selects the vertex correspondingto the maximum value in | u ′ m +1 ( m + 1) | , · · · , | u ′ m +1 ( n ) | .Since S m is given and U ′S m ∪ v, R m is a diagonal matrix, thisalso corresponds to selection of v such that magnitude valueof the det( U ′S m ∪ v, R m ) is the maximum among all possible v selections.But because U ′S m ∪ v, R m is obtained from U by doingGaussian elimination, the two determinants are equal, i.e., | det( U S m ∪ v, R m ) | = | det( U ′S m ∪ v, R m ) | , and since the current ( m + 1) th iteration chooses the maximum absolute value pivot,given S m that sample maximizes | det( U S m ∪ v, R m ) | .We now show that the vertex v ∗ is selected in the ( m + 1) th iteration according to the following rule: v ∗ = arg max v ∈S cm dist ( d v , span( d , · · · , d m )) , where dist ( · , · ) is the distance between a vector and its orthog-onal projection onto a vector subspace. Thus, this optimizationis equivalent to selecting a vertex v that maximizes the volumeof the contained parallelepiped, Vol ( d , · · · , d m , d v ) .Let h be a unit vector along the direction of ( m + 1) th column of U ′ in (18). We are interested in ﬁnding the vertex v that maximizes | h ( v ) | . Proposition 2.

The signal value h ( v ) is the length of projec-tion of d v on h .Proof. The signal h belongs to the bandlimited space, h ∈ span( U R m ) . Thus, using (11) we have that: h ( v ) = h h , d v i . Since h is a unit vector, the last expression in the equationabove is the projection length of d v on h .So | h ( v ) | is maximized when |h d v , h i| is maximized. Proposition 3.

The signal h is such that h ∈ span( d , · · · , d m ) ⊥ ∩ span( U R m ) .Proof. All pivots in the Gaussian elimination of U S m ∪ v, R m are non-zero, as seen in (18), so that the following equivalentstatements follow: • U S m , m is full rank. • U R m U T R m I S m is full column rank. • span( d , · · · , d m ) has dimension m .The second statement follows (0.4.6 (b) [25]) from the ﬁrstbecause U R m has full column rank and U S m , m is nonsin-gular. Given that span( d , · · · , d m ) has dimension m we canproceed to the orthogonality arguments. d d d m h ˜ D m (a) Orthogonality of h with ˜ D m . d v P ˜ D m ( d v )˜ D m |h h , d v i| (b) Components of d v .Fig. 1. Geometry of SP. By deﬁnition, h obtained from (18) is zero over the set S m so that, from Proposition 1: h (1) = 0 = ⇒ h h , d i = 0 , ... h ( m ) = 0 = ⇒ h h , d m i = 0 , and therefore h is orthogonal to each of the vectors d , · · · , d m . We call the space spanned by those vectors ˜ D m ,deﬁned as ˜ D m = { span( d i ) | i ∈ { , , · · · , m }} . Since dimension of U R m is m + 1 and h is orthogonal to ˜ D m = span( d , · · · , d m ) of dimension m , span ( h ) is theorthogonal complement subspace to ˜ D m (see Fig. 1a for anillustration).For this particular algorithm, R changes with the num-ber of samples in the sampling set. At the end of m th iteration the bandwidth support R can be represented as R m = { , · · · , m + 1 } , where m is the number of samplesin the current sampling set. We use ˜ D and ˜ D m to denote adependence of D and D m on R m in addition to S m . Proposition 4.

The sample v ∗ selected in the ( m + 1) th iteration of SP maximizes the distance between d v and itsorthogonal projection on ˜ D m .Proof. Since d v ∈ span( U R m ) , it can be resolved in to twoorthogonal components with respect to the two orthogonal(Prop. 3) spaces ˜ D m and span ( h ) . d v = P ˜ D m d v + h d v , h i h , where h has unit magnitude and P ˜ D m is the projection matrixonto the subspace ˜ D m .Maximizing | h ( v ) | is equivalent to maximizing |h d v , h i| which can be expressed in terms of magnitude of d v and themagnitude of its projection on ˜ D m . arg max v h d v , h i = arg max v k d v k − (cid:13)(cid:13) P ˜ D m d v (cid:13)(cid:13) (19)Fig. 1b shows this orthogonality relation between d v , |h h , d v i| , and P ˜ D m ( d v ) . Algorithm 3

Volume interpretation of SP algorithm as k → ∞ procedure SP( L , s ) S ← ∅R ← { } while |S| < s do Compute d v , v ∈ V for R ˜ D ← span v ∈V ( d v ) v ∗ ← arg max v k d v k − k P ˜ D d v k S ← S ∪ v ∗ R ← R ∪ |R| + 1 end whilereturn S end procedure So the v ∗ chosen is the one which maximizes the volumeof the space spanned by the ﬁltered delta signals. v ∗ = arg max v ∈S cm Vol ( d , · · · , d m , d v ) . The last line follows from using the deﬁnition of volumeof parallelepiped [27]. Note that, although Proposition 4could have been derived from the determinant property inProposition 1 using (8), the approach using the orthogonalvector to the subspace in Proposition 2, Proposition 3 andProposition 4 makes more explicit the geometry of the prob-lem. We summarize this SP algorithm interpretation from thevolume maximization point of view in Algorithm 3. This newunderstanding helps us relate volume maximization with someexisting sampling methods in literature.

C. Eigendecomposition-free methods as volume maximization

So far we have covered existing literature on D-optimalityboth in general and as it relates to graphs, and proposed twoalgorithms towards that goal. From (9e), note that the greedyupdate for approximate volume maximization is v ∗ = arg max v ∈S cm k d v k − k P D m d v k . Based on this we can revisit some eigendecomposition-freealgorithms for graph signal sampling from the perspectiveof volume maximization. For each of these algorithms, weconsider the criterion to add a vertex to the sampling set inthe ( m + 1) th iteration. First, the WRS algorithm can be seen as neglecting the projection term and sampling based only on k d v k . Alternatively, the SP algorithm approximates this by v ∗ = arg max v ∈S cm k d v k − (cid:13)(cid:13) P ˜ D m d v (cid:13)(cid:13) (20)for a ﬁnite value of parameter k and a varying ˜ D m inplace of D m — see Section IV-B. The generalization to theeigendecomposition-free approach of [28](V2) proposes tomaximize (using the greedy selection in Equation (31)): v ∗ = arg max v ∈S cm k d v k − X w ∈S m h| d w | , | d v |i , but in practice maximizes a different expression — see (32)in [18].The crucial difference between our proposed method and[18] is that we obtain a speciﬁc expression to be maximizedthrough D-optimality. Whereas [18] clearly shows the relationbetween various experiment design objective functions andtheir corresponding localization operators, the relation betweenthe algorithm proposed in [18], LSSS, and the experimentdesign objective functions is unclear.The similarities in the optimization objective function forvarious eigendecomposition free sampling methods are sum-marized in Table II.V. E XPERIMENTAL SETTINGS

To evaluate our sampling algorithms, we assess their sam-pling performance on different graph topologies at differentsampling rates.

A. Signal, Graph Models and Sampling setups

With a perfectly bandlimited signal most sampling schemesachieve similar performance in terms of reconstruction error.However, in practice signals are rarely perfectly bandlimitedand noise-free. Therefore, it is necessary to compare theperformance of the sampling methods on non-ideal signals.

1) Signal smoothness and graph topologies: : Consider ﬁrsta synthetic noisy signal model. The signal f is bandlimitedwith added noise n . The resulting signal is f = x + n which can be expressed as U F ˜ f F + n with the frequencycomponents of x , and noise being random variables distributedas multivariate normal distributions: ˜ f F ∼ N ( , c I FF ) , n ∼ N ( , c I VV ) . The constants c and c are chosen sothat the expected signal power is and the expected noisepower is . . Since our main objective is to study the effectof varying number of samples, graph topologies, and the graphsize on DC and AVM, we ﬁx the bandwidth of the signal to .We compare our algorithms against three established algo-rithms — WRS, SP, and LSSS All methods except WRS returnunique samples. For a fair comparison, all sampling methodsare evaluated under conditions where the same number ofsamples is obtained, irrespective of whether the returned onesare unique or not (which could occur in the case of [17]).We use the combinatorial Laplacian for our sampling andreconstruction experiments, except for the classiﬁcation exper-iment where the normalized Laplacian of the nearest neighborsgraph is used, as it achieves overall better classiﬁcation accu-racy.

2) Sampling set sizes:

We use two graph sizes and . Except for the Erd˝os R´enyi graph model, we use theGrasp [29] and GSPBox [30] MATLAB toolboxes to generatethe graph instances — see Table III.We use sampling set sizes from samples to samplesto compare the variation of reconstruction error. For comparingalgorithms in this setting, we do not show the full range ofreconstruction SNRs from WRS because its SNR is usually 5-10 dBs lower than other methods at starting sampling rate of60 (see Tables VI and VII for performance at higher samplingrates).

3) Classiﬁcation on real world datasets:

In this exper-iment we evaluate sampling algorithms in a transductivesemi-supervised learning setting for a digit classiﬁcation task(USPS dataset). We randomly select 10 smaller datasets ofsize 1000 from the original dataset, such that each smallerdataset contains 100 elements from each category, i.e., the10 digits. Using those smaller datasets we construct a nearestneighbors graph with 10 neighbors. This setup is same as in[7]. The graph sampling and reconstruction algorithms thenselect number of samples ranging from to . Usingone-vs-all strategy we reconstruct the class signals, and thenclassify by selecting the class which gives the maximumreconstruction in magnitude at a vertex. We then report theaverage accuracy of the classiﬁcation over the 10 smaller sets.

4) Effect of scaling graph sizes:

One of our primary goalsis to develop fast and scalable algorithms. To evaluate theseproperties, we use a random sensor nearest neighbors graphwith 10 nearest neighbors and the sizes of the graph vary as500, 1000, 2000, 4000, 8000. For each graph size we take150 samples. We report the SNR of the reconstruction and thetime required to sample.We will now describe how algorithms are initialized.

B. Initialization details

We wish to evaluate all the algorithms on an equal footing.So for evaluating graph squared coherences, we use the samenumber of random vectors O (10 log( n )) and an order 30polynomial wherever ﬁltering is needed for the WRS, DC andAVM methods. The degree of the polynomial is selected so asto be larger than the diameter of most graphs we consider.The various algorithms we consider have some hard-codedparameter values. SP has just one parameter k to which we as-sign k = 4 . LSSS has a few more parameters to tune like ν, η .In [18] the parameter ν is chosen experimentally to be 75, butin our experiments we run the LSSS algorithm on a wide rangeof ν values around 75 — ν = [0 . , . , , , ,and select the value of ν that maximizes SNR. We chosethis wide range of values experimentally, as we observedcases where optimal reconstruction SNR was achieved at ν values differing from the proposed 75 by several orders ofmagnitude when consider different topologies, graph sizes andLaplacians. As for the sampling times, we choose the samplingtime corresponding to the ν chosen as per the maximum SNR.We experimentally determine η the same way as in the originalimplementation. For the DC algorithm, we choose ∆ = 0 . . TABLE IIA

PPROXIMATION TO GREEDY MAXIMIZATION OF DETERMINANT . LSSS - F

OR IMPLEMENTATION DETAILS REFER TO S ECTION

IV-CSampling method Selection process ApproximationExact greedy arg max v ∈S cm k d v k − k P D m d v k -WRS p ( v ) ∝ k d v k No projection.SP arg max v ∈S cm k d v k − (cid:13)(cid:13)(cid:13) P ˜ D m d v (cid:13)(cid:13)(cid:13) Projection space approximate and increasing in size.LSSS arg max v ∈S cm k d v k − P w ∈S m h| d w | , | d v |i Inner product approximation for projection.AVM arg max v ∈S cm k d v k − P w ∈S m d w ( v ) k d w k Assumption of orthogonality.TABLE IIIT

YPES OF GRAPHS IN THE EXPERIMENTS

Graph model InstanceRandom sensor knn grasp_plane_knn(n,8)

Scale-free grasp_barabasi_albert(n,8)

Community param.Nc = 5gsp_community(n, param)

WS/Small world grasp_watts_strogatz(n, 5, 0.2)

Erd˝os R´enyi erdos_renyi(n,0.02)

C. Reconstruction techniques

We denote the sampled signal f S and the lowpass fre-quencies of the original signal by ˜ f F = U T F f . The idealreconstruction which minimizes the mean square error usingthe sampled signal is given by the least squares solution to (cid:13)(cid:13)(cid:13) U SF ˜ f F − f S (cid:13)(cid:13)(cid:13) . Other existing methods of reconstruction are— using a linear combination of tailored kernels as seen inLSSS, or solving a regularized least squares as seen in BS-GDA. However, since we are primarily interested in comparingthe sampling sets generated by various algorithms on an evenfooting, we use the least squares solution for reconstructionwhich we compute by assuming that we know the graphFourier basis. To achieve best results for WRS, instead ofleast squares solution we use the recommended weighted leastsquares [17], although it is slightly different from what we usefor all other algorithms. We use Moore-Penrose pseudo inversefor all our least squares solutions.VI. R ESULTS

We now evaluate the performance of our algorithm basedon its speed and on how well it can reconstruct the originalsignal.

A. Reconstruction error

In this paper we look at the mean squared error in recon-structing the signal f . So we measure the error (cid:13)(cid:13)(cid:13) ˆ f − f (cid:13)(cid:13)(cid:13) in thereconstructed signal with respect to the original noisy signal,where ˆ f is the reconstructed signal.For our experiments, we plot the SNR averaged over 50different graph instances of each model of graph, with a newsignal generated for each graph instance Fig. 2. The two graphmodels where we observe a lesser SNR for AVM algorithmare the Random sensor nearest neighbors and the Communitygraph models which we discuss next. For the remaining graphmodels the reconstruction SNR from DC and AVM samplingis comparable to other algorithms, such as SP and LSSS. In fact, we ﬁnd the sampling from AVM to be comparativelysatisfactory considering that we are reporting the maximumSNRs for LSSS over 5 different parameter values.In Fig. 2f we notice that for Random sensor nearest neigh-bors graphs of size 500 we need more samples to achievecompetitive performance. To better understand this, considera graph consisting of two communities. In the original volumemaximization a sampling set from only one community wouldgive a volume of zero, and that sampling set would never beselected by an greedy exact volume maximization. However,because AVM is only an approximation to the greedy volumemaximization, sampling from only one community is possiblealthough unlikely. More generally, this approximation affectsweakly connected graphs such as random sensor graphs witha knn construction with a small number of nearest neigh-bors. More speciﬁcally, this approximation affects communitygraphs at low sampling rates as we can see in Figures 2c and2h.The issue, however, is no longer critical for larger graphsas we see when we increase the number of samples to 150 —(Table VII), our algorithm performance is comparable to thatof other state-of-the-art algorithms. Note that we do not facethis issue for Erd˝os R´enyi graph instances in our experiments,since these are almost surely connected as the probability ofconnection exceeds the sharp threshold [31].For the USPS dataset classiﬁcation, we do observe a signiﬁ-cant drop in the classiﬁcation accuracy for the samples chosenusing the DC algorithm. However, the classiﬁcation accuracyfor AVM on the USPS dataset is at par with the remainingalgorithms.Next we evaluate how the complexity of AVM scales withthe graph size. B. Speed

Using the setup from Section V-A, we compare the samplingtimes for WRS, SP, LSSS, BS-GDA and AVM algorithms. Weexclude the DC algorithm from these comparisons becausethe distance evaluations in DC, which provide good intuition,make DC signiﬁcantly slower as compared to AVM. Weinclude BS-GDA since it is one of the lowest complexityapproaches among eigendecomposition-free algorithms. Weuse the Random sensor graph from the GSPbox [30] with 20nearest neighbors.In our comparison we use the implementations of WRS,SP, LSSS and BS-GDA distributed by their respective authors,and run them on MATLAB 2019b along with our proposed

50 100 150 200789 (a) Random sensor knn graph, size 1000

50 100 150 200910 (b) Scale free graph, size 1000

50 100 150 200246810 (c) Community graph, size 1000

50 100 150 2008910 (d) WS graph, size 1000

50 100 150 20078910 (e) Erd˝os R´enyi graph, size 1000

50 100 150 20078910 (f) Random sensor knn graph, size 500

50 100 150 20091011 (g) Scale free graph, size 500

50 100 150 200 − (h) Community graph, size 500

50 100 150 200910 (i) WS graph, size 500

50 100 150 2008910 (j) Erd˝os R´enyi graph, size 500

50 100 150 2000 . . . . (k) Classiﬁcation accuracies WRSSPLSSSDCAVM

Fig. 2. Comparison of eigendecomposition free methods in the literature. x-axis: number of samples selected. y-axis: average SNR of the reconstructedsignals. We do not include entire range of SNR from WRS based reconstruction because of its comparatively wider range. algorithm. We could possibly improve on the existing imple-mentations using specialized packages for functionalities suchas eigendecomposition, but to remain faithful to the originalpapers we use their codes with minimal changes. Wherever thetheoretical algorithms in the papers conﬂict with the providedimplementations we go with the implementation since that waspresumably what the algorithms in the papers were timed on. To minimize the effect of other processes running at dif-ferent times, we run the sampling algorithms in a roundrobin fashion. We do this process for multiple iterationsand different graph topologies. We time the implementationson an Ubuntu HP Z840 Workstation, which naturally hasplenty of background processes running. The changes in theirresource consumption affects our timing. It is impossible to − Number of nodes in the graph E x ec u ti on ti m e ( s ec s . ) WRSSPLSSSBS-GDAAVM (a) Random sensor graphs with 20 nearest neighbour connections − Number of nodes in the graph E x ec u ti on ti m e ( s ec s . ) WRSSPLSSSBS-GDAAVM (b) Community graphs with 10 communitiesFig. 3. Visualizing average sampling times of four algorithms over 50 iterations on community graphs with 10 communities. Execution times for LSSS areaveraged over executions for different parameter values. . . . . Execution time(secs.) S N R o fr ec on s t r u c t e d s i gn a l s WRSSPLSSSBS-GDAAVM (a) Random sensor graphs with 20 nearest neighbour connections. , , , , . . Execution time(secs.) S N R o fr ec on s t r u c t e d s i gn a l s WRSSPLSSSBS-GDAAVM (b) Community graphs with 10 communitiesFig. 4. Scatter plot of the SNR vs execution time for graph size 8000. Axis for execution time is reversed, so results on the top right are desirable.TABLE IVE

XECUTION TIME ( SECS .) FOR SAMPLING , R

ANDOM SENOR GRAPHS WITH NEAREST NEIGHBORS |V|

WRS SP LSSS BS-GDA AVM Overhead(AVM)

500 0 .

21 2 .

73 0 .

62 0 .

15 0 .

53 2 . ,

000 0 .

46 7 .

42 2 . .

64 0 .

91 1 . ,

000 1 . .

56 9 .

91 3 .

32 1 .

87 1 . ,

000 2 .

55 65 .

09 37 .

91 18 .

13 4 .

21 1 . ,

000 6 .

16 165 .

85 113 .

87 96 .

48 9 .

49 1 . stop virtually all background processes, so we try to reducetheir impact in two ways. We iterate over each samplingscheme 50 times and report the averages. Secondly, insteadof completing iterations over the sampling schemes one byone, we call all the different sampling schemes in the same TABLE VE

XECUTION TIME ( SECS .) FOR SAMPLING , C

OMMUNITY GRAPHS WITH COMMUNITIES |V|

WRS SP LSSS BS-GDA AVM Overhead(AVM)

500 0 .

18 3 .

91 0 .

61 0 .

14 0 .

47 2 . ,

000 0 .

45 21 .

67 2 .

99 0 .

66 0 .

85 1 . ,

000 1 .

03 125 .

95 13 .

67 3 . .

78 1 . ,

000 2 .

57 597 .

21 66 .

47 28 .

57 4 .

27 1 . ,

000 6 .

79 2 , .

92 270 .

25 188 .

81 9 .

38 1 . iteration. These minor precautions help us mitigate any effectsof background processes on our timing.We observe as the size of the graph increases, AVM is onlyslightly slower compared to WRS. It is orders of magnitudefaster than SP, LSSS and the BS-GDA algorithm — see Tables TABLE VISNR

S OF THE RECONSTRUCTED SIGNALS ON RANDOM SENSOR GRAPHS |V|

WRS SP LSSS BS-GDA AVM

500 6 . .

78 9 .

83 8 .

96 9 . ,

000 7 .

89 9 .

69 9 .

81 9 .

25 9 . ,

000 8 .

23 9 .

38 9 .

39 9 .

24 9 . ,

000 7 .

99 9 .

54 9 .

51 9 .

36 9 . ,

000 8 .

11 8 .

94 8 .

98 8 .

94 8 . TABLE VIISNR

S OF THE RECONSTRUCTED SIGNALS ON COMMUNITY GRAPHS |V|

WRS SP LSSS BS-GDA AVM

500 6 .

41 10 .

14 10 . − .

61 9 . ,

000 7 .

09 9 . .

62 7 .

13 8 . ,

000 7 .

16 9 .

61 9 .

65 9 .

13 9 . ,

000 7 .

53 9 .

43 9 .

44 9 .

06 9 . ,

000 8 .

04 9 .

58 9 .

56 9 .

16 9 . IV and V, while having a very small impact on the SNR ofthe reconstructed signal — Tables VI and VII. The executiontime also scales very well with respect to the graph size Fig.3a.We also report the relative execution times using the over-head rate, the ratio of execution times of two algorithms. Wecompute this overhead for the AVM algorithm vs the WRSalgorithm pair.Overhead(AVM) = Execution time of proposed AVM algorithmExecution time of WRSMost existing graph sampling algorithms consider WRS asthe fastest sampling algorithm and benchmark against it. Byspecifying our overhead rates with respect to WRS we canindirectly compare our algorithm with myriad others withoutdoing so one by one. We report these factors for various graphsizes in Tables IV and V.To justify the increase in the speed of execution comparedto the slight decrease in the SNR, we plot the SNR versusthe Execution time for the different algorithms we compared.Ideally we want an algorithm with fast execution and goodSNR. From Fig. 4a, 4 we see that our algorithm ﬁts thatrequirement very well.Of course with different graph types, the SNR vs Executiontime performance of AVM may vary. But what we alwaysexpect this algorithm to deliver is execution times similar toWRS while having a signiﬁcant improvement in the SNR.In a way, this algorithm bridges the gap between existingEigendecomposition free algorithms and WRS.Sometimes the system requirements are not about selecting s samples fast, but rather going from an existing selection of s samples to s + 1 samples fast. For our AVM algorithm, goingfrom s samples to s + 1 samples requires us to sample n+1samples all over again. In this paper we do not evaluate itsperformance for that scenario.

1) Coherence computations:

In our AVM algorithm Stage1 is the bottleneck because it involves T iterations along witha product with |E| and |V| .We think that the size of the graph has an indirect impacton the number of iterations T needed and a direct impact on number of computations due to |E| and |V| . Very large graphscan place an upper limit on the number of computations wecan do. In this situation we note that the stage 1 of computingsquared coherences and λ s is an approximation, and we cando a coarser approximation instead. C. Complexity

In this section we denote the total number of samplesto take by s . The complexity of the proposed Algorithm 2is O (( |E| + |V| ) dT log |V| ) + O ( s |V| + s ( |E| + |V| ) d ) . Theﬁrst term corresponds to the squared coherence computationsand the second term corresponds to the inner products andsearching for the next vertex. For a connected graph we knowthat |E| ≥ |V| − so in that case the complexity is simply O ( |E| dT log |V| ) + O ( s |E| d ) .The cost of ﬁnding squared coherences is O ( |E| dT log |V| ) ,where d is the polynomial degree and T are the numberof iterations to converge to the right λ s . Given s samplesat a point during the iterations in the algorithm, ﬁnding andnormalizing ﬁltered signal takes O ( d ( |E| + |V| )) . Subtractionand ﬁnding the maximum takes O ( |V| ) steps. We do this s times which makes the total computational complexity to be O ( s |V| + s ( |E| + |V| ) d ) .VII. C ONCLUSION

Most sampling schemes perform reasonably well whendealing with perfectly bandlimited signals. However, in thepresence of noise or the signal not being perfectly bandlimited,some schemes perform much better. In the scenario that onlya limited number of samples can be chosen, we would liketo use an algorithm that can perform well without requiringexpensive operations.The algorithms presented in this paper rely on the intuitionof looking at the problem as maximizing the volume of theparallelepiped formed by the lowpass signals corresponding tothe sampled vertices. This helps us to develop intuitive and fastgraph signal sampling algorithms. The volume maximizationframework also helped to connect various existing algorithmstogether.The sampling algorithm we developed reaches speedsachieved by WRS, but with a large improvement in re-construction accuracies. The accuracies are comparable withother contemporary algorithms but at the same time providesigniﬁcant improvements in speed.VIII. A

CKNOWLEDGEMENTS

This work is supported in part by NSF under grants CCF-1410009, CCF-1527874, and CCF-2009032 and by a gift fromTencent. A

PPENDIX AP ROOF OF EIGENVECTOR CONVERGENCE

Lemma 2.

There exists a signal φ in the orthogonal subspaceto ψ ∗ with φ ( S m ) = , k φ k = 1 whose out of bandwidthenergy is a minimum value c = 0 . Proof.

The set of signals { φ : φ ( S m ) = , k φ k = 1 } is aclosed set. Let (cid:0) x · · · x n (cid:1) T be in the set for any ǫ . Then (cid:0) x + ǫ/ x · · · x n (cid:1) T is in the ǫ neighborhood. Distanceexists because it is a normed vector space. That vector does nothave kk = 1 so it is not in the set. So for every ǫ -neighborhood ∃ a point not in the set. So every point is a limit point andthe set is a closed set.Out of bandwidth energy is a continuous function on ourset. Let v , v be such that v , v ⊥ ψ ∗ and v ( S m ) = , v ( S m ) = . Let us suppose that the Fourier coefﬁcientsfor v and v are ( α , · · · , α n ) T and ( β , · · · , β n ) T . Then wewant n X i = m +2 ( α i − β i ) < ǫ (A.1)for some δ where k v − v k < δ . We can show that (A.1)holds when δ = ǫ/ .Since the set is closed and the function is continuous onthe set, the function attains a minimum value. Minimum valuecannot be zero because there is a unique signal ψ ∗ with thatproperty, and we are looking in a space orthogonal to ψ ∗ . Sothere is a signal with minimum out of bandwidth energy of c where c > .Next, this appendix shows the proof for the l convergencefrom Theorem 1. Proof.

Let us look at a particular step where we have alreadyselected S m vertices. The solutions to the following optimiza-tion problems are equivalent. ψ ∗ k = arg min ψ ψ T L k ψψ T ψ = arg min ψ , k ψ k =1 ψ T L k ψ . Therefore, we will consider solutions with k ψ k = 1 .Let us consider the space of our signals. φ ( S m ) = , φ ∈R n is a vector space. Dimension of this vector space is n − m .For any k , let us represent our solution for k as ψ = α ψ ∗ + α ψ ⊥ . Here ψ ⊥ is a vector in the orthogonal subspace to ourvector ψ ∗ . We can do this because we have a vector spaceand it has ﬁnite dimensions. One condition on our signal isthat α + α = 1 , k ψ ∗ k = 1 , (cid:13)(cid:13)(cid:13) ψ ⊥ (cid:13)(cid:13)(cid:13) = 1 . Furthermore, weknow the Fourier transform of our two signal components. ψ ∗ F −→ U T ψ ∗ = (cid:2) γ · · · γ m +1 · · · (cid:3) T = γ , ψ ⊥ F −→ U T ψ ⊥ = h β ... β n i T = β . Our signal can be written as (cid:2) ψ ∗ ψ ⊥ (cid:3) (cid:20) α α (cid:21) = (cid:2) ψ ∗ ψ ⊥ (cid:3) α . Our objective function becomes the following: α T (cid:20) ψ ∗ T ψ ⊥ T (cid:21) L k (cid:2) ψ ∗ ψ ⊥ (cid:3) α = α T (cid:20) γ T β T (cid:21) Σ k (cid:2) γ β (cid:3) α = α T (cid:20) P m +1 i =1 γ i σ ki P m +1 i =1 γ i β i σ ki P m +1 i =1 γ i β i σ ki P ni = i β i σ ki (cid:21) α = α T (cid:20) a bb d (cid:21) α . In the last equation a, b, c, d are just convenient notations forthe scalar values in the × matrix. Note that a, d > because σ i > and the expression is then a sum of positive quantities.Note that we want to minimize the objective function subjectto the constraint k α k = 1 . We solve this optimization problemby a standard application of the KKT conditions [32].The Lagrangian function corresponding to our constrainedminimization problem is as follows: L ( α , λ ) = α T (cid:20) a bb d (cid:21) α + λ ( α T α − . The solution which minimizes this objective function is theeigenvector of the matrix (cid:20) a bc d (cid:21) with the minimum eigen-value. To prove that we take the gradient of the equation withrespect to α and put it to .This gives us two ﬁrst order equations. aα + bα + λα = 0 , bα + dα + λα = 0 .α = 0 implies α = 0 unless b = 0 and vice versa. Both α and α cannot be zero at the same time otherwise oursolution does not lie in our domain of unit length vectors.However either of α or α can be only if b = 0 . If b = 0 ,for large k the solution is given by α = 0 , α = 1 becausewe show next that a/d can be made less than / for k > k .We now analyze the case where b = 0 and so a + λ = 0 and d + λ = 0 . Writing α in terms of α for both the equationswe get: α = − ( a + λ ) b α , α = − bd + λ α . Equating both the expressions for α (and assuming α =0 )gives us a quadratic with two solutions. Since a, d > thepositive sign gives us the λ with lower magnitude. λ = − ( a + d ) + p ( a − d ) + 4 b . We now use the condition that the solution has norm one.Solution of this gives us a value for α . α = ∓ a − d + p ( a − d ) + 4 b r(cid:16) a − d + p ( a − d ) + 4 b (cid:17) + 4 b . We look at the absolute value of the α . | α | = 2 | b | q ( − ( a − d ) + p ( a − d ) + 4 b ) + (2 b ) Let us ﬁnd a k such that | b | /d < ǫ/ ( ǫ > ) for all k > k .This will also make a/d < / . This will help us make theentire expression less than ǫ for k > k (A.3). We upper bound b in the following way. | b | = | m +1 X i =1 β i γ i σ k |≤ vuut m +1 X i =1 β i γ i vuut m +1 X i =1 σ ki ≤ . √ m + 1 σ km +1 = b . Now we know that the least possible value of d is c σ km +2 .So when k > k we get the following upper bound for | b | /d in terms of ǫ : | b | d ≤ √ m + 1 σ km +1 c σ km +2 . We want this to be less than ǫ/ , which gives us our conditionon k . √ m + 1 σ km +1 c σ km +2 < ǫ k > & log( m + 1) / /ǫ + log(2 /c )log σ m +2 σ m +1 ' . (A.2) a/d also admits a similar analysis. ad ≤ σ km +1 c σ km +2 k > & log(2 /c )log σ m +2 σ m +1 ' . (A.3)Since this value of k is equal or lesser than the value of k required for | b | /d < ǫ/ , for our theorem we will takethe value (A.2). When d divides both the numerator anddenominator of the equation it gives us the expressions weneed in terms of a/d and | b | /d . | α | = | b/d | r(cid:16) − a/d + p ( a/d − + 4( b/d ) (cid:17) + (2 b/d ) < ǫ | − a/d + p (1 − a/d ) + ǫ | < ǫ | − a/d ) | < ǫ. This implies that as k increases the coefﬁcient of out-of-bandwidth component goes to zero. Because the out-ofbandwidth signal has ﬁnite energy, the signal energy goes tozero as α → . Whether ψ ∗ k converges to ψ ∗ or − ψ ∗ is amatter of convention. Hence as k → ∞ , ψ ∗ k → ψ ∗ .A PPENDIX BJ USTIFICATION FOR IGNORING TARGET BANDWIDTH WHILESAMPLING

We know that selecting the right U SF matrix is essentialto prevent a blow up of the error while reconstructing using(1). In practice for reconstruction the bandwidth is f ≤ s .However, for our AVM sampling algorithm we chose thebandwidth to be |R| = s instead. We next address why thatis a logical choice with respect to D-optimality.The matrix U T SS U SS is positive deﬁnite following fromour initial condition of the set S being a uniqueness set. Thisprovides us with the needed relations between determinants.We can see that U T SF U SF is a submatrix of U T SS U SS . U T SS U SS = (cid:20) U T SF U SF U T SF U S ,f +1: s U T S ,f +1: s U SF U T S ,f +1: s U S ,f +1: s (cid:21) . This helps us to relate the matrix and its submatrix determi-nants using Fischer’s inequality from Theorem 7.8.5 in [25]. det( U T SS U SS ) ≤ det( U T SF U SF ) det( U T S ,f +1: s U S ,f +1: s ) (B.1)The determinant of the matrix U T S ,f +1: s U S ,f +1: s can bebounded above. The eigenvalues of U T S ,f +1: s U S ,f +1: s arethe same as the non-zero eigenvalues of U S ,f +1: s U T S ,f +1: s by Theorem 1.2.22 in [25]. Using eigenvalue interlacingTheorem 8.1.7 from [33], the eigenvalues of the matrix U S ,f +1: s U T S ,f +1: s are less than or equal to 1 because it issubmatrix of U V ,f +1: s U T V ,f +1: s whose non-zero eigenvaluesare all . As determinant of a matrix is the product of itseigenvalues, the following bound applies: det( U T S ,f +1: s U S ,f +1: s ) ≤ . (B.2)Using (B.1) and (B.2) and positive deﬁniteness of thematrices, we now have a simple lower bound for our criteriaunder consideration: | det( U T SS U SS ) | ≤ | det( U T SF U SF ) | . (B.3)Thus, for example it is impossible for | det( U T SS U SS ) | to beequal to some positive value while | det( U T SF U SF ) | being halfof that positive value.To summarize, instead of aiming to maximize | det( U T SF U SF ) | , we aimed to maximize | det( U T SS U SS ) | . This intu-itively worked because optimizing for a D-optimal matrixindirectly ensured a controlled performance of the subset ofthat matrix. In this way due to the relation (B.3), we avoidedknowing the precise bandwidth f and still managed to sampleusing the AVM algorithm.A PPENDIX CA PPROXIMATING G RAM MATRIX BY A DIAGONAL MATRIX

Here we try to estimate how close our approximation of D T m D m to a diagonal matrix is. Towards this goal we deﬁnea simple metric for a general matrix A .Fraction of energy in diagonal = P i A ii P i P j A ij . (C.1)Since this can be a property dependent on the graph topology,we take 5 different types of graphs with 1000 vertices — Scalefree, WRS sensor nearest neighbors, Erd˝os R´enyi, Grid, Line.Using AVM we take varying number of samples ranging from1 to 50. With the bandwidth f taken to be 50, we average thefraction of the energy (C.1) over 10 instances of each graphand represent it in Fig. C.1.We observe more than 0.75 fraction of energy in thediagonal of the matrix D T m D m , which justiﬁes this approx-imation. According to our experiments, which are not pre-sented here, the inverse of the matrix ( D T m D m ) − is notas close to a diagonal matrix as D T m D m is to a diagonalmatrix. Nevertheless, in place of ( D T m D m ) − we still use diag (cid:16) / k d k , · · · , / k d m k (cid:17) for what it is, an approxima-tion.Note however that the approximation does not hold ingeneral for any samples. It holds when the samples are selected . . . . . Number of samples F r ac ti ono f e n e r gy i nd i a gon a l BARandom sensor knnERGridLine

Fig. C.1. Closeness to diagonal at each iteration. in a determinant maximizing conscious way by Algorithm 2.This approximation is suited to AVM because of its choiceof sampling bandwidth, R . As the number of samples re-quested increases, so does the sampling bandwidth. The higherbandwidth causes the ﬁltered delta signals to become morelocalized causing energy concentration in the diagonal andkeeping the diagonal approximation reasonable and applicable.A PPENDIX

DD-

OPTIMAL SAMPLING FOR GENERIC KERNELS

Another graph signal model is a probabilistic distributioninstead of a bandlimited model [34], [3]. In such cases,covariance matrix is our kernel. The subset selection problemis deﬁned as a submatrix selection of the covariance matrix.Framing the problem as entropy maximization naturally leadsto a determinant maximization approach [35].To deﬁne our problem more formally, let us restrict spaceof all possible kernels to the space of kernels which can bedeﬁned as K = g ( L ) with g deﬁned on matrices but inducedfrom a function from non-negative reals to positive reals g : R ≥ → R > . An example of such a function of L would be ( L + δI ) − . Such a function can be written as a function on theeigenvalues of the Laplacian K = U g ( Σ ) U T . Motivated byentropy maximization in the case of probabilistic graph signalmodel, suppose we wish to select a set S so that we maximizethe determinant magnitude | det( K SS ) | .There are a few differences for solving the new problem,although most of Algorithm 2 translates well. We now wishto maximize | det( U S g ( Σ ) U T S ) | . The expression for the de-terminant update remains the same as before. det (cid:18)(cid:20) D T m D m D T m d v d T v D m d T v d v (cid:21)(cid:19) ≈ det( D T m D m ) det( d T v d v − d T v D m ( D T m D m ) − D T m d v ) Only now we have to maximize the volume of the paral-lelelpiped formed by the vectors d v = U g / ( Σ ) U T δ v for v ∈ S . The squared coherences with respect to our newkernel d T v d v are computed in the same way as before by random projections. The diagonal of our new kernel matrixnow approximates the matrix D T m D m . D T m D m ≈ diag(( U g ( Σ ) U T ) , · · · , ( U g ( Σ ) U T ) nn ) The other difference is that the approximate update stage isgiven by v ∗ ← arg max v ∈S c k d v k − X w ∈S ( U g / ( Σ ) U T d w ) ( v ) k d w k with the difference resulting from the kernel not being aprojection operator. So for a generic kernel with a determinantmaximization objective, Algorithm 2 works the same way withminor modiﬁcations discussed here.R EFERENCES[1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-dergheynst, “The emerging ﬁeld of signal processing on graphs: Ex-tending high-dimensional data analysis to networks and other irregulardomains,”

IEEE Signal Processing Magazine , vol. 30, no. 3, pp. 83–98,2013, 1211.0053.[2] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins,and E. Upfal, “The web as a graph,” in

Proceedings of the nineteenthACM SIGMOD-SIGACT-SIGART symposium on Principles of databasesystems , pp. 1–10, ACM, 2000.[3] X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learningusing gaussian ﬁelds and harmonic functions,” in

Proceedings of the 20thInternational conference on Machine learning (ICML-03) , pp. 912–919,2003.[4] S. Fortunato, “Community detection in graphs,”

Physics reports ,vol. 486, no. 3, pp. 75–174, 2010.[5] M. Crovella and E. Kolaczyk, “Graph wavelets for spatial trafﬁcanalysis,” in

INFOCOM 2003. Twenty-Second Annual Joint Conferenceof the IEEE Computer and Communications. IEEE Societies , vol. 3,pp. 1848–1857, IEEE, 2003.[6] S. Chen, R. Varma, A. Sandryhaila, and J. Kovaˇcevi´c, “Discrete signalprocessing on graphs: Sampling theory,”

IEEE Transactions on SignalProcessing , vol. 63, no. 24, pp. 6510–6523, 2015, 1503.05432.[7] A. Anis, A. Gadde, and A. Ortega, “Efﬁcient sampling set selectionfor bandlimited graph signals using graph spectral proxies,”

IEEETransactions on Signal Processing , vol. 64, no. 14, pp. 3775–3789, 2015,1510.00297.[8] Y. Tanaka, Y. C. Eldar, A. Ortega, and G. Cheung, “Sampling signals ongraphs: From theory to applications,”

IEEE Signal Processing Magazine ,vol. 37, no. 6, pp. 14–30, 2020, 2003.03957.[9] I. Pesenson, “Sampling in paley-wiener spaces on combinatorial graphs,”

Transactions of the American Mathematical Society , vol. 360, no. 10,pp. 5603–5627, 2008, 1111.5896.[10] S. Joshi and S. Boyd, “Sensor selection via convex optimization,”

IEEETransactions on Signal Processing , vol. 57, no. 2, pp. 451–462, 2008.[11] A. Atkinson, A. Donev, and R. Tobias,

Optimum experimental designs,with SAS , vol. 34. Oxford University Press, 2007.[12] H. Shomorony and A. S. Avestimehr, “Sampling large data on graphs,”in

Signal and Information Processing (GlobalSIP), 2014 IEEE GlobalConference on , pp. 933–936, IEEE, 2014, 1411.3017.[13] M. Tsitsvero, S. Barbarossa, and P. Di Lorenzo, “Signals on graphs:Uncertainty principle and sampling,”

IEEE Transactions on SignalProcessing , vol. 64, no. 18, pp. 4845–4860, 2016, 1507.08822.[14] N. Tremblay, P.-O. Amblard, and S. Barthelm´e, “Graph sampling withdeterminantal processes,” in , pp. 1674–1678, IEEE, 2017, 1703.01594.[15] L. F. Chamon and A. Ribeiro, “Greedy sampling of graph signals,”

IEEETransactions on Signal Processing , vol. 66, no. 1, pp. 34–47, 2017,1704.01223.[16] F. Wang, Y. Wang, and G. Cheung, “A-optimal sampling and robustreconstruction for graph signals via truncated neumann series,”

IEEESignal Processing Letters , vol. 25, no. 5, pp. 680–684, 2018.[17] G. Puy, N. Tremblay, R. Gribonval, and P. Vandergheynst, “Randomsampling of bandlimited signals on graphs,”

Applied and ComputationalHarmonic Analysis , 2016, 1511.05118. [18] A. Sakiyama, Y. Tanaka, T. Tanaka, and A. Ortega,“Eigendecomposition-free sampling set selection for graph signals,” IEEE Transactions on Signal Processing , vol. 67, no. 10, pp. 2679–2692,2019.[19] Y. Bai, F. Wang, G. Cheung, Y. Nakatsukasa, and W. Gao, “Fastgraph sampling set selection using gershgorin disc alignment,”

IEEETransactions on Signal Processing , vol. 68, pp. 2419–2434, 2020,1907.06179.[20] A. Jayawant and A. Ortega, “A distance-based formulation for samplingsignals on graphs,” in , pp. 6318–6322, IEEE, 2018.[21] B. Bollob´as,

Modern graph theory , vol. 184. Springer Science &Business Media, 2013.[22] A. C¸ ivril and M. Magdon-Ismail, “On selecting a maximum volume sub-matrix of a matrix and related problems,”

Theoretical Computer Science ,vol. 410, no. 47-49, pp. 4801–4811, 2009.[23] S. A. Goreinov, I. V. Oseledets, D. V. Savostyanov, E. E. Tyrtyshnikov,and N. L. Zamarashkin, “How to ﬁnd a good submatrix,” in

MatrixMethods: Theory, Algorithms And Applications: Dedicated to the Mem-ory of Gene Golub , pp. 247–256, World Scientiﬁc, 2010.[24] A. Deshpande and L. Rademacher, “Efﬁcient volume sampling forrow/column subset selection,” in , pp. 329–338, IEEE, 2010, 1004.4057.[25] R. A. Horn and C. R. Johnson,

Matrix analysis . Cambridge universitypress, 2012.[26] A. Berlinet and C. Thomas-Agnan,

Reproducing kernel Hilbert spacesin probability and statistics . Springer Science & Business Media, 2011.[27] B. Peng, “The determinant: A means to calculate volume,”

Recall ,vol. 21, p. a22, 2007.[28] A. Sakiyama, Y. Tanaka, T. Tanaka, and A. Ortega,“Eigendecomposition-free sampling set selection for graph signals,” arXiv preprint arXiv:1809.01827 , 2018.[29] B. Girault, S. Narayanan, P. Gonc¸alves, A. Ortega, and E. Fleury,“Grasp: A matlab toolbox for graph signal processing,” in , 2017.[30] N. Perraudin, J. Paratte, D. Shuman, L. Martin, V. Kalofolias, P. Van-dergheynst, and D. K. Hammond, “Gspbox: A toolbox for signalprocessing on graphs,” 2016, 1408.5781.[31] P. Erd˝os and A. R´enyi, “On the evolution of random graphs,”

Publ.Math. Inst. Hung. Acad. Sci , vol. 5, no. 1, pp. 17–60, 1960.[32] S. Boyd, S. P. Boyd, and L. Vandenberghe,

Convex optimization .Cambridge university press, 2004.[33] G. H. Golub and C. F. Van Loan,

Matrix computations , vol. 3. JHUPress, 2012.[34] A. Gadde and A. Ortega, “A probabilistic interpretation of samplingtheory of graph signals,” in , pp. 3257–3261,IEEE, 2015, 1503.06629.[35] M. C. Shewry and H. P. Wynn, “Maximum entropy sampling,”