Computer Science Data Structures And Algorithms - Researchain

Featured Researches

CountSketches, Feature Hashing and the Median of Three

In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector v to a vector of dimension (2t??)s , where t,s>0 are integer parameters. It is known that even for t=1 , a CountSketch allows estimating coordinates of v with variance bounded by ?�v ??2 2 /s . For t>1 , the estimator takes the median of 2t?? independent estimates, and the probability that the estimate is off by more than 2?�v ??2 / s ??is exponentially small in t . This suggests choosing t to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant t . Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to O(min{?�v ??2 1 / s 2 ,?�v ??2 2 /s}) when t>1 . That is, the variance decreases proportionally to s ?? , asymptotically for large enough s . We also study the variance in the setting where an inner product is to be estimated from two CountSketches. This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median estimator, can be made more reliable at a small cost in settings where using a median estimator is possible. We confirm our theoretical findings in experiments and thereby help justify why a small constant number of estimates often suffice in practice. Our improved variance bounds are based on new general theorems about the variance and higher moments of the median of i.i.d. random variables that may be of independent interest.

Data Structures And Algorithms

Counting Short Vector Pairs by Inner Product and Relations to the Permanent

Given as input two n -element sets A,B⊆{0,1 } d with d=clogn≤(logn ) 2 /(loglogn ) 4 and a target t∈{0,1,…,d} , we show how to count the number of pairs (x,y)∈A×B with integer inner product ⟨x,y⟩=t deterministically, in n 2 / 2 Ω( lognloglogn/(c log 2 c) √ ) time. This demonstrates that one can solve this problem in deterministic subquadratic time almost up to log 2 n dimensions, nearly matching the dimension bound of a subquadratic randomized detection algorithm of Alman and Williams [FOCS 2015]. We also show how to modify their randomized algorithm to count the pairs w.h.p., to obtain a fast randomized algorithm. Our deterministic algorithm builds on a novel technique of reconstructing a function from sum-aggregates by prime residues, which can be seen as an {\em additive} analog of the Chinese Remainder Theorem. As our second contribution, we relate the fine-grained complexity of the task of counting of vector pairs by inner product to the task of computing a zero-one matrix permanent over the integers.

Data Structures And Algorithms

Cut Sparsification of the Clique Beyond the Ramanujan Bound: A Separation of Cut Versus Spectral Sparsification

We prove that a random d -regular graph, with high probability, is a cut sparsifier of the clique with approximation error at most (2 2 π − − √ + o n,d (1))/ d − − √ , where 2 2 π − − √ =1.595… and o n,d (1) denotes an error term that depends on n and d and goes to zero if we first take the limit n→∞ and then the limit d→∞ . This is established by analyzing linear-size cuts using techniques of Jagannath-Sen '17 derived from ideas from statistical physics and analyzing small cuts via martingale inequalities. We also prove that every spectral sparsifier of the clique having average degree d and a certain high "pseudo-girth" property has an approximation error that is at least the "Ramanujan bound" (2− o n,d (1))/ d − − √ , which is met by d -regular Ramanujan graphs, generalizing a lower bound of Srivastava-Trevisan '18. Together, these results imply a separation between spectral sparsification and cut sparsification. If G is a random logn -regular graph on n vertices, we show that, with high probability, G admits a (weighted subgraph) cut sparsifier of average degree d and approximation error at most (1.595…+ o n,d (1))/ d − − √ , while every (weighted subgraph) spectral sparsifier of G having average degree d has approximation error at least (2− o n,d (1))/ d − − √ .

Data Structures And Algorithms

Cut-Equivalent Trees are Optimal for Min-Cut Queries

Min-Cut queries are fundamental: Preprocess an undirected edge-weighted graph, to quickly report a minimum-weight cut that separates a query pair of nodes s,t . The best data structure known for this problem simply builds a cut-equivalent tree, discovered 60 years ago by Gomory and Hu, who also showed how to construct it using n−1 minimum st -cut computations. Using state-of-the-art algorithms for minimum st -cut (Lee and Sidford, FOCS 2014) arXiv:1312.6713, one can construct the tree in time O ~ (m n 3/2 ) , which is also the preprocessing time of the data structure. (Throughout, we focus on polynomially-bounded edge weights, noting that faster algorithms are known for small/unit edge weights.) Our main result shows the following equivalence: Cut-equivalent trees can be constructed in near-linear time if and only if there is a data structure for Min-Cut queries with near-linear preprocessing time and polylogarithmic (amortized) query time, and even if the queries are restricted to a fixed source. That is, equivalent trees are an essentially optimal solution for Min-Cut queries. This equivalence holds even for every minor-closed family of graphs, such as bounded-treewidth graphs, for which a two-decade old data structure (Arikati et al., J.~Algorithms 1998) implies the first near-linear time construction of cut-equivalent trees. Moreover, unlike all previous techniques for constructing cut-equivalent trees, ours is robust to relying on approximation algorithms. In particular, using the almost-linear time algorithm for (1+ϵ) -approximate minimum st -cut (Kelner et al., SODA 2014), we can construct a (1+ϵ) -approximate flow-equivalent tree (which is a slightly weaker notion) in time n 2+o(1) . This leads to the first (1+ϵ) -approximation for All-Pairs Max-Flow that runs in time n 2+o(1) , and matches the output size almost-optimally.

Data Structures And Algorithms

Data stream fusion for accurate quantile tracking and analysis

UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summaries that are fused into a new summary related to the union of the streams (or datasets) processed by the input summaries whilst preserving both the error and size guarantees provided by UDDSKETCH. This property of sketches, known as mergeability, enables parallel and distributed processing. We prove that UDDSKETCH is fully mergeable and introduce a parallel version of UDDSKETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version of DDSKETCH, showing through extensive experimental results that our parallel algorithm almost always outperforms the parallel DDSKETCH algorithm with regard to the overall accuracy in determining the quantiles.

Data Structures And Algorithms

Delay Optimization of Combinational Logic by And-Or Path Restructuring

We propose a dynamic programming algorithm that constructs delay-optimized circuits for alternating And-Or paths with prescribed input arrival times. Our algorithm fulfills best-known approximation guarantees and empirically outperforms earlier methods by exploring a significantly larger portion of the solution space. Our algorithm is the core of a new timing optimization framework that replaces critical paths of arbitrary length by logically equivalent realizations with less delay. Our framework allows revising early decisions on the logical structure of the netlist in a late step of an industrial physical design flow. Experiments demonstrate the effectiveness of our tool on 7nm real-world instances.

Data Structures And Algorithms

Deletion to Induced Matching

In the DELETION TO INDUCED MATCHING problem, we are given a graph G on n vertices, m edges and a non-negative integer k and asks whether there exists a set of vertices S⊆V(G) such that |S|≤k and the size of any connected component in G−S is exactly 2. In this paper, we provide a fixed-parameter tractable (FPT) algorithm of running time O ∗ ( 1.748 k ) for the DELETION TO INDUCED MATCHING problem using branch-and-reduce strategy and path decomposition. We also extend our work to the exact-exponential version of the problem.

Data Structures And Algorithms

Density Sketches for Sampling and Estimation

We introduce Density sketches (DS): a succinct online summary of the data distribution. DS can accurately estimate point wise probability density. Interestingly, DS also provides a capability to sample unseen novel data from the underlying data distribution. Thus, analogous to popular generative models, DS allows us to succinctly replace the real-data in almost all machine learning pipelines with synthetic examples drawn from the same distribution as the original data. However, unlike generative models, which do not have any statistical guarantees, DS leads to theoretically sound asymptotically converging consistent estimators of the underlying density function. Density sketches also have many appealing properties making them ideal for large-scale distributed applications. DS construction is an online algorithm. The sketches are additive, i.e., the sum of two sketches is the sketch of the combined data. These properties allow data to be collected from distributed sources, compressed into a density sketch, efficiently transmitted in the sketch form to a central server, merged, and re-sampled into a synthetic database for modeling applications. Thus, density sketches can potentially revolutionize how we store, communicate, and distribute data.

Data Structures And Algorithms

Detecting and Enumerating Small Induced Subgraphs in c -Closed Graphs

Fox et al. [SIAM J. Comp. 2020] introduced a new parameter, called c -closure, for a parameterized study of clique enumeration problems. A graph G is c -closed if every pair of vertices with at least c common neighbors is adjacent. The c -closure of G is the smallest c such that G is c -closed. We systematically explore the impact of c -closure on the computational complexity of detecting and enumerating small induced subgraphs. More precisely, for each graph H on three or four vertices, we investigate parameterized polynomial-time algorithms for detecting H and for enumerating all occurrences of H in a given c -closed graph.

Data Structures And Algorithms

Deterministic Algorithms for Decremental Shortest Paths via Layered Core Decomposition

In the decremental single-source shortest paths (SSSP) problem, the input is an undirected graph G=(V,E) with n vertices and m edges undergoing edge deletions, together with a fixed source vertex s∈V . The goal is to maintain a data structure that supports shortest-path queries: given a vertex v∈V , quickly return an (approximate) shortest path from s to v . The decremental all-pairs shortest paths (APSP) problem is defined similarly, but now the shortest-path queries are allowed between any pair of vertices of V . Both problems have been studied extensively since the 80's, and algorithms with near-optimal total update time and query time have been discovered for them. Unfortunately, all these algorithms are randomized and, more importantly, they need to assume an oblivious adversary. Our first result is a deterministic algorithm for the decremental SSSP problem on weighted graphs with O( n 2+o(1) ) total update time, that supports (1+ϵ) -approximate shortest-path queries, with query time O(|P|⋅ n o(1) ) , where P is the returned path. This is the first (1+ϵ) -approximation algorithm against an adaptive adversary that supports shortest-path queries in time below O(n) , that breaks the O(mn) total update time bound of the classical algorithm of Even and Shiloah from 1981. Our second result is a deterministic algorithm for the decremental APSP problem on unweighted graphs that achieves total update time O( n 2.5+δ ) , for any constant δ>0 , supports approximate distance queries in O(loglogn) time; the algorithm achieves an O(1) -multiplicative and n o(1) -additive approximation on the path length. All previous algorithms for APSP either assume an oblivious adversary or have an Ω( n 3 ) total update time when m=Ω( n 2 ) .

Ready to get started?

Join us today

Archive Your Research