Computer Science Data Structures And Algorithms - Researchain

Featured Researches

Computing nearest neighbour interchange distances between ranked phylogenetic trees

Many popular algorithms for searching the space of leaf-labelled trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is NP-hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although ranked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to NP-hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with thousands of leaves (and likely hundreds of thousands if implemented efficiently). We also connect the problem of computing distances in our graph of ranked trees with the well-known version of this problem on unranked trees by introducing a parameter for the weight difference between move types. We propose to study a family of shortest path problems indexed by this parameter with computational complexity varying from quadratic to NP-hard.

Data Structures And Algorithms

Computing the Largest Bond and the Maximum Connected Cut of a Graph

The cut-set ∂(S) of a graph G=(V,E) is the set of edges that have one endpoint in S⊂V and the other endpoint in V∖S , and whenever G[S] is connected, the cut [S,V∖S] of G is called a connected cut. A bond of a graph G is an inclusion-wise minimal disconnecting set of G , i.e., bonds are cut-sets that determine cuts [S,V∖S] of G such that G[S] and G[V∖S] are both connected. Contrasting with a large number of studies related to maximum cuts, there exist very few results regarding the largest bond of general graphs. In this paper, we aim to reduce this gap on the complexity of computing the largest bond, and the maximum connected cut of a graph. Although cuts and bonds are similar, we remark that computing the largest bond and the maximum connected cut of a graph tends to be harder than computing its maximum cut. We show that it does not exist a constant-factor approximation algorithm to compute the largest bond, unless P = NP. Also, we show that {\sc Largest Bond} and {\sc Maximum Connected Cut} are NP-hard even for planar bipartite graphs, whereas \textsc{Maximum Cut} is trivial on bipartite graphs and polynomial-time solvable on planar graphs. In addition, we show that {\sc Largest Bond} and {\sc Maximum Connected Cut} are NP-hard on split graphs, and restricted to graphs of clique-width w they can not be solved in time f(w)× n o(w) unless the Exponential Time Hypothesis fails, but they can be solved in time f(w)× n O(w) . Finally, we show that both problems are fixed-parameter tractable when parameterized by the size of the solution, the treewidth, and the twin-cover number.

Data Structures And Algorithms

Connected Components in Undirected Set--Based Graphs. Applications in Object--Oriented Model Manipulation

This work introduces a novel algorithm for finding the connected components of a graph where the vertices and edges are grouped into sets defining a Set--Based Graph. The algorithm, under certain restrictions on those sets, has the remarkable property of achieving constant computational costs with the number of vertices and edges. The mentioned restrictions are related to the possibility of representing the sets of vertices by intension and the sets of edges using some particular type of maps. While these restrictions can result strong in a general context, they are usually satisfied in the problem of transforming connections into equations in object oriented models, which is the main application of the proposed algorithm. Besides describing the new algorithm and studying its computational cost, the work describes its prototype implementation and shows its application in different examples.

Data Structures And Algorithms

Consistent k -Median: Simpler, Better and Robust

In this paper we introduce and study the online consistent k -clustering with outliers problem, generalizing the non-outlier version of the problem studied in [Lattanzi-Vassilvitskii, ICML17]. We show that a simple local-search based online algorithm can give a bicriteria constant approximation for the problem with O( k 2 log 2 (nD)) swaps of medians (recourse) in total, where D is the diameter of the metric. When restricted to the problem without outliers, our algorithm is simpler, deterministic and gives better approximation ratio and recourse, compared to that of [Lattanzi-Vassilvitskii, ICML17].

Data Structures And Algorithms

Constant Amortized Time Enumeration of Eulerian trails

In this paper, we consider enumeration problems for edge-distinct and vertex-distinct Eulerian trails. Here, two Eulerian trails are \emph{edge-distinct} if the edge sequences are not identical, and they are \emph{vertex-distinct} if the vertex sequences are not identical. As the main result, we propose optimal enumeration algorithms for both problems, that is, these algorithm runs in O(N) total time, where N is the number of solutions. Our algorithms are based on the reverse search technique introduced by [Avis and Fukuda, DAM 1996], and the push out amortization technique introduced by [Uno, WADS 2015].

Data Structures And Algorithms

Constructing Large Matchings via Query Access to a Maximal Matching Oracle

Multi-pass streaming algorithm for Maximum Matching have been studied since more than 15 years and various algorithmic results are known today, including 2 -pass streaming algorithms that break the 1/2 -approximation barrier, and (1−ϵ) -approximation streaming algorithms that run in O(poly 1 ϵ ) passes in bipartite graphs and in O(( 1 ϵ ) 1 ϵ ) or O(poly( 1 ϵ )⋅logn) passes in general graphs, where n is the number of vertices of the input graph. However, proving impossibility results for such algorithms has so far been elusive, and, for example, even the existence of 2 -pass small space streaming algorithms with approximation factor 0.999 has not yet been ruled out. The key building block of all multi-pass streaming algorithms for Maximum Matching is the Greedy matching algorithm. Our aim is to understand the limitations of this approach: How many passes are required if the algorithm solely relies on the invocation of the Greedy algorithm? In this paper, we initiate the study of lower bounds for restricted families of multi-pass streaming algorithms for Maximum Matching. We focus on the simple yet powerful class of algorithms that in each pass run Greedy on a vertex-induced subgraph of the input graph. In bipartite graphs, we show that 3 passes are necessary and sufficient to improve on the trivial approximation factor of 1/2 : We give a lower bound of 0.6 on the approximation ratio of such algorithms, which is optimal. We further show that Ω( 1 ϵ ) passes are required for computing a (1−ϵ) -approximation, even in bipartite graphs. Last, the considered class of algorithms is not well-suited to general graphs: We show that Ω(n) passes are required in order to improve on the trivial approximation factor of 1/2 .

Data Structures And Algorithms

Constructing a Distance Sensitivity Oracle in O( n 2.5794 M) Time

We continue the study of distance sensitivity oracles (DSOs). Given a directed graph G with n vertices and edge weights in {1,2,??M} , we want to build a data structure such that given any source vertex u , any target vertex v , and any failure f (which is either a vertex or an edge), it outputs the length of the shortest path from u to v not going through f . Our main result is a DSO with preprocessing time O( n 2.5794 M) and constant query time. Previously, the best preprocessing time of DSOs for directed graphs is O( n 2.7233 M) , and even in the easier case of undirected graphs, the best preprocessing time is O( n 2.6865 M) [Ren, ESA 2020]. One drawback of our DSOs, though, is that it only supports distance queries but not path queries. Our main technical ingredient is an algorithm that computes the inverse of a degree- d polynomial matrix (i.e. a matrix whose entries are degree- d univariate polynomials) modulo x r . The algorithm is adapted from [Zhou, Labahn and Storjohann, Journal of Complexity, 2015], and we replace some of its intermediate steps with faster rectangular matrix multiplication algorithms. We also show how to compute unique shortest paths in a directed graph with edge weights in {1,2,??M} , in O( n 2.5286 M) time. This algorithm is crucial in the preprocessing algorithm of our DSO. Our solution improves the O( n 2.6865 M) time bound in [Ren, ESA 2020], and matches the current best time bound for computing all-pairs shortest paths.

Data Structures And Algorithms

Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

Graph partitioning schedules parallel calculations like sparse matrix-vector multiply (SpMV). We consider contiguous partitions, where the m rows (or columns) of a sparse matrix with N nonzeros are split into K parts without reordering. We propose the first near-linear time algorithms for several graph partitioning problems in the contiguous regime. Traditional objectives such as the simple edge cut, hyperedge cut, or hypergraph connectivity minimize the total cost of all parts under a balance constraint. Our total partitioners use O(Km+N) space. They run in O((Kmlog(m)+N)log(N)) time, a significant improvement over prior O(K( m 2 +N)) time algorithms due to Kernighan and Grandjean et. al. Bottleneck partitioning minimizes the maximum cost of any part. We propose a new bottleneck cost which reflects the sum of communication and computation on each part. Our bottleneck partitioners use linear space. The exact algorithm runs in linear time when K 2 is O( N C ) for C<1 . Our (1+ϵ) -approximate algorithm runs in linear time when Klog( c high /( c low ϵ)) is O( N C ) for C<1 , where c high and c low are upper and lower bounds on the optimal cost. We also propose a simpler (1+ϵ) -approximate algorithm which runs in a factor of log( c high /( c low ϵ)) from linear time. We empirically demonstrate that our algorithms efficiently produce high-quality contiguous partitions on a test suite of 42 test matrices. When K=8 , our hypergraph connectivity partitioner achieved a speedup of 53× (mean 15.1× ) over prior algorithms. The mean runtime of our bottleneck partitioner was 5.15 SpMVs.

Data Structures And Algorithms

Convergence of Gibbs Sampling: Coordinate Hit-and-Run Mixes Fast

The Gibbs Sampler is a general method for sampling high-dimensional distributions, dating back to Turchin, 1971. In each step of the Gibbs Sampler, we pick a random coordinate and re-sample that coordinate from the distribution induced by fixing all other coordinates. While it has become widely used over the past half-century, guarantees of efficient convergence have been elusive. We show that for a convex body K in R n with diameter D , the mixing time of the Coordinate Hit-and-Run (CHAR) algorithm on K is polynomial in n and D . We also give a lower bound on the conductance of CHAR, showing that it is strictly worse than hit-and-run or the ball walk in the worst case.

Data Structures And Algorithms

Coordinate Methods for Matrix Games

We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form min x∈X max y∈Y y ⊤ Ax which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample complexity. We obtain nearly-constant per-iteration complexity by designing efficient data structures leveraging Taylor approximations to the exponential and a binomial heap. We improve sample complexity via low-variance gradient estimators using dynamic sampling distributions that depend on both the iterates and the magnitude of the matrix entries. Our runtime bounds improve upon those of existing primal-dual methods by a factor depending on sparsity measures of the m by n matrix A . For example, when rows and columns have constant ℓ 1 / ℓ 2 norm ratios, we offer improvements by a factor of m+n in the fully stochastic setting and m+n − − − − − √ in the variance-reduced setting. We apply our methods to computational geometry problems, i.e. minimum enclosing ball, maximum inscribed ball, and linear regression, and obtain improved complexity bounds. For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of nnz(A)/(m+n) − − − − − − − − − − − − − √ .

Ready to get started?

Join us today

Archive Your Research