Computer Science Data Structures And Algorithms - Researchain

Featured Researches

Locality in Online Algorithms

Online algorithms make decisions based on past inputs. In general, the decision may depend on the entire history of inputs. If many computers run the same online algorithm with the same input stream but are started at different times, they do not necessarily make consistent decisions. In this work we introduce time-local online algorithms. These are online algorithms where the output at a given time only depends on T=O(1) latest inputs. The use of (deterministic) time-local algorithms in a distributed setting automatically leads to globally consistent decisions. Our key observation is that time-local online algorithms (in which the output at a given time only depends on local inputs in the temporal dimension) are closely connected to local distributed graph algorithms (in which the output of a given node only depends on local inputs in the spatial dimension). This makes it possible to interpret prior work on distributed graph algorithms from the perspective of online algorithms. We describe an algorithm synthesis method that one can use to design optimal time-local online algorithms for small values of T . We demonstrate the power of the technique in the context of a variant of the online file migration problem, and show that e.g. for two nodes and unit migration costs there exists a 3 -competitive time-local algorithm with horizon T=4 , while no deterministic online algorithm (in the classic sense) can do better. We also derive upper and lower bounds for a more general version of the problem; we show that there is a 6 -competitive deterministic time-local algorithm and a 2.62 -competitive randomized time-local algorithm for any migration cost α?? .

Data Structures And Algorithms

Localized Topological Simplification of Scalar Data

This paper describes a localized algorithm for the topological simplification of scalar data, an essential pre-processing step of topological data analysis (TDA). Given a scalar field f and a selection of extrema to preserve, the proposed localized topological simplification (LTS) derives a function g that is close to f and only exhibits the selected set of extrema. Specifically, sub- and superlevel set components associated with undesired extrema are first locally flattened and then correctly embedded into the global scalar field, such that these regions are guaranteed -- from a combinatorial perspective -- to no longer contain any undesired extrema. In contrast to previous global approaches, LTS only and independently processes regions of the domain that actually need to be simplified, which already results in a noticeable speedup. Moreover, due to the localized nature of the algorithm, LTS can utilize shared-memory parallelism to simplify regions simultaneously with a high parallel efficiency (70%). Hence, LTS significantly improves interactivity for the exploration of simplification parameters and their effect on subsequent topological analysis. For such exploration tasks, LTS brings the overall execution time of a plethora of TDA pipelines from minutes down to seconds, with an average observed speedup over state-of-the-art techniques of up to x36. Furthermore, in the special case where preserved extrema are selected based on topological persistence, an adapted version of LTS partially computes the persistence diagram and simultaneously simplifies features below a predefined persistence threshold. The effectiveness of LTS, its parallel efficiency, and its resulting benefits for TDA are demonstrated on several simulated and acquired datasets from different application domains, including physics, chemistry, and biomedical imaging.

Data Structures And Algorithms

Longest Common Subsequence in Sublinear Space

We present the first o(n) -space polynomial-time algorithm for computing the length of a longest common subsequence. Given two strings of length n , the algorithm runs in O( n 3 ) time with O( n log 1.5 n 2 logn √ ) bits of space.

Data Structures And Algorithms

Low-Congestion Shortcuts for Graphs Excluding Dense Minors

We prove that any n -node graph G with diameter D admits shortcuts with congestion O(δDlogn) and dilation O(δD) , where δ is the maximum edge-density of any minor of G . Our proof is simple, elementary, and constructive - featuring a Θ ~ (δD) -round distributed construction algorithm. Our results are tight up to O ~ (1) factors and generalize, simplify, unify, and strengthen several prior results. For example, for graphs excluding a fixed minor, i.e., graphs with constant δ , only a O ~ ( D 2 ) bound was known based on a very technical proof that relies on the Robertson-Seymour Graph Structure Theorem. A direct consequence of our result is that many graph families, including any minor-excluded ones, have near-optimal Θ ~ (D) -round distributed algorithms for many fundamental communication primitives and optimization problems including minimum spanning tree, minimum cut, and shortest-path approximations.

Data Structures And Algorithms

Low-Depth Parallel Algorithms for the Binary-Forking Model without Atomics

The binary-forking model is a parallel computation model, formally defined by Blelloch et al. very recently, in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Θ(logn) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. Though the binary-forking model allows the use of atomic {\em test-and-set} (TS) instructions to reduce some synchronization overhead, assuming the availability of such instructions puts a stronger requirement on the hardware and may limit the portability of the algorithms using them. In this paper, we avoid the use of locks and atomic instructions in our algorithms except possibly inside the join operation which is implemented by the runtime system. In this paper, we design efficient parallel algorithms in the binary-forking model without atomics for three fundamental problems: Strassen's (and Strassen-like) matrix multiplication (MM), comparison-based sorting, and the Fast Fourier Transform (FFT). All our results improve over known results for the corresponding problem in the binary-forking model both with and without atomics.

Data Structures And Algorithms

MSPP: A Highly Efficient and Scalable Algorithm for Mining Similar Pairs of Points

The closest pair of points problem or closest pair problem (CPP) is an important problem in computational geometry where we have to find a pair of points from a set of points in metric space with the smallest distance between them. This problem arises in a number of applications, such as but not limited to clustering, graph partitioning, image processing, patterns identification, and intrusion detection. For example, in air-traffic control, we must monitor aircrafts that come too close together, since this may potentially indicate a possible collision. Numerous algorithms have been presented for solving the CPP. The algorithms that are employed in practice have a worst case quadratic run time complexity. In this article we present an elegant approximation algorithm for the CPP called MSPP: Mining Similar Pairs of Points. It is faster than currently best known algorithms while maintaining a very good accuracy. The proposed algorithm also detects a set of closely similar pairs of points in Euclidean and Pearson metric spaces and can be adapted in numerous real world applications, such as clustering, dimension reduction, constructing and analyzing gene/transcript co-expression network, among others.

Data Structures And Algorithms

Maximizing Agreements for Ranking, Clustering and Hierarchical Clustering via MAX-CUT

In this paper, we study a number of well-known combinatorial optimization problems that fit in the following paradigm: the input is a collection of (potentially inconsistent) local relationships between the elements of a ground set (e.g., pairwise comparisons, similar/dissimilar pairs, or ancestry structure of triples of points), and the goal is to aggregate this information into a global structure (e.g., a ranking, a clustering, or a hierarchical clustering) in a way that maximizes agreement with the input. Well-studied problems such as rank aggregation, correlation clustering, and hierarchical clustering with triplet constraints fall in this class of problems. We study these problems on stochastic instances with a hidden embedded ground truth solution. Our main algorithmic contribution is a unified technique that uses the maximum cut problem in graphs to approximately solve these problems. Using this technique, we can often get approximation guarantees in the stochastic setting that are better than the known worst case inapproximability bounds for the corresponding problem. On the negative side, we improve the worst case inapproximability bound on several hierarchical clustering formulations through a reduction to related ranking problems.

Data Structures And Algorithms

Maximizing approximately k-submodular functions

We introduce the problem of maximizing approximately k -submodular functions subject to size constraints. In this problem, one seeks to select k -disjoint subsets of a ground set with bounded total size or individual sizes, and maximum utility, given by a function that is "close" to being k -submodular. The problem finds applications in tasks such as sensor placement, where one wishes to install k types of sensors whose measurements are noisy, and influence maximization, where one seeks to advertise k topics to users of a social network whose level of influence is uncertain. To deal with the problem, we first provide two natural definitions for approximately k -submodular functions and establish a hierarchical relationship between them. Next, we show that simple greedy algorithms offer approximation guarantees for different types of size constraints. Last, we demonstrate experimentally that the greedy algorithms are effective in sensor placement and influence maximization problems.

Data Structures And Algorithms

Maximum Coverage in the Data Stream Model: Parameterized and Generalized

We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k?�[m] . In Max-Cover, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: If the sets have size at most d , there exist single-pass algorithms using O ~ ( d d+1 k d ) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d . If each element appears in at most r sets, we present single pass algorithms using O ~ ( k 2 r/ ϵ 3 ) space that return a 1+ϵ approximation in the case of Max-Cover. We also present a single-pass algorithm using slightly more memory, i.e., O ~ ( k 3 r/ ϵ 4 ) space, that 1+ϵ approximates Max-Unique-Cover. In contrast to the above results, when d and r are arbitrary, any constant pass 1+ϵ approximation algorithm for either problem requires Ω( ϵ ?? m) space but a single pass O( ϵ ?? mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e??) and e 1??/k for Max-Cover and Max-Unique-Cover respectively requires Ω(m/ k 2 ) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set-Cover problem.

Data Structures And Algorithms

Maximum Weight Disjoint Paths in Outerplanar Graphs via Single-Tree Cut Approximators

Since 1997 there has been a steady stream of advances for the maximum disjoint paths problem. Achieving tractable results has usually required focusing on relaxations such as: (i) to allow some bounded edge congestion in solutions, (ii) to only consider the unit weight (cardinality) setting, (iii) to only require fractional routability of the selected demands (the all-or-nothing flow setting). For the general form (no congestion, general weights, integral routing) of edge-disjoint paths ({\sc edp}) even the case of unit capacity trees which are stars generalizes the maximum matching problem for which Edmonds provided an exact algorithm. For general capacitated trees, Garg, Vazirani, Yannakakis showed the problem is APX-Hard and Chekuri, Mydlarz, Shepherd provided a 4 -approximation. This is essentially the only setting where a constant approximation is known for the general form of \textsc{edp}. We extend their result by giving a constant-factor approximation algorithm for general-form \textsc{edp} in outerplanar graphs. A key component for the algorithm is to find a {\em single-tree} O(1) cut approximator for outerplanar graphs. Previously O(1) cut approximators were only known via distributions on trees and these were based implicitly on the results of Gupta, Newman, Rabinovich and Sinclair for distance tree embeddings combined with results of Anderson and Feige.

Ready to get started?

Join us today

Archive Your Research