Computer Science Data Structures And Algorithms - Researchain

Featured Researches

A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

The hierarchical overlap graph (HOG) is a graph that encodes overlaps from a given set P of n strings, as the overlap graph does. A best known algorithm constructs HOG in O(||P|| log n) time and O(||P||) space, where ||P|| is the sum of lengths of the strings in P. In this paper we present a new algorithm to construct HOG in O(||P||) time and space. Hence, the construction time and space of HOG are better than those of the overlap graph, which are O(||P|| + n^2).

Data Structures And Algorithms

A Neighborhood-preserving Graph Summarization

We introduce in this paper a new summarization method for large graphs. Our summarization approach retains only a user-specified proportion of the neighbors of each node in the graph. Our main aim is to simplify large graphs so that they can be analyzed and processed effectively while preserving as many of the node neighborhood properties as possible. Since many graph algorithms are based on the neighborhood information available for each node, the idea is to produce a smaller graph which can be used to allow these algorithms to handle large graphs and run faster while providing good approximations. Moreover, our compression allows users to control the size of the compressed graph by adjusting the amount of information loss that can be tolerated. The experiments conducted on various real and synthetic graphs show that our compression reduces considerably the size of the graphs. Moreover, we conducted several experiments on the obtained summaries using various graph algorithms and applications, such as node embedding, graph classification and shortest path approximations. The obtained results show interesting trade-offs between the algorithms runtime speed-up and the precision loss.

Data Structures And Algorithms

A New Approach to Capacity Scaling Augmented With Unreliable Machine Learning Predictions

Modern data centers suffer from immense power consumption. The erratic behavior of internet traffic forces data centers to maintain excess capacity in the form of idle servers in case the workload suddenly increases. As an idle server still consumes a significant fraction of the peak energy, data center operators have heavily invested in capacity scaling solutions. In simple terms, these aim to deactivate servers if the demand is low and to activate them again when the workload increases. To do so, an algorithm needs to strike a delicate balance between power consumption, flow-time, and switching costs. Over the last decade, the research community has developed competitive online algorithms with worst-case guarantees. In the presence of historic data patterns, prescription from Machine Learning (ML) predictions typically outperform such competitive algorithms. This, however, comes at the cost of sacrificing the robustness of performance, since unpredictable surges in the workload are not uncommon. The current work builds on the emerging paradigm of augmenting unreliable ML predictions with online algorithms to develop novel robust algorithms that enjoy the benefits of both worlds. We analyze a continuous-time model for capacity scaling, where the goal is to minimize the weighted sum of flow-time, switching cost, and power consumption in an online fashion. We propose a novel algorithm, called Adaptive Balanced Capacity Scaling (ABCS), that has access to black-box ML predictions, but is completely oblivious to the accuracy of these predictions. In particular, if the predictions turn out to be accurate in hindsight, we prove that ABCS is (1+ε) -competitive. Moreover, even when the predictions are inaccurate, ABCS guarantees a bounded competitive ratio. The performance of the ABCS algorithm on a real-world dataset positively support the theoretical results.

Data Structures And Algorithms

A Normal Sequence Compressed by PPM ∗ but not by Lempel-Ziv 78

In this paper we compare the difference in performance of two of the Prediction by Partial Matching (PPM) family of compressors (PPM ∗ and the original Bounded PPM algorithm) and the Lempel-Ziv 78 (LZ) algorithm. We construct an infinite binary sequence whose worst-case compression ratio for PPM ∗ is 0 , while Bounded PPM's and LZ's best-case compression ratios are at least 1/2 and 1 respectively. This sequence is an enumeration of all binary strings in order of length, i.e. all strings of length 1 followed by all strings of length 2 and so on. It is therefore normal, and is built using repetitions of de Bruijn strings of increasing order

Data Structures And Algorithms

A Note on a Recent Algorithm for Minimum Cut

Given an undirected edge-weighted graph G=(V,E) with m edges and n vertices, the minimum cut problem asks to find a subset of vertices S such that the total weight of all edges between S and V∖S is minimized. Karger's longstanding O(m log 3 n) time randomized algorithm for this problem was very recently improved in two independent works to O(m log 2 n) [ICALP'20] and to O(m log 2 n+n log 5 n) [STOC'20]. These two algorithms use different approaches and techniques. In particular, while the former is faster, the latter has the advantage that it can be used to obtain efficient algorithms in the cut-query and in the streaming models of computation. In this paper, we show how to simplify and improve the algorithm of [STOC'20] to O(m log 2 n+n log 3 n) . We obtain this by replacing a randomized algorithm that, given a spanning tree T of G , finds in O(mlogn+n log 4 n) time a minimum cut of G that 2-respects (cuts two edges of) T with a simple O(mlogn+n log 2 n) time deterministic algorithm for the same problem.

Data Structures And Algorithms

A Novel Method for Inference of Acyclic Chemical Compounds with Bounded Branch-height Based on Artificial Neural Networks and Integer Programming

Analysis of chemical graphs is a major research topic in computational molecular biology due to its potential applications to drug design. One approach is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a framework has been proposed for inverse QSAR/QSPR using artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, a feature vector f(G) of a chemical graph G is introduced and a prediction function ψ on a chemical property π is constructed with an ANN. In the second phase, given a target value y ∗ of property π , a feature vector x ∗ is inferred by solving an MILP formulated from the trained ANN so that ψ( x ∗ ) is close to y ∗ and then a set of chemical structures G ∗ such that f( G ∗ )= x ∗ is enumerated by a graph search algorithm. The framework has been applied to the case of chemical compounds with cycle index up to 2. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector x ∗ can be inferred for up to around n=40 whereas graphs G ∗ can be enumerated for up to n=15 . When applied to the case of chemical acyclic graphs, the maximum computable diameter of G ∗ was around up to around 8. We introduce a new characterization of graph structure, "branch-height," based on which an MILP formulation and a graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using properties such as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs G ∗ with n=50 and diameter 30.

Data Structures And Algorithms

A Potential Reduction Inspired Algorithm for Exact Max Flow in Almost O ˜ ( m 4/3 ) Time

We present an algorithm for computing s - t maximum flows in directed graphs in O ˜ ( m 4/3+o(1) U 1/3 ) time. Our algorithm is inspired by potential reduction interior point methods for linear programming. Instead of using scaled gradient/Newton steps of a potential function, we take the step which maximizes the decrease in the potential value subject to advancing a certain amount on the central path, which can be efficiently computed. This allows us to trace the central path with our progress depending only ℓ ∞ norm bounds on the congestion vector (as opposed to the ℓ 4 norm required by previous works) and runs in O( m − − √ ) iterations. To improve the number of iterations by establishing tighter bounds on the ℓ ∞ norm, we then consider the weighted central path framework of Madry \cite{M13,M16,CMSV17} and Liu-Sidford \cite{LS20}. Instead of changing weights to maximize energy, we consider finding weights which maximize the maximum decrease in potential value. Finally, similar to finding weights which maximize energy as done in \cite{LS20} this problem can be solved by the iterative refinement framework for smoothed ℓ 2 - ℓ p norm flow problems \cite{KPSW19} completing our algorithm. We believe our potential reduction based viewpoint provides a versatile framework which may lead to faster algorithms for max flow.

Data Structures And Algorithms

A Refined Analysis of Submodular Greedy

Many algorithms for maximizing a monotone submodular function subject to a knapsack constraint rely on the natural greedy heuristic. We present a novel refined analysis of this greedy heuristic which enables us to: (1) reduce the enumeration in the tight (1??e ?? ) -approximation of [Sviridenko 04] from subsets of size three to two; (2) present an improved upper bound of 0.42945 for the classic algorithm which returns the better between a single element and the output of the greedy heuristic.

Data Structures And Algorithms

A Simple Deterministic Algorithm for Edge Connectivity

We show a deterministic algorithm for computing edge connectivity of a simple graph with m edges in m 1+o(1) time. Although the fastest deterministic algorithm by Henzinger, Rao, and Wang [SODA'17] has a faster running time of O(m log 2 mloglogm) , we believe that our algorithm is conceptually simpler. The key tool for this simplication is the expander decomposition. We exploit it in a very straightforward way compared to how it has been previously used in the literature.

Data Structures And Algorithms

A Simple Sublinear Algorithm for Gap Edit Distance

We study the problem of estimating the edit distance between two n -character strings. While exact computation in the worst case is believed to require near-quadratic time, previous work showed that in certain regimes it is possible to solve the following {\em gap edit distance} problem in sub-linear time: distinguish between inputs of distance ≤k and > k 2 . Our main result is a very simple algorithm for this benchmark that runs in time O ~ (n/ k − − √ ) , and in particular settles the open problem of obtaining a truly sublinear time for the entire range of relevant k . Building on the same framework, we also obtain a k -vs- k 2 algorithm for the one-sided preprocessing model with O ~ (n) preprocessing time and O ~ (n/k) query time (improving over a recent O ~ (n/k+ k 2 ) -query time algorithm for the same problem [GRS'20].

Ready to get started?

Join us today

Archive Your Research