Computer Science Data Structures And Algorithms - Researchain

Featured Researches

Improved approximation schemes for early work scheduling on identical parallel machines with common due date

We study the early work scheduling problem on identical parallel machines in order to maximize the total early work, i.e., the parts of non-preemptive jobs executed before a common due date. By preprocessing and constructing an auxiliary instance which has several good properties, we propose an efficient polynomial time approximation scheme with running time O(n) , which improves the result in [Györgyi, P., Kis, T. (2020). A common approximation framework for early work, late work, and resource leveling problems. {\it European Journal of Operational Research}, 286(1), 129-137], and a fully polynomial time approximation scheme with running time O(n) when the number of machines is a fixed number, which improves the result in [Chen, X., Liang, Y., Sterna, M., Wang, W., Błażewicz, J. (2020b). Fully polynomial time approximation scheme to maximize early work on parallel machines with common due date. {\it European Journal of Operational Research}, 284(1), 67-74], where n is the number of jobs, and the hidden constant depends on the desired accuracy.

Data Structures And Algorithms

Improved lower and upper bounds on the tile complexity of uniquely self-assembling a thin rectangle non-cooperatively in 3D

We investigate a fundamental question regarding a benchmark class of shapes in one of the simplest, yet most widely utilized abstract models of algorithmic tile self-assembly. Specifically, we study the directed tile complexity of a k×N thin rectangle in Winfree's abstract Tile Assembly Model, assuming that cooperative binding cannot be enforced (temperature-1 self-assembly) and that tiles are allowed to be placed at most one step into the third dimension (just-barely 3D). While the directed tile complexities of a square and a scaled-up version of any algorithmically specified shape at temperature 1 in just-barely 3D are both asymptotically the same as they are (respectively) at temperature 2 in 2D, the bounds on the directed tile complexity of a thin rectangle at temperature 2 in 2D are not known to hold at temperature 1 in just-barely 3D. Motivated by this discrepancy, we establish new lower and upper bounds on the directed tile complexity of a thin rectangle at temperature 1 in just-barely 3D. We develop a new, more powerful type of Window Movie Lemma that lets us upper bound the number of "sufficiently similar" ways to assign glues to a set of fixed locations. Consequently, our lower bound, Ω( N 1 k ) , is an asymptotic improvement over the previous best lower bound and is more aesthetically pleasing since it eliminates the k that used to divide N 1 k . The proof of our upper bound is based on a just-barely 3D, temperature-1 counter, organized according to "digit regions", which affords it roughly fifty percent more digits for the same target rectangle compared to the previous best counter. This increase in digit density results in an upper bound of O ⎛ ⎝ N 1 ⌊ k 2 ⌋ +logN ⎞ ⎠ , that is an asymptotic improvement over the previous best upper bound and roughly the square of our lower bound.

Data Structures And Algorithms

Improving Run Length Encoding by Preprocessing

The Run Length Encoding (RLE) compression method is a long standing simple lossless compression scheme which is easy to implement and achieves a good compression on input data which contains repeating consecutive symbols. In its pure form RLE is not applicable on natural text or other input data with short sequences of identical symbols. We present a combination of preprocessing steps that turn arbitrary input data in a byte-wise encoding into a bit-string which is highly suitable for RLE compression. The main idea is to first read all most significant bits of the input byte-string, followed by the second most significant bit, and so on. We combine this approach by a dynamic byte remapping as well as a Burrows-Wheeler-Scott transform on a byte level. Finally, we apply a Huffman Encoding on the output of the bit-wise RLE encoding to allow for more dynamic lengths of code words encoding runs of the RLE. With our technique we can achieve a lossless average compression which is better than the standard RLE compression by a factor of 8 on average.

Data Structures And Algorithms

Inapproximability of Diameter in super-linear time: Beyond the 5/3 ratio

We show, assuming the Strong Exponential Time Hypothesis, that for every ε>0 , approximating directed Diameter on m -arc graphs within ratio 7/4−ε requires m 4/3−o(1) time. Our construction uses nonnegative edge weights but even holds for sparse digraphs, i.e., for which the number of vertices n and the number of arcs m satisfy m=n log O(1) n . This is the first result that conditionally rules out a near-linear time 5/3 -approximation for Diameter.

Data Structures And Algorithms

Inference under Information Constraints III: Local Privacy Constraints

We study goodness-of-fit and independence testing of discrete distributions in a setting where samples are distributed across multiple users. The users wish to preserve the privacy of their data while enabling a central server to perform the tests. Under the notion of local differential privacy, we propose simple, sample-optimal, and communication-efficient protocols for these two questions in the noninteractive setting, where in addition users may or may not share a common random seed. In particular, we show that the availability of shared (public) randomness greatly reduces the sample complexity. Underlying our public-coin protocols are privacy-preserving mappings which, when applied to the samples, minimally contract the distance between their respective probability distributions.

Data Structures And Algorithms

Information Theoretic Limits of Cardinality Estimation: Fisher Meets Shannon

Estimating the cardinality (number of distinct elements) of a large multiset is a classic problem in streaming and sketching, dating back to Flajolet and Martin's classic Probabilistic Counting (PCSA) algorithm from 1983. In this paper we study the intrinsic tradeoff between the space complexity of the sketch and its estimation error in the random oracle model. We define a new measure of efficiency for cardinality estimators called the Fisher-Shannon (Fish) number H/I . It captures the tension between the limiting Shannon entropy ( H ) of the sketch and its normalized Fisher information ( I ), which characterizes the variance of a statistically efficient, asymptotically unbiased estimator. Our results are as follows. We prove that all base- q variants of Flajolet and Martin's PCSA sketch have Fish-number H 0 / I 0 ≈1.98016 and that every base- q variant of (Hyper)LogLog has Fish-number worse than H 0 / I 0 , but that they tend to H 0 / I 0 in the limit as q→∞ . Here H 0 , I 0 are precisely defined constants. We describe a sketch called Fishmonger that is based on a smoothed, entropy-compressed variant of PCSA with a different estimator function. It is proved that with high probability, Fishmonger processes a multiset of [U] such that at all times, its space is O( log 2 logU)+(1+o(1))( H 0 / I 0 )b≈1.98b bits and its standard error is 1/ b √ . We give circumstantial evidence that H 0 / I 0 is the optimum Fish-number of mergeable sketches for Cardinality Estimation. We define a class of linearizable sketches and prove that no member of this class can beat H 0 / I 0 . The popular mergeable sketches are, in fact, also linearizable.

Data Structures And Algorithms

Inset Edges Effect and Average Distance of Trees

An added edge to a graph is called an inset edge. Predicting k inset edges which minimize the average distance of a graph is known to be NP-Hard. When k = 1 the complexity of the problem is polynomial. In this paper, we further find the single inset edge(s) of a tree with the closest change on the average distance to a given input. To do that we may require the effect of each inset edge for the set of inset edges. For this, we propose an algorithm with the time complexity between O(m) and O(m/m) and an average of less than O( m.log(m)), where m stands for the number of possible inset edges. Then it takes up to O(log(m)) to find the target inset edges for a custom change on the average distance. Using theoretical tools, the algorithm strictly avoids recalculating the distances with no changes, after adding a new edge to a tree. Then reduces the time complexity of calculating remaining distances using some matrix tools which first introduced in [8] with one additional technique. This gives us a dynamic time complexity and absolutely depends on the input tree which is proportion to the Wiener index of the input tree.

Data Structures And Algorithms

Interactive Inference under Information Constraints

We study the role of interactivity in distributed statistical inference under information constraints, e.g., communication constraints and local differential privacy. We focus on the tasks of goodness-of-fit testing and estimation of discrete distributions. From prior work, these tasks are well understood under noninteractive protocols. Extending these approaches directly for interactive protocols is difficult due to correlations that can build due to interactivity; in fact, gaps can be found in prior claims of tight bounds of distribution estimation using interactive protocols. We propose a new approach to handle this correlation and establish a unified method to establish lower bounds for both tasks. As an application, we obtain optimal bounds for both estimation and testing under local differential privacy and communication constraints. We also provide an example of a natural testing problem where interactivity helps.

Data Structures And Algorithms

Internal Quasiperiod Queries

Internal pattern matching requires one to answer queries about factors of a given string. Many results are known on answering internal period queries, asking for the periods of a given factor. In this paper we investigate (for the first time) internal queries asking for covers (also known as quasiperiods) of a given factor. We propose a data structure that answers such queries in O(lognloglogn) time for the shortest cover and in O(logn(loglogn ) 2 ) time for a representation of all the covers, after O(nlogn) time and space preprocessing.

Data Structures And Algorithms

Jiffy: A Lock-free Skip List with Batch Updates and Snapshots

In this paper we introduce Jiffy, the first lock-free, linearizable ordered key-value index that offers both (1) batch updates, which are put and remove operations that are executed atomically, and (2) consistent snapshots used by, e.g., range scan operations. Jiffy is built as a multiversioned lock-free skip list and relies on CPU's Time Stamp Counter register to generate version numbers at minimal cost. For faster skip list traversals and better utilization of the CPU caches, key-value entries are grouped into immutable objects called revisions. Moreover, by changing the size of revisions and thus modifying the synchronization granularity, our index can adapt to varying contentions levels (smaller revisions are more suited for write-heavy workloads whereas large revisions benefit read-dominated workloads, especially when they feature many range scan operations). Structure modifications to the index, which result in changing the size of revisions, happen through (lock-free) skip list node split and merge operations that are carefully coordinated with the update operations. Despite rich semantics, Jiffy offers highly scalable performance, which is comparable or exceeds the performance of the state-of-the-art lock-free ordered indices that feature linearizable range scan operations. Compared to its (lock-based) rivals that also support batch updates, Jiffy can execute large batch updates up to 7.4x more efficiently.

Ready to get started?

Join us today

Archive Your Research