Mathias Bæk Tejs Knudsen

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mathias Bæk Tejs Knudsen is active.

Explore More

Publication

Featured researches published by Mathias Bæk Tejs Knudsen.

scandinavian workshop on algorithm theory | 2014

Additive Spanners: A Simple Construction

Mathias Bæk Tejs Knudsen

We consider additive spanners of unweighted undirected graphs. Let G be a graph and H a subgraph of G. The most naive way to construct an additive k-spanner of G is the following: As long as H is not an additive k-spanner repeat: Find a pair (u,v) ∈ H that violates the spanner-condition and a shortest path from u to v in G. Add the edges of this path to H.

combinatorial pattern matching | 2015

Longest Common Extensions in Sublinear Space

Philip Bille; Inge Li Gørtz; Mathias Bæk Tejs Knudsen; Moshe Lewenstein; Hjalte Wedel Vildhøj

The longest common extension problem (LCE problem) is to construct a data structure for an input string \(T\) of length \(n\) that supports \({\mathrm {LCE}}(i,j)\) queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions \(i\) and \(j\) in \(T\). This classic problem has a well-known solution that uses \(\mathcal {O}(n)\) space and \(\mathcal {O}(1)\) query time. In this paper we show that for any trade-off parameter \(1 \le \tau \le n\), the problem can be solved in \(\mathcal {O}(\frac{n}{\tau })\) space and \(\mathcal {O}(\tau )\) query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.

foundations of computer science | 2017

Fast Similarity Sketching

Søren Dahlgaard; Mathias Bæk Tejs Knudsen; Mikkel Thorup

We consider the Similarity Sketching problem: Given a universe [u] = {0,..., u-1} we want a random function S mapping subsets A of [u] into vectors S(A) of size t, such that similarity is preserved. More precisely: Given subsets A,B of [u], define X_i = [S(A)[i] = S(B)[i]] and X = sum_{i in [t]} X_i. We want to have E[X] = t*J(A,B), where J(A,B) = |A intersect B|/|A union B| and furthermore to have strong concentration guarantees (i.e. Chernoff-style bounds) for X. This is a fundamental problem which has found numerous applications in data mining, large-scale classification, computer vision, similarity search, etc. via the classic MinHash algorithm. The vectors S(A) are also called sketches. The seminal t x MinHash algorithm uses t random hash functions h_1,..., h_t, and stores (min_{a in A} h_1(A),..., min_{a in A} h_t(A)) as the sketch of A. The main drawback of MinHash is, however, its O(t*|A|) running time, and finding a sketch with similar properties and faster running time has been the subject of several papers. Addressing this, Li et al. [NIPS12] introduced one permutation hashing (OPH), which creates a sketch of size t in O(t + |A|) time, but with the drawback that possibly some of the t entries are empty when |A| = O(t). One could argue that sketching is not necessary in this case, however the desire in most applications is to have one sketching procedure that works for sets of all sizes. Therefore, filling out these empty entries is the subject of several follow-up papers initiated by Shrivastava and Li [ICML14]. However, these densification schemes fail to provide good concentration bounds exactly in the case |A| = O(t), where they are needed. In this paper we present a new sketch which obtains essentially the best of both worlds. That is, a fast O(t log t + |A|) expected running time while getting the same strong concentration bounds as MinHash. Our new sketch can be seen as a mix between sampling with replacement and sampling without replacement. We demonstrate the power of our new sketch by considering popular applications in large-scale classification with linear SVM as introduced by Li et al. [NIPS11] as well as approximate similarity search using the LSH framework of Indyk and Motwani [STOC98]. In particular, for the j_1, j_2-approximate similarity search problem on a collection of n sets we obtain a data-structure with space usage O(n^{1+rho} + sum_{A in C} |A|) and O(n^rho * log n + |Q|) expected time for querying a set Q compared to a O(n^rho * log n * |Q|) expected query time of the classic result of Indyk and Motwani.

european symposium on algorithms | 2015

Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

Mathias Bæk Tejs Knudsen; Morten Stöckel

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how “random” a hash function or a random number generator is, is its independence: a sequence of random variables is said to be k-independent if every variable is uniform and every size k subset is independent.

international symposium on algorithms and computation | 2014

Dynamic and Multi-Functional Labeling Schemes

Søren Dahlgaard; Mathias Bæk Tejs Knudsen; Noy Rotbart

We investigate labeling schemes supporting adjacency, ancestry, sibling,and connectivity queries in forests. In the course of more than 20 years, the existence of \(\log n + O(\log \log n)\) labeling schemes supporting each of these functions was proven, with the most recent being ancestry [Fraigniaud and Korman, STOC ’10]. Several multi-functional labeling schemes also enjoy lower or upper bounds of \(\log n + \Omega (\log \log n)\) or \(\log n + O(\log \log n)\) respectively. Notably an upper bound of \(\log n + 2\log \log n\) for adjacency+siblings and a lower bound of \(\log n + \log \log n\) for each of the functions siblings, ancestry, and connectivity [Alstrup et al., SODA ’03]. We improve the constants hidden in the \(O\)-notation, where our main technical contribution is a \(\log n+ 2 \log \log n\) lower bound for connectivity +ancestry and connectivity+siblings.

string processing and information retrieval | 2016

Maximal Unbordered Factors of Random Strings

Patrick Hagge Cording; Mathias Bæk Tejs Knudsen

A border of a string is a non-empty prefix of the string that is also a suffix of the string, and a string is unbordered if it has no border. Loptev, Kucherov, and Starikovskaya [CPM 2015] conjectured the following: If we pick a string of length n from a fixed alphabet uniformly at random, then the expected length of the maximal unbordered factor is \(n - O(1)\). We prove that this conjecture is true by proving that the expected value is in fact \(n - \varTheta (\sigma ^{-1})\), where \(\sigma \) is the size of the alphabet. We discuss some of the consequences of this theorem.

foundations of computer science | 2015