Zsuzsanna Lipták | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zsuzsanna Lipták is active.

Explore More

Publication

Featured researches published by Zsuzsanna Lipták.

International Journal of Foundations of Computer Science | 2012

ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS

Péter Burcsi; Ferdinando Cicalese; Gabriele Fici; Zsuzsanna Lipták

The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1, …, aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1, …, pσ), where pi = |{j | sj = ai}|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time. The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of , where m = ∑i qi is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to , i.e., by a factor of log m.

fun with algorithms | 2012

On Approximate Jumbled Pattern Matching in Strings

Péter Burcsi; Ferdinando Cicalese; Gabriele Fici; Zsuzsanna Lipták

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t)=q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u,v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u≤q≤v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem in sub-linear expected time using a wavelet tree of s, which can be computed in time O(n) in a preprocessing phase. We then discuss a Scrabble-like variation of the problem, in which a weight function on the letters of s is given and one has to find all occurrences in s of a substring t with maximum weight having Parikh vector p(t)≤v. For the case of a binary alphabet, we present an algorithm which solves the decision version of the Approximate Jumbled Pattern Matching problem in constant time, by indexing the string in subquadratic time.

workshop on algorithms in bioinformatics | 2006

Decomposing metabolomic isotope patterns

Sebastian Böcker; Matthias C. Letzel; Zsuzsanna Lipták; Anton Pervukhin

We present a method for determining the sum formula of metabolites solely from their mass and isotope pattern. Metabolites, such as sugars or lipids, participate in almost all cellular processes, but the majority still remains uncharacterized. Our input is a measured isotope pattern from a high resolution mass spectrometer, and we want to find those molecules that best match this pattern. Determination of the sum formula is a crucial step in the identification of an unknown metabolite, as it reduces its possible structures to a hopefully manageable set. Our method is computationally efficient, and first results on experimental data indicate good identification rates for chemical compounds up to 700 Dalton. Above 1000 Dalton, the number of molecules with a certain mass increases rapidly. To efficiently analyze mass spectra of such molecules, we define several additive invariants extracted from the input and then propose to solve a joint decomposition problem.

Discrete Applied Mathematics | 2004

Algorithmic complexity of protein identification: Combinatorics of weighted strings

Mark Cieliebak; Thomas Erlebach; Zsuzsanna Lipták; Jens Stoye; Emo Welzl

Abstract We investigate a problem which arises in computational biology: Given a constant-size alphabet A with a weight function μ : A → N , find an efficient data structure and query algorithm solving the following problem: For a string σ over A and a weight M∈ N , decide whether σ contains a substring with weight M , where the weight of a string is the sum of the weights of its letters (O NE -S TRING M ASS F INDING P ROBLEM ). If the answer is yes , then we may in addition require a witness, i.e., indices i ⩽ j such that the substring beginning at position i and ending at position j has weight M . We allow preprocessing of the string and measure efficiency in two parameters: storage space required for the preprocessed data and running time of the query algorithm for given M . We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length n of the input string σ . Among others, we present two non-trivial efficient algorithms: L OOKUP solves the problem with O( n ) storage space and O (n/ log n) time; I NTERVAL solves the problem for binary alphabets with O( n ) storage space in O ( log n) query time. We introduce other variants of the problem and sketch how our algorithms may be extended for these variants. Finally, we discuss combinatorial properties of weighted strings.

acm symposium on applied computing | 2005

Efficient mass decomposition

Sebastian Böcker; Zsuzsanna Lipták

We study the problem of decomposing a positive integer M over a (fixed and finite) weighted alphabet Σ: We want to find non-negative integers c<inf>i</inf> such that M = c<inf>1</inf>a<inf>1</inf>+...+c<inf>k</inf>a<inf>k</inf>, where the a<inf>i</inf> are the positive integer weights of the individual characters and |Σ| = k. We refer to the vector (c<inf>1</inf>,...,c<inf>k</inf>) as a witness (of M over Σ), and denote by γ(M) the number of distinct witnesses of M. We present a data structure of size O(ka<inf>1</inf>) that allows finding all witnesses of any query M in time O(ka<inf>1</inf>. γ(M). To the best of our knowledge, this is the first algorithm for the problem with runtime independent of the size of the query M. Construction of the data structure requires O(ka<inf>1</inf>) time and constant additional space, and is very easy to implement.The problem is motivated by mass spectrometry experiments, where peaks need to be mapped to sample molecules whose mass they could represent. Our simulations show that the algorithm presented performs well on relevant applications.

string processing and information retrieval | 2013

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

Ferdinando Cicalese; Travis Gagie; Emanuele Giaquinta; Eduardo Sany Laber; Zsuzsanna Lipták; Romeo Rizzi; Alexandru I. Tomescu

We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

Bioinformatics | 2011

KABOOM! A new suffix array based algorithm for clustering expression data

Scott Hazelhurst; Zsuzsanna Lipták

MOTIVATION Second-generation sequencing technology has reinvigorated research using expression data, and clustering such data remains a significant challenge, with much larger datasets and with different error profiles. Algorithms that rely on all-versus-all comparison of sequences are not practical for large datasets. RESULTS We introduce a new filter for string similarity which has the potential to eliminate the need for all-versus-all comparison in clustering of expression data and other similar tasks. Our filter is based on multiple long exact matches between the two strings, with the additional constraint that these matches must be sufficiently far apart. We give details of its efficient implementation using modified suffix arrays. We demonstrate its efficiency by presenting our new expression clustering tool, wcd-express, which uses this heuristic. We compare it to other current tools and show that it is very competitive both with respect to quality and run time. AVAILABILITY Source code and binaries available under GPL at http://code.google.com/p/wcdest. Runs on Linux and MacOS X. CONTACT [email protected]; [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

computing and combinatorics conference | 2005

The money changing problem revisited: computing the Frobenius number in time O(k a 1 )

Sebastian Böcker; Zsuzsanna Lipták

The Money Changing Problem (also known as Equality Constrained Integer Knapsack Problem) is as follows: Let a1 < a2 < ... < ak be fixed positive integers with

developments in language theory | 2011

On prefix normal words

Gabriele Fici; Zsuzsanna Lipták

\gcd(a_1, \dots, a_k) = 1

Journal of Discrete Algorithms | 2012

A linear algorithm for string reconstruction in the reverse complement equivalence model

Ferdinando Cicalese; Péter L. Erds; Zsuzsanna Lipták

. Given some integer n, are there non-negative integers x1, ..., xk such that ∑iaixi = n? The Frobenius numberg(a1, ..., ak) is the largest integer n that has no decomposition of the above form. There exist algorithms that, for fixed k, compute the Frobenius number in time polynomial in log ak. For variable k, one can compute a residue table of a1 words which, in turn, allows to determine the Frobenius number. The best known algorithm for computing the residue table has runtime O(ka1 log a1) using binary heaps, and O(a1 (k+log a1)) using Fibonacci heaps. In both cases, O(a1) extra memory in addition to the residue table is needed. Here, we present an intriguingly simple algorithm to compute the residue table in time O(ka1) and extra memory O(1). In addition to computing the Frobenius number, we can use the residue table to solve the given instance of the Money Changing Problem in constant time, for any n.

Explore More