Marina Barsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marina Barsky is active.

Explore More

Publication

Featured researches published by Marina Barsky.

conference on information and knowledge management | 2008

A new method for indexing genomes using on-disk suffix trees

Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton

We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.

conference on information and knowledge management | 2009

Suffix trees for very large genomic sequences

Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton

Software - Practice and Experience | 2010

A survey of practical algorithms for suffix tree construction in external memory

Marina Barsky; Ulrike Stege; Alex Thomo

We present ADvanced Artefact Management System (ADAMS), a web-based system that integrates project management features, such as work-breakdown structure definition, resource allocation, and schedule management as well as artefact management features, such as artefact versioning, traceability management, and artefact quality management. In this article we focus on the fine-grained artefact management approach adopted in ADAMS, which is a valuable support to high-level documentation and traceability management. In particular, the traceability layer in ADAMS is used to propagate events concerning changes to an artefact to the dependent artefacts, thus also increasing the context-awareness in the project. We also present the results of experimenting with the system in software projects developed at the University of Salerno. Copyright

Information Systems | 2011

Suffix trees for inputs larger than main memory

Marina Barsky; Ulrike Stege; Alex Thomo

A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current methods for constructing suffix trees do not scale for large inputs. As suffix trees are larger than the input sequences and quickly outgrow the main memory, the first attempts at building large suffix trees focused on algorithms which avoid massive random access to the trees being built. However, all the existing practical algorithms perform random access to the input string, thus requiring in essence that the input be small enough to be kept in main memory. The constantly growing pool of string data, especially biological sequences, requires us to build suffix trees for much larger strings. We are the first to present an algorithm which is able to construct suffix trees for input sequences significantly larger than the size of the available main memory. Both the input string and the suffix tree are kept on disk and the algorithm is designed to avoid multiple random I/Os to both of them. As a proof of concept, we show that our method allows to build the suffix tree for 12GB of real DNA sequences in 26h on a single machine with 2GB of RAM. This input is four times the size of the Human Genome, and the construction of suffix trees for inputs of such magnitude was never reported before.

ACM Journal of Experimental Algorithms | 2008

A graph approach to the threshold all-against-all substring matching problem

Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton

We present a novel graph model and an efficient algorithm for solving the “threshold all against all” problem, which involves searching two strings (with length M and N, respectively) for all maximal approximate substring matches of length at least S, with up to K differences. Our algorithm solves the problem in time O(MNK3), which is a considerable improvement over the previous known bound for this problem. We also provide experimental evidence that, in practice, our algorithm exhibits a better performance than its worst-case running time.

Viruses | 2010

JaPaFi: A Novel Program for the Identification of Highly Conserved DNA Sequences

Aliya Sadeque; Marina Barsky; Francesco Marass; Peter Kruczkiewicz; Chris Upton

We describe the use of Java Pattern Finder (JaPaFi) to identify short (<100 nt) highly conserved sequences in a series of poxvirus genomes. The algorithm utilizes pattern matching to identify approximate matches appearing at least once in each member of a set of genomes; a key feature is that the genomes do not need to be aligned. The user simply specifies the genomes to search, minimum length of sequences to find and the maximum number of mismatches and indels allowed. Many of the most highly conserved segments contain poxvirus promoter elements.

string processing and information retrieval | 2006

A new algorithm for fast all-against-all substring matching

Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton

We present a new and efficient algorithm to solve the ’threshold all vs. all’ problem, which involves searching of two strings (with length N and M respectively) for finding all maximal approximate matches of length at least S and with up to K differences. The algorithm is based on a novel graph model, and it solves the problem in time O(NMK 2).

very large data bases | 2015