Marina Barsky
University of Victoria
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marina Barsky.
conference on information and knowledge management | 2008
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.
conference on information and knowledge management | 2009
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton
A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current methods for constructing suffix trees do not scale for large inputs. All the existing practical algorithms perform random access to the input string, thus requiring that the input be small enough to be kept in main memory. We are the first to present an algorithm which is able to construct suffix trees for input sequences significantly larger than the size of the available main memory. As a proof of concept, we show that our method allows to build the suffix tree for 12GB of real DNA sequences in 26 hours on a single machine with 2GB of RAM. This input is four times the size of the Human Genome, and the construction of suffix trees for inputs of such magnitude was never reported before.
Software - Practice and Experience | 2010
Marina Barsky; Ulrike Stege; Alex Thomo
We present ADvanced Artefact Management System (ADAMS), a web-based system that integrates project management features, such as work-breakdown structure definition, resource allocation, and schedule management as well as artefact management features, such as artefact versioning, traceability management, and artefact quality management. In this article we focus on the fine-grained artefact management approach adopted in ADAMS, which is a valuable support to high-level documentation and traceability management. In particular, the traceability layer in ADAMS is used to propagate events concerning changes to an artefact to the dependent artefacts, thus also increasing the context-awareness in the project. We also present the results of experimenting with the system in software projects developed at the University of Salerno. Copyright
Information Systems | 2011
Marina Barsky; Ulrike Stege; Alex Thomo
A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current methods for constructing suffix trees do not scale for large inputs. As suffix trees are larger than the input sequences and quickly outgrow the main memory, the first attempts at building large suffix trees focused on algorithms which avoid massive random access to the trees being built. However, all the existing practical algorithms perform random access to the input string, thus requiring in essence that the input be small enough to be kept in main memory. The constantly growing pool of string data, especially biological sequences, requires us to build suffix trees for much larger strings. We are the first to present an algorithm which is able to construct suffix trees for input sequences significantly larger than the size of the available main memory. Both the input string and the suffix tree are kept on disk and the algorithm is designed to avoid multiple random I/Os to both of them. As a proof of concept, we show that our method allows to build the suffix tree for 12GB of real DNA sequences in 26h on a single machine with 2GB of RAM. This input is four times the size of the Human Genome, and the construction of suffix trees for inputs of such magnitude was never reported before.
ACM Journal of Experimental Algorithms | 2008
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton
We present a novel graph model and an efficient algorithm for solving the “threshold all against all” problem, which involves searching two strings (with length <i>M</i> and <i>N</i>, respectively) for all maximal approximate substring matches of length at least <i>S</i>, with up to <i>K</i> differences. Our algorithm solves the problem in time <i>O</i>(<i>MNK</i><sub>3</sub>), which is a considerable improvement over the previous known bound for this problem. We also provide experimental evidence that, in practice, our algorithm exhibits a better performance than its worst-case running time.
Viruses | 2010
Aliya Sadeque; Marina Barsky; Francesco Marass; Peter Kruczkiewicz; Chris Upton
We describe the use of Java Pattern Finder (JaPaFi) to identify short (<100 nt) highly conserved sequences in a series of poxvirus genomes. The algorithm utilizes pattern matching to identify approximate matches appearing at least once in each member of a set of genomes; a key feature is that the genomes do not need to be aligned. The user simply specifies the genomes to search, minimum length of sequences to find and the maximum number of mismatches and indels allowed. Many of the most highly conserved segments contain poxvirus promoter elements.
string processing and information retrieval | 2006
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton
We present a new and efficient algorithm to solve the ’threshold all vs. all’ problem, which involves searching of two strings (with length N and M respectively) for finding all maximal approximate matches of length at least S and with up to K differences. The algorithm is based on a novel graph model, and it solves the problem in time O(NMK 2).
very large data bases | 2015
Wissam Khaouid; Marina Barsky; Venkatesh Srinivasan; Alex Thomo
very large data bases | 2011
Marina Barsky; Sangkyum Kim; Tim Weninger; Jiawei Han
bioinformatics and bioengineering | 2007
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton