Yelena Frid
University of California, Davis
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yelena Frid.
Algorithms for Molecular Biology | 2010
Yelena Frid; Dan Gusfield
BackgroundThe problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic RNA-folding problem of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length n, has an O(n3)-time dynamic programming solution that is widely applied. It is known that an o(n3) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the O(n3) worst-case time bound when n is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the Four-Russians technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding.ResultsIn this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of O(n3/log(n)).ConclusionsWe show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.
computing and combinatorics conference | 2007
Dan Gusfield; Yelena Frid; Daniel G. Brown
Several central and well-known combinatorial problems in phylogenetics and population genetics have efficient, elegant solutions when the input is complete or consists of haplotype data, but lack efficient solutions when input is either incomplete, consists of genotype data, or is for problems generalized from decision questions to optimization questions. Unfortunately, in biological applications, these harder problems arise very often. Previous research has shown that integer-linear programming can sometimes be used to solve hard problems in practice on a range of data that is realistic for current biological applications. Here, we describe a set of related integer linear programming (ILP) formulations for several additional problems, most of which are known to be NP-hard. These ILP formulations address either the issue of missing data, or solve Haplotype Inference Problems with objective functions that model more complex biological phenomena than previous formulations. These ILP formulations solve efficiently on data whose composition reflects a range of data of current biological interest. We also assess the biological quality of the ILP solutions: some of the problems, although not all, solve with excellent quality. These results give a practical way to solve instances of some central, hard biological problems, and give practical ways to assess how well certain natural objective functions reflect complex biological phenomena. Perl code to generate the ILPs (for input to CPLEX) is on the web at wwwcsif.cs.ucdavis.edu/˜gusfield.
workshop on algorithms in bioinformatics | 2013
Balaji Venkatachalam; Dan Gusfield; Yelena Frid
The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n 3) time using Nussinov’s dynamic programming algorithm. The Four-Russians method is a technique that will reduce the running time for certain dynamic programming algorithms by a multiplicative factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated and solved. Frid and Gusfield designed an \(O(\frac{n^3}{\log n})\) algorithm for RNA folding using the Four-Russians technique. In their algorithm the preprocessing is interleaved with the algorithm computation.
workshop on algorithms in bioinformatics | 2010
Yelena Frid; Dan Gusfield
The computational formulation for finding the optimal simultaneous alignment and fold (optimal Co-fold) of RNA sequences was first introduced by Sankoff in 1985. Since then the importance of Co-Folding has grown as conservation of structure and its relationship to function have been widely observed in RNA. For two sequences, the computation time of Sankoffs Algorithm is θ(N6). Existing literature on cofolding attempts to improve efficiency through simplifying the original problem formulation. We present here a practical and worst-case speed up using the Four-Russians method, without placing any added constraints on the types of alignments or folds allowed. Our algorithm, Fast Cofold, finds the optimal Co-fold in O(N6/ log(N2))-time, a speedup which is observed in practice. Because the solution matrix produced by our algorithm is identical to the one produced by the Sankoff algorithm, the contribution of the algorithm lays not only in its standalone practicality but also in the ability to implement it alongside heuristic speed ups leading to even greater reductions in time.
Algorithms for Molecular Biology | 2014
Balaji Venkatachalam; Dan Gusfield; Yelena Frid
BackgroundThe secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n3) time using Nussinov’s dynamic programming algorithm. The Four-Russians method is a technique that reduces the running time for certain dynamic programming algorithms by a multiplicative factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated and solved. Frid and Gusfield designed an O(n3logn) algorithm for RNA folding using the Four-Russians technique. In their algorithm the preprocessing is interleaved with the algorithm computation.Theoretical resultsWe simplify the algorithm and the analysis by doing the preprocessing once prior to the algorithm computation. We call this the two-vector method. We also show variants where instead of exhaustive preprocessing, we only solve the subproblems encountered in the main algorithm once and memoize the results. We give a simple proof of correctness and explore the practical advantages over the earlier method.The Nussinov algorithm admits an O(n2) time parallel algorithm. We show a parallel algorithm using the two-vector idea that improves the time bound to O(n2logn).Practical resultsWe have implemented the parallel algorithm on graphics processing units using the CUDA platform. We discuss the organization of the data structures to exploit coalesced memory access for fast running times. The ideas to organize the data structures also help in improving the running time of the serial algorithms. For sequences of length up to 6000 bases the parallel algorithm takes only about 2.5 seconds and the two-vector serial method takes about 57 seconds on a desktop and 15 seconds on a server. Among the serial algorithms, the two-vector and memoized versions are faster than the Frid-Gusfield algorithm by a factor of 3, and are faster than Nussinov by up to a factor of 20. The source-code for the algorithms is available at http://github.com/ijalabv/FourRussiansRNAFolding.
workshop on algorithms in bioinformatics | 2009
Yelena Frid; Dan Gusfield
The problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic RNA-folding problem of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length n, has an O(n 3)-time dynamic programming solution that is widely applied. It is known that an o(n 3) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the O(n 3) worst-case time bound when n is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the Four-Russians technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding.
conference on combinatorial optimization and applications | 2012
Yelena Frid; Dan Gusfield
While secondary pseudoknotted structure prediction is computationally challenging, such structures appear to play biologically important roles in both cells and viral RNA [1]. Restricting the class of possible structures and then finding the optimal structure for that restricted class is a common method employed to deal with the computational complexity.
Algorithms for Molecular Biology | 2016
Yelena Frid; Dan Gusfield
BackgroundThe basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known
workshop on algorithms in bioinformatics | 2015
Yelena Frid; Dan Gusfield
Archive | 2010
Yelena Frid; Dan Gusfield
O(n^3)