Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Svetlana Ekisheva is active.

Publication


Featured researches published by Svetlana Ekisheva.


International Journal of Bioinformatics Research and Applications | 2006

Probabilistic models for biological sequences: selection and Maximum Likelihood estimation

Svetlana Ekisheva; Mark Borodovsky

Probabilistic models for biological sequences (DNA and proteins) are frequently used in bioinformatics. We describe statistical tests designed to detect the order of dependency among elements of the sequence and to select the most appropriate probabilistic model for an experimental biological sequence. For a model of given type, the independence model, the first-order Markov chain and the hidden Markov model (HMM), we derive the uniform lower bound for the rate of decay for the errors of the maximum likelihood (ML) estimates of the model parameters and, subsequently, the uniform confidence intervals for the parameters.


Archive | 2006

Problems and Solutions in Biological Sequence Analysis: Profile HMMs for sequence families

Mark Borodovsky; Svetlana Ekisheva

We want to analyze sequences to identify the relationship of an individual sequence to a sequence family. If we already have a set of sequences belonging to a family, we can perform a database search for more members using pairwise alignment with one of the known family members as the query sequence. We could also search with all the known members one by one. However, pairwise searching with any one of the members may not find sequence distantly related to the ones we have already. An alternative approach is to use statistical features of the whole set of sequences in the search. Similarly, even when family membership is clear, accurate alignment can be often improved significantly by concentrating on features that are conserved in the whole family. A multiple alignment can show how the sequences in a family relate to each other. Figure 5.1 shows a multiple alignment of seven sequences from the large globin family. The three dimensional structure has been obtained for each protein in the alignment shown, and the sequences have been aligned on the basis of aligning the eight alpha helices of the conserved globin fold, and also on the basis of aligning certain key residues in the sequences, such as two conserved histidines (H) which are the residues interacting with an oxygen-binding heme prosthetic group in the globin active site. It is clear that some positions in the globin alignment are more conserved than others. In general the helices are more conserved than the loop regions between them, and certain residues are particularly strongly conserved. When identifying a new sequence as a globin, it would be desirable to concentrate on checking that these more conserved features are present. We will develop a particular type of hidden Markov model well suited to modeling multiple alignments. We call these profile HMMs after standard profiles, which were introduced previously for multiple alignment but with non-probabilistic structures. Profile HMMs are probably the most popular application of hidden Markov models in molecular biology at the moment. We will assume for the purpose of this chapter that we given a correct multiple alignment, from which we will build a model that can be used to find and score potential matches to new sequences. The multiple alignment could be built from structural information, like the globin alignment shown here, or it could come from a sequence-based alignment procedure, such as those discussed …


Archive | 2006

Markov chains and hidden Markov models

Mark Borodovsky; Svetlana Ekisheva

The chapter in BSA that introduces Markov chains and hidden Markov models plays a critical role in that book. The sequence comparison algorithms described in Chapter 2 could not be developed without the introduction of the theoretically justified similarity scores and statistical theory of similarity score distributions. These developments, in turn, are not feasible without rational choices of probabilistic models for DNA and protein sequences. Both Markov chains and hidden Markov models are often remarkably good candidates for the sequence models. Moreover, hidden Markov models (HMMs) are potentially a more flexible means for biological sequence analysis because they allow simultaneous modeling of observable and non-observable (hidden) states. The presence of the two types of states perfectly fits the need to model some important additional information existing beyond sequences per se , such as the functional meaning of the sequence elements, matches and mismatches of symbols in pairs of aligned sequences, evolutionary conserved regions in multiple sequences, phylogenetic relationships, etc. Chapter 3 of BSA introduces the fundamental algorithms of HMM theory: the Viterbi algorithm, the forward and backward algorithms, as well as the Baum–Welch algorithm. All of these algorithms are amenable for a variety of applications in biological sequence analysis. Of course, some of these HMM constructions exist in parallel with their non-probabilistic counterparts; for example, consider the Viterbi algorithm for a pair HMM and the classic dynamic programming algorithm for pairwise alignment. Both HMM and non-HMM approaches are known for finding conserved domains, building phylogenetic trees, etc. In this chapter, the BSA problems focus on deriving the formulas that support probabilistic modeling and the HMM algorithm construction.


Archive | 2006

Problems and Solutions in Biological Sequence Analysis

Mark Borodovsky; Svetlana Ekisheva


Methodology and Computing in Applied Probability | 2011

Uniform Accuracy of the Maximum Likelihood Estimates for Probabilistic Models of Biological Sequences

Svetlana Ekisheva; Mark Borodovsky


arXiv: Probability | 2008

Transportation Distance and the Central Limit Theorem

Svetlana Ekisheva; Christian Houdré


Archive | 2006

Problems and Solutions in Biological Sequence Analysis: RNA structure analysis

Mark Borodovsky; Svetlana Ekisheva


Archive | 2006

Problems and Solutions in Biological Sequence Analysis: Probabilistic approaches to phylogeny

Mark Borodovsky; Svetlana Ekisheva


Archive | 2006

Problems and Solutions in Biological Sequence Analysis: Transformational grammars

Mark Borodovsky; Svetlana Ekisheva


Archive | 2006

Problems and Solutions in Biological Sequence Analysis: Background on probability

Mark Borodovsky; Svetlana Ekisheva

Collaboration


Dive into the Svetlana Ekisheva's collaboration.

Top Co-Authors

Avatar

Mark Borodovsky

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Christian Houdré

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge