Veselka Boeva
Technical University of Sofia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Veselka Boeva.
Journal of Bioinformatics and Computational Biology | 2007
Elena Tsiporkova; Veselka Boeva
Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.
european conference on computational biology | 2008
Elena Tsiporkova; Veselka Boeva
SUMMARY A novel integration approach targeting the combination of multi-experiment time series expression data is proposed. A recursive hybrid aggregation algorithm is initially employed to extract a set of genes, which are eventually of interest for the biological phenomenon under study. Next, a hierarchical merge procedure is specifically developed for the purpose of fusing together the multiple-experiment expression pro.les of the selected genes. This employs dynamic time warping alignment techniques in order to account adequately for the potential phase shift between the different experiments. We subsequently demonstrate that the resulting gene expression pro.les consistently re.ect the behavior of the original expression pro.les in the different experiments. SUPPLEMENTARY INFORMATION Supplementary data are available athttp://www.tu-plovdiv.bg/Container/bi/DataIntegration/
International Journal of Approximate Reasoning | 1999
Elena Tsiporkova; Veselka Boeva; Bernard De Baets
Abstract In this paper, the modal logic interpretation of plausibility and belief measures on an arbitrary universe of discourse, as proposed by Harmanec et al., is further developed by employing notions from set-valued analysis. In a model of modal logic, a multivalued mapping is constructed from the accessibility relation and a mapping determined by the value assignment function. This multivalued mapping induces a plausibility measure and a belief measure expressed in terms of conditional probabilities of inverse and superinverse images, or equivalently, in terms of conditional probabilities of truth sets of possibilitations and necessitations. Restricting to a finite universe of discourse, multivalued interpretations of basic probability assignments and of commonality functions are also obtained, in terms of conditional probabilities of pure inverse and subinverse images, or equivalently, in terms of conditional probabilities of truth sets of particular logical expressions involving possibilitations and necessitations.
Expert Systems With Applications | 2014
Anton Borg; Martin Boldt; Niklas Lavesson; Ulf Melander; Veselka Boeva
According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist. This study presents a systematic data collection method for residential burglaries. A decision support system for comparing and analysing residential burglaries is also presented. The decision support system consists of an advanced search tool and a plugin-based analytical framework. In order to find similar crimes, law enforcement officers have to review a large amount of crimes. The potential use of the cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential burglary analysis based on characteristics is investigated. The characteristics used are modus operandi, residential characteristics, stolen goods, spatial similarity, or temporal similarity. Clustering quality is measured using the modularity index and accuracy is measured using the rand index. The clustering solution with the best quality performance score were residential characteristics, spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when grouping crimes can positively affect the end result. The results suggest that a high quality clustering solution performs significantly better than a random guesser. In terms of practical significance, the presented clustering approach is capable of reduce the amounts of cases to review while keeping most connected cases. While the approach might miss some connections, it is also capable of suggesting new connections. The results also suggest that while crime series clustering is feasible, further investigation is needed.
BMC Bioinformatics | 2014
Anna Hristoskova; Veselka Boeva; Elena Tsiporkova
BackgroundPresently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them.ResultsWe propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group.These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals.ConclusionsThe proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices.
Archive | 2010
Veselka Boeva; Elena Tsiporkova
This work proposes a novel multi-purpose data standardization method inspired by gene-centric clustering approaches. The clustering is performed via template matching of expression profiles employing Dynamic Time Warping (DTW) alignment algorithm to measure the similarity between the profiles. In this way, for each gene profile a cluster consisting of a varying number of neighboring gene profiles (determined by the degree of similarity) is identified to be used in the subsequent standardization phase. The standardized profiles are extracted via a recursive aggregation algorithm, which reduces each cluster of neighboring expression profiles to a singe profile. The proposed data standardization method is validated on gene expression time series data coming from a study examining the global cell-cycle control of gene expression in fission yeast Schizosaccharomyces pombe.
international conference on adaptive and intelligent systems | 2009
Veselka Boeva; Elena Kostadinova
Gene expression microarrays are the most commonly available source of high-throughput biological data. Each microarray experiment is supposed to measure the gene expression levels of a set of genes in a number of different experimental conditions or time points. Integration of results from different microarray experiments to the specific analysis is an important and yet challenging problem. Direct integration of microarrays is often ineffective because of the diverse types of experiment specific variations. In this paper, we propose a new hybrid method, which is specially suited for integration analysis of time series expression data across different experiments. The proposed algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles. First for each considered time series dataset a quadratic distance matrix that contains the DTW distances calculated between the expression profiles of each gene pair is built. Then using a hybrid aggregation algorithm the obtained DTW distance matrices are transformed into a single matrix, consisting of one overall DTW distance per each gene pair. The values of the resulting matrix can be interpreted as the consensus DTW distances supported by all the experiments. These may be further analyzed and help find the relationship among the genes. The proposed method is validated on gene expression time series data coming from two independent studies examining the global cell-cycle control of gene expression in fission yeast Schizosaccharomyces pombe.
data and knowledge engineering | 2004
Veselka Boeva; Love Ekenberg
Conflict detection and analysis are of high importance, e.g., when integrating conceptual schemata, such as UML-Specifications, or analysing goal-fulfilment of sets of autonomous agents. In general, models for this introduce unnecessarily complicated frameworks with several disadvantages regarding semantics as well as complexity. This paper demonstrates that an important set of static and dynamic conflicts between specifications can be diagnosed using ordinary first-order modal logic. Furthermore, we show how the framework can be extended for handling situations when there are convex sets of probability measures over a state-space. Thus, representing specifications as conceptual schemata and using standard Kripke models of modal logic, augmented with an interval-valued probability measure, we propose instrumental definitions and procedures for conflict detection.
international conference on information technology | 2011
Elena Kostadinova; Veselka Boeva; Niklas Lavesson
In this article, we study two microarray data integration techniques and describe how they can be applied and validated on a set of independent, but biologically related, microarray data sets in order to derive consistent and relevant clustering results. First, we present a cluster integration approach, which combines the information containing in multiple data sets at the level of expression or similarity matrices, and then applies a clustering algorithm on the combined matrix for subsequent analysis. Second, we propose a technique for the integration of multiple partitioning results. The performance of the proposed cluster integration algorithms is evaluated on time series expression data using two clustering algorithms and three cluster validation measures. We also propose a modified version of the Figure of Merit (FOM) algorithm, which is suitable for estimating the predictive power of clustering algorithms when they are applied to multiple expression data sets. In addition, an improved version of the well-known connectivity measure is introduced to achieve a more objective evaluation of the connectivity performance of clustering algorithms.
Fuzzy Sets and Systems | 1999
Elena Tsiporkova; Bernard De Baets; Veselka Boeva
A modal logic interpretation of Dempsters rule of conditioning is developed. It is shown that by restricting a model of modal logic in a non-trivial way, the measures induced by this restricted model are, in fact, the conditional measures given the restricting set, corresponding to the measures induced by the original model.