Milko Krachunov
Sofia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Milko Krachunov.
Biology Direct | 2015
Ola Spjuth; Erik Bongcam-Rudloff; Guillermo Carrasco Hernández; Lukas Forer; Mario Giovacchini; Roman Valls Guimera; Aleksi Kallio; Eija Korpelainen; Maciej M. Kańduła; Milko Krachunov; David P. Kreil; Ognyan Kulev; Paweł P. Łabaj; Samuel Lampa; Luca Pireddu; Sebastian Schönherr; Alexey Siretskiy; Dimitar Vassilev
AbstractHigh-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution. Reviewers This article was reviewed by Dr Andrew Clark.
Journal of Computational Science | 2014
Milko Krachunov; Dimitar Vassilev
Abstract Metagenomics is a rapidly growing field, which has been greatly driven by the ongoing advancements in high-throughput sequencing technologies. As a result, both the data preparation and the subsequent in silico experiments pose unsolved technical and theoretical challenges, as there are not any well-established approaches, and new expertise and software are constantly emerging. Our project main focus is the creation and evaluation of a novel error detection and correction approach to be used inside a metagenomic processing workflow. The approach, together with an indirect validation technique and the already obtained empirical results, are described in detail in this paper. To aid the development and testing, we are also building a workflow execution system to run our experiments that is designed to be extensible beyond the scope of error detection which will be released as a free/open-source software package.
Biotechnology & Biotechnological Equipment | 2012
Peter Petrov; Milko Krachunov; Elena Todorovska; Dimitar Vassilev
ABSTRACT Recent years have seen a vast amount of data generated by various biological and biomedical experiments. The storage, management and analysis of this data, is done by means of the modern bioinformatics applications and tools. One of the bioinformatics instruments used for solving these tasks, are ontologies and the apparatus they provide. Ontology as a modeling tool is a specification of a conceptualization meaning that an ontology is a formal description of the concepts and relationships that can exist for a given software system or software agent (8, 10). Anatomical (phenotypic) ontologies of various species nowadays typically contain from few thousands to few tens of thousands of terms and relations (which is a very small number compared to the count of objects and the amount of data produced by biological experiments at the molecular level, for example) but usually the semantics employed in them is enormous in scale. The major problem when using such ontologies is that they lack intelligent tools for cross-species literature searches (text mining) as well as tools aiding the design of new biological and biomedical experiments with other (notyet tested) species/organisms, based on available information about experiments already performed on certain model species/organisms. This is where the process of merging anatomical ontologies comes into use. Using specific models and algorithms for merging of such ontologies is a matter of choice. In this work a novel approach for solving this task, based on two directed acyclic graph (DAG) models and three original algorithmic procedures is presented. Based on them, an intelligent software system for merging two (and possibly more) input/source anatomical ontologies into one output/target super-ontology was designed and implemented. This system was named AnatOM (an abbreviation from “Anatomical Ontologies Merger”). In this work a short overview of ontologies is provided describing what ontologies are and why they are widely used as a tool in bioinformatics. The problem of merging anatomical ontologies of two or more different organisms is introduced and some effort has been put into explaining why it is important. A general outline is presented of the models and the method that have been developed for solving the ontologies merging problem. A high-level overview of the AnatOM program implemented by the authors as part of this work is also provided. To achieve the degree of intelligence that is needed, the AnatOM program utilizes the large amount of high-quality data (knowledge) available in several widely popular and generally recognized knowledge bases such as UMLS, FMA, and WordNet. The last one of these is a general-purpose i.e. non-specialized knowledge source. The first two are biological/biomedical ones. Their choice was based on the fact that they provide a very good foundation for building an intelligent system that performs certain comparative anatomy tasks including mapping and merging of anatomical ontologies (23).
international conference on conceptual structures | 2017
Milko Krachunov; Maria Nisheva; Dimitar Vassilev
Abstract In high-variation genomics datasets, such as found in metagenomics or complex polyploid genome analysis, error detection and variant calling are impeded by the difficulty in discerning sequencing errors from actual biological variation. Confirming base candidates with high frequency of occurrence is no longer a reliable measure, because of the natural variation and the presence of rare bases. This work employs machine learning models to classify bases into erroneous and rare variations, after preselecting potential error candidates with a weighted frequency measure, which aims to focus on unexpected variations by using the inter-sequence pairwise similarity. Different similarity measures are used to account for different types of datasets. Four machine learning models are tested.
international conference on conceptual structures | 2015
Milko Krachunov; Dimitar Vassilev; Maria Nisheva; Ognyan Kulev; Valeriya Simeonova; Vladimir Dimitrov
NGS data processing in metagenomics studies has to deal with noisy data that can contain a large amount of reading errors which are difficult to detect and account for. This work introduces a fuzzy indicator of reliability technique to facilitate solutions to this problem. It includes modified Hamming and Levenshtein distance functions that are aimed to be used as drop-in replacements in NGS analysis procedures which rely on distances, such as phylogenetic tree construction. The distances utilise fuzzy sets of reliable bases or an equivalent fuzzy logic, potentially aggregating multiple sources of base reliability.
Journal of Integrative Bioinformatics | 2013
Peter Petrov; Milko Krachunov; Dimitar Vassilev
This paper presents a study in the domain of semi-automated and fully-automated ontology mapping. A process for inferring additional cross-ontology links within the domain of anatomical ontologies is presented and evaluated on pairs from three model organisms. The results of experiments performed with various external knowledge sources and scoring schemes are discussed.
artificial intelligence methodology systems applications | 2018
Milko Krachunov; Maria Nisheva; Dimitar Vassilev
Genomics studies have increasingly had to deal with datasets containing high variation between the sequenced nucleotide chains. This is most common in metagenomics studies and polyploid studies, where the biological nature of studied samples requires analysis of multiple variants of nearly identical sequences. The high variation makes it more difficult to determine the correct nucleotide sequences, as well as to distinguish signal from noise, producing digital results with higher error rates than the ones that can be achieved in samples with low variation. This paper presents an original pure machine learning-based approach for detecting and potentially correcting those errors. It uses a generic machine learning-based model that can be applied to different types of sequencing data with minor modifications. As presented in a separate part of this work, these models can be combined with data-specific error candidate selection to apply the models on, for a refined error discovery, but as shown here, can also be used independently.
international syposium on methodologies for intelligent systems | 2017
Milko Krachunov; Peter Petrov; Maria Nisheva; Dimitar Vassilev
A system for automated prediction and inference of cross-ontology links is presented. External knowledge sources are used to create a primary body of predictions. The structure of the projected super-ontology is then used to automatically infer additional predictions. Probabilistic scores are attached to all of these predictions, allowing them to be filtered using a statistically-selected threshold. Three anatomical ontologies were mapped in pairs, and all the predicted mapping links were individually checked by a manual curator, allowing a closer look at the quality of the chosen prediction procedures, and the validity of the resulting mappings.
The first computers | 2017
Milko Krachunov; Maria Nisheva; Dimitar Vassilev
For metagenomics datasets, datasets of complex polyploid genomes, and other high-variation genomics datasets, there are difficulties with the analysis, error detection and variant calling, stemming from the challenges of discerning sequencing errors from biological variation. Confirming base candidates with high frequency of occurrence is no longer a reliable measure because of the natural variation and the presence of rare bases. The paper discusses an approach to the application of machine learning models to classify bases into erroneous and rare variations after preselecting potential error candidates with a weighted frequency measure, which aims to focus on unexpected variations by using the inter-sequence pairwise similarity. Different similarity measures are used to account for different types of datasets. Four machine learning models are implemented and tested.
Archive | 2018
Milko Krachunov; Milena Sokolova; Valeriya Simeonova; Maria Nisheva; Irena Avdjieva; Dimitar Vassilev