Dimitar Hristovski
University of Ljubljana
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dimitar Hristovski.
International Journal of Medical Informatics | 2005
Dimitar Hristovski; Borut Peterlin; Joyce A. Mitchell; Susanne M. Humphrey
We present BITOLA, an interactive literature-based biomedical discovery support system. The goal of this system is to discover new, potentially meaningful relations between a given starting concept of interest and other concepts, by mining the bibliographic database MEDLINE. To make the system more suitable for disease candidate gene discovery and to decrease the number of candidate relations, we integrate background knowledge about the chromosomal location of the starting disease as well as the chromosomal location of the candidate genes from resources such as LocusLink and Human Genome Organization (HUGO). BITOLA can also be used as an alternative way of searching the MEDLINE database. The system is available at http://www.mf.uni-lj.si/bitola/.
Sleep | 2012
Christopher M. Miller; Thomas C. Rindflesch; Marcelo Fiszman; Dimitar Hristovski; Dongwook Shin; Graciela Rosemblat; Han Zhang; Kingman P. Strohl
STUDY OBJECTIVES Sleep quality commonly diminishes with age, and, further, aging men often exhibit a wider range of sleep pathologies than women. We used a freely available, web-based discovery technique (Semantic MEDLINE) supported by semantic relationships to automatically extract information from MEDLINE titles and abstracts. DESIGN We assumed that testosterone is associated with sleep (the A-C relationship in the paradigm) and looked for a mechanism to explain this association (B explanatory link) as a potential or partial mechanism underpinning the etiology of eroded sleep quality in aging men. MEASUREMENTS AND RESULTS Review of full-text papers in critical nodes discovered in this manner resulted in the proposal that testosterone enhances sleep by inhibiting cortisol. Using this discovery method, we posit, and could confirm as a novel hypothesis, cortisol as part of a mechanistic link elucidating the observed correlation between decreased testosterone in aging men and diminished sleep quality. CONCLUSIONS This approach is publically available and useful not only in this manner but also to generate from the literature alternative explanatory models for observed experimental results.
Cardiovascular and Hematological Agents in Medicinal Chemistry | 2013
Dimitar Hristovski; Thomas C. Rindflesch; Borut Peterlin
We present a promising in silico paradigm called literature-based discovery (LBD) and describe its potential to identify novel pharmacologic approaches to treating diseases. The goal of LBD is to generate novel hypotheses by analyzing the vast biomedical literature. Additional knowledge resources, such as ontologies and specialized databases, are often used to supplement the published literature. MEDLINE, the largest and most important biomedical bibliographic database, is the most common source for exploiting LBD. There are two variants of LBD, open discovery and closed discovery. With open discovery we can, for example, try to find a novel therapeutic approach for a given disease, or find new therapeutic applications for an existing drug. With closed discovery we can find an explanation for a relationship between two concepts. For example, if we already have a hypothesis that a particular drug is useful for a particular disease, with closed discovery we can identify the mechanisms through which the drug could have a therapeutic effect on the disease. We briefly describe the methodology behind LBD and then discuss in more detail currently available LBD tools; we also mention in passing some of those no longer available. Next we present several examples in which LBD has been exploited for identifying novel therapeutic approaches. In conclusion, LBD is a powerful paradigm with considerable potential to complement more traditional drug discovery methods, especially for drug target discovery and for existing drug relabeling.
Archive | 2008
Dimitar Hristovski; Carol Friedman; Thomas C. Rindflesch; B. Peterlin
Literature-based discovery (LBD) is an emerging methodology for uncovering nonovert relationships in the online research literature. Making such relationships explicit supports hypothesis generation and discovery. Currently LBD systems depend exclusively on co-occurrence of words or concepts in target documents, regardless of whether relations actually exist between the words or concepts. We describe a method to enhance LBD through capture of semantic relations from the literature via use of natural language processing (NLP). This paper reports on an application of LBD that combines two NLP systems: BioMedLEE and SemRep, which are coupled with an LBD system called BITOLA. The two NLP systems complement each other to increase the types of information utilized by BITOLA. We also discuss issues associated with combining heterogeneous systems. Initial experiments suggest this approach can uncover new associations that were not possible using previous methods.
BMC Bioinformatics | 2015
Dimitar Hristovski; Dejan Dinevski; Andrej Kastrin; Thomas C. Rindflesch
BackgroundThe proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature.ResultsWe extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%).ConclusionsIn this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.
intelligent systems in molecular biology | 2009
Dimitar Hristovski; Andrej Kastrin; Borut Peterlin; Thomas C. Rindflesch
Although microarray experiments have great potential to support progress in biomedical research, results are not easy to interpret. Information about the functions and relations of relevant genes needs to be extracted from the vast biomedical literature. A potential solution is to use computerized text analysis methods. Our proposal enhances these methods with semantic relations. We describe an application that integrates such relations with microarray results and discuss its benefits in supporting enhanced access to the relevant literature for interpretation of results and novel hypotheses generation. The application is available at http://sembt.mf.uni-lj.si
PLOS ONE | 2014
Andrej Kastrin; Thomas C. Rindflesch; Dimitar Hristovski
Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearsons chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Smirnov test rejects the power law as a plausible model for degree distribution. For the major MeSH network the average path length was 2.63 edges with a diameter of seven edges and clustering coefficient of 0.15. The Kolmogorov-Smirnov test failed to reject the power law as a plausible model. The power-law exponent was 5.07. In both networks it was evident that nodes with a lower degree exhibit higher clustering than those with a higher degree. After simulated attack, where we removed 10% of nodes with the highest degrees, the giant component of each of the two networks contains about 90% of all nodes. Because of small average path length and high degree of clustering the MeSH network is small-world. A power-law distribution is not a plausible model for the degree distribution. The network is highly modular, highly resistant to targeted and random attack and with minimal dissortativity.
Methods of Information in Medicine | 2010
Andrej Kastrin; Borut Peterlin; Dimitar Hristovski
OBJECTIVES Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. METHODS Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. RESULTS Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine-learning algorithms (support vector machines, decision trees, naïve Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine-learning algorithms. CONCLUSIONS We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.
Genetic Testing | 2004
Borut Peterlin; T. Kunej; Dimitar Hristovski
Despite the current lack of understanding the mechanism of deleterious effects of Y chromosome microdeletions and their prognostic influence on male subfertility, the Y chromosome microdeletion test is widely used in the diagnostic evaluation of male subfertility. However, currently used diagnostic schemes have not been sufficiently evaluated for their diagnostic performance. The purpose of this study was to analyze a large database of published Y chromosome microdeletions to develop the optimal screening strategy for male subfertility. Therefore, we created a database from genetic and clinical data published in 52 peer-reviewed studies reporting on 512 cases with Y chromosome microdeletions. We developed a computerized procedure with the goal of minimizing the number of genetic markers included in the diagnostic set while maximizing the detection rate in patients with microdeletions. We estimate that 85.6% of all published Y chromosome microdeletions can be covered by a set of six genetic markers (sY84, sY127, sY152, RBMY1, sY147, sY254-DAZ). Inclusion of additional markers brings relatively little to the sensitivity of the test and is potentially related to the population origin.
Methods of Information in Medicine | 2016
Andrej Kastrin; T. C. Rindflesch; Dimitar Hristovski
OBJECTIVES Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts. METHODS We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future. RESULTS Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87). CONCLUSIONS The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.