Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Halil Kilicoglu is active.

Publication


Featured researches published by Halil Kilicoglu.


Bioinformatics | 2012

SemMedDB: a PubMed-scale repository of biomedical semantic predications

Halil Kilicoglu; Dongwook Shin; Marcelo Fiszman; Graciela Rosemblat; Thomas C. Rindflesch

SUMMARY Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support. AVAILABILITY AND IMPLEMENTATION The SemMedDB repository is available as a MySQL database for non-commercial use at http://skr3.nlm.nih.gov/SemMedDB. An UMLS Metathesaurus license is required. CONTACT [email protected].


north american chapter of the association for computational linguistics | 2004

Abstraction summarization for managing the biomedical research literature

Marcelo Fiszman; Thomas C. Rindflesch; Halil Kilicoglu

We explore a semantic abstraction approach to automatic summarization in the biomedical domain. The approach relies on a semantic processor that functions as the source interpreter and produces a list of predications. A transformation stage then generalizes and condenses this list, ultimately generating a conceptual condensate for a disorder input topic. The final condensate is displayed in graphical form. We provide a set of principles for the transformation stage and describe the application of this approach to multidocument input. Finally, we examine the characteristics and quality of the condensates produced.


Journal of Biomedical Informatics | 2009

Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation

Marcelo Fiszman; Dina Demner-Fushman; Halil Kilicoglu; Thomas C. Rindflesch

As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for 53 diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p<0.01) and the increase in the overall score of clinical usefulness was 0.39 (p<0.05).


Journal of the American Medical Informatics Association | 2009

Towards Automatic Recognition of Scientifically Rigorous Clinical Research Evidence

Halil Kilicoglu; Dina Demner-Fushman; Thomas C. Rindflesch; Nancy L. Wilczynski; R. Brian Haynes

The growing numbers of topically relevant biomedical publications readily available due to advances in document retrieval methods pose a challenge to clinicians practicing evidence-based medicine. It is increasingly time consuming to acquire and critically appraise the available evidence. This problem could be addressed in part if methods were available to automatically recognize rigorous studies immediately applicable in a specific clinical situation. We approach the problem of recognizing studies containing useable clinical advice from retrieved topically relevant articles as a binary classification problem. The gold standard used in the development of PubMed clinical query filters forms the basis of our approach. We identify scientifically rigorous studies using supervised machine learning techniques (Naïve Bayes, support vector machine (SVM), and boosting) trained on high-level semantic features. We combine these methods using an ensemble learning method (stacking). The performance of learning methods is evaluated using precision, recall and F(1) score, in addition to area under the receiver operating characteristic (ROC) curve (AUC). Using a training set of 10,000 manually annotated MEDLINE citations, and a test set of an additional 2,000 citations, we achieve 73.7% precision and 61.5% recall in identifying rigorous, clinically relevant studies, with stacking over five feature-classifier combinations and 82.5% precision and 84.3% recall in recognizing rigorous studies with treatment focus using stacking over word + metadata feature vector. Our results demonstrate that a high quality gold standard and advanced classification methods can help clinicians acquire best evidence from the medical literature.


Information services & use | 2011

Semantic MEDLINE: An advanced information management application for biomedicine

Thomas C. Rindflesch; Halil Kilicoglu; Marcelo Fiszman; Graciela Rosemblat; Dongwook Shin

To support more effective biomedical information management, Semantic MEDLINE integrates document retrieval, advanced natural language processing, automatic summarization and visualization into a single Web portal. The application is intended to help manage the results of PubMed searches by condensing core semantic content in the citations retrieved. Output is presented as a connected graph of semantic relations, with links to the original MEDLINE citations. The ability to connect salient information across documents helps users keep up with the research literature and discover connections which might otherwise go unnoticed. Semantic MEDLINE can make an impact on biomedicine by supporting scientific discovery and the timely translation of insights from basic research into advances in clinical practice and patient care. Semantic MEDLINE is illustrated here with recent research on the clock genes.


BMC Bioinformatics | 2006

Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease

Marco Masseroli; Halil Kilicoglu; François-Michel Lang; Thomas C. Rindflesch

BackgroundGenomic functional information is valuable for biomedical research. However, such information frequently needs to be extracted from the scientific literature and structured in order to be exploited by automatic systems. Natural language processing is increasingly used for this purpose although it inherently involves errors. A postprocessing strategy that selects relations most likely to be correct is proposed and evaluated on the output of SemGen, a system that extracts semantic predications on the etiology of genetic diseases. Based on the number of intervening phrases between an argument and its predicate, we defined a heuristic strategy to filter the extracted semantic relations according to their likelihood of being correct. We also applied this strategy to relations identified with co-occurrence processing. Finally, we exploited postprocessed SemGen predications to investigate the genetic basis of Parkinsons disease.ResultsThe filtering procedure for increased precision is based on the intuition that arguments which occur close to their predicate are easier to identify than those at a distance. For example, if gene-gene relations are filtered for arguments at a distance of 1 phrase from the predicate, precision increases from 41.95% (baseline) to 70.75%. Since this proximity filtering is based on syntactic structure, applying it to the results of co-occurrence processing is useful, but not as effective as when applied to the output of natural language processing.In an effort to exploit SemGen predications on the etiology of disease after increasing precision with postprocessing, a gene list was derived from extracted information enhanced with postprocessing filtering and was automatically annotated with GFINDer, a Web application that dynamically retrieves functional and phenotypic information from structured biomolecular resources. Two of the genes in this list are likely relevant to Parkinsons disease but are not associated with this disease in several important databases on genetic disorders.ConclusionInformation based on the proximity postprocessing method we suggest is of sufficient quality to be profitably used for subsequent applications aimed at uncovering new biomedical knowledge. Although proximity filtering is only marginally effective for enhancing the precision of relations extracted with co-occurrence processing, it is likely to benefit methods based, even partially, on syntactic structure, regardless of the relation.


BMC Bioinformatics | 2011

Constructing a semantic predication gold standard from the biomedical literature

Halil Kilicoglu; Graciela Rosemblat; Marcelo Fiszman; Thomas C. Rindflesch

BackgroundSemantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology.ResultsWe obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations.ConclusionsWhile interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.


Journal of Biomedical Informatics | 2014

Using semantic predications to uncover drug-drug interactions in clinical data

Rui Zhang; Michael J. Cairelli; Marcelo Fiszman; Graciela Rosemblat; Halil Kilicoglu; Thomas C. Rindflesch; Serguei V. S. Pakhomov; Genevieve B. Melton

In this study we report on potential drug-drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug-drug interaction schemas, based on relationships extracted from SemMedDB. In the first schema, Drug1 and Drug2 interact through Drug1s effect on some gene, which in turn affects Drug2. In the second, Drug1 affects Gene1, while Drug2 affects Gene2. Gene1 and Gene2, together, then have an effect on some biological function. After checking each drug pair from the medication lists of each of 22 patients, we found 19 known and 62 unknown drug-drug interactions using both schemas. For example, our results suggest that the interaction of Lisinopril, an ACE inhibitor commonly prescribed for hypertension, and the antidepressant sertraline can potentially increase the likelihood and possibly the severity of psoriasis. We also assessed the relationships extracted by SemRep from a linguistic perspective and found that the precision of SemRep was 0.58 for 300 randomly selected sentences from MEDLINE. Our study demonstrates that the use of structured knowledge in the form of relationships from the biomedical literature can support the discovery of potential drug-drug interactions occurring in patient clinical data. Moreover, SemMedDB provides a good knowledge resource for expanding the range of drugs, genes, and biological functions considered as elements in various drug-drug interaction pathways.


PLOS Computational Biology | 2014

Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference

Guocai Chen; Michael J. Cairelli; Halil Kilicoglu; Dongwook Shin; Thomas C. Rindflesch

Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.


Journal of Biomedical Informatics | 2015

The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs

Kirk Roberts; Sonya E. Shooshan; Laritza Rodriguez; Swapna Abhyankar; Halil Kilicoglu; Dina Demner-Fushman

This paper describes a supervised machine learning approach for identifying heart disease risk factors in clinical text, and assessing the impact of annotation granularity and quality on the systems ability to recognize these risk factors. We utilize a series of support vector machine models in conjunction with manually built lexicons to classify triggers specific to each risk factor. The features used for classification were quite simple, utilizing only lexical information and ignoring higher-level linguistic information such as syntax and semantics. Instead, we incorporated high-quality data to train the models by annotating additional information on top of a standard corpus. Despite the relative simplicity of the system, it achieves the highest scores (micro- and macro-F1, and micro- and macro-recall) out of the 20 participants in the 2014 i2b2/UTHealth Shared Task. This system obtains a micro- (macro-) precision of 0.8951 (0.8965), recall of 0.9625 (0.9611), and F1-measure of 0.9276 (0.9277). Additionally, we perform a series of experiments to assess the value of the annotated data we created. These experiments show how manually-labeled negative annotations can improve information extraction performance, demonstrating the importance of high-quality, fine-grained natural language annotations.

Collaboration


Dive into the Halil Kilicoglu's collaboration.

Top Co-Authors

Avatar

Thomas C. Rindflesch

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Marcelo Fiszman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Dina Demner-Fushman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Graciela Rosemblat

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kirk Roberts

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Yassine Mrabet

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Dongwook Shin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Michael J. Cairelli

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kate Masterton

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Laritza Rodriguez

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge