Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ergin Soysal is active.

Publication


Featured researches published by Ergin Soysal.


Computers in Biology and Medicine | 2010

Design and evaluation of an ontology based information extraction system for radiological reports

Ergin Soysal; Ilyas Cicekli; Nazife Baykal

This paper describes an information extraction system that extracts and converts the available information in free text Turkish radiology reports into a structured information model using manually created extraction rules and domain ontology. The ontology provides flexibility in the design of extraction rules, and determines the information model for the extracted semantic information. Although our information extraction system mainly concentrates on abdominal radiology reports, the system can be used in another field of medicine by adapting its ontology and extraction rule set. We achieved very high precision and recall results during the evaluation of the developed system with unseen radiology reports.


Ophthalmic Plastic and Reconstructive Surgery | 2008

Basal cell carcinoma of the eyelids and periorbital region in a Turkish population.

Hülya Gökmen Soysal; Ergin Soysal; Fatma Markoç; Fisun Ardiç

Purpose: To review the clinical and histopathologic features, treatment, and outcomes of eyelid basal cell carcinomas. Methods: The clinical records and histopathologic specimens of 311 patients with eyelid basal cell carcinomas were reviewed and analyzed retrospectively. The main outcome measures are patient demographics, clinical characteristics, lesion size, duration of lesion, histologic subtypes, presence of orbital and perineural invasion, severity of peritumorous inflammation, treatment modalities, recurrence rate, tumor-related death, and prognostic features. Results: Two-hundred ninety patients underwent surgery whereas others received radiotherapy or chemotherapy. The most common histologic subtypes were infiltrative, nodular, and basosquamous basal cell carcinomas. Nearly one-third (29.9%) of the patients were previously recurrent. Orbital and perineural invasion rates were 17.04% and 10.6%, respectively. Recurrent basal cell carcinomas were larger, with longer duration of lesion and a higher rate of orbital and perineural invasion. Basosquamous basal cell carcinomas were more likely to have prior recurrences, larger lesion size, and the highest rate of orbital invasion. Perineural invasion was most frequent in morpheaform and basosquamous subtypes. Peritumorous inflammation differed between subtypes and was highest in the superficial subtype. The recurrence rate was 7.39% in total. The death of 2 patients was tumor-related. Conclusions: In this large case series from a single center, the outcomes were worse than previously reported due to delay in treatment and previous inadequate treatments. Adverse prognostic factors associated with secondary orbital invasion are previous recurrences, aggressive histologic subtypes, longer duration of lesion, larger lesion size, and the presence of perineural invasion.


Nature Genetics | 2017

Finding useful data across multiple biomedical data repositories using DataMed

Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey S. Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Anupama E. Gururaj; Elizabeth A. Bell; Ergin Soysal; Nansu Zong; Hyeoneui Kim

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in todays big data–intensive science landscape.


BMC Systems Biology | 2015

A weighted and integrated drug-target interactome: drug repurposing for schizophrenia as a use case.

Liang Chin Huang; Ergin Soysal; W. Jim Zheng; Zhongming Zhao; Hua Xu; Jingchun Sun

BackgroundComputational pharmacology can uniquely address some issues in the process of drug development by providing a macroscopic view and a deeper understanding of drug action. Specifically, network-assisted approach is promising for the inference of drug repurposing. However, the drug-target associations coming from different sources and various assays have much noise, leading to an inflation of the inference errors. To reduce the inference errors, it is necessary and critical to create a comprehensive and weighted data set of drug-target associations.ResultsIn this study, we created a weighted and integrated drug-target interactome (WinDTome) to provide a comprehensive resource of drug-target associations for computational pharmacology. We first collected drug-target interactions from six commonly used drug-target centered data sources including DrugBank, KEGG, TTD, MATADOR, PDSP Ki Database, and BindingDB. Then, we employed the record linkage method to normalize drugs and targets to the unique identifiers by utilizing the public data sources including PubChem, Entrez Gene, and UniProt. To assess the reliability of the drug-target associations, we assigned two scores (Score_S and Score_R) to each drug-target association based on their data sources and publication references. Consequently, the WinDTome contains 546,196 drug-target associations among 303,018 compounds and 4,113 genes. To assess the application of the WinDTome, we designed a network-based approach for drug repurposing using mental disorder schizophrenia (SCZ) as a case. Starting from 41 known SCZ drugs and their targets, we inferred a total of 264 potential SCZ drugs through the associations of drug-target with Score_S higher than two in WinDTome and human protein-protein interactions. Among the 264 SCZ-related drugs, 39 drugs have been investigated in clinical trials for SCZ treatment and 74 drugs for the treatment of other mental disorders, respectively. Compared with the results using other Score_S cutoff values, single data source, or the data from STITCH, the inference of 264 SCZ-related drugs had the highest performance.ConclusionsThe WinDTome generated in this study contains comprehensive drug-target associations with confidence scores. Its application to the SCZ drug repurposing demonstrated that the WinDTome is promising to serve as a useful resource for drug repurposing.


Journal of the American Medical Informatics Association | 2018

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines

Ergin Soysal; Jingqi Wang; Min Jiang; Yonghui Wu; Serguei V. S. Pakhomov; Hongfang Liu; Hua Xu

Abstract Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.


Journal of the American Medical Informatics Association | 2016

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)

Yonghui Wu; Joshua C. Denny; S. Trent Rosenbloom; Randolph A. Miller; Dario A. Giuse; Lulu Wang; Carmelo Blanquicett; Ergin Soysal; Jun Xu; Hua Xu

Objective: The goal of this study was to develop a practical framework for recognizing and disambiguating clinical abbreviations, thereby improving current clinical natural language processing (NLP) systems’ capability to handle abbreviations in clinical narratives. Methods: We developed an open-source framework for clinical abbreviation recognition and disambiguation (CARD) that leverages our previously developed methods, including: (1) machine learning based approaches to recognize abbreviations from a clinical corpus, (2) clustering-based semiautomated methods to generate possible senses of abbreviations, and (3) profile-based word sense disambiguation methods for clinical abbreviations. We applied CARD to clinical corpora from Vanderbilt University Medical Center (VUMC) and generated 2 comprehensive sense inventories for abbreviations in discharge summaries and clinic visit notes. Furthermore, we developed a wrapper that integrates CARD with MetaMap, a widely used general clinical NLP system. Results and Conclusion: CARD detected 27 317 and 107 303 distinct abbreviations from discharge summaries and clinic visit notes, respectively. Two sense inventories were constructed for the 1000 most frequent abbreviations in these 2 corpora. Using the sense inventories created from discharge summaries, CARD achieved an F1 score of 0.755 for identifying and disambiguating all abbreviations in a corpus from the VUMC discharge summaries, which is superior to MetaMap and Apache’s clinical Text Analysis Knowledge Extraction System (cTAKES). Using additional external corpora, we also demonstrated that the MetaMap-CARD wrapper improved MetaMap’s performance in recognizing disorder entities in clinical notes. The CARD framework, 2 sense inventories, and the wrapper for MetaMap are publicly available at https://sbmi.uth.edu/ccb/resources/abbreviation.htm. We believe the CARD framework can be a valuable resource for improving abbreviation identification in clinical NLP systems.


north american chapter of the association for computational linguistics | 2015

UTH-CCB: The Participation of the SemEval 2015 Challenge -- Task 14

Jun Xu; Yaoyun Zhang; Jingqi Wang; Yonghui Wu; Min Jiang; Ergin Soysal; Hua Xu

This paper describes the system developed by the University of Texas Health Science Center at Houston (UTHealth), for the 2015 SemEval shared task on “Analysis of Clinical Text” (Task 14). We participated in both sub-tasks: Task 1 for “Disorder Identification”, which aims to detect disorder entities and encode them to UMLS (Unified Medial Language System) CUI (Concept Unique Identifier) and Task 2 for Disorder Slot Filling, where the task is to identify normalized value for modifiers of disorders. For Task 1, we developed an ensemble approach that combined machine learning based named entity recognition classifiers with MetaMap, an existing symbolic biomedical NLP system, to recognize disorder entities, and we used a general Vector Space Model-based approach for disorder encoding to UMLS CUIs. To identify modifiers of disorders (Task 2), we developed Support Vector Machines-based classifiers for each type of modifier, by exploring various types of features. Our system was ranked 3 for Task 1 and 1 for the Task 2 (both 2A and 2B), demonstrating the effectiveness of machine learning-based approaches for extracting clinical entities and their modifiers from clinical narratives.


bioRxiv | 2016

DataMed: Finding useful data across multiple biomedical data repositories

Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey S. Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Ergin Soysal; Nansu Zong; Hyeoneui Kim

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the NIH Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various datasets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports Findability and Accessibility of datasets. These characteristics - along with Interoperability and Reusability - compose the four FAIR principles to facilitate knowledge discovery in today’s big data-intensive science landscape.


advances in social networks analysis and mining | 2012

Security Standards for Electronic Health Records

Öznur Esra Par; Ergin Soysal

Circulation of personal health records in digital media has increased by intensive usage of technology on health sector. Circulation of personal health records in electronic media brings with security and privacy issues. Electronic health records are all of this information which includes patient data from birth to the death of the patient. Since electronic health records include private and unchangeable information, it is being tried to forbid their usage revelation without permission in accordance with the related legislations. Digitization of personal health records also brings with security risks. A number of technical and legal infrastructure is needed to eliminate these risks. With the scope of the research national (such as HIPAA) and international standards (such as ISO) has been studied.


Journal of the American Medical Informatics Association | 2018

DataMed – an open source discovery index for finding biomedical datasets

Xiaoling Chen; Anupama E. Gururaj; Burak Ozyurt; Ruiling Liu; Ergin Soysal; Trevor Cohen; Firat Tiryaki; Yueling Li; Nansu Zong; Min Jiang; Deevakar Rogith; Mandana Salimi; Hyeoneui Kim; Philippe Rocca-Serra; Alejandra Gonzalez-Beltran; Claudiu Farcas; Todd R. Johnson; Ron Margolis; George Alter; Susanna-Assunta Sansone; Ian Fore; Lucila Ohno-Machado; Jeffrey S. Grethe; Hua Xu

Abstract Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

Collaboration


Dive into the Ergin Soysal's collaboration.

Top Co-Authors

Avatar

Hua Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Min Jiang

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hyeoneui Kim

University of California

View shared research outputs
Top Co-Authors

Avatar

Ian Fore

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anupama E. Gururaj

University of Texas Health Science Center at Houston

View shared research outputs
Researchain Logo
Decentralizing Knowledge