Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ekaterina Buyko is active.

Publication


Featured researches published by Ekaterina Buyko.


Journal of Biomedical Semantics | 2011

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B. Laurila; Christopher J. O. Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I. Furlong; Michael Rautschka; Mariana Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga

BackgroundCompetitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.ResultsAll four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE.ConclusionsThe SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.


computational intelligence | 2011

SYNTACTIC SIMPLIFICATION AND SEMANTIC ENRICHMENT—TRIMMING DEPENDENCY GRAPHS FOR EVENT EXTRACTION

Ekaterina Buyko; Erik Faessler; Joachim Wermter; Udo Hahn

In our approach to event extraction, dependency graphs constitute the fundamental data structure for knowledge capture. Two types of trimming operations pave the way to more effective relation extraction. First, we simplify the syntactic representation structures resulting from parsing by pruning informationally irrelevant lexical material from dependency graphs. Second, we enrich informationally relevant lexical material in the simplified dependency graphs with additional semantic meta data at several layers of conceptual granularity. These two aggregation operations on linguistic representation structures are intended to avoid overfitting of machine learning‐based classifiers which we use for event extraction (besides manually curated dictionaries). Given this methodological framework, the corresponding JReX system developed by the JulieLab Team from Friedrich‐Schiller‐Universität Jena (Germany) scored on 2nd rank among 24 competing teams for Task 1 in the “BioNLP’09 Shared Task on Event Extraction,” with 45.8% recall, 47.5% precision and 46.7% F1‐score on all 3,182 events. In more recent experiments, based on slight modifications of JReX and using the same data sets, we were able to achieve 45.9% recall, 57.7% precision, and 51.1% F1‐score.


linguistic annotation workshop | 2007

An Annotation Type System for a Data-Driven NLP Pipeline

Udo Hahn; Ekaterina Buyko; Katrin Tomanek; Scott Piao; John McNaught; Yoshimasa Tsuruoka; Sophia Ananiadou

We introduce an annotation type system for a data-driven NLP core system. The specifications cover formal document structure and document meta information, as well as the linguistic levels of morphology, syntax and semantics. The type system is embedded in the framework of the Unstructured Information Management Architecture (UIMA).


pacific symposium on biocomputing | 2011

The extraction of pharmacogenetic and pharmacogenomic relations--a case study using PharmGKB.

Ekaterina Buyko; Elena Beisswanger; Udo Hahn

In this paper, we report on adapting the JREX relation extraction engine, originally developed For the elicitation of protein-protein interaction relations, to the domains of pharmacogenetics and pharmacogenomics. We propose an intrinsic and an extrinsic evaluation scenario which is based on knowledge contained in the PharmGKB knowledge base. Porting JREX yields favorable results in the range of 80% F-score for Gene-Disease, Gene-Drug, and Drug-Disease relations.


BMC Bioinformatics | 2011

U-Compare bio-event meta-service: compatible BioNLP event extraction services

Yoshinobu Kano; Jari Björne; Filip Ginter; Tapio Salakoski; Ekaterina Buyko; Udo Hahn; K. Bretonnel Cohen; Karin Verspoor; Christophe Roeder; Lawrence Hunter; Halil Kilicoglu; Sabine Bergler; Sofie Van Landeghem; Thomas Van Parys; Yves Van de Peer; Makoto Miwa; Sophia Ananiadou; Mariana Neves; Alberto Pascual-Montano; Arzucan Özgür; Dragomir R. Radev; Sebastian Riedel; Rune Sætre; Hong-Woo Chun; Jin-Dong Kim; Sampo Pyysalo; Tomoko Ohta; Jun’ichi Tsujii

BACKGROUND Bio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes. RESULTS We have integrated nine event extraction systems in the U-Compare framework, making them intercompatible and interoperable with other U-Compare components. The U-Compare event meta-service provides various meta-level features for comparison and ensemble of multiple event extraction systems. Experimental results show that the performance improvements achieved by the ensemble are significant. CONCLUSIONS While individual event extraction systems themselves provide useful features for bio text mining, the U-Compare meta-service is expected to improve the accessibility to the individual systems, and to enable meta-level uses over multiple event extraction systems such as comparison and ensemble.


north american chapter of the association for computational linguistics | 2009

How Feasible and Robust is the Automatic Extraction of Gene Regulation Events? A Cross-Method Evaluation under Lab and Real-Life Conditions

Udo Hahn; Katrin Tomanek; Ekaterina Buyko; Jung-jae Kim; Dietrich Rebholz-Schuhmann

We explore a rule system and a machine learning (ML) approach to automatically harvest information on gene regulation events (GREs) from biological documents in two different evaluation scenarios -- one uses self-supplied corpora in a clean lab setting, while the other incorporates a standard reference database of curated GREs from RegulonDB, real-life data generated independently from our work. In the lab condition, we test how feasible the automatic extraction of GREs really is and achieve F-scores, under different, not directly comparable test conditions though, for the rule and the ML systems which amount to 34% and 44%, respectively. In the RegulonDB condition, we investigate how robust both methodologies are by comparing them with this routinely used database. Here, the best F-scores for the rule and the ML systems amount to 34% and 19%, respectively.


international conference on computational linguistics | 2008

Are Morpho-Syntactic Features More Predictive for the Resolution of Noun Phrase Coordination Ambiguity than Lexico-Semantic Similarity Scores?

Ekaterina Buyko; Udo Hahn

Coordinations in noun phrases often pose the problem that elliptified parts have to be reconstructed for proper semantic interpretation. Unfortunately, the detection of coordinated heads and identification of elliptified elements notoriously lead to ambiguous reconstruction alternatives. While linguistic intuition suggests that semantic criteria might play an important, if not superior, role in disambiguating resolution alternatives, our experiments on the reannotated WSJ part of the Penn Treebank indicate that solely morpho-syntactic criteria are more predictive than solely lexico-semantic ones. We also found that the combination of both criteria does not yield any substantial improvement.


ieee international conference semantic computing | 2011

Generating Semantics for the Life Sciences via Text Analytics

Ekaterina Buyko; Udo Hahn

The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the worlds largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.


intelligent data analysis | 2011

Towards automatic pathway generation from biological full-text publications

Ekaterina Buyko; Jörg Linde; Steffen Priebe; Udo Hahn

We introduce an approach to the automatic generation of biological pathway diagrams from scientific literature. It is composed of the automatic extraction of single interaction relations which are typically found in the full text (rather than the abstract) of a scientific publication, and their subsequent integration into a complex pathway diagram. Our focus is here on relation extraction from full-text documents. We compare the performance of automatic full-text extraction procedures with a manually generated gold standard in order to validate the extracted data which serve as input for the pathway integration procedure.


Journal of Bioinformatics and Computational Biology | 2010

CALBC silver standard corpus.

Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Erik M. van Mulligen; Ning Kang; Jan A. Kors; David Milward; Peter Corbett; Ekaterina Buyko; Elena Beisswanger; Udo Hahn

Collaboration


Dive into the Ekaterina Buyko's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Milward

St John's Innovation Centre

View shared research outputs
Top Co-Authors

Avatar

Ning Kang

Erasmus University Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Corbett

St John's Innovation Centre

View shared research outputs
Researchain Logo
Decentralizing Knowledge