Yuka Tateisi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuka Tateisi is active.

Explore More

Publication

Featured researches published by Yuka Tateisi.

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications | 2004

Introduction to the bio-entity recognition task at JNLPBA

Jin-Dong Kim; Tomoko Ohta; Yoshimasa Tsuruoka; Yuka Tateisi; Nigel Collier

We describe here the JNLPBA shared task of bio-entity recognition using an extended version of the GENIA version 3 named entity corpus of MEDLINE abstracts. We provide background information on the task and present a general discussion of the approaches taken by participating systems.

pacific symposium on biocomputing | 2000

Event extraction from biomedical papers using a full parser.

Akane Yakushiji; Yuka Tateisi; Yusuke Miyao; Jun’ichi Tsujii

We have designed and implemented an information extraction system using a full parser to investigate the plausibility of full analysis of text using general-purpose parser and grammar applied to biomedical domain. We partially solved the problems of full parsing of inefficiency, ambiguity, and low coverage by introducing the preprocessors, and proposed the use of modules that handles partial results of parsing for further improvement. Our approach makes it possible to modularize the system, so that the IE system as a whole becomes easy to be tuned to specific domains, and easy to be maintained and improved by incorporating various techniques of disambiguation, speed up, etc. In preliminary experiment, from 133 argument structures that should be extracted from 97 sentences, we obtained 23% uniquely and 24% with ambiguity. And 20% are extractable from not complete but partial results of full parsing.

BMC Bioinformatics | 2008

New challenges for text mining: mapping between text and manually curated pathways

Kanae Oda; Jin-Dong Kim; Tomoko Ohta; Daisuke Okanohara; Takuya Matsuzaki; Yuka Tateisi; Jun’ichi Tsujii

BackgroundAssociating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge.ResultsTo address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus.ConclusionsWe believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text.

empirical methods in natural language processing | 2006

Automatic Construction of Predicate-argument Structure Patterns for Biomedical Information Extraction

Akane Yakushiji; Yusuke Miyao; Tomoko Ohta; Yuka Tateisi; Jun’ichi Tsujii

This paper presents a method of automatically constructing information extraction patterns on predicate-argument structures (PASs) obtained by full parsing from a smaller training corpus. Because PASs represent generalized structures for syntactical variants, patterns on PASs are expected to be more generalized than those on surface words. In addition, patterns are divided into components to improve recall and we introduce a Support Vector Machine to learn a prediction model using pattern matching results. In this paper, we present experimental results and analyze them on how well protein-protein interactions were extracted from MEDLINE abstracts. The results demonstrated that our method improved accuracy compared to a machine learning approach using surface word/part-of-speech patterns.

meeting of the association for computational linguistics | 2006

An Intelligent Search Engine and GUI-based Efficient MEDLINE Search Tool Based on Deep Syntactic Parsing

Tomoko Ohta; Yusuke Miyao; Takashi Ninomiya; Yoshimasa Tsuruoka; Akane Yakushiji; Katsuya Masuda; Jumpei Takeuchi; Kazuhiro Yoshida; Tadayoshi Hara; Jin-Dong Kim; Yuka Tateisi; Jun’ichi Tsujii

We present a practical HPSG parser for English, an intelligent search engine to retrieve MEDLINE abstracts that represent biomedical events and an efficient MEDLINE search tool helping users to find information about biomedical entities such as genes, proteins, and the interactions between them.

north american chapter of the association for computational linguistics | 2006

Subdomain adaptation of a POS tagger with a small corpus

Yuka Tateisi; Yoshimasa Tsuruoka; Jun’ichi Tsujii

For the domain of biomedical research abstracts, two large corpora, namely GENIA (Kim et al 2003) and Penn BioIE (Kulik et al 2004) are available. Both are basically in human domain and the performance of systems trained on these corpora when they are applied to abstracts dealing with other species is unknown. In machine-learning-based systems, re-training the model with addition of corpora in the target domain has achieved promising results (e.g. Tsuruoka et al 2005, Lease et al 2005). In this paper, we compare two methods for adaptation of POS taggers trained for GENIA and Penn BioIE corpora to Drosophila melanogaster (fruit fly) domain.

meeting of the association for computational linguistics | 2003

Encoding Biomedical Resources in TEI: The Case of the GENIA Corpus

Tomaž Erjavec; Jin-Dong Kim; Tomoko Ohta; Yuka Tateisi; Jun’ichi Tsujii

It is well known that standardising the annotation of language resources significantly raises their potential, as it enables re-use and spurs the development of common technologies. Despite the fact that increasingly complex linguistic information is being added to biomedical texts, no standard solutions have so far been proposed for their encoding. This paper describes a standardised XML tagset (DTD) for annotated biomedical corpora and other resources, which is based on the Text Encoding Initiative Guidelines P4, a general and parameterisable standard for encoding language resources. We ground the discussion in the encoding of the GENIA corpus, which currently contains 2,000 abstracts taken from the MEDLINE database, and has almost 100,000 hand-annotated terms marked for semantic class from the accompanying ontology. The paper introduces GENIA and TEI and implements a TEI parametrisation and conversion for the GENIA corpus. A number of aspects of biomedical language are discussed, such as complex tokenisation, prevalence of contractions and complex terms, and the linkage and encoding of ontologies.

bioinformatics and biomedicine | 2014

Cost decisions in the development of disease knowledge base : A case study

Takashi Okumura; Hiroaki Tanaka; Mai Omura; Maori Ito; Shin'ichi Nakagawa; Yuka Tateisi

For clinical decision support systems, the disease knowledge base accounts for the majority of the total development cost, because it requires considerable effort by highly-paid domain experts. However, existing research on automated acquisition of medical knowledge has focused on accuracy, while mostly ignoring the cost issue. This case study investigates the cost breakdown of our simplified disease knowledge base, and discusses a way of reducing development cost of the core component for clinical decision support systems. To achieve broad disease coverage with limited cost, we adopted a hybridization approach, that is, combining a handmade knowledge base for essential diseases and an automatically generated knowledge base for rare diseases. Toward further cost reduction, the case study suggested to minimize the human intervention by medical experts through i) keeping the structure of the knowledge base simple, ii) establishing a public resource for laboratory examination results, and iii) lazy evaluation of data quality in the utilization phase. Although the resulting knowledge base may not be adequate for a definitive diagnosis, the approach could be suitable to build clinical decision support systems for differential diagnosis and for disease search engines.

meeting of the association for computational linguistics | 2004

Finding anchor verbs for biomedical IE using predicate-argument structures

Akane Yakushiji; Yuka Tateisi; Yusuke Miyao; Jun’ichi Tsujii

For biomedical information extraction, most systems use syntactic patterns on verbs (anchor verbs) and their arguments. Anchor verbs can be selected by focusing on their arguments. We propose to use predicate-argument structures (PASs), which are outputs of a full parser, to obtain verbs and their arguments. In this paper, we evaluated PAS method by comparing it to a method using part of speech (POSs) pattern matching. POS patterns produced larger results with incorrect arguments, and the results will cause adverse effects on a phase selecting appropriate verbs.

health information science | 2012

A lightweight approach for extracting disease-symptom relation with metamap toward automated generation of disease knowledge base

Takashi Okumura; Yuka Tateisi

Diagnostic decision support systems necessitate disease knowledge base, and this part may occupy dominant portion in the total development cost of such systems. Accordingly, toward automated generation of disease knowledge base, we conducted a preliminary study for efficient extraction of symptomatic expressions, utilizing MetaMap, a tool for assigning UMLS (Unified Medical Language System) semantic tags onto phrases in a given medical literature text. We first utilized several tags in the MetaMap output, related to symptoms and findings, for extraction of symptomatic terms. This straightforward approach resulted in Recall 82% and Precision 64%. Then, we applied a heuristics that exploits certain patterns of tag sequences that frequently appear in typical symptomatic expressions. This simple approach achieved 7% recall gain, without sacrificing precision. Although the extracted information requires manual inspection, the study suggested that the simple approach can extract symptomatic expressions, at very low cost. Failure analysis of the output was also performed to further improve the performance.

Explore More