Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ai Kawazoe is active.

Publication


Featured researches published by Ai Kawazoe.


Bioinformatics | 2008

BioCaster: detecting public health rumors with a Web-based text mining system

Nigel Collier; Son Doan; Ai Kawazoe; Reiko Matsuda Goodwin; Mike Conway; Yoshio Tateno; Quoc Hung Ngo; Dinh Dien; Asanee Kawtrakul; Koichi Takeuchi; Mika Shigematsu; Kiyosu Taniguchi

Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Laymans terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles. Availability: The BioCaster map and ontology are freely available via a web portal at http://www.biocaster.org. Contact: [email protected]


BMC Medical Informatics and Decision Making | 2010

A framework for enhancing spatial and temporal granularity in report-based health surveillance systems

Hutchatai Chanlekha; Ai Kawazoe; Nigel Collier

BackgroundCurrent public concern over the spread of infectious diseases has underscored the importance of health surveillance systems for the speedy detection of disease outbreaks. Several international report-based monitoring systems have been developed, including GPHIN, Argus, HealthMap, and BioCaster. A vital feature of these report-based systems is the geo-temporal encoding of outbreak-related textual data. Until now, automated systems have tended to use an ad-hoc strategy for processing geo-temporal information, normally involving the detection of locations that match pre-determined criteria, and the use of document publication dates as a proxy for disease event dates. Although these strategies appear to be effective enough for reporting events at the country and province levels, they may be less effective at discovering geo-temporal information at more detailed levels of granularity. In order to improve the capabilities of current Web-based health surveillance systems, we introduce the design for a novel scheme called spatiotemporal zoning.MethodThe proposed scheme classifies news articles into zones according to the spatiotemporal characteristics of their content. In order to study the reliability of the annotation scheme, we analyzed the inter-annotator agreements on a group of human annotators for over 1000 reported events. Qualitative and quantitative evaluation is made on the results including the kappa and percentage agreement.ResultsThe reliability evaluation of our scheme yielded very promising inter-annotator agreement, more than a 0.9 kappa and a 0.9 percentage agreement for event type annotation and temporal attributes annotation, respectively, with a slight degradation for the spatial attribute. However, for events indicating an outbreak situation, the annotators usually had inter-annotator agreements with the lowest granularity location.ConclusionsWe developed and evaluated a novel spatiotemporal zoning annotation scheme. The results of the scheme evaluation indicate that our annotated corpus and the proposed annotation scheme are reliable and could be effectively used for developing an automatic system. Given the current advances in natural language processing techniques, including the availability of language resources and tools, we believe that a reliable automatic spatiotemporal zoning system can be achieved. In the next stage of this work, we plan to develop an automatic zoning system and evaluate its usability within an operational health surveillance system.


International Journal of Medical Informatics | 2009

Classifying disease outbreak reports using n-grams and semantic features.

Mike Conway; Son Doan; Ai Kawazoe; Nigel Collier

INTRODUCTION This paper explores the benefits of using n-grams and semantic features for the classification of disease outbreak reports, in the context of the BioCaster disease outbreak report text mining system. A novel feature of this work is the use of a general purpose semantic tagger - the USAS tagger - to generate features. BACKGROUND We outline the application context for this work (the BioCaster epidemiological text mining system), before going on to describe the experimental data used in our classification experiments (the 1000 document BioCaster corpus). FEATURE SETS: Three broad groups of features are used in this work: Named Entity based features, n-gram features, and features derived from the USAS semantic tagger. METHODOLOGY Three standard machine learning algorithms - Naïve Bayes, the Support Vector Machine algorithm, and the C4.5 decision tree algorithm - were used for classifying experimental data (that is, the BioCaster corpus). Feature selection was performed using the chi(2) feature selection algorithm. Standard text classification performance metrics - Accuracy, Precision, Recall, Specificity and F-score - are reported. RESULTS A feature representation composed of unigrams, bigrams, trigrams and features derived from a semantic tagger, in conjunction with the Naïve Bayes algorithm and feature selection yielded the highest classification accuracy (and F-score). This result was statistically significant compared to a baseline unigram representation and to previous work on the same task. However, it was feature selection rather than semantic tagging that contributed most to the improved performance. CONCLUSION This study has shown that for the classification of disease outbreak reports, a combination of bag-of-words, n-grams and semantic features, in conjunction with feature selection, increases classification accuracy at a statistically significant level compared to previous work in this domain.


language resources and evaluation | 2007

A multilingual ontology for infectious disease surveillance: rationale, design and challenges

Nigel Collier; Ai Kawazoe; Lihua Jin; Mika Shigematsu; Dinh Dien; Roberto A. Barrero; Koichi Takeuchi; Asanee Kawtrakul

A lack of surveillance system infrastructure in the Asia-Pacific region is seen as hindering the global control of rapidly spreading infectious diseases such as the recent avian H5N1 epidemic. As part of improving surveillance in the region, the BioCaster project aims to develop a system based on text mining for automatically monitoring Internet news and other online sources in several regional languages. At the heart of the system is an application ontology which serves the dual purpose of enabling advanced searches on the mined facts and of allowing the system to make intelligent inferences for assessing the priority of events. However, it became clear early on in the project that existing classification schemes did not have the necessary language coverage or semantic specificity for our needs. In this article we present an overview of our needs and explore in detail the rationale and methods for developing a new conceptual structure and multilingual terminological resource that focusses on priority pathogens and the diseases they cause. The ontology is made freely available as an online database and downloadable OWL file.


meeting of the association for computational linguistics | 2007

The Role of Roles in Classifying Annotated Biomedical Text

Son Doan; Ai Kawazoe; Nigel Collier

This paper investigates the roles of named entities (NEs) in annotated biomedical text classification. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology. Concepts were classified as Types, while others were identified as being Roles. Types are specified as NE classes and Roles are integrated into NEs as attributes. We focus on the Roles of NEs by extracting and using them in different ways as features in the classifier. Experimental results show that: 1) Roles for each NE greatly helped improve performance of the system, 2) combining information about NE classes with their Roles contribute significantly to the improvement of performance. We discuss in detail the effect of each Role on the accuracy of text classification.


international conference on knowledge-based and intelligent information and engineering systems | 2003

A framework for integrating deep and shallow semantic structures in text mining

Nigel Collier; Koichi Takeuchi; Ai Kawazoe; Tony Mullen; Tuangthong Wattarujeekrit

Recent work in knowledge representation undertaken as part of the Semantic Web initiative has enabled a common infrastructure (Resource Description Framework (RDF) and RDF Schema) for sharing knowledge of ontologies and instances. In this paper we present a framework for combining the shallow levels of semantic description commonly used in MUC-style information extraction with the deeper semantic structures available in such ontologies. The framework is implemented within the PIA project software called Ontology Forge. Ontology Forge offers a server-based hosting environment for ontologies, a server-side information extraction system for reducing the effort of writing annotations and a many-featured ontology/annotation editor. We discuss the knowledge framework, some features of the system and summarize results from extended named entity experiments designed to capture instances in texts using support vector machine software.


Journal of Medical Internet Research | 2010

Developing a disease outbreak event corpus.

Mike Conway; Ai Kawazoe; Hutchatai Chanlekha; Nigel Collier

Background In recent years, there has been a growth in work on the use of information extraction technologies for tracking disease outbreaks from online news texts, yet publicly available evaluation standards (and associated resources) for this new area of research have been noticeably lacking. Objective This study seeks to create a “gold standard” data set against which to test how accurately disease outbreak information extraction systems can identify the semantics of disease outbreak events. Additionally, we hope that the provision of an annotation scheme (and associated corpus) to the community will encourage open evaluation in this new and growing application area. Methods We developed an annotation scheme for identifying infectious disease outbreak events in news texts. An event─in the context of our annotation scheme─consists minimally of geographical (eg, country and province) and disease name information. However, the scheme also allows for the rich encoding of other domain salient concepts (eg, international travel, species, and food contamination). Results The work resulted in a 200-document corpus of event-annotated disease outbreak reports that can be used to evaluate the accuracy of event detection algorithms (in this case, for the BioCaster biosurveillance online news information extraction system). In the 200 documents, 394 distinct events were identified (mean 1.97 events per document, range 0-25 events per document). We also provide a download script and graphical user interface (GUI)-based event browsing software to facilitate corpus exploration. Conclusion In summary, we present an annotation scheme and corpus that can be used in the evaluation of disease outbreak event extraction algorithms. The annotation scheme and corpus were designed both with the particular evaluation requirements of the BioCaster system in mind as well as the wider need for further evaluation resources in this growing research area.


Applied Ontology | 2009

The development of a schema for semantic annotation: Gain brought by a formal ontological method

Ai Kawazoe; Lihua Jin; Mika Shigematsu; Daisuke Bekki; Roberto A. Barrero; Kiyosu Taniguchi; Nigel Collier

In this paper, we will report annotation experiments which show the advantage of applying a formal ontological methodology for constructing a schema for semantic annotation to mark up terms in the public health domain. We demonstrate that (1) a traditional task-oriented approach with a simple schema can cause several critical problems, and (2) the performance of annotators and the quality of annotated corpus is improved by applying formal ontological methodology in analyzing ‘markable’ categories of concepts and restructuring the schema. These results show that disciplined methods are useful for controlling the development of even quite modest semantic structures like annotation schema for entity recognition. We also report philosophical/logical considerations and decisions we made when we adopted the formal approach.


international symposium on artificial intelligence | 2015

An Inference Problem Set for Evaluating Semantic Theories and Semantic Processing Systems for Japanese

Ai Kawazoe; Ribeka Tanaka; Koji Mineshima; Daisuke Bekki

This paper introduces a collection of inference problems intended for use in evaluation of semantic theories and semantic processing systems for Japanese. The problem set categorizes inference problems according to semantic phenomena that they involve, following the general policy of the FraCaS test suite. It consists of multilingual and Japanese subsets, which together cover both universal semantic phenomena and Japanese-specific ones. This paper outlines the design policy used in constructing the problem set and the contents of a beta version, currently available online.


logical aspects of computational linguistics | 2016

Implementing Variable Vectors in a CCG Parser

Daisuke Bekki; Ai Kawazoe

This article addresses problems that arise from the use of category variables

Collaboration


Dive into the Ai Kawazoe's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Son Doan

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mika Shigematsu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyosu Taniguchi

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hutchatai Chanlekha

National Institute of Informatics

View shared research outputs
Researchain Logo
Decentralizing Knowledge