Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yoshiki Niwa is active.

Publication


Featured researches published by Yoshiki Niwa.


Bioinformatics | 2005

Automatic extraction of gene/protein biological functions from biomedical text

Asako Koike; Yoshiki Niwa; Toshihisa Takagi

MOTIVATION With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing. RESULTS We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54-64% with a precision of 91-94% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene-GO relationships and 150 000 family-GO relationships for major eukaryotes.


international conference on computational linguistics | 1994

Co-occurrence vectors from corpora vs. distance vectors from dictionaries

Yoshiki Niwa; Yoshihiko Nitta

A comparison was made of vectors derived by using ordinary co-occurrence statistics from large text corpora and of vectors derived by measuring the interword distances in dictionary definitions. The precision of word sense disambiguation by using co-occurrence vectors from the 1987 Wall Street Journal (20M total words) was higher than that by using distance vectors from the Collins English Dictionary (60K head words + 1.6M definition words). However, other experimental results suggest that distance vectors contain some different semantic information from co-occurrence vectors.


conference on current trends in theory and practice of informatics | 2000

Information Access Based on Associative Calculation

Akihiko Takano; Yoshiki Niwa; Shingo Nishioka; Makoto Iwayama; Toru Hisamitsu; Osamu Imaichi; Hirofumi Sakurai

The statistical measures for similarity have been widely used in textual information retrieval for many decades. They are the basis to improve the effectiveness of IR systems, including retrieval, clustering, and summarization. We have developed an information retrieval system DualNAVI which provides users with rich interaction both in document space and in word space. We show that associative calculation for measuring similarity among documents or words is the computational basis of this effective information access with DualNAVI. The new approaches in document clustering (Hierarchical Bayesian Clustering), and measuring term representativeness (Baseline method) are also discussed. Both have sound mathematical basis and depend essentially on associative calculation.


international conference on computational linguistics | 2002

A measure of term representativeness based on the number of co-occurring salient words

Toru Hisamitsu; Yoshiki Niwa

We propose a novel measure of the representativeness (i.e., indicativeness or topic specificity) of a term in a given corpus. The measure embodies the idea that the distribution of words co-occurring with a representative term should be biased according to the word distribution in the whole corpus. The bias of the word distribution in the co-occurring words is defined as the number of distinct words whose occurrences are saliently biased in the co-occurring words. The saliency of a word is defined by a threshold probability that can be automatically defined using the whole corpus. Comparative evaluation clarified that the measure is clearly superior to conventional measures in finding topic-specific words in the newspaper archives of different sizes.


international conference on computational linguistics | 2000

A method of measuring term representativeness: baseline method using co-occurrence distribution

Toru Hisamitsu; Yoshiki Niwa; Jun’ichi Tsujii

This paper introduces a scheme, which we call the baseline method, to define a measure of term representativeness and measures defined by using the scheme. The representativeness of a term is measured by a normalized characteristic value defined for a set of all documents that contain the term. Normalization is done by comparing the original characteristic value with the characteristic value defined for a randomly chosen document set of the same size. The latter value is estimated by a baseline function obtained by random sampling and logarithmic linear approximation. We found that the distance between the word distribution in a document set and the word distribution in a whole corpus is an effective characteristic value to use for the baseline method. Measures defined by the baseline method have several advantages including that they can be used to compare the representativeness of two terms with very different frequencies, and that they have well-defined threshold values of being representative. In addition, the baseline function for a corpus is robust against differences in corpora; that is, it can be used for normalization in a different corpus that has a different size or is in a different domain.


north american chapter of the association for computational linguistics | 2015

Learning Sentence Ordering for Opinion Generation of Debate

Toshihiko Yanase; Toshinori Miyoshi; Kohsuke Yanai; Misa Sato; Makoto Iwayama; Yoshiki Niwa; Paul Reisert; Kentaro Inui

We propose a sentence ordering method to help compose persuasive opinions for debating. In debate texts, support of an opinion such as evidence and reason typically follows the main claim. We focused on this claimsupport structure to order sentences, and developed a two-step method. First, we select from among candidate sentences a first sentence that is likely to be a claim. Second, we order the remaining sentences by using a ranking-based method. We tested the effectiveness of the proposed method by comparing it with a general-purpose method of sentence ordering and found through experiment that it improves the accuracy of first sentence selection by about 19 percentage points and had a superior performance over all metrics. We also applied the proposed method to a constructive speech generation task.


meeting of the association for computational linguistics | 2015

End-to-end Argument Generation System in Debating

Misa Sato; Kohsuke Yanai; Toshinori Miyoshi; Toshihiko Yanase; Makoto Iwayama; Qinghua Sun; Yoshiki Niwa

We introduce an argument generation system in debating, one that is based on sentence retrieval. Users can specify a motion such as This house should ban gambling, and a stance on whether the system agrees or disagrees with the motion. Then the system outputs three argument paragraphs based on “values” automatically decided by the system. The “value” indicates a topic that is considered as a positive or negative for people or communities, such as health and education. Each paragraph is related to one value and composed of about seven sentences. An evaluation over 50 motions from a popular debate website showed that the generated arguments are understandable in 64 paragraphs out of 150.


cross language evaluation forum | 2015

Information Extraction from Clinical Documents: Towards Disease/Disorder Template Filling

Veera Raghavendra Chikka; Nestor Mariyasagayam; Yoshiki Niwa; Kamalakar Karlapalem

In recent years there has been an increase in the generation of electronic health records EHRs, which lead to an increased scope for research on biomedical literature. Many research works have been using various NLP, information retrieval and machine learning techniques to extract information from these records. In this paper, we provide a methodology to extract information for understanding the status of the disease/disorder. The status of disease/disorder is based on different attributes like temporal information, severity and progression of the disease. Here, we consider ten attributes that allow us to understand the majority details regarding the status of the disease/disorder. They are Negation Indicator, Subject Class, Uncertainty Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body Location, DocTime Class, and Temporal Expression. In this paper, we present rule-based and machine learning approaches to identify each of these attributes and evaluate our system on attribute level and system level accuracies. This project was done as a part of the ShARe/CLEF eHealth Evaluation Lab 2014. We were able to achieve state-of-art accuracy 0.868 in identifying normalized values of the attributes.


north american chapter of the association for computational linguistics | 2016

Hitachi at SemEval-2016 Task 12: A Hybrid Approach for Temporal Information Extraction from Clinical Notes.

P R Sarath; Manikandan R; Yoshiki Niwa

This paper describes the system developed for the task of temporal information extraction from clinical narratives in the context of 2016 Clinical TempEval challenge. Clinical TempEval 2016 addressed the problem of temporal reasoning in clinical domain by providing annotated clinical notes and pathology reports similar to Clinical TempEval challenge 2015. The Clinical TempEval challenge consisted of six subtasks. Hitachi team participated in two time expression based subtasks: time expression span detection (TS) and time expression attribute identification (TA) for which we developed hybrid of rule-based and machine learning based methods using Stanford TokensRegex framework and Stanford Named Entity Recognizer and evaluated it on the THYME corpus. Our hybrid system achieved a maximum F-score of 0.73 for identification of time spans (TS) and 0.71 for identification of time attributes (TA).


north american chapter of the association for computational linguistics | 2016

bunji at SemEval-2016 Task 5: Neural and Syntactic Models of Entity-Attribute Relationship for Aspect-based Sentiment Analysis.

Toshihiko Yanase; Kohsuke Yanai; Misa Sato; Toshinori Miyoshi; Yoshiki Niwa

This paper describes a sentiment analysis system developed by the bunji team in SemEval2016 Task 5. In this task, we estimate the sentimental polarity of a given entity-attribute (E#A) pair in a sentence. Our approach is to estimate the relationship between target entities and sentimental expressions. We use two different methods to estimate the relationship. The first one is based on a neural attention model that learns relations between tokens and E#A pairs through backpropagation. The second one is based on a rule-based system that examines several verb-centric relations related to E#A pairs. We confirmed the effectiveness of the proposed methods in a target estimation task and a polarity estimation task in the restaurant domain, while our overall ranks were modest.

Collaboration


Dive into the Yoshiki Niwa's collaboration.

Researchain Logo
Decentralizing Knowledge