Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chikashi Nobata is active.

Publication


Featured researches published by Chikashi Nobata.


international conference on computational linguistics | 2000

Extracting the names of genes and gene products with a hidden Markov model

Nigel Collier; Chikashi Nobata; Jun’ichi Tsujii

We report the results of a study into the use of a linear interpolating hidden Markov model (HMM) for the task of extracting technical terminology from MEDLINE abstracts and texts in the molecular-biology domain. This is the first stage in a system that will extract event information for automatically updating biology databases. We trained the HMM entirely with bigrams based on lexical and character features in a relatively small corpus of 100 MEDLINE abstracts that were marked-up by domain experts with term classes such as proteins and DNA. Using cross-validation methods we achieved an F-score of 0.73 and we examine the contribution made by each part of the interpolation model to overcoming data sparseness.


conference of the european chapter of the association for computational linguistics | 1999

The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

Nigel Collier; Hyun S. Park; Norihiro Ogata; Yuka Tateishi; Chikashi Nobata; Tomoko Ohta; Tateshi Sekimizu; Hisao Imai; Katsutoshi Ibushi; Jun’ichi Tsujii

We present an outline of the genome information acquisition (GENIA) project for automatically extracting biochemical information from journal papers and abstracts. GENIA will be available over the Internet and is designed to aid in information extraction, retrieval and visualisation and to help reduce information overload on researchers. The vast repository of papers available online in databases such as MEDLINE is a natural environment in which to develop language engineering methods and tools and is an opportunity to show how language engineering can play a key role on the Internet.


Nucleic Acids Research | 2011

UKPMC: a full text article resource for the life sciences

Johanna McEntyre; Sophia Ananiadou; Stephen Andrews; William J. Black; Richard Boulderstone; Paula Buttery; David Chaplin; Sandeepreddy Chevuru; Norman Cobley; Lee Ann Coleman; Paul Davey; Bharti Gupta; Lesley Haji-Gholam; Craig Hawkins; Alan Horne; Simon J. Hubbard; Jee Hyub Kim; Ian Lewin; Vic Lyte; Ross MacIntyre; Sami Mansoor; Linda Mason; John McNaught; Elizabeth Newbold; Chikashi Nobata; Ernest Ong; Sharmila Pillai; Dietrich Rebholz-Schuhmann; Heather Rosie; Rob Rowbotham

UK PubMed Central (UKPMC) is a full-text article database that extends the functionality of the original PubMed Central (PMC) repository. The UKPMC project was launched as the first ‘mirror’ site to PMC, which in analogy to the International Nucleotide Sequence Database Collaboration, aims to provide international preservation of the open and free-access biomedical literature. UKPMC (http://ukpmc.ac.uk) has undergone considerable development since its inception in 2007 and now includes both a UKPMC and PubMed search, as well as access to other records such as Agricola, Patents and recent biomedical theses. UKPMC also differs from PubMed/PMC in that the full text and abstract information can be searched in an integrated manner from one input box. Furthermore, UKPMC contains ‘Cited By’ information as an alternative way to navigate the literature and has incorporated text-mining approaches to semantically enrich content and integrate it with related database resources. Finally, UKPMC also offers added-value services (UKPMC+) that enable grantees to deposit manuscripts, link papers to grants, publish online portfolios and view citation information on their papers. Here we describe UKPMC and clarify the relationship between PMC and UKPMC, providing historical context and future directions, 10 years on from when PMC was first launched.


Metabolomics | 2011

Mining metabolites: extracting the yeast metabolome from the literature

Chikashi Nobata; Paul D. Dobson; Syed Amir Iqbal; Pedro Mendes; Jun’ichi Tsujii; Douglas B. Kell; Sophia Ananiadou

Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials.


international acm sigir conference on research and development in information retrieval | 2008

Kleio: a knowledge-enriched information retrieval system for biology

Chikashi Nobata; Philip Cotter; Naoaki Okazaki; Brian Rea; Yutaka Sasaki; Yoshimasa Tsuruoka; Jun’ichi Tsujii; Sophia Ananiadou

Kleio is an advanced information retrieval (IR) system developed at the UK National Centre for Text Mining (NaCTeM)1. The system offers textual and metadata searches across MEDLINE and provides enhanced searching functionality by leveraging terminology management technologies.


north american chapter of the association for computational linguistics | 2003

A survey for multi-document summarization

Satoshi Sekine; Chikashi Nobata

Automatic Multi-Document summarization is still hard to realize. Under such circumstances, we believe, it is important to observe how humans are doing the same task, and look around for different strategies.We prepared 100 document sets similar to the ones used in the DUC multi-document summarization task. For each document set, several people prepared the following data and we conducted a survey.A) Free style summarizationB) Sentence Extraction type summarizationC) Axis (type of main topic)D) Table style summaryIn particular, we will describe the last two in detail, as these could lead to a new direction for multi-summarization research.


international conference on computational linguistics | 2004

Corpus and evaluation measures for multiple document summarization with multiple sources

Tsutomu Hirao; Takahiro Fukusima; Manabu Okumura; Chikashi Nobata; Hidetsugu Nanba

In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.


BMC Bioinformatics | 2011

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

Xinglong Wang; Rafal Rak; Angelo Restificar; Chikashi Nobata; Christopher Rupp; Riza Theresa Batista-Navarro; Raheel Nawaz; Sophia Ananiadou

BackgroundThe selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles.ResultsWe proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics.ConclusionsOur novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.


IEEE Transactions on Speech and Audio Processing | 2004

Morphological analysis of the corpus of spontaneous Japanese

Kiyotaka Uchimoto; Kazuma Takaoka; Chikashi Nobata; Atsushi Yamada; Satoshi Sekine; Hitoshi Isahara

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semi-automatic analysis, we achieve a precision of better than 99% for detecting and tagging short-unit words and 97% for long-unit words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.


meeting of the association for computational linguistics | 2003

Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese

Kiyotaka Uchimoto; Chikashi Nobata; Atsushi Yamada; Satoshi Sekine; Hitoshi Isahara

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. In this paper, we show that by using semi-automatic analysis we achieve a precision of better than 99% for detecting and tagging short words and 97% for long words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.

Collaboration


Dive into the Chikashi Nobata's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyotaka Uchimoto

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hitoshi Isahara

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Hitoshi Isahara

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge