Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Toshihisa Takagi is active.

Publication


Featured researches published by Toshihisa Takagi.


Bioinformatics | 2001

Automated extraction of information on protein–protein interactions from the biological literature

Toshihide Ono; Haretsugu Hishigaki; Akira Tanigami; Toshihisa Takagi

MOTIVATION To understand biological process, we must clarify how proteins interact with each other. However, since information about protein-protein interactions still exists primarily in the scientific literature, it is not accessible in a computer-readable format. Efficient processing of large amounts of interactions therefore needs an intelligent information extraction method. Our aim is to develop an efficient method for extracting information on protein-protein interaction from scientific literature. RESULTS We present a method for extracting information on protein-protein interactions from the scientific literature. This method, which employs only a protein name dictionary, surface clues on word patterns and simple part-of-speech rules, achieved high recall and precision rates for yeast (recall = 86.8% and precision = 94.3%) and Escherichia coli (recall = 82.5% and precision = 93.5%). The result of extraction suggests that our method should be applicable to any species for which a protein name dictionary is constructed. AVAILABILITY The program is available on request from the authors.


Nucleic Acids Research | 2006

MetaGene: prokaryotic gene finding from environmental genome shotgun sequences

Hideki Noguchi; Jungho Park; Toshihisa Takagi

Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.


Yeast | 2001

Assessment of prediction accuracy of protein function from protein--protein interaction data.

Haretsugu Hishigaki; Kenta Nakai; Toshihide Ono; Akira Tanigami; Toshihisa Takagi

Functional prediction of open reading frames coded in the genome is one of the most important tasks in yeast genomics. Among a number of large‐scale experiments for assigning certain functional classes to proteins, experiments determining protein–protein interaction are especially important because interacting proteins usually have the same function. Thus, it seems possible to predict the function of a protein when the function of its interacting partner is known. However, in vitro experiments often suffer from artifacts and a protein can often have multiple binding partners with different functions. We developed an objective prediction method that can systematically include the information of indirect interaction. Our method can predict the subcellular localization, the cellular role and the biochemical function of yeast proteins with accuracies of 72.7%, 63.6% and 52.7%, respectively. The prediction accuracy rises for proteins with more than three binding partners and thus we present the open prediction results for 16 such proteins. Copyright


Nucleic Acids Research | 2002

JSNP: a database of common gene variations in the Japanese population.

Mika Hirakawa; Toshihiro Tanaka; Yoichi Hashimoto; Masako Kuroda; Toshihisa Takagi; Yusuke Nakamura

JSNP is a repository of Japanese Single Nucleotide Polymorphism (SNP) data, begun in 2000 and developed through the Prime Ministers Millennium Project. The aim of this undertaking is to identify and collate up to 150 000 SNPs from the Japanese population, located in genes or in adjacent regions that might influence the coding sequence of the genes. The project has been carried out by a collaboration between the Human Genome Center (HGC) in the Institute of Medical Science (IMS) at the University of Tokyo and the Japan Science and Technology Corporation (JST). JSNP serves as both a storage site for the Japanese SNPs obtained from the ongoing project and as a facility for public dissemination to allow researchers access to high quality SNP data. A primary motivation of the project is the construction of a basic data set to identify relationships between polymorphisms and common diseases or the reaction to drugs. As such, emphasis has been placed on the identification of SNPs that lie in candidate regions which may affect phenotype but which would not necessarily directly cause disease. Unrestricted access to JSNP and any associated files is available at http://snp.ims.u-tokyo.ac.jp/.


Nature Genetics | 1999

A radiation hybrid map of the rat genome containing 5,255 markers

Takeshi Watanabe; Marie Therese Bihoreau; Linda McCarthy; Susanna L. Kiguwa; Haretsugu Hishigaki; Atsushi B. Tsuji; Julie Browne; Yuki Yamasaki; Ayako Mizoguchi-Miyakita; Keiko Oga; Toshihide Ono; Shiro Okuno; Naohide Kanemoto; E. Takahashi; Kazuhiro Tomita; Hiromi Hayashi; Masakazu Adachi; Caleb Webber; Marie Davis; Susanne Kiel; Catherine Knights; Angela L. Smith; Ricky Critcher; Jonathan Miller; Thiru Thangarajah; Philip J R Day; James R. Hudson; Yasuo Irie; Toshihisa Takagi; Yusuke Nakamura

A whole-genome radiation hybrid (RH) panel was used to construct a high-resolution map of the rat genome based on microsatellite and gene markers. These include 3,019 new microsatellite markers described here for the first time and 1,714 microsatellite markers with known genetic locations, allowing comparison and integration of maps from different sources. A robust RH framework map containing 1,030 positions ordered with odds of at least 1,000:1 has been defined as a tool for mapping these markers, and for future RH mapping in the rat. More than 500 genes which have been mapped in mouse and/or human were localized with respect to the rat RH framework, allowing the construction of detailed rat-mouse and rat-human comparative maps and illustrating the power of the RH approach for comparative mapping.


north american chapter of the association for computational linguistics | 2009

A Markov Logic Approach to Bio-Molecular Event Extraction

Sebastian Riedel; Hong-Woo Chun; Toshihisa Takagi; Jun’ichi Tsujii

In this paper we describe our entry to the BioNLP 2009 Shared Task regarding biomolecular event extraction. Our work can be described by three design decisions: (1) instead of building a pipeline using local classifier technology, we design and learn a joint probabilistic model over events in a sentence; (2) instead of developing specific inference and learning algorithms for our joint model, we apply Markov Logic, a general purpose Statistical Relation Learning language, for this task; (3) we represent events as relational structures over the tokens of a sentence, as opposed to structures that explicitly mention abstract event entities. Our results are competitive: we achieve the 4th best scores for task 1 (in close range to the 3rd place) and the best results for task 2 with a 13 percent point margin.


Bioinformatics | 2005

Automatic extraction of gene/protein biological functions from biomedical text

Asako Koike; Yoshiki Niwa; Toshihisa Takagi

MOTIVATION With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing. RESULTS We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54-64% with a precision of 91-94% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene-GO relationships and 150 000 family-GO relationships for major eukaryotes.


Bioinformatics | 2000

PNAD-CSS : a workbench for constructing a protein name abbreviation dictionary

Mikio Yoshida; Ken-ichiro Fukuda; Toshihisa Takagi

MOTIVATION Since their initial development, integration and construction of databases for molecular-level data have progressed. Though biological molecules are related to each other and form a complex system, the information is stored in the vast archives of the literature or in diverse databases. There is no unified naming convention for biological object, and biological terms may be ambiguous or polysemic. This makes the integration and interaction of databases difficult. In order to eliminate these problems, machine-readable natural language resources appear to be quite promising. We have developed a workbench for protein name abbreviation dictionary (PNAD) building. RESULTS We have developed PNAD Construction Support System (PNAD-CSS), which offers various convenient facilities to decrease the construction costs of a protein name abbreviation dictionary of which entries are collected from abstracts in biomedical papers. The system allows the users to concentrate on higher level interpretation by removing some troublesome tasks, e.g. management of abstracts, extracting protein names and their abbreviations, and so on. To extract a pair of protein names and abbreviations, we have developed a hybrid system composed of the PROPER System and the PNAD System. The PNAD System can extract the pairs from parenthetical-paraphrases involved in protein names, the PROPER System identified these paris, with 98.95% precision, 95.56% recall and 97.58% complete precision. AVAILABILITY PROPER System is freely available from http://www.hgc.inc.u-tokyo.ac.jp/service/tooldoc /KeX/intro.html. The other software are also available on request. Contact the authors. CONTACT [email protected]


Nucleic Acids Research | 2010

DDBJ launches a new archive database with analytical tools for next-generation sequence data

Eli Kaminuma; Jun Mashima; Yuichi Kodama; Takashi Gojobori; Osamu Ogasawara; Kousaku Okubo; Toshihisa Takagi; Yasukazu Nakamura

The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the ‘DDBJ Read Archive’ (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the ‘DDBJ Read Annotation Pipeline’ was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users’ research and provide easier access to DDBJ databases.


Journal of the American Medical Informatics Association | 2005

ALICE: an algorithm to extract abbreviations from MEDLINE.

Hiroko Ao; Toshihisa Takagi

OBJECTIVE To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly. METHODS ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules. RESULTS It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database. CONCLUSION ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate. This system is freely available at http://uvdb3.hgc.jp/ALICE/ALICE_index.html.

Collaboration


Dive into the Toshihisa Takagi's collaboration.

Top Co-Authors

Avatar

Yasunori Yamamoto

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eli Kaminuma

National Institute of Genetics

View shared research outputs
Top Co-Authors

Avatar

Kousaku Okubo

National Institute of Genetics

View shared research outputs
Researchain Logo
Decentralizing Knowledge