Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tamotsu Noguchi is active.

Publication


Featured researches published by Tamotsu Noguchi.


Bioinformatics | 2007

POODLE-L

Shuichi Hirose; Kana Shimizu; Satoru Kanai; Yutaka Kuroda; Tamotsu Noguchi

MOTIVATION Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. RESULTS We developed Prediction Of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthews correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors. AVAILABILITY POODLE-L is freely available at http://mbs.cbrc.jp/poodle/poodle-l.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Nucleic Acids Research | 2003

PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003

Tamotsu Noguchi; Yutaka Akiyama

PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). Started at the Real World Computing Partnership (RWCP) in August 1997, it developed to the present system of PDB-REPRDB. In April 2001, the system was moved to the Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST) (http://www.cbrc.jp/); it is available at http://www.cbrc.jp/pdbreprdb/. The current database includes 33 368 protein chains from 16 682 PDB entries (1 September, 2002), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (1<40 residues), or (d) data with non-standard amino acid residues at all residues. The number of entries including membrane protein structures in the PDB has increased rapidly with determination of numbers of membrane protein structures because of improved X-ray crystallography, NMR, and electron microscopic experimental techniques. Since many protein structure studies must address globular and membrane proteins separately, this new elimination factor, which excludes membrane protein chains, is introduced in the PDB-REPRDB system. Moreover, the PDB-REPRDB system for membrane protein chains begins at the same URL. The current membrane database includes 551 protein chains, including membrane domains in the SCOP database of release 1.59 (15 May, 2002).


Nucleic Acids Research | 2001

PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB)

Tamotsu Noguchi; Hideo Matsuda; Yutaka Akiyama

PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). The previous version of PDB-REPRDB provided 48 representative sets, whose similarity criteria were predetermined, on the WWW. The current version is designed so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the users requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. One can obtain a representative list and classification data of protein chains from the system. The current database includes 20 457 protein chains from PDB entries (August 6, 2000). The system for PDB-REPRDB is available at the Parallel Protein Information Analysis system (PAPIA) WWW server (http://www.rwcp.or.jp/papia/).


BMC Bioinformatics | 2007

Predicting mostly disordered proteins by using structure-unknown protein data

Kana Shimizu; Yoichi Muraoka; Shuichi Hirose; Kentaro Tomii; Tamotsu Noguchi

Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences. The proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness.BackgroundPredicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences.ResultsWhen the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences.ConclusionThe proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness.


BMC Bioinformatics | 2013

MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation

Chun-Hong Fang; Tamotsu Noguchi; Daisuke Tominaga; Hayato Yamana

BackgroundMolecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.ResultsHere, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.ConclusionsThis study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.


Proteomics | 2013

ESPRESSO: A system for estimating protein expression and solubility in protein expression systems

Shuichi Hirose; Tamotsu Noguchi

Recombinant protein technology is essential for conducting protein science and using proteins as materials in pharmaceutical or industrial applications. Although obtaining soluble proteins is still a major experimental obstacle, knowledge about protein expression/solubility under standard conditions may increase the efficiency and reduce the cost of proteomics studies.


Proteins | 2006

Systematic comparison of catalytic mechanisms of hydrolysis and transfer reactions classified in the EzCatDB database

Nozomi Nagano; Tamotsu Noguchi; Yutaka Akiyama

Catalytic mechanisms of 270 enzymes from 131 superfamilies, mainly hydrolases and transferases, were analyzed based on their enzyme structures. A method of systematic comparison and classification of the catalytic reactions was developed. Hydrolysis and transfer reactions closely resemble one another, displaying common mechanisms, single displacement, and double displacement. These displacement mechanisms might be further subclassified according to the type of catalytic factors and nucleophilic substitution involved. Several types of catalytic factors exist: nucleophile, acid, base, stabilizer, modulator, cofactors. Nucleophilic substitution might be categorized as SN1/SN2 (or dissociative/associative) reactions. The classification indicates that some mechanisms favor particular types of catalytic factors. In hydrolyses of amide bonds and phosphoric ester bonds, mechanisms with single displacement tend to use inorganic cofactors such as zinc and magnesium ions as important catalysts, whereas those with double displacement frequently do not use such cofactors. In contrast, hydrolyses of O‐glycoside bond rarely use such cofactors, with one exception. The trypsin‐like hydrolytic reaction, which is catalyzed by the classic catalytic triad comprising serine/histidine/aspartate, can be considered as a “super‐reaction” because it is observed in at least three nonhomologous enzymes, whereas most reactions are singlets without any nonhomologous enzymes. By dividing complex reactions into several reactions, correlations between active site structures and catalytic functions can be suggested. This classification method is applicable to other reactions such as elimination and isomerization. Furthermore, it will facilitate annotation of enzyme functions from 3D patterns of enzyme active sites. The classification is available at http://mbs.cbrc.jp/EzCatDB/RLCP/index.html. Proteins 2007. ©2006 Wiley‐Liss, Inc.


Nucleic Acids Research | 2011

SAHG, a comprehensive database of predicted structures of all human proteins

Chie Motono; Junichi Nakata; Ryotaro Koike; Kana Shimizu; Matsuyuki Shirota; Takayuki Amemiya; Kentaro Tomii; Nozomi Nagano; Naofumi Sakaya; Kiyotaka Misoo; Miwa Sato; Akinori Kidera; Hidekazu Hiroaki; Tsuyoshi Shirai; Kengo Kinoshita; Tamotsu Noguchi; Motonori Ota

Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.


Bioinformatics | 2000

Quick selection of representative protein chain sets based on customizable requirements

Tamotsu Noguchi; Kentaro Onizuka; Makoto Ando; Hideo Matsuda; Yutaka Akiyama

MOTIVATION Protein structure classification has been recognized as one of the most important research issues in protein structure analysis. A substantial number of methods for the classification have been proposed, and several databases have been constructed using these methods. Since some proteins with very similar sequences may exhibit structural diversities, we have proposed PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB), which strategy of selection is based not only on sequence similarity but also on structural similarity. Forty-eight representative sets whose similarity criteria were predetermined were made available over the World Wide Web (WWW). However, the sets were insufficient in number to satisfy users researching protein structures by various methods. RESULT We have improved the system for PDB-REPRDB so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the users requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. This paper describes the method we use to classify chains and select the representatives in the system. We also describe the interface used to set the parameters.


New Biotechnology | 2011

Development and evaluation of data-driven designed tags (DDTs) for controlling protein solubility

Shuichi Hirose; Yoshifumi Kawamura; Masatoshi Mori; Kiyonobu Yokota; Tamotsu Noguchi; Naoki Goshima

Production of proteins is an important issue in protein science and pharmaceutical studies. Numerous protein expression systems using living cells and cell-free methods have been developed to date. In these systems, a promising strategy for improving the success rate of obtaining soluble proteins is the attachment of various tags into target proteins based on empirical rules. This paper presents a method for the production of data-driven designed tags (DDTs) based on highly frequent sequence property patterns in an experimentally assessed protein solubility dataset in a wheat germ cell-free system. We constructed seven proteins combined with 12 kinds of DDTs (six for enhancing solubility and six for insolubility) at the N-terminal region as tags. Then we investigated their behavior using SDS-PAGE. Results show that three and four proteins respectively showed a trend toward solubilization and insolubilization, which indicates the possibility that the theoretically designed sequence can control protein solubility.

Collaboration


Dive into the Tamotsu Noguchi's collaboration.

Top Co-Authors

Avatar

Yutaka Akiyama

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shuichi Hirose

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kana Shimizu

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Makoto Ando

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyonobu Yokota

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Masakazu Sekijima

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge