Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kana Shimizu is active.

Publication


Featured researches published by Kana Shimizu.


Bioinformatics | 2007

POODLE-L

Shuichi Hirose; Kana Shimizu; Satoru Kanai; Yutaka Kuroda; Tamotsu Noguchi

MOTIVATION Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. RESULTS We developed Prediction Of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthews correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors. AVAILABILITY POODLE-L is freely available at http://mbs.cbrc.jp/poodle/poodle-l.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Journal of Molecular Biology | 2009

Interaction between intrinsically disordered proteins frequently occurs in a human protein-protein interaction network.

Kana Shimizu; Hiroyuki Toh

Intrinsic protein disorder is a widespread phenomenon characterised by a lack of stable three-dimensional structures and is considered to play an important role in protein-protein interactions (PPIs). This study examined the genome-wide preference of disorder in PPIs by using exhaustive disorder prediction in human PPIs. We categorised the PPIs into three types (interaction between disordered proteins, interaction between structured proteins, and interaction between a disordered protein and a structured protein) with regard to the flexibility of molecular recognition and compared these three interaction types in an existing human PPI network with those in a randomised network. Although the structured regions were expected to become the identifiers for binding recognition, this comparative analysis revealed unexpected results. The occurrence of interactions between disordered proteins was significantly frequent, and that between a disordered protein and a structured protein was significantly infrequent. We found that this propensity was much stronger in interactions between nonhub proteins. We also analysed the interaction types from a functional standpoint by using GO, which revealed that the interaction between disordered proteins frequently occurred in cellular processes, regulation, and metabolic processes. The number of interactions, especially in metabolic processes between disordered proteins, was 1.8 times as large as that in the randomised network. Another analysis conducted by using KEGG pathways provided results where several signaling pathways and disease-related pathways included many interactions between disordered proteins. All of these analyses suggest that human PPIs preferably occur between disordered proteins and that the flexibility of the interacting protein pairs may play an important role in human PPI networks.


Nucleic Acids Research | 2012

PoSSuM: a database of similar protein–ligand binding and putative pockets

Jun Ito; Yasuo Tabei; Kana Shimizu; Koji Tsuda; Kentaro Tomii

Numerous potential ligand-binding sites are available today, along with hundreds of thousands of known binding sites observed in the PDB. Exhaustive similarity search for such vastly numerous binding site pairs is useful to predict protein functions and to enable rapid screening of target proteins for drug design. Existing databases of ligand-binding sites offer databases of limited scale. For example, SitesBase covers only ∼33 000 known binding sites. Inferring protein function and drug discovery purposes, however, demands a much more comprehensive database including known and putative-binding sites. Using a novel algorithm, we conducted a large-scale all-pairs similarity search for 1.8 million known and potential binding sites in the PDB, and discovered over 14 million similar pairs of binding sites. Here, we present the results as a relational database Pocket Similarity Search using Multiple-sketches (PoSSuM) including all the discovered pairs with annotations of various types. PoSSuM enables rapid exploration of similar binding sites among structures with different global folds as well as similar ones. Moreover, PoSSuM is useful for predicting the binding ligand for unbound structures, which provides important clues for characterizing protein structures with unclear functions. The PoSSuM database is freely available at http://possum.cbrc.jp/PoSSuM/.


Proteins | 2012

PDB-scale analysis of known and putative ligand-binding sites with structural sketches

Jun Ito; Yasuo Tabei; Kana Shimizu; Kentaro Tomii; Koji Tsuda

Computational investigation of protein functions is one of the most urgent and demanding tasks in the field of structural bioinformatics. Exhaustive pairwise comparison of known and putative ligand‐binding sites, across protein families and folds, is essential in elucidating the biological functions and evolutionary relationships of proteins. Given the vast amounts of data available now, existing 3D structural comparison methods are not adequate due to their computation time complexity. In this article, we propose a new bit string representation of binding sites called structural sketches, which is obtained by random projections of triplet descriptors. It allows us to use ultra‐fast all‐pair similarity search methods for strings with strictly controlled error rates. Exhaustive comparison of 1.2 million known and putative binding sites finished in ∼30 h on a single core to yield 88 million similar binding site pairs. Careful investigation of 3.5 million pairs verified by TM‐align revealed several notable analogous sites across distinct protein families or folds. In particular, we succeeded in finding highly plausible functions of several pockets via strong structural analogies. These results indicate that our method is a promising tool for functional annotation of binding sites derived from structural genomics projects. Proteins 2011.


Bioinformatics | 2011

SlideSort: All Pairs Similarity Search for Short Reads

Kana Shimizu; Koji Tsuda

Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing. Availability: Executable binary files and C++ libraries are available at http://www.cbrc.jp/~shimizu/slidesort/ for Linux and Windows. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2015

Privacy-preserving search for chemical compound databases.

Kana Shimizu; Koji Nuida; Hiromi Arai; Shigeo Mitsunari; Nuttapong Attrapadung; Michiaki Hamada; Koji Tsuda; Takatsugu Hirokawa; Jun Sakuma; Goichiro Hanaoka; Kiyoshi Asai

BackgroundSearching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources.ResultsIn order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holders privacy and database holders privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation.ConclusionWe proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.


Nucleic Acids Research | 2011

SAHG, a comprehensive database of predicted structures of all human proteins

Chie Motono; Junichi Nakata; Ryotaro Koike; Kana Shimizu; Matsuyuki Shirota; Takayuki Amemiya; Kentaro Tomii; Nozomi Nagano; Naofumi Sakaya; Kiyotaka Misoo; Miwa Sato; Akinori Kidera; Hidekazu Hiroaki; Tsuyoshi Shirai; Kengo Kinoshita; Tamotsu Noguchi; Motonori Ota

Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.


International Journal of Molecular Sciences | 2015

A Method for Systematic Assessment of Intrinsically Disordered Protein Regions by NMR

Natsuko Goda; Kana Shimizu; Yohta Kuwahara; Takeshi Tenno; Tamotsu Noguchi; Takahisa Ikegami; Motonori Ota; Hidekazu Hiroaki

Intrinsically disordered proteins (IDPs) that lack stable conformations and are highly flexible have attracted the attention of biologists. Therefore, the development of a systematic method to identify polypeptide regions that are unstructured in solution is important. We have designed an “indirect/reflected” detection system for evaluating the physicochemical properties of IDPs using nuclear magnetic resonance (NMR). This approach employs a “chimeric membrane protein”-based method using the thermostable membrane protein PH0471. This protein contains two domains, a transmembrane helical region and a C-terminal OB (oligonucleotide/oligosaccharide binding)-fold domain (named NfeDC domain), connected by a flexible linker. NMR signals of the OB-fold domain of detergent-solubilized PH0471 are observed because of the flexibility of the linker region. In this study, the linker region was substituted with target IDPs. Fifty-three candidates were selected using the prediction tool POODLE and 35 expression vectors were constructed. Subsequently, we obtained 15N-labeled chimeric PH0471 proteins with 25 IDPs as linkers. The NMR spectra allowed us to classify IDPs into three categories: flexible, moderately flexible, and inflexible. The inflexible IDPs contain membrane-associating or aggregation-prone sequences. This is the first attempt to use an indirect/reflected NMR method to evaluate IDPs and can verify the predictions derived from our computational tools.


Bioinformatics | 2014

Reference-free prediction of rearrangement breakpoint reads

Edward Wijaya; Kana Shimizu; Kiyoshi Asai; Michiaki Hamada

MOTIVATION Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information. RESULTS In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼ 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome. AVAILABILITY AND IMPLEMENTATION The source code of SlideSort-BPR can be freely downloaded from https://code.google.com/p/slidesort-bpr/.


Methods of Molecular Biology | 2014

POODLE: tools predicting intrinsically disordered regions of amino acid sequence.

Kana Shimizu

Protein intrinsic disorder, a widespread phenomenon characterized by a lack of stable three-dimensional structure, is thought to play an important role in protein function. In the last decade, dozens of computational methods for predicting intrinsic disorder from amino acid sequences have been developed. They are widely used by structural biologists not only for analyzing the biological function of intrinsic disorder but also for finding flexible regions that possibly hinder successful crystallization of the full-length protein. In this chapter, I introduce Prediction Of Order and Disorder by machine LEarning (POODLE), which is a series of programs accurately predicting intrinsic disorder. After giving the theoretical background for predicting intrinsic disorder, I give a detailed guide to using POODLE. I then also briefly introduce a case study where using POODLE for functional analyses of protein disorder led to a novel biological findings.

Collaboration


Dive into the Kana Shimizu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kentaro Tomii

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Tamotsu Noguchi

Meiji Pharmaceutical University

View shared research outputs
Top Co-Authors

Avatar

Yasuo Tabei

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jun-ichi Ito

Tokyo University of Pharmacy and Life Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Koji Nuida

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Shuichi Hirose

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge