Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daisuke Kihara is active.

Publication


Featured researches published by Daisuke Kihara.


Nucleic Acids Research | 2005

Limitations and potentials of current motif discovery algorithms

Jianjun Hu; Bin Li; Daisuke Kihara

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them.


Proceedings of the National Academy of Sciences of the United States of America | 2001

TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints

Daisuke Kihara; Hui Lu; Andrzej Kolinski; Jeffrey Skolnick

The successful prediction of protein structure from amino acid sequence requires two features: an efficient conformational search algorithm and an energy function with a global minimum in the native state. As a step toward addressing both issues, a threading-based method of secondary and tertiary restraint prediction has been developed and applied to ab initio folding. Such restraints are derived by extracting consensus contacts and local secondary structure from at least weakly scoring structures that, in some cases, can lack any global similarity to the sequence of interest. Furthermore, to generate representative protein structures, a reduced lattice-based protein model is used with replica exchange Monte Carlo to explore conformational space. We report results on the application of this methodology, termed TOUCHSTONE, to 65 proteins whose lengths range from 39 to 146 residues. For 47 (40) proteins, a cluster centroid whose rms deviation from native is below 6.5 (5) Å is found in one of the five lowest energy centroids. The number of correctly predicted proteins increases to 50 when atomic detail is added and a knowledge-based atomic potential is combined with clustered and nonclustered structures for candidate selection. The combination of the ratio of the relative number of contacts to the protein length and the number of clusters generated by the folding algorithm is a reliable indicator of the likelihood of successful fold prediction, thereby opening the way for genome-scale ab initio folding.


Proteins | 2004

Development and Large Scale Benchmark Testing of the PROSPECTOR_3 Threading Algorithm

Jeffrey Skolnick; Daisuke Kihara; Yang Zhang

This article describes the PROSPECTOR_3 threading algorithm, which combines various scoring functions designed to match structurally related target/template pairs. Each variant described was found to have a Z‐score above which most identified templates have good structural (threading) alignments, Zstruct (Zgood). ‘Easy’ targets with accurate threading alignments are identified as single templates with Z > Zgood or two templates, each with Z > Zstruct, having a good consensus structure in mutually aligned regions. ‘Medium’ targets have a pair of templates lacking a consensus structure, or a single template for which Zstruct < Z < Zgood. PROSPECTOR_3 was applied to a comprehensive Protein Data Bank (PDB) benchmark composed of 1491 single domain proteins, 41–200 residues long and no more than 30% identical to any threading template. Of the proteins, 878 were found to be easy targets, with 761 having a root mean square deviation (RMSD) from native of less than 6.5 Å. The average contact prediction accuracy was 46%, and on average 17.6 residue continuous fragments were predicted with RMSD values of 2.0 Å. There were 606 medium targets identified, 87% (31%) of which had good structural (threading) alignments. On average, 9.1 residue, continuous fragments with RMSD of 2.5 Å were predicted. Combining easy and medium sets, 63% (91%) of the targets had good threading (structural) alignments compared to native; the average target/template sequence identity was 22%. Only nine targets lacked matched templates. Moreover, PROSPECTOR_3 consistently outperforms PSIBLAST. Similar results were predicted for open reading frames (ORFS) ≤200 residues in the M. genitalium, E. coli and S. cerevisiae genomes. Thus, progress has been made in identification of weakly homologous/analogous proteins, with very high alignment coverage, both in a comprehensive PDB benchmark as well as in genomes. Proteins 2004;55:000–000.


Proteins | 2001

Defrosting the frozen approximation: PROSPECTOR--a new approach to threading.

Jeffrey Skolnick; Daisuke Kihara

PROSPECTOR (PROtein Structure Predictor Employing Combined Threading to Optimize Results) is a new threading approach that uses sequence profiles to generate an initial probe‐template alignment and then uses this “partly thawed” alignment in the evaluation of pair interactions. Two types of sequence profiles are used: the close set, composed of sequences in which sequence identity lies between 35% and 90%; and the distant set, composed of sequences with a FASTA E‐score less than 10. Thus, a total of four scoring functions are used in a hierarchical method: the close (distant) sequence profiles screen a structural database to provide an initial alignment of the probe sequence in each of the templates. The same database is then screened with a scoring function composed of sequence plus secondary structure plus pair interaction profiles. This combined hierarchical threading method is called PROSPECTOR1. For the original Fischer database, 59 of 68 pairs are correctly identified in the top position. Next, the set of the top 20 scoring sequences (four scoring functions times the top five structures) is used to construct a protein‐specific pair potential based on consensus side‐chain contacts occurring in 25% of the structures. In subsequent threading iterations, this protein‐specific pair potential, when combined in a composite manner, is found to be more sensitive in identifying the correct pairs than when the original statistical potential is used, and it increases the number of recognized structures for the combined scoring functions, termed PROSPECTOR2, to a total of 61 Fischer pairs identified in the top position. Application to a second, smaller Fischer database of 27 probe‐template pairs places 18 (17) structures in the top position for PROSPECTOR1 (PROSPECTOR2). Overall, these studies show that the use of pair interactions as assessed by the improved Z‐score enhances the specificity of probe‐template matches. Thus, when the hierarchy of scoring functions is combined, the ability to identify correct probe‐template pairs is significantly enhanced. Finally, a web server has been established for use by the academic community (http://bioinformatics.danforthcenter.org/services/threading.html). Proteins 2001;42:319–331.


Proteins | 2002

Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding

Yang Zhang; Daisuke Kihara; Jeffrey Skolnick

Among the major difficulties in protein structure prediction is the roughness of the energy landscape that must be searched for the global energy minimum. To address this issue, we have developed a novel Monte Carlo algorithm called parallel hyperbolic sampling (PHS) that logarithmically flattens local high‐energy barriers and, therefore, allows the simulation to tunnel more efficiently through energetically inaccessible regions to low‐energy valleys. Here, we show the utility of this approach by applying it to the SICHO (SIde‐CHain‐Only) protein model. For the same CPU time, the parallel hyperbolic sampling method can identify much lower energy states and explore a larger region phase space than the commonly used replica sampling (RS) Monte Carlo method. By clustering the simulated structures obtained in the PHS implementation of the SICHO model, we can successfully predict, among a representative benchmark 65 proteins set, 50 cases in which one of the top 5 clusters have a root‐mean‐square deviation (RMSD) from the native structure below 6.5 Å. Compared with our previous calculations that used RS as the conformational search procedure, the number of successful predictions increased by four and the CPU cost is reduced. By comparing the structure clusters produced by both PHS and RS, we find a strong correlation between the quality of predicted structures and the minimum relative RMSD (mrRMSD) of structures clusters identified by using different search engines. This mrRMSD correlation may be useful in blind prediction as an indicator of the likelihood of successful folds. Proteins 2002;48:192–201.


Journal of Molecular Biology | 2011

Community-wide assessment of protein-interface modeling suggests improvements to design methodology

Sarel J. Fleishman; Timothy A. Whitehead; Eva Maria Strauch; Jacob E. Corn; Sanbo Qin; Huan-Xiang Zhou; Julie C. Mitchell; Omar Demerdash; Mayuko Takeda-Shitaka; Genki Terashi; Iain H. Moal; Xiaofan Li; Paul A. Bates; Martin Zacharias; Hahnbeom Park; Jun Su Ko; Hasup Lee; Chaok Seok; Thomas Bourquard; Julie Bernauer; Anne Poupon; Jérôme Azé; Seren Soner; Şefik Kerem Ovali; Pemra Ozbek; Nir Ben Tal; Turkan Haliloglu; Howook Hwang; Thom Vreven; Brian G. Pierce

The CAPRI (Critical Assessment of Predicted Interactions) and CASP (Critical Assessment of protein Structure Prediction) experiments have demonstrated the power of community-wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community-wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting that there may be important physical chemistry missing in the energy calculations. A total of 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the nonpolar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were, on average, structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a nonbinder.


Proteins | 2008

Fast protein tertiary structure retrieval based on global surface shape similarity

Lee Sael; Bin Li; David La; Yi Fang; Karthik Ramani; Raif M. Rustamov; Daisuke Kihara

Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main‐chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real‐time. Here we introduce a global surface shape representation by three‐dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main‐chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real‐time protein structure search by 3D Zernike descriptor will open up new possibility of large‐scale global and local protein surface shape comparison. Proteins 2008.


Proteins | 2001

Generalized comparative modeling (GENECOMP): a combination of sequence comparison, threading, and lattice modeling for protein structure prediction and refinement.

Andrzej Kolinski; Marcos R. Betancourt; Daisuke Kihara; Piotr Rotkiewicz; Jeffrey Skolnick

An improved generalized comparative modeling method, GENECOMP, for the refinement of threading models is developed and validated on the Fischer database of 68 probe–template pairs, a standard benchmark used to evaluate threading approaches. The basic idea is to perform ab initio folding using a lattice protein model, SICHO, near the template provided by the new threading algorithm PROSPECTOR. PROSPECTOR also provides predicted contacts and secondary structure for the template‐aligned regions, and possibly for the unaligned regions by garnering additional information from other top‐scoring threaded structures. Since the lowest‐energy structure generated by the simulations is not necessarily the best structure, we employed two structure‐selection protocols: distance geometry and clustering. In general, clustering is found to generate somewhat better quality structures in 38 of 68 cases. When applied to the Fischer database, the protocol does no harm and in a significant number of cases improves upon the initial threading model, sometimes dramatically. The procedure is readily automated and can be implemented on a genomic scale. Proteins 2001;44:133–149.


Proteins | 2009

PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data

Troy Hawkins; Meghana Chitale; Stanislav Luban; Daisuke Kihara

Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence‐based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI‐BLAST search by extracting and scoring individual functional attributes, searching a wide range of E‐value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low‐resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome‐scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60–90% could be annotated with a GO molecular function term at high confidence (≥80%). We also applied our predictions to the protein–protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (≥90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI‐BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP‐SIG ′05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/. Proteins 2009.


Protein Science | 2005

The effect of long-range interactions on the secondary structure formation of proteins.

Daisuke Kihara

The influence of long‐range residue interactions on defining secondary structure in a protein has long been discussed and is often cited as the current limitation to accurate secondary structure prediction. There are several experimental examples where a local sequence alone is not sufficient to determine its secondary structure, but a comprehensive survey on a large data set has not yet been done. Interestingly, some earlier studies denied the negative effect of long‐range interactions on secondary structure prediction accuracy. Here, we have introduced the residue contact order (RCO), which directly indicates the separation of contacting residues in terms of the position in the sequence, and examined the relationship between the RCO and the prediction accuracy. A large data set of 2777 nonhomologous proteins was used in our analysis. Unlike previous studies, we do find that prediction accuracy drops as residues have contacts with more distant residues. Moreover, this negative correlation between the RCO and the prediction accuracy was found not only for β‐strands, but also for α‐helices. The prediction accuracy of β‐strands is lower if residues have a high RCO or a low RCO, which corresponds to the situation that a β‐sheet is formed by β‐strands from different chains in a protein complex. The reason why the current study draws the opposite conclusion from the previous studies is examined. The implication for protein folding is also discussed.

Collaboration


Dive into the Daisuke Kihara's collaboration.

Top Co-Authors

Avatar

Lee Sael

State University of New York System

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeffrey Skolnick

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge