Hisashi Kashima | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hisashi Kashima is active.

Explore More

Publication

Featured researches published by Hisashi Kashima.

european conference on machine learning | 2010

Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs

Rudy Raymond; Hisashi Kashima

Recent years have witnessed a widespread interest on methods using both link structure and node information for link prediction on graphs. One of the state-of-the-art methods is Link Propagation which is a new semi-supervised learning algorithm for link prediction on graphs based on the popularly-studied label propagation by exploiting information on similarities of links and nodes. Despite its efficiency and effectiveness compared to other methods, its applications were still limited due to the computational time and space constraints. In this paper, we propose fast and scalable algorithms for the Link Propagation by introducing efficient procedures to solve large linear equations that appear in the method. In particular, we show how to obtain a compact representation of the solution to the linear equations by using a non-trivial combination of techniques in linear algebra to construct algorithms that are also effective for link prediction on dynamic graphs. These enable us to apply the Link Propagation to large networks with more than 400,000 nodes. Experiments demonstrate that our approximation methods are scalable, fast, and their prediction qualities are comparably competitive.

BMC Bioinformatics | 2010

Protein complex prediction via verifying and reconstructing the topology of domain-domain interactions

Yosuke Ozawa; Rintaro Saito; Shigeo Fujimori; Hisashi Kashima; Masamichi Ishizaka; Hiroshi Yanagawa; Etsuko Miyamoto-Sato; Masaru Tomita

BackgroundHigh-throughput methods for detecting protein-protein interactions enable us to obtain large interaction networks, and also allow us to computationally identify the associations of proteins as protein complexes. Although there are methods to extract protein complexes as sets of proteins from interaction networks, the extracted complexes may include false positives because they do not account for the structural limitations of the proteins and thus do not check that the proteins in the extracted complex can simultaneously bind to each other. In addition, there have been few searches for deeper insights into the protein complexes, such as of the topology of the protein-protein interactions or into the domain-domain interactions that mediate the protein interactions.ResultsHere, we introduce a combinatorial approach for prediction of protein complexes focusing not only on determining member proteins in complexes but also on the DDI/PPI organization of the complexes. Our method analyzes complex candidates predicted by the existing methods. It searches for optimal combinations of domain-domain interactions in the candidates based on an assumption that the proteins in a candidate can form a true protein complex if each of the domains is used by a single protein interaction. This optimization problem was mathematically formulated and solved using binary integer linear programming. By using publicly available sets of yeast protein-protein interactions and domain-domain interactions, we succeeded in extracting protein complex candidates with an accuracy that is twice the average accuracy of the existing methods, MCL, MCODE, or clustering coefficient. Although the configuring parameters for each algorithm resulted in slightly improved precisions, our method always showed better precision for most values of the parameters.ConclusionsOur combinatorial approach can provide better accuracy for prediction of protein complexes and also enables to identify both direct PPIs and DDIs that mediate them in complexes.

Journal of Molecular Graphics & Modelling | 2010

Prediction of protein-ligand binding affinities using multiple instance learning.

Reiji Teramoto; Hisashi Kashima

Accurate prediction of protein-ligand binding affinities for lead optimization in drug discovery remains an important and challenging problem on scoring functions for docking simulation. In this paper, we propose a data-driven approach that integrates multiple scoring functions to predict protein-ligand binding affinity directly. We then propose a new method called multiple instance regression based scoring (MIRS) that incorporates unbound ligand conformations using multiple scoring functions. We evaluated the predictive performance of MIRS using 100 protein-ligand complexes and their binding affinities. The experimental results showed that MIRS outperformed the 11 conventional scoring functions including LigScore, PLP, AutoDock, G-Score, D-Score, LUDI, F-Score, ChemScore, X-Score, PMF, and DrugScore. In addition, we confirmed that MIRS performed well on binding pose prediction. Our results reveal that it is indispensable to incorporate unbound ligand conformations in both binding affinity prediction and binding pose prediction. The proposed method will accelerate efficient lead optimization on structure-based drug design and provide a new direction to designing of new scoring score functions.

knowledge discovery and data mining | 2010

Finding itemset-sharing patterns in a large itemset-associated graph

Mutsumi Fukuzaki; Mio Seki; Hisashi Kashima; Jun Sese

Itemset mining and graph mining have attracted considerable attention in the field of data mining, since they have many important applications in various areas such as biology, marketing, and social network analysis However, most existing studies focus only on either itemset mining or graph mining, and only a few studies have addressed a combination of both In this paper, we introduce a new problem which we call itemset-sharing subgraph (ISS) set enumeration, where the task is to find sets of subgraphs with common itemsets in a large graph in which each vertex has an associated itemset The problem has various interesting potential applications such as in side-effect analysis in drug discovery and the analysis of the influence of word-of-mouth communication in marketing in social networks We propose an efficient algorithm ROBIN for finding ISS sets in such graph; this algorithm enumerates connected subgraphs having common itemsets and finds their combinations Experiments using a synthetic network verify that our method can efficiently process networks with more than one million edges Experiments using a real biological network show that our algorithm can find biologically interesting patterns We also apply ROBIN to a citation network and find successful collaborative research works.

knowledge discovery and data mining | 2011

A subpath kernel for rooted unordered trees

Daisuke Kimura; Tetsuji Kuboyama; Tetsuo Shibuya; Hisashi Kashima

Kernel method is one of the promising approaches to learning with tree-structured data, and various efficient tree kernels have been proposed to capture informative structures in trees. In this paper, we propose a new tree kernel function based on subpath sets to capture vertical structures in rooted unordered trees, since such tree-structures are often used to code hierarchical information in data. We also propose a simple and efficient algorithm for computing the kernel by extending the multikey quicksort algorithm used for sorting strings. The time complexity of the algorithm is O((|T1|+|T2|)log(|T1|+|T2|)) time on average, and the space complexity is O(|T1| + |T2|), where |T1| and |T2| are the numbers of nodes in two trees T1 and T2. We apply the proposed kernel to two supervised classification tasks, XML classification in web mining and glycan classification in bioinformatics. The experimental results show that the predictive performance of the proposed kernel is competitive with that of the existing efficient tree kernel for unordered trees proposed by Vishwanathan et al. [1], and is also empirically faster than the existing kernel.

international conference on ubiquitous information management and communication | 2014

Learning an accurate entity resolution model from crowdsourced labels

Jingjing Wang; Satoshi Oyama; Masahito Kurihara; Hisashi Kashima

We investigated the use of supervised learning methods that use labels from crowd workers to resolve entities. Although obtaining labeled data by crowdsourcing can reduce time and cost, it also brings challenges (e.g., coping with the variable quality of crowd-generated data). First, we evaluated the quality of crowd-generated labels for actual entity resolution data sets. Then, we evaluated the prediction accuracy of two machine learning methods that use labels from crowd workers: a conventional LPP method using consensus labels obtained by majority voting and our proposed method that combines multiple Laplacians directly by using crowdsourced data. We discussed the relationship between the accuracy of workers labels and the prediction accuracy of the two methods.

pacific-asia conference on knowledge discovery and data mining | 2013

Matrix Factorization With Aggregated Observations

Yoshifumi Aimoto; Hisashi Kashima

Missing value estimation is a fundamental task in machine learning and data mining. It is not only used as a preprocessing step in data analysis, but also serves important purposes such as recommendation. Matrix factorization with low-rank assumption is a basic tool for missing value estimation. However, existing matrix factorization methods cannot be applied directly to such cases where some parts of the data are observed as aggregated values of several features in high-level categories. In this paper, we propose a new problem of restoring original micro observations from aggregated observations, and we give formulations and efficient solutions to the problem by extending the ordinary matrix factorization model. Experiments using synthetic and real data sets show that the proposed method outperforms several baseline methods.

international conference on machine learning | 2010