Yasuo Tabei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuo Tabei is active.

Explore More

Publication

Featured researches published by Yasuo Tabei.

Bioinformatics | 2012

Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers

Yasuo Tabei; Edouard Pauwels; Véronique Stoven; Kazuhiro Takemoto; Yoshihiro Yamanishi

Motivation: Drug effects are mainly caused by the interactions between drug molecules and their target proteins including primary targets and off-targets. Identification of the molecular mechanisms behind overall drug–target interactions is crucial in the drug design process. Results: We develop a classifier-based approach to identify chemogenomic features (the underlying associations between drug chemical substructures and protein domains) that are involved in drug–target interaction networks. We propose a novel algorithm for extracting informative chemogenomic features by using L1 regularized classifiers over the tensor product space of possible drug–target pairs. It is shown that the proposed method can extract a very limited number of chemogenomic features without loosing the performance of predicting drug–target interactions and the extracted features are biologically meaningful. The extracted substructure–domain association network enables us to suggest ligand chemical fragments specific for each protein domain and ligand core substructures important for a wide range of protein families. Availability: Softwares are available at the supplemental website. Contact: [email protected] Supplementary Information: Datasets and all results are available at http://cbio.ensmp.fr/~yyamanishi/l1binary/ .

BMC Systems Biology | 2013

Scalable prediction of compound-protein interactions using minwise hashing

Yasuo Tabei; Yoshihiro Yamanishi

The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.

data compression conference | 2015

Queries on LZ-Bounded Encodings

Djamal Belazzougui; Travis Gagie; Paweł Gawrychowski; Juha Kärkkäinen; Alberto Ordóñez; Simon J. Puglisi; Yasuo Tabei

We describe a data structure that stores a strings in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that our data structure can be built in a scalable manner and is both small and fast in practice compared to other data structures supporting such queries.

Bioinformatics | 2014

Metabolome-scale prediction of intermediate compounds in multistep metabolic pathways with a recursive supervised approach.

Masaaki Kotera; Yasuo Tabei; Yoshihiro Yamanishi; Ai Muto; Yuki Moriya; Toshiaki Tokimatsu; Susumu Goto

Motivation: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale. Results: In this article, we develop a novel method to predict the multistep reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as ‘multistep reaction sequence likeness’, i.e. whether a compound–compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm, we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multistep reaction sequences, based on chemical substructure fingerprints/descriptors of compounds. We further demonstrate the usefulness of our proposed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set and discuss characteristic features of the extracted chemical substructure transformation patterns in multistep reaction sequences. Our comprehensively predicted reaction networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways. Availability and implementation: Materials are available for free at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2014/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

combinatorial pattern matching | 2013

A Succinct Grammar Compression

Yasuo Tabei; Yoshimasa Takabatake; Hiroshi Sakamoto

We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n + logn! + o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivalent to the lower bound. The space is at most 2nlogρ(1 + o(1)) bits for \(\rho \leq 2\sqrt{n}\), while supporting random access to any production rule of an SLP in O(loglogn) time. In addition, we present a novel dynamic data structure associating a digram with a unique symbol. Such a data structure is called a naming function and has been implemented using a hash table that has a space-time tradeoff. Thus, the memory space is mainly occupied by the hash table during the development of production rules. Alternatively, we build a dynamic data structure for the naming function by leveraging the idea behind the wavelet tree. The space is strictly bounded by 2nlogn(1 + o(1)) bits, while supporting O(logn) query and update time.

symposium on experimental and efficient algorithms | 2014

Improved ESP-index: A Practical Self-index for Highly Repetitive Texts

Yoshimasa Takabatake; Yasuo Tabei; Hiroshi Sakamoto

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.

Bioinformatics | 2016

Simultaneous prediction of enzyme orthologs from chemical transformation patterns for de novo metabolic pathway reconstruction

Yasuo Tabei; Yoshihiro Yamanishi; Masaaki Kotera

Motivation: Metabolic pathways are an important class of molecular networks consisting of compounds, enzymes and their interactions. The understanding of global metabolic pathways is extremely important for various applications in ecology and pharmacology. However, large parts of metabolic pathways remain unknown, and most organism-specific pathways contain many missing enzymes. Results: In this study we propose a novel method to predict the enzyme orthologs that catalyze the putative reactions to facilitate the de novo reconstruction of metabolic pathways from metabolome-scale compound sets. The algorithm detects the chemical transformation patterns of substrate–product pairs using chemical graph alignments, and constructs a set of enzyme-specific classifiers to simultaneously predict all the enzyme orthologs that could catalyze the putative reactions of the substrate–product pairs in the joint learning framework. The originality of the method lies in its ability to make predictions for thousands of enzyme orthologs simultaneously, as well as its extraction of enzyme-specific chemical transformation patterns of substrate–product pairs. We demonstrate the usefulness of the proposed method by applying it to some ten thousands of metabolic compounds, and analyze the extracted chemical transformation patterns that provide insights into the characteristics and specificities of enzymes. The proposed method will open the door to both primary (central) and secondary metabolism in genomics research, increasing research productivity to tackle a wide variety of environmental and public health matters. Availability and Implementation: Contact: [email protected]

knowledge discovery and data mining | 2016

Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices

Yasuo Tabei; Hiroto Saigo; Yoshihiro Yamanishi; Simon J. Puglisi

With massive high-dimensional data now commonplace in research and industry, there is a strong and growing demand for more scalable computational techniques for data analysis and knowledge discovery. Key to turning these data into knowledge is the ability to learn statistical models with high interpretability. Current methods for learning statistical models either produce models that are not interpretable or have prohibitive computational costs when applied to massive data. In this paper we address this need by presenting a scalable algorithm for partial least squares regression (PLS), which we call compression-based PLS (cPLS), to learn predictive linear models with a high interpretability from massive high-dimensional data. We propose a novel grammar-compressed representation of data matrices that supports fast row and column access while the data matrix is in a compressed form. The original data matrix is grammar-compressed and then the linear model in PLS is learned on the compressed data matrix, which results in a significant reduction in working space, greatly improving scalability. We experimentally test cPLS on its ability to learn linear models for classification, regression and feature extraction with various massive high-dimensional data, and show that cPLS performs superiorly in terms of prediction accuracy, computational efficiency, and interpretability.

string processing and information retrieval | 2015

Online Self-Indexed Grammar Compression

Yoshimasa Takabatake; Yasuo Tabei; Hiroshi Sakamoto

Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the index structure by reading input characters one-by-one. Such a property is another advantage which enables saving a working space for construction, because we do not need to store input texts in memory. We experimentally test OESP-index on the ability to build index structures and search query texts, and we show OESP-indexs efficiency, especially space-efficiency for building index structures.

european symposium on algorithms | 2015

Access, Rank, and Select in Grammar-compressed Strings

Djamal Belazzougui; Patrick Hagge Cording; Simon J. Puglisi; Yasuo Tabei

Given a string S of length N on a fixed alphabet of σ symbols, a grammar compressor produces a context-free grammar G of size n that generates S and only S. In this paper we describe data structures to support the following operations on a grammar-compressed string: access(S,i,j) (return substring S[i,j]), rank c (S,i) (return the number of occurrences of symbol c before position i in S), and select c (S,i) (return the position of the ith occurrence of c in S). Our main result for access is a method that requires \(\O(n\log N)\) bits of space and \(\O(\log N+m/\log_\sigma N)\) time to extract m = j − i + 1 consecutive symbols from S. Alternatively, we can achieve \(\O(\log_\tau N+m/\log_\sigma N)\) query time using \(\O(n\tau\log_\tau (N/n)\log N)\) bits of space, matching a lower bound stated by Verbin and Yu for strings where N is polynomially related to n when τ = log e N. For rank and select we describe data structures of size \(\O(n\sigma\log N)\) bits that support the two operations in \(\O(\log N)\) time. We also extend our other structure to support both operations in \(\O(\log_\tau N)\) time using \(\O(n\tau\sigma\log_\tau (N/n)\log N)\) bits of space. When τ = log e N the query time is O(logN/loglogN) and we provide a hardness result showing that significantly improving this would imply a major breakthrough on a hard graph-theoretical problem.

Explore More