Taiji Suzuki
Tokyo Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Taiji Suzuki.
Archive | 2012
Masashi Sugiyama; Taiji Suzuki; Takafumi Kanamori
Machine learning is an interdisciplinary field of science and engineering that studies mathematical theories and practical applications of systems that learn. This book introduces theories, methods, and applications of density ratio estimation, which is a newly emerging paradigm in the machine learning community. Various machine learning problems such as non-stationarity adaptation, outlier detection, dimensionality reduction, independent component analysis, clustering, classification, and conditional density estimation can be systematically solved via the estimation of probability density ratios. The authors offer a comprehensive introduction of various density ratio estimators including methods via density estimation, moment matching, probabilistic classification, density fitting, and density ratio fitting as well as describing how these can be applied to machine learning. The book also provides mathematical theories for density ratio estimation including parametric and non-parametric convergence analysis and numerical stability analysis to complete the first and definitive treatment of the entire framework of density ratio estimation in machine learning.
BMC Bioinformatics | 2009
Taiji Suzuki; Masashi Sugiyama; Takafumi Kanamori; Jun Sese
BackgroundAlthough microarray gene expression analysis has become popular, it remains difficult to interpret the biological changes caused by stimuli or variation of conditions. Clustering of genes and associating each group with biological functions are often used methods. However, such methods only detect partial changes within cell processes. Herein, we propose a method for discovering global changes within a cell by associating observed conditions of gene expression with gene functions.ResultsTo elucidate the association, we introduce a novel feature selection method called Least-Squares Mutual Information (LSMI), which computes mutual information without density estimaion, and therefore LSMI can detect nonlinear associations within a cell. We demonstrate the effectiveness of LSMI through comparison with existing methods. The results of the application to yeast microarray datasets reveal that non-natural stimuli affect various biological processes, whereas others are no significant relation to specific cell processes. Furthermore, we discover that biological processes can be categorized into four types according to the responses of various stimuli: DNA/RNA metabolism, gene expression, protein metabolism, and protein localization.ConclusionWe proposed a novel feature selection method called LSMI, and applied LSMI to mining the association between conditions of yeast and biological processes through microarray datasets. In fact, LSMI allows us to elucidate the global organization of cellular process control.
Neural Computation | 2013
Taiji Suzuki; Masashi Sugiyama
The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant of mutual information as a dependency measure. We apply a density-ratio estimator for approximating squared-loss mutual information that is formulated as a minimum contrast estimator on parametric or nonparametric models. Since cross-validation is available for choosing an appropriate model, our method does not require any prespecified structure on the underlying distributions. We elucidate the asymptotic bias of our estimator on parametric models and the asymptotic convergence rate on nonparametric models. The convergence analysis utilizes the uniform tail-bound of a U-process, and the convergence rate is characterized by the bracketing entropy of the model. We then develop a natural gradient algorithm on the Grassmann manifold for sufficient subspace search. The analytic formula of our estimator allows us to compute the gradient efficiently. Numerical experiments show that the proposed method compares favorably with existing dimension-reduction approaches on artificial and benchmark data sets.
Acta Metallurgica | 1972
S Takeughi; E Kuramoto; Taiji Suzuki
Abstract Zone-refined and annealed tantalum single crystals were deformed at temperatures between 4.2 and 450°K to investigate the deformation mechanism at low temperatures. Three types of axial orientations, [149], near [011] and near [001], were chosen so that the maximum shear stress was applied on either of {110}, twinning {112} and anti-twinning {112} planes. No twinning deformation was observed in any orientations even at liquid helium temperature. The orientation dependences of the yield stress and slip patterns showed a characteristic anisotropy and results were explained most naturally on the assumption that the activated slip planes were {110} and twinning {112}. By assuming that the screw dislocation under no stress has two types of spreading core configurations with an equivalent energy, changeable to each other, the slip behaviors can be interpreted by the successive process of overcoming the Peierls potential hills, in which the core configuration of glide dislocations is transformable from one to another depending on the direction of the applied stress. Furthermore, it is concluded that the normal stress on the slip plane has an effect on the Peierls potential from the comparison of yield stresses for tension and compression with each other.
Machine Learning | 2012
Takafumi Kanamori; Taiji Suzuki; Masashi Sugiyama
The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), feature selection (mutual information), and conditional probability estimation. Several methods of directly estimating the density ratio have recently been developed, e.g., moment matching estimation, maximum-likelihood density-ratio estimation, and least-squares density-ratio fitting. In this paper, we propose a kernelized variant of the least-squares method for density-ratio estimation, which is called kernel unconstrained least-squares importance fitting (KuLSIF). We investigate its fundamental statistical properties including a non-parametric convergence rate, an analytic-form solution, and a leave-one-out cross-validation score. We further study its relation to other kernel-based density-ratio estimators. In experiments, we numerically compare various kernel-based density-ratio estimation methods, and show that KuLSIF compares favorably with other approaches.
Ipsj Transactions on Computer Vision and Applications | 2009
Masashi Sugiyama; Takafumi Kanamori; Taiji Suzuki; Shohei Hido; Jun Sese; Ichiro Takeuchi; Liwei Wang
In statistical pattern recognition, it is important to avoid density estimation since density estimation is often more difficult than pattern recognition itself. Following this idea—known as Vapnik’s principle, a statistical data processing framework that employs the ratio of two probability density functions has been developed recently and is gathering a lot of attention in the machine learning and data mining communities. The purpose of this paper is to introduce to the computer vision community recent advances in density ratio estimation methods and their usage in various statistical data processing tasks such as nonstationarity adaptation, outlier detection, feature selection, and independent component analysis.
neural information processing systems | 2012
Masashi Sugiyama; Takafumi Kanamori; Taiji Suzuki; Marthinus Christoffel du Plessis; Song Liu; Ichiro Takeuchi
We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, this procedure does not necessarily work well because the first step is performed without regard to the second step, and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this letter, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a nonparametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be used in L2-distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.
Neural Networks | 2011
Masashi Sugiyama; Makoto Yamada; Paul von Bünau; Taiji Suzuki; Takafumi Kanamori; Motoaki Kawanabe
Methods for directly estimating the ratio of two probability density functions have been actively explored recently since they can be used for various data processing tasks such as non-stationarity adaptation, outlier detection, and feature selection. In this paper, we develop a new method which incorporates dimensionality reduction into a direct density-ratio estimation procedure. Our key idea is to find a low-dimensional subspace in which densities are significantly different and perform density-ratio estimation only in this subspace. The proposed method, D(3)-LHSS (Direct Density-ratio estimation with Dimensionality reduction via Least-squares Hetero-distributional Subspace Search), is shown to overcome the limitation of baseline methods.
international symposium on information theory | 2009
Taiji Suzuki; Masashi Sugiyama; Toshiyuki Tanaka
We propose a new method of approximating mutual information based on maximum likelihood estimation of a density ratio function. The proposed method, Maximum Likelihood Mutual Information (MLMI), possesses useful properties, e.g., it does not involve density estimation, the global optimal solution can be efficiently computed, it has suitable convergence properties, and model selection criteria are available. Numerical experiments show that MLMI compares favorably with existing methods.
Machine Learning | 2011
Taiji Suzuki; Ryota Tomioka
We propose a new optimization algorithm for Multiple Kernel Learning (MKL) called SpicyMKL, which is applicable to general convex loss functions and general types of regularization. The proposed SpicyMKL iteratively solves smooth minimization problems. Thus, there is no need of solving SVM, LP, or QP internally. SpicyMKL can be viewed as a proximal minimization method and converges super-linearly. The cost of inner minimization is roughly proportional to the number of active kernels. Therefore, when we aim for a sparse kernel combination, our algorithm scales well against increasing number of kernels. Moreover, we give a general block-norm formulation of MKL that includes non-sparse regularizations, such as elastic-net and ℓp-norm regularizations. Extending SpicyMKL, we propose an efficient optimization method for the general regularization framework. Experimental results show that our algorithm is faster than existing methods especially when the number of kernels is large (>1000).
Collaboration
Dive into the Taiji Suzuki's collaboration.
National Institute of Advanced Industrial Science and Technology
View shared research outputs