Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kenji Fukumizu is active.

Publication


Featured researches published by Kenji Fukumizu.


Annals of Statistics | 2009

Kernel dimension reduction in regression

Kenji Fukumizu; Francis R. Bach; Michael I. Jordan

We present a new methodology for sufficient dimension reduction (SDR). Our methodology derives directly from the formulation of SDR in terms of the conditional independence of the covariate X from the response Y , given the projection of X on the central subspace [cf. J. Amer. Statist. Assoc. 86 (1991) 316–342 and Regression Graphics (1998) Wiley]. We show that this conditional independence assertion can be characterized in terms of conditional covariance operators on reproducing kernel Hilbert spaces and we show how this characterization leads to an M-estimator for the central subspace. The resulting estimator is shown to be consistent under weak conditions; in particular, we do not have to impose linearity or ellipticity conditions of the kinds that are generally invoked for SDR methods. We also present empirical results showing that the new methodology is competitive in practice.


Neural Networks | 2000

Adaptive natural gradient learning algorithms for various stochastic models

Hyeyoung Park; Shun-ichi Amari; Kenji Fukumizu

The natural gradient method has an ideal dynamic behavior which resolves the slow learning speed of the standard gradient descent method caused by plateaus. However, it is required to calculate the Fisher information matrix and its inverse, which makes the implementation of the natural gradient almost impossible. To solve this problem, a preliminary study has been proposed concerning an adaptive method of calculating an estimate of the inverse of the Fisher information matrix, which is called the adaptive natural gradient learning method. In this paper, we show that the adaptive natural gradient method can be extended to be applicable to a wide class of stochastic models: regression with an arbitrary noise model and classification with an arbitrary number of classes. We give explicit forms of the adaptive natural gradient for these models. We confirm the practical advantage of the proposed algorithms through computational experiments on benchmark problems.


Annals of Statistics | 2013

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

Dino Sejdinovic; Bharath K. Sriperumbudur; Arthur Gretton; Kenji Fukumizu

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.


international conference on machine learning | 2009

Hilbert space embeddings of conditional distributions with applications to dynamical systems

Le Song; Jonathan Huang; Alexander J. Smola; Kenji Fukumizu

In this paper, we extend the Hilbert space embedding approach to handle conditional distributions. We derive a kernel estimate for the conditional embedding, and show its connection to ordinary embeddings. Conditional embeddings largely extend our ability to manipulate distributions in Hilbert spaces, and as an example, we derive a nonparametric method for modeling dynamical systems where the belief state of the system is maintained as a conditional embedding. Our method is very general in terms of both the domains and the types of distributions that it can handle, and we demonstrate the effectiveness of our method in various dynamical systems. We expect that conditional embeddings will have wider applications beyond modeling dynamical systems.


IEEE Signal Processing Magazine | 2013

Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models

Le Song; Kenji Fukumizu; Arthur Gretton

Many modern applications of signal processing and machine learning, ranging from computer vision to computational biology, require the analysis of large volumes of high-dimensional continuous-valued measurements. Complex statistical features are commonplace, including multimodality, skewness, and rich dependency structures. Such problems call for a flexible and robust modeling framework that can take into account these diverse statistical features. Most existing approaches, including graphical models, rely heavily on parametric assumptions. Variables in the model are typically assumed to be discrete valued or multivariate Gaussians; and linear relations between variables are often used. These assumptions can result in a model far different from the data generating process.


Electronic Journal of Statistics | 2012

On the empirical estimation of integral probability metrics

Bharath K. Sriperumbudur; Kenji Fukumizu; Arthur Gretton; Bernhard Schoelkopf; Gert R. G. Lanckriet

Given two probability measures, P and Q defined on a measurable space, S, the integral probability metric (IPM) is defined as γF(P,Q) = sup {∣∣∣∣ ∫


Journal of the American Statistical Association | 2014

Gradient-based kernel dimension reduction for regression

Kenji Fukumizu; Chenlei Leng

This article proposes a novel approach to linear dimension reduction for regression using nonparametric estimation with positive-definite kernels or reproducing kernel Hilbert spaces (RKHSs). The purpose of the dimension reduction is to find such directions in the explanatory variables that explain the response sufficiently: this is called sufficient dimension reduction. The proposed method is based on an estimator for the gradient of the regression function considered for the feature vectors mapped into RKHSs. It is proved that the method is able to estimate the directions that achieve sufficient dimension reduction. In comparison with other existing methods, the proposed one has wide applicability without strong assumptions on the distributions or the type of variables, and needs only eigendecomposition for estimating the projection matrix. The theoretical analysis shows that the estimator is consistent with certain rate under some conditions. The experimental results demonstrate that the proposed method successfully finds effective directions with efficient computation even for high-dimensional explanatory variables.


Neural Networks | 2008

Relation between weight size and degree of over-fitting in neural network regression

Katsuyuki Hagiwara; Kenji Fukumizu

This paper investigates the relation between over-fitting and weight size in neural network regression. The over-fitting of a network to Gaussian noise is discussed. Using re-parametrization, a network function is represented as a bounded function g multiplied by a coefficient c. This is considered to bound the squared sum of the outputs of g at given inputs away from a positive constant delta(n), which restricts the weight size of a network and enables the probabilistic upper bound of the degree of over-fitting to be derived. This reveals that the order of the probabilistic upper bound can change depending on delta(n). By applying the bound to analyze the over-fitting behavior of one Gaussian unit, it is shown that the probability of obtaining an extremely small value for the width parameter in training is close to one when the sample size is large.


Neurocomputing | 2011

Statistical approaches to combining binary classifiers for multi-class classification

Yuichi Shiraishi; Kenji Fukumizu

One of the popular methods for multi-class classification is to combine binary classifiers. In this paper, we propose a new approach for combining binary classifiers. Our method trains a combining method of binary classifiers using statistical techniques such as penalized logistic regression, stacking, and a sparsity promoting penalty. Our approach has several advantages. Firstly, our method outperforms existing methods even if the base classifiers are well-tuned. Secondly, an estimate of conditional probability for each class can be naturally obtained. Furthermore, we propose selecting relevant binary classifiers by adding the group lasso type penalty in training the combining method.


International Journal of Pattern Recognition and Artificial Intelligence | 2015

Higher-Order Regularized Kernel Canonical Correlation Analysis

Md. Ashad Alam; Kenji Fukumizu

It is well known that the performance of kernel methods depends on the choice of appropriate kernels and associated parameters. While cross-validation (CV) is a useful method of kernel and parameter choice for supervised learning such as the support vector machines, there are no general well-founded methods for unsupervised kernel methods. This paper discusses CV for kernel canonical correlation analysis (KCCA), and proposes a new regularization approach for KCCA. As we demonstrate with Gaussian kernels, the CV errors for KCCA tend to decrease as the bandwidth parameter of the kernel decreases, which provides inappropriate features with all the data concentrated in a few points. This is caused by the ill-posedness of the KCCA with the CV. To solve this problem, we propose to use constraints on the fourth-order moments of canonical variables in addition to the variances. Experiments on synthesized and real-world data demonstrate that the proposed higher-order regularized KCCA can be applied effectively with the CV to find appropriate kernel and regularization parameters.

Collaboration


Dive into the Kenji Fukumizu's collaboration.

Top Co-Authors

Avatar

Arthur Gretton

University College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yu Nishiyama

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar

Song Liu

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Francis R. Bach

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar

Taiji Suzuki

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Le Song

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge