Yoshinobu Kawahara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshinobu Kawahara is active.

Explore More

Publication

Featured researches published by Yoshinobu Kawahara.

Statistical Analysis and Data Mining | 2012

Sequential change-point detection based on direct density-ratio estimation

Yoshinobu Kawahara; Masashi Sugiyama

Change-point detection is the problem of discovering time points at which properties of time-series data change. This covers a broad range of real-world problems and has been actively discussed in the community of statistics and data mining. In this paper, we present a novel nonparametric approach to detecting the change of probability distributions of sequence data. Our key idea is to estimate the ratio of probability densities, not the probability densities themselves. This formulation allows us to avoid nonparametric density estimation, which is known to be a difficult problem. We provide a change-point detection algorithm based on direct density-ratio estimation that can be computed very efficiently in an online manner. The usefulness of the proposed method is demonstrated through experiments using artificial and real-world datasets.

Bioinformatics | 2013

Efficient network-guided multi-locus association mapping with graph cuts

Chloé-Agathe Azencott; Dominik Grimm; Mahito Sugiyama; Yoshinobu Kawahara; Karsten M. Borgwardt

Motivation: As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. Although several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci or do not scale to genome-wide settings. Results: We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidopsis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature. Availability: Code is available at http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Neural Networks | 2012

Separation of stationary and non-stationary sources with a generalized eigenvalue problem

Satoshi Hara; Yoshinobu Kawahara; Takashi Washio; Paul von Bünau; Terumasa Tokunaga; K. Yumoto

Non-stationary effects are ubiquitous in real world data. In many settings, the observed signals are a mixture of underlying stationary and non-stationary sources that cannot be measured directly. For example, in EEG analysis, electrodes on the scalp record the activity from several sources located inside the brain, which one could only measure invasively. Discerning stationary and non-stationary contributions is an important step towards uncovering the mechanisms of the data generating system. To that end, in Stationary Subspace Analysis (SSA), the observed signal is modeled as a linear superposition of stationary and non-stationary sources, where the aim is to separate the two groups in the mixture. In this paper, we propose the first SSA algorithm that has a closed form solution. The novel method, Analytic SSA (ASSA), is more than 100 times faster than the state-of-the-art, numerically stable, and guaranteed to be optimal when the covariance between stationary and non-stationary sources is time-constant. In numerical simulations on wide range of settings, we show that our method yields superior results, even for signals with time-varying group-wise covariance. In an application to geophysical data analysis, ASSA extracts meaningful components that shed new light on the Pi 2 pulsations of the geomagnetic field.

Computational Management Science | 2013

Simultaneous pursuit of out-of-sample performance and sparsity in index tracking portfolios

Akiko Takeda; Mahesan Niranjan; Jun-ya Gotoh; Yoshinobu Kawahara

Index tracking is a passive investment strategy in which a fund (e.g., an ETF: exchange traded fund) manager purchases a set of assets to mimic a market index. The tracking error, i.e., the difference between the performances of the index and the portfolio, may be minimized by buying all the assets contained in the index. However, this strategy results in a considerable transaction cost and, accordingly, decreases the return of the constructed portfolio. On the other hand, a portfolio with a small cardinality may result in poor out-of-sample performance. Of interest is, thus, constructing a portfolio with good out-of-sample performance, while keeping the number of assets invested in small (i.e., sparse). In this paper, we develop a tracking portfolio model that addresses the above conflicting requirements by using a combination of L0- and L2-norms. The L2-norm regularizes the overdetermined system to impose smoothness (and hence has better out-of-sample performance), and it shrinks the solution to an equally-weighted dense portfolio. On the other hand, the L0-norm imposes a cardinality constraint that achieves sparsity (and hence a lower transaction cost). We propose a heuristic method for estimating portfolio weights, which combines a greedy search with an analytical formula embedded in it. We demonstrate that the resulting sparse portfolio has good tracking and generalization performance on historic data of weekly and monthly returns on the Nikkei 225 index and its constituent companies.

international conference on neural information processing | 2010

Stationary subspace analysis as a generalized eigenvalue problem

Satoshi Hara; Yoshinobu Kawahara; Takashi Washio; Paul von Bünau

Understanding non-stationary effects is one of the key challenges in data analysis. However, in many settings the observation is a mixture of stationary and non-stationary sources. The aim of Stationary Subspace Analysis (SSA) is to factorize multivariate data into its stationary and non-stationary components. In this paper, we propose a novel SSA algorithm (ASSA) that extracts stationary sources from multiple time series blocks. It has a globally optimal solution under certain assumptions that can be obtained by solving a generalized eigenvalue problem. Apart from the numerical advantages, we also show that compared to the existing method, fewer blocks are required in ASSA to guarantee the identifiability of the solution. We demonstrate the validity of our approach in simulations and in an application to domain adaptation.

Neurocomputing | 2011

Analyzing relationships among ARMA processes based on non-Gaussianity of external influences

Yoshinobu Kawahara; Shohei Shimizu; Takashi Washio

The analysis of a relationship among variables in data generating systems is one of the important problems in machine learning. In this paper, we propose an approach for estimating a graphical representation of variables in data generating processes, based on the non-Gaussianity of external influences and an autoregressive moving-average (ARMA) model. The presented model consists of two parts, i.e., a classical structural-equation model for instantaneous effects and an ARMA model for lagged effects in processes, and is estimated through the analysis using the non-Gaussianity on the residual processes. As well as the recently proposed non-Gaussianity based method named LiNGAM analysis, the estimation by the proposed method has identifiability and consistency. We also address the relation of the estimated structure by our method to the Granger causality. Finally, we demonstrate analyses on the data containing both of the instantaneous causality and the Granger (temporal) causality by using our proposed method where the datasets for the demonstration cover both artificial and real physical systems.

Pattern Recognition Letters | 2011

Submodular fractional programming for balanced clustering

Yoshinobu Kawahara; Kiyohito Nagano; Yoshio Okamoto

We address the balanced clustering problem where cluster sizes are regularized with submodular functions. The objective function for balanced clustering is a submodular fractional function, i.e., the ratio of two submodular functions, and thus includes the well-known ratio cuts as special cases. In this paper, we present a novel algorithm for minimizing this objective function (submodular fractional programming) using recent submodular optimization techniques. The main idea is to utilize an algorithm to minimize the difference of two submodular functions, combined with the discrete Newton method. Thus, it can be applied to the objective function involving any submodular functions in both the numerator and the denominator, which enables us to design flexible clustering setups. We also give theoretical analysis on the algorithm, and evaluate the performance through comparative experiments with conventional algorithms by artificial and real datasets.

Journal of Visualization | 2015

Scatterplot layout for high-dimensional data visualization

Yunzhu Zheng; Haruka Suematsu; Takayuki Itoh; Ryohei Fujimaki; Satoshi Morinaga; Yoshinobu Kawahara

Multi-dimensional data visualization is an important research topic that has been receiving increasing attention. Several techniques that apply scatterplot matrices have been proposed to represent multi-dimensional data as a collection of two-dimensional data visualization spaces. Typically, when using the scatterplot-based approach it is easier to understand relations between particular pairs of dimensions, but it often requires too large display spaces to display all possible scatterplots. This paper presents a technique to display meaningful sets of scatterplots generated from high-dimensional datasets. Our technique first evaluates all possible scatterplots generated from high-dimensional datasets, and selects meaningful sets. It then calculates the similarity between arbitrary pairs of the selected scatterplots, and places relevant scatterplots closer together in the display space while they never overlap each other. This design policy makes users easier to visually compare relevant sets of scatterplots. This paper presents algorithms to place the scatterplots by the combination of ideal position calculation and rectangle packing algorithms, and two examples demonstrating the effectiveness of the presented technique.Graphical Abstract

international symposium on neural networks | 2010

An experimental comparison of linear non-Gaussian causal discovery methods and their variants

Yasuhiro Sogawa; Shohei Shimizu; Yoshinobu Kawahara; Takashi Washio

Many multivariate Gaussianity-based techniques for identifying causal networks of observed variables have been proposed. These methods have several problems such that they cannot uniquely identify the causal networks without any prior knowledge. To alleviate this problem, a non-Gaussianity-based identification method LiNGAM was proposed. Though the LiNGAM potentially identifies a unique causal network without using any prior knowledge, it needs to properly examine independence assumptions of the causal network and search the correct causal network by using finite observed data points only. On another front, a kernel based independence measure that evaluates the independence more strictly was recently proposed. In addition, some advanced generic search algorithms including beam search have been extensively studied in the past. In this paper, we propose some variants of the LiNGAM method which introduce the kernel based method and the beam search enabling more accurate causal network identification. Furthermore, we experimentally characterize the LiNGAM and its variants in terms of accuracy and robustness of their identification.

Neural Networks | 2013

Active learning for noisy oracle via density power divergence

Yasuhiro Sogawa; Tsuyoshi Ueno; Yoshinobu Kawahara; Takashi Washio

The accuracy of active learning is critically influenced by the existence of noisy labels given by a noisy oracle. In this paper, we propose a novel pool-based active learning framework through robust measures based on density power divergence. By minimizing density power divergence, such as β-divergence and γ-divergence, one can estimate the model accurately even under the existence of noisy labels within data. Accordingly, we develop query selecting measures for pool-based active learning using these divergences. In addition, we propose an evaluation scheme for these measures based on asymptotic statistical analyses, which enables us to perform active learning by evaluating an estimation error directly. Experiments with benchmark datasets and real-world image datasets show that our active learning scheme performs better than several baseline methods.

Explore More