Is this you? Create Your Porfile

Kohei Hayashi

National Institute of Informatics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kohei Hayashi is active.

Explore More

Publication

Featured researches published by Kohei Hayashi.

international joint conference on artificial intelligence | 2018

Think Globally, Embed Locally — Locally Linear Meta-embedding of Words

Danushka Bollegala; Kohei Hayashi; Ken-ichi Kawarabayashi

Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accurate and complete \emph{meta-embeddings} of words. For this purpose, we propose an unsupervised locally linear meta-embedding learning method that takes pre-trained word embeddings as the input, and produces more accurate meta embeddings. Unlike previously proposed meta-embedding learning methods that learn a global projection over all words in a vocabulary, our proposed method is sensitive to the differences in local neighbourhoods of the individual source word embeddings. Moreover, we show that vector concatenation, a previously proposed highly competitive baseline approach for integrating word embeddings, can be derived as a special case of the proposed method. Experimental results on semantic similarity, word analogy, relation classification, and short-text classification tasks show that our meta-embeddings to significantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings.

international joint conference on artificial intelligence | 2017

Tensor Decomposition with Missing Indices

Yuto Yamaguchi; Kohei Hayashi

How can we decompose a data tensor if the indices are partially missing? Tensor decomposition is a fundamental tool to analyze the tensor data. Suppose, for example, we have a 3rd-order tensor X where each element Xijk takes 1 if user i posts word j at location k on Twitter. Standard tensor decomposition expects all the indices are observed. However, in some tweets, location k can be missing. In this paper, we study a tensor decomposition problem where the indices (i, j, or k) of some observed elements are partially missing. Towards the problem, we propose a probabilistic tensor decomposition model that handles missing indices as latent variables. To infer them, we develop an algorithm based on the variational MAP-EM algorithm, which enables us to leverage the information from the incomplete data. The experiments on both synthetic and real datasets show that the proposed model achieves higher accuracy in the tensor completion task than baselines.

international joint conference on artificial intelligence | 2017

When Does Label Propagation Fail? A View from a Network Generative Model.

Yuto Yamaguchi; Kohei Hayashi

What kinds of data does Label Propagation (LP) work best on? Can we justify the solution of LP from a theoretical standpoint? LP is a semisupervised learning algorithm that is widely used to predict unobserved node labels on a network (e.g., user’s gender on an SNS). Despite its importance, its theoretical properties remain mostly unexplored. In this paper, we answer the above questions by interpreting LP from a statistical viewpoint. As our main result, we identify the network generative model behind the discretized version of LP (DLP), and we show that under specific conditions the solution of DLP is equal to the maximum a posteriori estimate of that generative model. Our main result reveals the critical limitations of LP. Specifically, we discover that LP would not work best on networks with (1) disassortative node labels, (2) clusters having different edge densities, (3) nonuniform label distributions, or (4) unreliable node labels provided. Our experiments under a variety of settings support our theoretical results.

PLOS ONE | 2017

Learning linear transformations between counting-based and prediction-based word embeddings

Danushka Bollegala; Kohei Hayashi; Ken-ichi Kawarabayashi

Despite the growing interest in prediction-based word embedding learning methods, it remains unclear as to how the vector spaces learnt by the prediction-based methods differ from that of the counting-based methods, or whether one can be transformed into the other. To study the relationship between counting-based and prediction-based embeddings, we propose a method for learning a linear transformation between two given sets of word embeddings. Our proposal contributes to the word embedding learning research in three ways: (a) we propose an efficient method to learn a linear transformation between two sets of word embeddings, (b) using the transformation learnt in (a), we empirically show that it is possible to predict distributed word embeddings for novel unseen words, and (c) empirically it is possible to linearly transform counting-based embeddings to prediction-based embeddings, for frequent words, different POS categories, and varying degrees of ambiguities.

Neurocomputing | 2017

Sparse Bayesian linear regression with latent masking variables

Yohei Kondo; Kohei Hayashi; Shin-ichi Maeda

Abstract Extracting a small number of relevant features for the task, i.e., feature selection, is often a crucial step in supervised learning problems. Sparse linear regression provides a fast and convenient option for feature selection, where regularization facilitates reducing the weight parameters of irrelevant features. However, the regularization also induces undesirable shrinkage in the weights of relevant features. Here, we propose Bayesian masking (BM) in order to resolve the trade-off problem between sparsity and shrinkage. Our strategy is not to directly impose any regularization on the weights; instead, BM introduces binary latent variables, called masking variables, into a regression model to keep the sparsity; each feature and sample has a binary variable whose value determines if the feature is masked or not at the sample. We derive a variational Bayesian inference algorithm for the augmented model based on the factorized information criterion (FIC), a recently-proposed asymptotic approximation of the marginal log-likelihood. We analyze the one-dimensional estimators of Lasso, automatic relevance determination (ARD), and BM, and thus show the superiority of BM in terms of the sparsity-shrinkage trade-off. Finally, we confirm our theoretical analyses through experiments and, demonstrate that BM achieves higher feature selection accuracy compared with Lasso and ARD.

arXiv: Machine Learning | 2010