Issei Sato | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Issei Sato is active.

Explore More

Publication

Featured researches published by Issei Sato.

international acm sigir conference on research and development in information retrieval | 2010

Person name disambiguation by bootstrapping

Minoru Yoshida; Masaki Ikeda; Shingo Ono; Issei Sato; Hiroshi Nakagawa

In this paper, we report our system that disambiguates person names in Web search results. The system uses named entities, compound key words, and URLs as features for document similarity calculation, which typically show high precision but low recall clustering results. We propose to use a two-stage clustering algorithm by bootstrapping to improve the low recall values, in which clustering results of the first stage are used to extract features used in the second stage clustering. Experimental results revealed that our algorithm yields better score than the best systems at the latest WePS workshop.

international conference on management of data | 2015

Bayesian Differential Privacy on Correlated Data

Bin Yang; Issei Sato; Hiroshi Nakagawa

Differential privacy provides a rigorous standard for evaluating the privacy of perturbation algorithms. It has widely been regarded that differential privacy is a universal definition that deals with both independent and correlated data and a differentially private algorithm can protect privacy against arbitrary adversaries. However, recent research indicates that differential privacy may not guarantee privacy against arbitrary adversaries if the data are correlated. In this paper, we focus on the private perturbation algorithms on correlated data. We investigate the following three problems: (1) the influence of data correlations on privacy; (2) the influence of adversary prior knowledge on privacy; and (3) a general perturbation algorithm that is private for prior knowledge of any subset of tuples in the data when the data are correlated. We propose a Pufferfish definition of privacy, called Bayesian differential privacy, by which the privacy level of a probabilistic perturbation algorithm can be evaluated even when the data are correlated and when the prior knowledge is incomplete. We present a Gaussian correlation model to accurately describe the structure of data correlations and analyze the Bayesian differential privacy of the perturbation algorithm on the basis of this model. Our results show that privacy is poorest for an adversary who has the least prior knowledge. We further extend this model to a more general one that considers uncertain prior knowledge.

knowledge discovery and data mining | 2012

Practical collapsed variational bayes inference for hierarchical dirichlet process

Issei Sato; Kenichi Kurihara; Hiroshi Nakagawa

We propose a novel collapsed variational Bayes (CVB) inference for the hierarchical Dirichlet process (HDP). While the existing CVB inference for the HDP variant of latent Dirichlet allocation (LDA) is more complicated and harder to implement than that for LDA, the proposed algorithm is simple to implement, does not require variance counts to be maintained, does not need to set hyper-parameters, and has good predictive performance.

knowledge discovery and data mining | 2010

Collusion-resistant privacy-preserving data mining

Bin Yang; Hiroshi Nakagawa; Issei Sato; Jun Sakuma

Recent research in privacy-preserving data mining (PPDM) has become increasingly popular due to the wide application of data mining and the increased concern regarding the protection of private and personal information. Lately, numerous methods of privacy-preserving data mining have been proposed. Most of these methods are based on an assumption that semi-honest is and collusion is not present. In other words, every party follows such protocol properly with the exception that it keeps a record of all its intermediate computations without sharing the record with others. In this paper, we focus our attention on the problem of collusions, in which some parties may collude and share their record to deduce the private information of other parties. In particular, we consider a general problem in PPDM - multiparty secure computation of some functions of secure summations of data spreading around multiple parties. To solve such a problem, we propose a new method that entails a high level of security - full-privacy. With this method, no sensitive information of a party will be revealed even when all other parties collude. In addition, this method is efficient with a running time of O(m). We will also show that by applying this general method, a large number of problems in PPDM can be solved with enhanced security.

knowledge discovery and data mining | 2008

Person name disambiguation in web pages using social network, compound words and latent topics

Shingo Ono; Issei Sato; Minoru Yoshida; Hiroshi Nakagawa

The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, many persons have the same name and that ambiguity typically causes the search results of one person name to include Web pages about several different persons. We propose a novel framework for person name disambiguation that has the following three components processes. Extraction of social network information by finding co-occurrences of named entities, Measurement of document similarities based on occurrences of key compound words, Inference of topic information from documents based on the Dirichlet process unigram mixture model. Experiments using an actual Web document dataset show that the result of our framework is promising.

knowledge discovery and data mining | 2007

Knowledge discovery of multiple-topic document using parametric mixture model with dirichlet prior

Issei Sato; Hiroshi Nakagawa

Documents, such as those seen on Wikipedia and Folksonomy, have tended to be assigned with multiple topics as a meta-data.Therefore, it is more and more important to analyze a relationship between a document and topics assigned to the document. In this paper, we proposed a novel probabilistic generative model of documents with multiple topics as a meta-data. By focusing on modeling the generation process of a document with multiple topics, we can extract specific properties of documents with multiple topics.Proposed model is an expansion of an existing probabilistic generative model: Parametric Mixture Model (PMM). PMM models documents with multiple topics by mixing model parameters of each single topic. Since, however, PMM assigns the same mixture ratio to each single topic, PMM cannot take into account the bias of each topic within a document. To deal with this problem, we propose a model that considers Dirichlet distribution as a prior distribution of the mixture ratio.We adopt Variational Bayes Method to infer the bias of each topic within a document. We evaluate the proposed model and PMM using MEDLINE corpus.The results of F-measure, Precision and Recall show that the proposed model is more effective than PMM on multiple-topic classification. Moreover, we indicate the potential of the proposed model that extracts topics and document-specific keywords using information about the assigned topics.

knowledge discovery and data mining | 2015

Stochastic Divergence Minimization for Online Collapsed Variational Bayes Zero Inference of Latent Dirichlet Allocation

Issei Sato; Hiroshi Nakagawa

The collapsed variational Bayes zero (CVB0) inference is a variational inference improved by marginalizing out parameters, the same as with the collapsed Gibbs sampler. A drawback of the CVB0 inference is the memory requirements. A probability vector must be maintained for latent topics for every token in a corpus. When the total number of tokens is N and the number of topics is K, the CVB0 inference requires Ο(NK) memory. A stochastic approximation of the CVB0 (SCVB0) inference can reduce Ο(NK) to Ο(VK), where V denotes the vocabulary size. We reformulate the existing SCVB0 inference by using the stochastic divergence minimization algorithm, with which convergence can be analyzed in terms of Martingale convergence theory. We also reveal the property of the CVB0 inference in terms of the leave-one-out perplexity, which leads to the estimation algorithm of the Dirichlet distribution parameters. The predictive performance of the propose SCVB0 inference is better than that of the original SCVB0 inference in four datasets.

Neurocomputing | 2013

Quantum annealing for Dirichlet process mixture models with applications to network clustering

Issei Sato; Shu Tanaka; Kenichi Kurihara; Seiji Miyashita; Hiroshi Nakagawa

We developed a new quantum annealing (QA) algorithm for Dirichlet process mixture (DPM) models based on the Chinese restaurant process (CRP). QA is a parallelized extension of simulated annealing (SA), i.e., it is a parallel stochastic optimization technique. Existing approaches (Kurihara et al. 2009 [12] and Sato et al. 2009 [20]) cannot be applied to the CRP because their QA framework is formulated using a fixed number of mixture components. The proposed QA algorithm can handle an unfixed number of classes in mixture models. We applied QA to a DPM model for clustering vertices in a network where a CRP seating arrangement indicates a network partition. A multi core processer was used for running QA in experiments, the results of which show that QA is better than SA, Markov chain Monte Carlo inference, and beam search at finding a maximum a posteriori estimation of a seating arrangement in the CRP. Since our QA algorithm is as easy as to implement the SA algorithm, it is suitable for a wide range of applications.

knowledge discovery and data mining | 2011

Probabilistic matrix factorization leveraging contexts for unsupervised relation extraction

Shingo Takamatsu; Issei Sato; Hiroshi Nakagawa

The clustering of the semantic relations between entities extracted from a corpus is one of the main issues in unsupervised relation extraction (URE). Previous methods assume a huge corpus because they have utilized frequently appearing entity pairs in the corpus. In this paper, we present a URE that works well for a small corpus by using word sequences extracted as relations. The feature vectors of the word sequences are extremely sparse. To deal with the sparseness problem, we take the two approaches: dimension reduction and leveraging context in the whole corpus including sentences from which no relations are extracted. The context in this case is captured with feature co-occurrences, which indicate appearances of two features in a single sentence. The approaches are implemented by a probabilistic matrix factorization that jointly factorizes the matrix of the feature vectors and the matrix of the feature co-occurrences. Experimental results show that our method outperforms previously proposed methods.

knowledge discovery and data mining | 2010

Mining numbers in text using suffix arrays and clustering based on dirichlet process mixture models

Minoru Yoshida; Issei Sato; Hiroshi Nakagawa; Akira Terada

We propose a system that enables us to search with ranges of numbers Both queries and resulting strings can be both strings and numbers (e.g., “200–800 dollars”) The system is based on suffix-arrays augmented with treatment of number information to provide search for numbers by words, and vice versa Further, the system performs clustering based on a Dirichlet Process Mixture of Gaussians to treat extracted collection of numbers appropriately.

Explore More