Yuto Yamaguchi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuto Yamaguchi is active.

Explore More

Publication

Featured researches published by Yuto Yamaguchi.

knowledge discovery and data mining | 2015

RSC: Mining and Modeling Temporal Activity in Social Media

Alceu Ferraz Costa; Yuto Yamaguchi; Agma J. M. Traina; Caetano Traina; Christos Faloutsos

Can we identify patterns of temporal activities caused by human communications in social media? Is it possible to model these patterns and tell if a user is a human or a bot based only on the timing of their postings? Social media services allow users to make postings, generating large datasets of human activity time-stamps. In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. Based on our findings, we propose Rest-Sleep-and-Comment (RSC), a generative model that is able to match all four discovered patterns. We demonstrate the utility of RSC by showing that it can accurately fit real time-stamp data from Reddit and Twitter. We also show that RSC can be used to spot outliers and detect users with non-human behavior, such as bots. We validate RSC using real data consisting of over 35 million postings from Twitter and Reddit. RSC consistently provides a better fit to real data and clearly outperform existing models for human dynamics. RSC was also able to detect bots with a precision higher than 94%.

conference on information and knowledge management | 2014

Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Yuto Yamaguchi; Toshiyuki Amagasa; Hiroyuki Kitagawa; Yohei Ikawa

The location profiles of social media users are valuable for various applications, such as marketing and real-world analysis. As most users do not disclose their home locations, the problem of inferring home locations has been well studied in recent years. In fact, most existing methods perform batch inference using static (i.e., pre-stored) social media contents. However, social media contents are generated and delivered in real-time as social streams. In this situation, it is important to continuously update current inference results based on the newly arriving contents to improve the results over time. Moreover, it is effective for location inference to use the spatiotemporal correlation between contents and locations. The main idea of this paper is that we can infer the locations of users who simultaneously post about a local event (e.g., earthquakes). Hence, in this paper, we propose an online location inference method over social streams that exploits the spatiotemporal correlation, achieving 1) continuous updates with low computational and storage costs, and 2) better inference accuracy than that of existing methods. The experimental results using a Twitter dataset show that our method reduces the inference error to less than 68% of existing methods. The results also show that the proposed method can update inference results in constant time regardless of the amount of accumulated contents.

pacific-asia conference on knowledge discovery and data mining | 2015

SocNL: Bayesian Label Propagation with Confidence

Yuto Yamaguchi; Christos Faloutsos; Hiroyuki Kitagawa

How can we predict Smith’s main hobby if we know the main hobby of Smith’s friends? Can we measure the confidence in our prediction if we are given the main hobby of only a few of Smith’s friends? In this paper, we focus on how to estimate the confidence on the node classification problem. Providing a confidence level for the classification problem is important because most nodes in real world networks tend to have few neighbors, and thus, a small amount of evidence. Our contributions are three-fold: (a) novel algorithm; we propose a semi-supervised learning algorithm that converges fast, and provides the confidence estimate (b) theoretical analysis; we show the solid theoretical foundation of our algorithm and the connections to label propagation and Bayesian inference (c) empirical analysis; we perform extensive experiments on three different real networks. Specifically, the experimental results demonstrate that our algorithm outperforms other algorithms on graphs with less smoothness and low label density.

Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18 | 2018

FORank: Fast ObjectRank for Large Heterogeneous Graphs

Tomoki Sato; Hiroaki Shiokawa; Yuto Yamaguchi; Hiroyuki Kitagawa

ObjectRank is one of the popular graph mining methods that enables us to evaluate the importance of each vertex on heterogeneous graphs. However, it is computationally expensive to apply it to large graphs since ObjectRank needs to compute the importance of all vertices iteratively. In this work, we present a fast ObjectRank algorithm,FORank, that accurately approximates the keyword search results. FORank iteratively prunes vertices whose convergence score likely has less impact on the results during iterative computation. The experiments showed that FORank runs 7 times faster than ObjectRank computation with over 90% accuracy approximation.

information integration and web-based applications & services | 2013

A Local Method for ObjectRank Estimation

Yuta Sakakura; Yuto Yamaguchi; Toshiyuki Amagasa; Hiroyuki Kitagawa

ObjectRank is a method of link structure analysis to evaluate the importance of objects in a database. ObjectRank is known to be computationally expensive, because it requires iterative computations over a large graph. However, in many real applications, it is sufficient to compute the ObjectRank scores for only small fraction of objects. To address this problem, this paper proposes a novel method for estimating ObjectRank scores for specific objects by applying local computation over partial graphs, thereby allowing us to maintain low computational cost even for large graphs. Our basic idea is that, for a given target node, we induce a local graph by checking the edge weights and pruning the edges with considering their weights. We conduct experiments to compare our method with some comparative methods. The experimental results show that our method can reduce the computational cost while maintaining the accuracy.

conference on information and knowledge management | 2017

Collecting Non-Geotagged Local Tweets via Bandit Algorithms

Saki Ueda; Yuto Yamaguchi; Hiroyuki Kitagawa

How can we collect non-geotagged tweets posted by users in a specific location as many as possible in a limited time span? How can we find such users if we do not have much information about the specified location? Although there are varieties of methods to estimate the locations of users, these methods are not directly applicable to this problem because they require collecting a large amount of random tweets and then filter them to obtain a small amount of tweets from such users. In this paper, we propose a framework that incrementally finds such users and continuously collects tweets from them. Our framework is based on the bandit algorithm that adjusts the trade-off between exploration and exploitation, in other words, it simultaneously finds new users in the specified location and collects tweets from already-found users. The experimental results show that the bandit algorithm works well on this problem and outperforms the carefully-designed baselines.

web information systems engineering | 2015

Tweet Location Inference Based on Contents and Temporal Association

Saki Ueda; Yuto Yamaguchi; Hiroyuki Kitagawa; Toshiyuki Amagasa

How can we infer a tweet location? Are timestamps of tweets effective for the location inference? In this study, we propose a novel method for tweet location inference based on contents and timestamps of tweets. It is important to infer the locations of tweets for the services related to locations such as recommending restaurants, sending disaster-related information to users, and providing commercial messages to users. This study has two contributions: (1) we propose a novel method to infer tweet locations based on the contents and timestamps of tweets, and (2) we experimentally demonstrate the effectiveness of the proposed method using Twitter data. The experimental results suggest that the proposed method can infer tweet locations more precisely than a baseline that does not take the temporal association into account.

international world wide web conferences | 2015

Why Do You Follow Him?: Multilinear Analysis on Twitter

Yuto Yamaguchi; Mitsuo Yoshida; Christos Faloutsos; Hiroyuki Kitagawa

Why does Smith follow Johnson on Twitter? In most cases, the reason why users follow other users is unavailable. In this work, we answer this question by proposing TagF, which analyzes the who-follows-whom network (matrix) and the who-tags-whom network (tensor) simultaneously. Concretely, our method decomposes a coupled tensor constructed from these matrix and tensor. The experimental results on million-scale Twitter networks show that TagF uncovers different, but explainable reasons why users follow other users.

Proceedings of the Confederated International Conferences on On the Move to Meaningful Internet Systems: OTM 2015 Conferences - Volume 9415 | 2015

Real-Time Relevance Matching of News and Tweets

Sei Onishi; Yuto Yamaguchi; Hiroyuki Kitagawa

Given a news article, how many tweets are relevant to it in Twitter? Can we continuously collect only such tweets in real-time? In this paper, we propose a method for matching news articles and tweets in real-time. By collecting tweets relevant to news articles, we can get reactions to news articles such as sentiments and opinions from Twitter users. Our contributions are two-fold: a flexibility: our method collects the appropriate number of tweets for various kinds of news articles, each of which has the different number of tweets that mention it. b efficiency: our method can reduce the update time of an inverted index which is used for efficient matching of news articles and tweets. Also, we experimentally demonstrate the effectiveness of our method on streams of news articles and tweets from Yahoo!News and Twitter, respectively. We use the area under the ROC curve AUC to compare the accuracy of our method and that of baselines. The comparison shows that the AUC of our method is higher than that of the baselines by up to 22.7%. Furthermore, our method can update its index about 10 times faster compared to the existing technique.

database and expert systems applications | 2014

An Improved Method for Efficient PageRank Estimation

Yuta Sakakura; Yuto Yamaguchi; Toshiyuki Amagasa; Hiroyuki Kitagawa

PageRank is a link analysis method to estimate the importance of nodes in a graph, and has been successfully applied in wide range of applications. However, its computational complexity is known to be high. Besides, in many applications, only a small number of nodes are of interest. To address this problem, several methods for estimating PageRank score of a target node without accessing whole graph have been proposed. In particular, Chen et al. proposed an approach where, given a target node, subgraph containing the target is induced to locally compute PageRank score. Nevertheless, its computation is still time consuming due to the fact that a number of iterative processes are required when constructing a subgraph for subsequent PageRank estimation. To make it more efficient, we propose an improved approach in which a subgraph is recursively expanded by solving a linear system without any iterative computation. To assess the efficiency of the proposed scheme, we conduct a set of experimental evaluations. The results reveal that our proposed scheme can estimate PageRank score more efficiently than the existing approach while maintaining the estimation accuracy.

Explore More