Kyosuke Nishida | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyosuke Nishida is active.

Explore More

Publication

Featured researches published by Kyosuke Nishida.

international acm sigir conference on research and development in information retrieval | 2012

Improving tweet stream classification by detecting changes in word probability

Kyosuke Nishida; Takahide Hoshide; Ko Fujimura

We propose a classification model of tweet streams in Twitter, which are representative of document streams whose statistical properties will change over time. Our model solves several problems that hinder the classification of tweets; in particular, the problem that the probabilities of word occurrence change at different rates for different words. Our model switches between two probability estimates based on full and recent data for each word when detecting changes in word probability. This switching enables our model to achieve both accurate learning of stationary words and quick response to bursty words. We then explain how to implement our model by using a word suffix array, which is a full-text search index. Using the word suffix array allows our model to handle the temporal attributes of word n-grams effectively. Experiments on three tweet data sets demonstrate that our model offers statistically significant higher topic-classification accuracy than conventional temporally-aware classification models.

Proceedings of the 2011 international workshop on DETecting and Exploiting Cultural diversiTy on the social web | 2011

Tweet classification by data compression

Kyosuke Nishida; Ryohei Banno; Ko Fujimura; Takahide Hoshide

We propose a new method that uses data compression for classifying an unseen tweet as being related to an interesting topic or not. Our compression-based tweet classification method, called CTC, evaluates the compressibility of the tweet when given positive and negative examples. This enables our method to handle multilingual tweets in the same manner and to effectively utilize the word context of the tweet, which is extremely important information in the 140 character limit. Experiments with worldwide tweets assigned a single hashtag demonstrate that our method, which uses the Deflate algorithm (used in gzip) for empirical evaluations, achieved higher precision and recall rates than state-of-the-art online learning algorithms.

ubiquitous computing | 2014

Probabilistic identification of visited point-of-interest for personalized automatic check-in

Kyosuke Nishida; Hiroyuki Toda; Takeshi Kurashima; Yoshihiko Suhara

Automatic check-in, which is to identify a users visited points of interest (POIs) from his or her trajectories, is still an open problem because of positioning errors and the high POI density in small areas. In this study, we propose a probabilistic visited-POI identification method. The method uses a new hierarchical Bayesian model for identifying the latent visited-POI label of stay points, which are automatically extracted from trajectories. This model learns from labeled and unlabeled stay point data (i.e., semi-supervised learning) and takes into account personal preferences, stay locations including positioning errors, stay times for each category, and prior knowledge about typical user preferences and stay times. Experimental results with real user trajectories and POIs of Foursquare demonstrated that our method achieved statistically significant improvements in precision at 1 and recall at 3 over the nearest neighbor method and a conventional method that uses a supervised learning-to-rank algorithm.

Journal of data science | 2016

Classifying spatial trajectories using representation learning

Yuki Endo; Hiroyuki Toda; Kyosuke Nishida; Jotaro Ikedo

This paper addresses the problem of feature extraction for estimating users’ transportation modes from their movement trajectories. Previous studies have adopted supervised learning approaches and used engineers’ skills to find effective features for accurate estimation. However, such handcrafted features cannot always work well because human behaviors are diverse and trajectories include noise due to measurement error. To compensate for the shortcomings of handcrafted features, we propose a method that automatically extracts additional features using a deep neural network (DNN). In order that a DNN can easily handle input trajectories, our method converts a raw trajectory data structure into an image data structure while maintaining effective spatiotemporal information. A classification model is constructed in a supervised manner using both of the deep features and handcrafted features. We demonstrate the effectiveness of the proposed method through several experiments using two real datasets, such as accuracy comparisons with previous methods and feature visualization.

pacific-asia conference on knowledge discovery and data mining | 2017

Predicting Destinations from Partial Trajectories Using Recurrent Neural Network

Yuki Endo; Kyosuke Nishida; Hiroyuki Toda; Hiroshi Sawada

Predicting a user’s destinations from his or her partial movement trajectories is still a challenging problem. To this end, we employ recurrent neural networks (RNNs), which can consider long-term dependencies and avoid a data sparsity problem. This is because the RNNs store statistical weights for long-term transitions in location sequences unlike conventional Markov process-based methods that count the number of short-term transitions. However, how to apply the RNNs to the destination prediction is not straight-forward, and thus we propose an efficient and accurate method for this problem. Specifically, our method represents trajectories as discretized features in a grid space and feeds sequences of them to the RNN model, which estimates the transition probabilities in the next timestep. Using these one-step transition probabilities, the visiting probabilities for the destination candidates are efficiently estimated by simulating the movements of objects based on stochastic sampling with an RNN encoder-decoder framework. We evaluate the proposed method on two different real datasets, i.e., taxi and personal trajectories. The results demonstrate that our method can predict destinations more accurately than state-of-the-art methods.

advances in social networks analysis and mining | 2016

How fashionable is each street?: quantifying road characteristics using social media

Takuya Nishimura; Kyosuke Nishida; Hiroyuki Toda; Hiroshi Sawada

Determining routes that provide opportunities to satisfy the various demands of users is still an open problem. This is because it is virtually impossible to manually quantify the characteristics of each road and there are few resources describing roads directly such that we meet any demand that may arise. The goal of this study is to automatically quantify the characteristics of roads for demands that can be described using keywords such as “fashionable”. To achieve this goal, we propose a two-stage method that analyzes social media and road networks. First, our method estimates the topic distribution (i.e., the characteristics) of each point-of-interest (POI) by analyzing geotagged texts with the Latent Dirichlet Allocation model. Next, it uses a Markov random field model to estimate the characteristics of each road on the basis of those of POIs and the road networks associated with the POIs. Experiments on real datasets demonstrate that our method achieves statistically significant improvements over baseline methods in terms of ranking quality in the information retrieval for roads in three areas given 25 keywords.

international workshop computational transportation science | 2015

Extracting Arbitrary-shaped Stay Regions from Geospatial Trajectories with Outliers and Missing Points

Kyosuke Nishida; Hiroyuki Toda; Yoshimasa Koike

We tackle the problem of extracting stay regions from a geospatial trajectory where a user has stayed longer than a certain time threshold. There are four major difficulties with this problem: (1) stay regions are not only point-type ones such as at a bus-stop but large and arbitrary-shaped ones such as at a shopping mall; (2) trajectories contain spatial outliers; (3) there are missing points in trajectories; and (4) trajectories should be analyzed in an online mode. Previous algorithms cannot overcome these difficulties simultaneously. Density-based batch algorithms have advantages over the previous algorithms in discovering of arbitrary-shaped clusters from spatial data containing outliers; however, they do not consider temporal durations and thus have not been used for extracting stay regions. We extended a density-based algorithm so that it would work in a duration-based manner online and have robustness to missing points in stay regions while keeping its advantages. Experiments on real trajectories of 13 users conducting their daily activities for three weeks demonstrated that our algorithm statistically significantly outperformed five state-of-the-art algorithms in terms of F1 score and works well without trajectory preprocessing consisting of filtering, interpolating, and smoothing.

Online Social Media Analysis and Visualization | 2014

Demographic and Psychographic Estimation of Twitter Users Using Social Structures

Jun Ito; Kyosuke Nishida; Takahide Hoshide; Hiroyuki Toda; Tadasu Uchiyama

Word-of-mouth marketing on social media has become more urgent with the increasing number of users and posts, and it is important to estimate user attributes because most users on Twitter do not reveal their attributes. We propose new methods for estimating user attributes of a Twitter user from the user’s contents (a profile document and tweets) and social neighbors, i.e. those with whom the user has mentioned. This study has three contributions on the task of user attribute estimation. First, we investigate a labeling method that finds the users associated with a blog account and uses their profile attributes on blog as true labels of training tweet data. We confirm that using the blog labels achieved higher accuracy than manual labeling and pattern matching methods, with respect to four attributes (gender, age, occupation, and interests). Second, we validate the best way to combine bag-of-words features of profile documents and tweets. We evaluate nine combining methods and show that words in profile documents should be treated distinctively from those in tweets. Third, we reveal that to adjust amount of information from social neighbors affects estimation accuracy. We experiment three adjustment levels and show that our method, which utilizes the target user’s profile document and tweets and the neighbors’ profile documents (not including tweets), achieved the best accuracy. Overall experiments conducted on the estimation of the four attributes show that our method achieved higher accuracy than conventional methods that use manually-labeled tweets.

international conference on social computing | 2018

Automatically Generating Head Nods with Linguistic Information.

Ryo Ishii; Ryuichiro Higashinaka; Kyosuke Nishida; Taichi Katayama; Nozomi Kobayashi; Junji Tomita

In addition to verbal behavior, nonverbal behavior is an important aspect for an embodied dialogue system to be able to conduct a smooth conversation with the user. Researchers have focused on automatically generating nonverbal behavior from speech and language information of dialogue systems. We propose a model to generate head nods accompanying an utterance from natural language. To the best of our knowledge, previous studies generated nods from the final words at the end of an utterance, i.e. bag of words. In this study, we focused on various text analyzed using linguistic information such as dialog act, part of speech, a large-scale Japanese thesaurus, and word position in a sentence. First, we compiled a Japanese corpus of 24 dialogues including utterance and nod information. Next, using the corpus, we created a model that generates nod during a phrase by using dialog act, part of speech, a large-scale Japanese thesaurus, word position in a sentence in addition to bag of words. The results indicate that our model outperformed a model using only bag of words and chance level. The results indicate that dialog act, part of speech, the large-scale Japanese thesaurus, and word position are useful to generate nods. Moreover, the model using all types of linguistic information had the highest performance. This result indicates that several types of linguistic information have the potential to be strong predictors with which to generate nods automatically.

conference on information and knowledge management | 2018

Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension

Kyosuke Nishida; Itsumi Saito; Atsushi Otsuka; Hisako Asano; Junji Tomita

This study considers the task of machine reading at scale (MRS) wherein, given a question, a system first performs the information retrieval (IR) task of finding relevant passages in a knowledge source and then carries out the reading comprehension (RC) task of extracting an answer span from the passages. Previous MRS studies, in which the IR component was trained without considering answer spans, struggled to accurately find a small number of relevant passages from a large set of passages. In this paper, we propose a simple and effective approach that incorporates the IR and RC tasks by using supervised multi-task learning in order that the IR component can be trained by considering answer spans. Experimental results on the standard benchmark, answering SQuAD questions using the full Wikipedia as the knowledge source, showed that our model achieved state-of-the-art performance. Moreover, we thoroughly evaluated the individual contributions of our model components with our new Japanese dataset and SQuAD. The results showed significant improvements in the IR task and provided a new perspective on IR for RC: it is effective to teach which part of the passage answers the question rather than to give only a relevance score to the whole passage.

Explore More