Gen Hattori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gen Hattori is active.

Explore More

Publication

Featured researches published by Gen Hattori.

international world wide web conferences | 2007

Robust web page segmentation for mobile terminal using content-distances and page layout information

Gen Hattori; Keiichiro Hoashi; Kazunori Matsumoto; Fumiaki Sugaya

The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing from PCs, it is difficult for mobile phone users to obtain sufficient information from the Web. Therefore, a method to reconstruct PC-optimized Web pages for mobile phone users is essential. An example approach is to segment the Web page based on its structure, and utilize the hierarchy of the content element to regenerate a page suitable for mobile phone browsing. In our previous work, we have examined a robust automatic Web page segmentation scheme which uses the distance between content elements based on the relative HTML tag hierarchy, i.e., the number and depth of HTML tags in Web pages. However, this scheme has a problem that the content-distance based on the order of HTML tags does not always correspond to the intuitional distance between content elements on the actual layout of a Web page. In this paper, we propose a hybrid segmentation method which segments Web pages based on both the content-distance calculated by the previous scheme, and a novel approach which utilizes Web page layout information. Experiments conducted to evaluate the accuracy of Web page segmentation results prove that the proposed method can segment Web pages more accurately than conventional methods. Furthermore, implementation and evaluation of our system on the mobile phone prove that our method can realize superior usability compared to commercial Web browsers.

Knowledge Based Systems | 2013

Twitter user profiling based on text and community mining for market analysis

Kazushi Ikeda; Gen Hattori; Chihiro Ono; Hideki Asoh; Teruo Higashino

This paper proposes demographic estimation algorithms for profiling Twitter users, based on their tweets and community relationships. Many people post their opinions via social media services such as Twitter. This huge volume of opinions, expressed in real time, has great appeal as a novel marketing application. When automatically extracting these opinions, it is desirable to be able to discriminate discrimination based on user demographics, because the ratio of positive and negative opinions differs depending on demographics such as age, gender, and residence area, all of which are essential for market analysis. In this paper, we propose a hybrid text-based and community-based method for the demographic estimation of Twitter users, where these demographics are estimated by tracking the tweet history and clustering of followers/followees. Our experimental results from 100,000 Twitter users show that the proposed hybrid method improves the accuracy of the text-based method. The proposed method is applicable to various user demographics and is suitable even for users who only tweet infrequently.

symposium on applications and the internet | 2004

Implementation and evaluation of message delegation middleware for ITS application

Gen Hattori; Chihiro Ono; Satoshi Nishiyama; Hiroki Horiuchi

There are many applications using communication between vehicles and systems on the fixed network in intelligent transportation systems (ITS). Although DSRC, wireless LAN, cellular phones, PHS, etc. can be used as communication media for vehicle-road communication, since characteristics, such as communicative area, transmission speed, communication cost, differ. The applications using vehicle-road communication need to have a function that selects one of these communication media by a certain standard. Moreover, the application also needs to have a reliable communication function, which sends a message only when the communication channel is available, since it is assumed that intermittence of a communication channel takes place frequently owing to the spot communication environment of DSRC and wireless LAN. To improve the development efficiency of applications in the DSRC network, so far, we have proposed middleware with a message delegation system that realizes reliable message delivery based on the information of the network status of the DSRC network. We describe implementation of the middleware and show the results of the middleware through evaluation.

web information systems engineering | 2014

Feature Based Sentiment Analysis of Tweets in Multiple Languages

Maike Erdmann; Kazushi Ikeda; Hiromi Ishizaki; Gen Hattori; Yasuhiro Takishima

Feature based sentiment analysis is normally conducted using review Web sites, since it is difficult to extract accurate product features from tweets. However, Twitter users express sentiment towards a large variety of products in many different languages. Besides, sentiment expressed on Twitter is more up to date and represents the sentiment of a larger population than review articles. Therefore, we propose a method that identifies product features using review articles and then conduct sentiment analysis on tweets containing those features. In that way, we can increase the precision of feature extraction by up to 40% compared to features extracted directly from tweets. Moreover, our method translates and matches the features extracted for multiple languages and ranks them based on how frequently the features are mentioned in the tweets of each language. By doing this, we can highlight the features that are the most relevant for multilingual analysis.

advanced information networking and applications | 2013

Early Detection Method of Service Quality Reduction Based on Linguistic and Time Series Analysis of Twitter

Kazushi Ikeda; Gen Hattori; Chihiro Ono; Hideki Asoh; Teruo Higashino

This paper proposes a method for detecting service quality reduction at an early stage based on a linguistic and time series analysis of Twitter. Recently, many people post their opinions about products and service quality via social networking services, such as Twitter. The number of tweets related to service quality increases when service quality reductions such as communication failures and train delays occur. It is crucial for the service operators to recover service quality at an early stage in order to maintain customer satisfaction. Tweets can be considered as an important clue for detecting service quality reduction. In this paper, we propose a method for early detection of service quality reduction by making the best use of the Twitter platform, which includes tweets as text information and has a feature of real time communication. The proposed method consists of a linguistic analysis and time series analysis of tweets. In the linguistic analysis, semi-automatic method is proposed to construct a service specific dictionary, which is used to extract negative tweets related to the services with high accuracy. In the time series analysis, statistical modeling is used for the early and accurate anomaly detection from the time series of the negative tweets. The experimental results show that the extraction accuracy of negative tweets and the detection accuracy of service quality reduction are significantly improved.

pervasive computing and communications | 2003

Making Java-enabled mobile phone as ubiquitous terminal by lightweight FIPA compliant agent platform

Gen Hattori; Satoshi Nishiyama; Chihiro Ono; Hiroki Horiuchi

We discuss the design issues on lightweight and FIPA compliant agent platform for Java-enabled mobile phones and describe the design of such agent platform. This platform changes Java-enabled mobile phones to ubiquitous terminals by providing place for agent applications. Combined with location services, it can be used for various ubiquitous services. We also show the performance comparison of the prototype with LEAP, another lightweight agent platform.

pacific rim international conference on artificial intelligence | 2012

Hierarchical training of multiple SVMs for personalized web filtering

Maike Erdmann; Duc-Dung Nguyen; Tomoya Takeyoshi; Gen Hattori; Kazunori Matsumoto; Chihiro Ono

The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose a transfer learning approach called Hierarchical Training for Multiple SVMs. HTMSVM identifies common data among similar training sets and trains the common data sets first, in order to obtain initial solutions. These initial solutions then reduce the time for training the individual training sets without influencing classification accuracy. In an experiment, in which we trained five Web content filters with 80% of common and 20% of inconsistently labeled training examples, HTMSVM was able to predict hazardous Web pages with a training time of only 26% to 41% compared to LibSVM, but the same classification accuracy (more than 91%).

international universal communication symposium | 2010

Identification of malicious web pages for crawling based on network-related attributes of web server

Gen Hattori; Kazunori Matsumoto; Chihiro Ono; Yasuhiro Takishima

In this paper, we propose an identification algorithm of malicious Web pages for crawlers, which collect Web pages for the later task to detect malicious Web pages based on the content. Recently, some organization would have to automatically crawl the Web pages with the crawlers for later checking by humans. However, since manually checking Web pages is an expensive task, the total cost would be enormous if the crawlers collected Web pages indiscriminately. Some automatically checking systems can make the human task more efficient, however, they cannot be used to increase the number of malicious Web pages. To solve these problems, we propose an efficient algorithm to determine whether the sites include malicious or dangerous content for crawling Web pages. The feature of the algorithm is that it can determine the probability of a site being malicious or harmless as calculated from the network-related attributes of the Web server derived from the URL string. The attributes refer to the domain name, directory name, and the IP (Internet Protocol) address of the nearest router from the Web server. To confirm the effectiveness of the proposed algorithm, we conducted an evaluation experiment in a simulated environment. We compared the number of the collected malicious Web pages by the proposed algorithm with that of a random sampling algorithm in the experiment. The advantage is +82.8% high in maximum on a stable condition. We also showed an example of crawling trajectories using the proposed algorithm and conventional crawling algorithms. The example showed that the proposed algorithm is able to collect more malicious Web pages than the conventional algorithms.

australasian joint conference on artificial intelligence | 2010

Hazardous Document Detection Based on Dependency Relations and Thesaurus

Kazushi Ikeda; Tadashi Yanagihara; Gen Hattori; Kazunori Matsumoto; Yasuhiro Takisima

In this paper, we propose algorithms to increase the accuracy of hazardous Web page detection by correcting the detection errors of typical keyword-based algorithms based on the dependency relations between the hazardous keywords and their neighboring segments. Most typical text-based filtering systems ignore the context where the hazardous keywords appear. Our algorithms automatically obtain segment pairs that are in dependency relations and appear to characterize hazardous documents. In addition, we also propose a practical approach to expanding segment pairs with a thesaurus. Experiments with a large number of Web pages show that our algorithms increase the detection F value by 7.3% compared to the conventional algorithms.

international conference on social computing | 2013

Automatic Labeling of Training Data for Collecting Tweets for Ambiguous TV Program Titles

Maike Erdmann; Erik Ward; Kazushi Ikeda; Gen Hattori; Chihiro Ono; Yasuhiro Takishima

Twitter is a popular medium for sharing opinions on TV programs, and the analysis of TV related tweets is attracting a lot of interest. However, when collecting all tweets containing a given TV program title, we obtain a large number of unrelated tweets, due to the fact that many of the TV program titles are ambiguous. Using supervised learning, TV related tweets can be collected with high accuracy. The goal of our proposed method is to automate the labeling process, in order to eliminate the cost required for data labeling without sacrificing classification accuracy. When creating the training data, we use only tweets of unambiguous TV program titles. In order to decide whether a TV program title is ambiguous, we automatically determine whether it can be used as a common expression or named entity. In two experiments, in which we collected tweets for 32 ambiguous TV program titles, we achieved the same (78.2%) or even higher classification accuracy (79.1%) with automatically labeled training data as with manually labeled data, while effectively eliminating labeling costs.

Explore More