Yasuhito Asano | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuhito Asano is active.

Explore More

Publication

Featured researches published by Yasuhito Asano.

database and expert systems applications | 2003

Finding Neighbor Communities in the Web Using Inter-site Graph

Yasuhito Asano; Hiroshi Imai; Masashi Toyoda; Masaru Kitsuregawa

In recent years, link-based information retrieval methods from the Web are developed. A framework of these methods is a Web graph using pages as vertices and Web-links as edges. In the last year, the authors have claimed that an inter-site graph using sites as vertices and global-links (links between sites) as edges is more natural and useful as a framework for link-based information retrieval than a Web graph. They have proposed directory-based sites as a new model of Web sites and established a method of identifying them from URL and Web-link data. They have examined that this method can identify directory-based sites almost correctly by using data of URLs and links in .jp domain. In this paper, we show that this framework is also useful for information retrieval in response to user’s query. We develop a system called Neighbor Community Finder (NCF, for short). NCF finds Web communities related to given URLs by constructing an inter-site graph with neighborhood sites and links obtained from the real Web on demand. We show that in several cases NCF is a more effective tool for finding related pages than Google’s service by computational experiments.

IEICE Transactions on Information and Systems | 2008

Improvements of HITS Algorithms for Spam Links

Yasuhito Asano; Yu Tezuka; Takao Nishizeki

The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring Web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm were really related to a given topic, and hence the algorithm could be used to find related pages. However, the algorithm and the variants including Bharats improved HITS, abbreviated to BHITS, proposed by Bharat and Henzinger cannot be used to find related pages any more on todays Web, due to an increase of spam links. In this paper, we first propose three methods to find “linkfarms,” that is, sets of spam links forming a densely connected subgraph of a Web graph. We then present an algorithm, called a trust-score algorithm, to give high scores to pages which are not spam pages with a high probability. Combining the three methods and the trust-score algorithm with BHITS, we obtain several variants of the HITS algorithm. We ascertain by experiments that one of them, named TaN+BHITS using the trust-score algorithm and the method of finding linkfarms by employing name servers, is most suitable for finding related pages on todays Web. Our algorithms take time and memory no more than those required by the original HITS algorithm, and can be executed on a PC with a small amount of main memory.

IEICE Transactions on Information and Systems | 2006

Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

Yasuhito Asano; Takao Nishizeki; Masashi Toyoda; Masaru Kitsuregawa

There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is very effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.

web age information management | 2003

Compact Encoding of the Web Graph Exploiting Various Power Laws

Yasuhito Asano; Tsuyoshi Ito; Hiroshi Imai; Masashi Toyoda; Masaru Kitsuregawa

Compact encodings of the web graph are required in order to keep the graph on main memory and to perform operations on the graph efficiently. Link2, the second version of the Link Database by Randall et al., which is part of the Connectivity Server, represented the adjacency list of each vertex by the variable-length nybble codes of delta values. In this paper, the fact is shown that certain variables related to the web graph have power distributions, and the reason is explained why using variable-length nybble codes in Link2 led to a compact representation of the graph from the statistical viewpoint on the basis of the relationship between power distributions and generalization of the variable-length nybble code. Besides, another encoding of the web graph based on these fact and relationship is proposed, and it is compared with Link2 and the encoding proposed by Guillaume et al. in 2002. Though our encoding is slower than Link2, it is 10% more compact than Link2. And our encoding is 20% more compact than the encoding proposed by Guillaume et al. and is comparable to it in terms of extraction time.

web information systems engineering | 2002

Applying site information to information retrieval from the Web

Yasuhito Asano; Hiroshi Imai; Masashi Toyoda; Masaru Kitsuregawa

In recent years, several information retrieval methods using information about Web-links have been developed, such as HITS and trawling. In order to analyze Web-links dividing into links inside each Web site (local-links) and links between Web sites (global-links)for information retrieval, a proper model of the Web site is required. In existing research, a Web server is used as a model of the Web site. This idea works relatively well when a Web site corresponds to a server, as is the case for public Web sites, but works poorly when multiple Web sites correspond to a server, as is the case for private Web sites on rental Web servers. We propose a new model of the Web site, directory-based site, to handle typical private sites, and a method to identify them using information about the URL and Web-links. We verify the method can approximately identify, at a rate of 66% of over 110,000 servers, whether each server has multiple directory-based sites or not, and extract over 500,000 directory-based sites and 4 million global-links by computational experiments using jp-domain URLs and Web-link data contains over 23 million URLs and 100 million Web-links, collected from July to August 2000, by Toyoda and Kitsuregawa. We also propose a new framework of Web-link based information retrieval that uses directory-based sites and global-links instead of Web pages and whole Web-links respectively, and examine the effectiveness of our framework by comparing a result of trawling on our framework to one on the existing framework.

web information systems engineering | 2005

Mining communities on the web using a max-flow and a site-oriented framework

Yasuhito Asano; Takao Nishizeki; Masashi Toyoda

There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is significantly effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.

graph drawing | 2003

Web-Linkage Viewer: Drawing Links in the Web Based on a Site-Oriented Framework

Yasuhito Asano; Takao Nishizeki

In recent years, link-based information retrieval methods from the Web are developed, such as HITS and Trawling. Since these methods utilize characteristic graph structures of links in the Web, analyzing the links by drawing them understandably will play an important role in the link-based information retrieval. Moreover, Asano et al.[2], [3] have shown that a framework using a site as a unit of information is more natural and useful for link-based information retrieval than the existing framework using a page as a unit. Therefore, we should distinguish links inside a site (called local-links) between links between sites (called global-links) and analyze their own graph structures. However, existing drawing tools, such as Gravis [1] and H3Viewer[5], do not distinguish local-links between global-links, and therefore they cannot draw graph structures of these links understandably. In this paper, we propose a new drawing tool, named Web-linkage Viewer, in order to draw graph structures in the Web according to the site-oriented framework.

web age information management | 2002

Web-Linkage Viewer: Finding Graph Structures in the Web

Yasuhito Asano; Hiroshi Imai; Masashi Toyoda; Masaru Kitsuregawa

Web -linkage Viewer is a system that draws the Web-links, dividing into the global-links and the local-links, and placing the top nodes of sites in the Web and the global-links on a spherical surface and the local-links as trees in cones emanating from the spherical surface, to display graph structures in the Weblinks understandably as Figure 1. We define the site as follows to define the global-links and the local-links, instead of the ambiguous concept in daily life.

Journal of The Operations Research Society of Japan | 2000