Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Baoning Wu is active.

Publication


Featured researches published by Baoning Wu.


international world wide web conferences | 2006

Topical TrustRank: using topicality to combat web spam

Baoning Wu; Vinay Goel; Brian D. Davison

Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the sense that the seed set used by TrustRank may not be sufficiently representative to cover well the different topics on the Web. Also, for a given seed set, TrustRank has a bias towards larger communities. We propose the use of topical information to partition the seed set and calculate trust scores for each topic separately to address the above issues. A combination of these trust scores for a page is used to determine its ranking. Experimental results on two large datasets show that our Topical TrustRank has a better performance than TrustRank in demoting spam sites or pages. Compared to TrustRank, our best technique can decrease spam from the top ranked sites by as much as 43.1%.


international world wide web conferences | 2006

Detecting semantic cloaking on the web

Baoning Wu; Brian D. Davison

By supplying different versions of a web page to search engines and to browsers, a content provider attempts to cloak the real content from the view of the search engine. Semantic cloaking refers to differences in meaning between pages which have the effect of deceiving search engine ranking algorithms. In this paper, we propose an automated two-step method to detect semantic cloaking pages based on different copies of the same page downloaded by a web crawler and a web browser. The first step is a filtering step, which generates a candidate list of semantic cloaking pages. In the second step, a classifier is used to detect semantic cloaking pages from the candidates generated by the filtering step. Experiments on manually labeled data sets show that we can generate a classifier with a precision of 93% and a recall of 85%. We apply our approach to links from the dmoz Open Directory Project and estimate that more than 50,000 of these pages employ semantic cloaking.


adversarial information retrieval on the web | 2007

Extracting link spam using biased random walks from spam seed sets

Baoning Wu; Kumar Chellapilla

Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link spam. Link farms and link exchanges are two common instances of link spam that produce spam communities -- i.e., clusters in the web graph. In this paper, we present a directed approach to extracting link spam communities when given one or more members of the community. In contrast to previous completely automated approaches to finding link spam, our method is specifically designed to be used interactively. Our approach starts with a small spam seed set provided by the user and simulates a random walk on the web graph. The random walk is biased to explore the local neighborhood around the seed set through the use of decay probabilities. Truncation is used to retain only the most frequently visited nodes. After termination, the nodes are sorted in decreasing order of their final probabilities and presented to the user. Experiments using manually labeled link spam data sets and random walks from a single seed domain show that the approach achieves over 95.12% precision in extracting large link farms and 80.46% precision in extracting link exchange centroids.


international acm sigir conference on research and development in information retrieval | 2007

Winnowing wheat from the chaff: propagating trust to sift spam from the web

Lan Nie; Baoning Wu; Brian D. Davison

The Web today includes many pages intended to deceive search engines, and attain an unwarranted result ranking. Since the links among web pages are used to calculate authority, ranking systems would benefit from knowing which pages contain content to be trusted and which do not. We propose and compare various trust propagation methods to estimate the trustworthiness of each page. We find that a non-trust-preserving propagation method is able to achieve close to a fifty percent improvement over TrustRank in separating spam from non-spam pages.


international world wide web conferences | 2007

A cautious surfer for PageRank

Lan Nie; Baoning Wu; Brian D. Davison

This work proposes a novel cautious surfer to incorporate trust into the process of calculating authority for web pages. We evaluate a total of sixty queries over two large, real-world datasets to demonstrate that incorporating trust can improve PageRanks performance.


international acm sigir conference on research and development in information retrieval | 2007

Ranking by community relevance

Lan Nie; Brian D. Davison; Baoning Wu

A web page may be relevant to multiple topics; even when nominally on a single topic, the page may attract attention (and thus links) from multiple communities. Instead of indiscriminately summing the authority provided by all pages, we decompose a web page into separate subnodes with respect to each community pointing to it. By considering the relevance of these communities, we are able to better model the query-specific reputation for each potential result. We apply a total of 125 queries to the TREC .GOV dataset to demonstrate how the use of community relevance can improve ranking performance.


Internet Research | 2008

Connecting P2P to the web: Lessons from a prototype Gnutella‐WWW gateway

Brian D. Davison; Wei Zhang; Baoning Wu

Purpose – The purpose of this paper is to describe a means to improve the accessibility of files across different delivery platforms, making it possible to use a single search modality. The paper shows that both peer‐to‐peer file sharing networks and the worldwide web provide extensive information resources, and either network may contain data that satisfy a searchers information need.Design/methodology/approach – The paper proposes a gateway between the worldwide web and peer‐to‐peer networks that permits searchers on one side to be able to seamlessly search and retrieve files on the other side of the gateway. The design and prototype implementation of such a gateway to Gnutella is detailed, along with access statistics from test deployments and lessons learned.Findings – The prototype implementation was found to demonstrate the feasibility of a seamless gateway between the Gnutella network and the worldwide web. Gnutella users saw millions of web search results and initiated retrievals via the gateway ...


international world wide web conferences | 2004

Lessons from a Gnutella-web gateway

Brian D. Davison; Wei Zhang; Baoning Wu

We present a gateway between the WWW and the Gnutella peer-to-peer network that permits searchers on one side to be able to search and retrieve files on the other side of the gateway. This work improvesthe accessibility of files across different delivery platforms, making it possible to use a single search modality. We outline our design and implementation, present access statistics from a test deployment and discuss lessons learned.


international world wide web conferences | 2005

Identifying link farm spam pages

Baoning Wu; Brian D. Davison


adversarial information retrieval on the web | 2005

Cloaking and Redirection: A Preliminary Study.

Baoning Wu; Brian D. Davison

Collaboration


Dive into the Baoning Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge