Na Dai | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Na Dai is active.

Explore More

Publication

Featured researches published by Na Dai.

international acm sigir conference on research and development in information retrieval | 2011

Learning to rank for freshness and relevance

Na Dai; Milad Shokouhi; Brian D. Davison

Freshness of results is important in modern web search. Failing to recognize the temporal aspect of a query can negatively affect the user experience, and make the search engine appear stale. While freshness and relevance can be closely related for some topics (e.g., news queries), they are more independent in others (e.g., time insensitive queries). Therefore, optimizing one criterion does not necessarily improve the other, and can even do harm in some cases. We propose a machine-learning framework for simultaneously optimizing freshness and relevance, in which the trade-off is automatically adaptive to query temporal characteristics. We start by illustrating different temporal characteristics of queries, and the features that can be used for capturing these properties. We then introduce our supervised framework that leverages the temporal profile of queries (inferred from pseudo-feedback documents) along with the other ranking features to improve both freshness and relevance of search results. Our experiments on a large archival web corpus demonstrate the efficacy of our techniques.

international acm sigir conference on research and development in information retrieval | 2010

Freshness matters: in flowers, food, and web authority

Na Dai; Brian D. Davison

The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking algorithms typically run on a single web snapshot without concern for user activities associated with the dynamics of web pages and links. Therefore, a stale page popular many years ago may still achieve a high authority score due to its accumulated in-links. To remedy this situation, we propose a temporal web link-based ranking scheme, which incorporates features from historical author activities. We quantify web page freshness over time from page and in-link activity, and design a web surfer model that incorporates web freshness, based on a temporal web graph composed of multiple web snapshots at different time points. It includes authority propagation among snapshots, enabling link structures at distinct time points to influence each other when estimating web page authority. Experiments on a real-world archival web corpus show our approach improves upon PageRank in both relevance and freshness of the search results.

adversarial information retrieval on the web | 2009

Looking into the past to better classify web spam

Na Dai; Brian D. Davison; Xiaoguang Qi

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue that historical web page information may also be important in spam classification. In this paper, we use content features from historical versions of web pages to improve spam classification. We use supervised learning techniques to combine classifiers based on current page content with classifiers based on temporal features. Experiments on the WEBSPAM-UK2007 dataset show that our approach improves spam classification F-measure performance by 30% compared to a baseline classifier which only considers current page content.

european conference on information retrieval | 2010

Mining anchor text trends for retrieval

Na Dai; Brian D. Davison

Anchor text has been considered as a useful resource to complement the representation of target pages and is broadly used in web search. However, previous research only uses anchor text of a single snapshot to improve web search. Historical trends of anchor text importance have not been well modeled in anchor text weighting strategies. In this paper, we propose a novel temporal anchor text weighting method to incorporate the trends of anchor text creation over time, which combines historical weights of anchor text by propagating the anchor text weights among snapshots over the time axis. We evaluate our method on a real-world web crawl from the Stanford WebBase. Our results demonstrate that the proposed method can produce a significant improvement in ranking quality.

international acm sigir conference on research and development in information retrieval | 2010

Capturing page freshness for web search

Na Dai; Brian D. Davison

Freshness has been increasingly realized by commercial search engines as an important criteria for measuring the quality of search results. However, most information retrieval methods focus on the relevance of page content to given queries without considering the recency issue. In this work, we mine page freshness from web user maintenance activities and incorporate this feature into web search. We first quantify how fresh the web is over time from two distinct perspectives--the page itself and its in-linked pages--and then exploit a temporal correlation between two types of freshness measures to quantify the confidence of page freshness. Results demonstrate page freshness can be better quantified when combining with temporal freshness correlation. Experiments on a real-world archival web corpus show that incorporating the combined page freshness into the searching process can improve ranking performance significantly on both relevance and freshness.

Online Information Review | 2011

Topic‐sensitive search engine evaluation

Na Dai; Brian D. Davison

Purpose – This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation.Design/methodology/approach – The authors reweight queries used in two TREC tasks to make them match three real background topic distributions, and show that the performance rankings of retrieval systems are quite different.Findings – It is found that search engines tend to perform similarly on queries about the same topic; and search engine performance is sensitive to the topic distribution of queries used in evaluation.Originality/value – Using experiments with multiple real‐world query logs, the paper demonstrates weaknesses in the current evaluation model of retrieval systems.

european conference on information retrieval | 2010

Mining neighbors' topicality to better control authority flow

Na Dai; Brian D. Davison; Yaoshuang Wang

Web pages are often recognized by others through contexts. These contexts determine how linked pages influence and interact with each other. When differentiating such interactions, the authority of web pages can be better estimated by controlling the authority flows among pages. In this work, we determine the authority distribution by examining the topicality relationship between associated pages. In addition, we find it is not enough to quantify the influence of authority propagation from only one type of neighbor, such as parent pages in PageRank algorithm, since web pages, like people, are influenced by diverse types of neighbors within the same network. We propose a probabilistic method to model authority flows from different sources of neighbor pages. In this way, we distinguish page authority interaction by incorporating the topical context and the relationship between associated pages. Experiments on the 2003 and 2004 TREC Web Tracks demonstrate that this approach outperforms other competitive topical ranking models and produces a more than 10% improvement over PageRank on the quality of top 10 search results. When increasing the types of incorporated neighbor sources, the performance shows stable improvements.

international acm sigir conference on research and development in information retrieval | 2011

Multi-objective optimization in learning to rank

Na Dai; Milad Shokouhi; Brian D. Davison

Supervised learning to rank algorithms typically optimize for high relevance and ignore other facets of search quality, such as freshness and diversity. Prior work on multi-objective ranking trained rankers focused on using hybrid labels that combine overall quality of documents, and implicitly incorporate multiple criteria into quantifying ranking risks. However, these hybrid scores are usually generated based on heuristics without considering potential correlations between individual facets (e.g., freshness versus relevance). In this poster, we empirically demonstrate that the correlation between objective facets in multi-criteria ranking optimization may significantly influence the effectiveness of trained rankers with respect to each objective.

conference on information and knowledge management | 2009

Vetting the links of the web

Na Dai; Brian D. Davison

Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, well-known directories such as the dmoz Open Directory Project, which maintains links to representative and authoritative external web pages within their various topics. Therefore, such sites involve many editors to manually revisit and revise links that have become out-of-date. To remedy this situation, we propose the novel web mining task of identifying outdated links on the web. We build a general classification model, primarily using local and global temporal features extracted from historical content, topic, link and time-focused changes over time. We evaluate our system via five-fold cross-validation on more than fifteen thousand ODP external links selected from thirteen top-level categories. Our system can predict the actions of ODP editors more than 75% of the time. Our models and predictions could be useful for various applications that depend on analysis of web links, including ranking and crawling.

acm conference on hypertext | 2011

Bridging link and query intent to enhance web search

Na Dai; Xiaoguang Qi; Brian D. Davison

Explore More

Collaboration

Dive into the Na Dai's collaboration.

Top Co-Authors

Brian D. Davison

Lehigh University

View shared research outputs

Top Co-Authors

Xiaoguang Qi

Lehigh University

View shared research outputs

Top Co-Authors

Milad Shokouhi

Microsoft

View shared research outputs

Top Co-Authors

Yaoshuang Wang

Lehigh University

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Na Dai is active.

Publication

Featured researches published by Na Dai.

Learning to rank for freshness and relevance

Freshness matters: in flowers, food, and web authority

Looking into the past to better classify web spam

Mining anchor text trends for retrieval

Capturing page freshness for web search

Topic‐sensitive search engine evaluation

Mining neighbors' topicality to better control authority flow

Multi-objective optimization in learning to rank

Vetting the links of the web

Bridging link and query intent to enhance web search

Collaboration

Dive into the Na Dai's collaboration.