Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Ahn is active.

Publication


Featured researches published by David Ahn.


conference on information and knowledge management | 2009

Feature selection for ranking using boosted trees

Feng Pan; Tim Converse; David Ahn; Franco Salvetti; Gianluca Donato

Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time, only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (selecting the features with highest relative importance as computed by boosted trees; discounting importance by feature similarity and a randomized approach. We evaluate and compare these approaches using data from a commercial search engine. The experimental results show that the proposed randomized feature selection with feature-importance-based backward elimination outperforms greedy approaches and achieves a comparable relevance with 30 features to a full-feature model trained with 419 features and the same modeling parameters.


computer and information technology | 2011

Greedy and Randomized Feature Selection for Web Search Ranking

Feng Pan; Tim Converse; David Ahn; Franco Salvetti; Gianluca Donato

Modern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, not all available features can be used for ranking, and in fact, much of the time only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. To this end, we explore different feature selection methods using boosted regression trees, including both greedy approaches (i.e., selecting the features with the highest relative influence as computed by boosted trees, discounting importance by feature similarity) and randomized approaches (i.e., best-only genetic algorithm, a proposed more efficient randomized method with feature-importance-based backward elimination). We evaluate and compare these approaches using two data sets, one from a commercial Wikipedia search engine and the other from a commercial Web search engine. The experimental results show that the greedy approach that selects top features with the highest relative influence performs close to the full-feature model, and the randomized feature selection with feature-importance-based backward elimination outperforms all other randomized and greedy approaches, especially on the Wikipedia data.


international conference on weblogs and social media | 2010

TweetMotif: Exploratory Search and Topic Summarization for Twitter

Brendan O'Connor; Michel Krieger; David Ahn


Archive | 2011

SEARCH RESULT ENTRY TRUNCATION USING PIXEL-BASED APPROXIMATION

Daniel Marantz; Keith Alan Regier; Tejas Nadkarni; David Ahn; Gianluca Donato


Archive | 2010

Identifying a topic-relevant subject

David Ahn; Michael Paul Bieniosek; Franco Salvetti; Giovanni Lorenzo Thione; Ian Robert Collins; Toby Takeo Sterrett


Archive | 2008

Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System

Richard S. Crouch; Martin Henk van den Berg; Franco Salvetti; Giovanni Lorenzo Thione; David Ahn


Archive | 2008

Identification of semantic relationships within reported speech

Richard S. Crouch; Martin Henk van den Berg; David Ahn; Olga Gurevich; Barney Pell; Livia Polanyi; Scott Prevost; Giovanni Lorenzo Thione


Archive | 2010

Generating snippets based on content features

Valerie Rose Nygaard; Riccardo Turchetto; Joanna Mun Yee Chan; Christian Biemann; David Ahn; Andrea Burbank; Feng Pan; Timothy Mcdonnell Converse; James Michael Reinhold; Tracy Holloway King


Archive | 2010

Query pattern generation for answers coverage expansion

Franco Salvetti; Ying Tu; David Ahn


Archive | 2011

AUTOMATIC INFORMATION PRESENTATION OF DATA AND ACTIONS IN SEARCH RESULTS

Krishnan Thazhathekalam; David Ahn; Andrea Burbank; Franco Salvetti; Christopher Jon Jewell

Collaboration


Dive into the David Ahn's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Feng Pan

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge