Is this you? Create Your Porfile

Jaime Arguello

University of North Carolina at Chapel Hill

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaime Arguello is active.

Explore More

Publication

Featured researches published by Jaime Arguello.

computer supported collaborative learning | 2008

Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning.

Carolyn Penstein Rosé; Yi-Chia Wang; Yue Cui; Jaime Arguello; Karsten Stegmann; Armin Weinberger; Frank Fischer

In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multi-dimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in.

international acm sigir conference on research and development in information retrieval | 2009

Sources of evidence for vertical selection

Jaime Arguello; Fernando Diaz; Jamie Callan; Jean Francois Crespo

Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engines main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.

international acm sigir conference on research and development in information retrieval | 2008

Retrieval and feedback models for blog feed search

Jonathan L. Elsas; Jaime Arguello; Jamie Callan; Jaime G. Carbonell

Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC 2007 Blog Distillation task[12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudo-relevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22% and 14% improvement in MAP over the unexpanded query for our baseline and federated algorithms respectively.

european conference on information retrieval | 2011

A methodology for evaluating aggregated search results

Jaime Arguello; Fernando Diaz; Jamie Callan; Ben Carterette

Aggregated search is the task of incorporating results from different specialized search services, or verticals, into Web search results. While most prior work focuses on deciding which verticals to present, the task of deciding where in the Web results to embed the vertical results has received less attention. We propose a methodology for evaluating an aggregated set of results. Our method elicits a relatively small number of human judgements for a given query and then uses these to facilitate a metric-based evaluation of any possible presentation for the query. An extensive user study with 13 verticals confirms that, when users prefer one presentation of results over another, our metric agrees with the stated preference. By using Amazons Mechanical Turk, we show that reliable assessments can be obtained quickly and inexpensively.

conference on information and knowledge management | 2011

Learning to aggregate vertical results into web search results

Jaime Arguello; Fernando Diaz; Jamie Callan

Aggregated search is the task of integrating results from potentially multiple specialized search services, or verticals, into the Web search results. The task requires predicting not only which verticals to present (the focus of most prior research), but also predicting where in the Web results to present them (i.e., above or below the Web results, or somewhere in between). Learning models to aggregate results from multiple verticals is associated with two major challenges. First, because verticals retrieve different types of results and address different search tasks, results from different verticals are associated with different types of predictive evidence (or features). Second, even when a feature is common across verticals, its predictiveness may be vertical-specific. Therefore, approaches to aggregating vertical results require handling an inconsistent feature representation across verticals, and, potentially, a vertical-specific relationship between features and relevance. We present 3 general approaches that address these challenges in different ways and compare their results across a set of 13 verticals and 1070 queries. We show that the best approaches are those that allow the learning algorithm to learn a vertical-specific relationship between features and relevance.

conference on information and knowledge management | 2009

Classification-based resource selection

Jaime Arguello; Jamie Callan; Fernando Diaz

In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index or aggregating content for web search. Resource selection refers to the subtask of deciding, given a query, which collections to search. Most existing resource selection methods rely on evidence found in collection content. We present an approach to resource selection that combines multiple sources of evidence to inform the selection decision. We derive evidence from three different sources: collection documents, the topic of the query, and query click-through data. We combine this evidence by treating resource selection as a multiclass machine learning problem. Although machine learned approaches often require large amounts of manually generated training data, we present a method for using automatically generated training data. We make use of and compare against prior resource selection work and evaluate across three experimental testbeds.

international acm sigir conference on research and development in information retrieval | 2009

Adaptation of offline vertical selection predictions in the presence of user feedback

Fernando Diaz; Jaime Arguello

Web search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous approaches to vertical selection is that methods have not explicitly modeled user feedback. However, production search systems often record a variety of feedback information. In this paper, we present algorithms for vertical selection which adapt to user feedback. We evaluate algorithms using a novel simulator which models performance of a vertical selector situated in realistic query traffic.

international acm sigir conference on research and development in information retrieval | 2010

Vertical selection in the presence of unlabeled verticals

Jaime Arguello; Fernando Diaz; Jean François Paiement

Vertical aggregation is the task of incorporating results from specialized search engines or verticals (e.g., images, video, news) into Web search results. Vertical selection is the subtask of deciding, given a query, which verticals, if any, are relevant. State of the art approaches use machine learned models to predict which verticals are relevant to a query. When trained using a large set of labeled data, a machine learned vertical selection model outperforms baselines which require no training data. Unfortunately, whenever a new vertical is introduced, a costly new set of editorial data must be gathered. In this paper, we propose methods for reusing training data from a set of existing (source) verticals to learn a predictive model for a new (target) vertical. We study methods for learning robust, portable, and adaptive cross-vertical models. Experiments show the need to focus on different types of features when maximizing portability (the ability for a single model to make accurate predictions across multiple verticals) than when maximizing adaptability (the ability for a single model to make accurate predictions for a specific vertical). We demonstrate the efficacy of our methods through extensive experimentation for 11 verticals

international conference on the theory of information retrieval | 2015

Development and Evaluation of Search Tasks for IIR Experiments using a Cognitive Complexity Framework

Diane Kelly; Jaime Arguello; Ashlee Edwards; Wan Ching Wu

One of the most challenging aspects of designing interactive information retrieval (IIR) experiments with users is the development of search tasks. We describe an evaluation of 20 search tasks that were designed for use in IIR experiments and developed using a cognitive complexity framework from educational theory. The search tasks represent five levels of cognitive complexity and four topical domains. The tasks were evaluated in the context of a laboratory IIR experiment with 48 participants. Behavioral and self-report data were used to characterize and understand differences among tasks. Results showed more cognitively complex tasks required significantly more search activity from participants (e.g., more queries, clicks, and time to complete). However, participants did not evaluate more cognitively complex tasks as more difficult and were equally satisfied with their performances across tasks. Our work makes four contributions: (1) it adds to what is known about the relationship among task, search behaviors and user experience; (2) it presents a framework for task creation and evaluation; (3) it provides tasks and questionnaires that can be reused by others and (4) it raises questions about findings and assumptions of many recent studies that only use behavioral signals from search logs as evidence for task difficulty and searcher satisfaction, as many of our results directly contradict these findings.

european conference on information retrieval | 2014

Predicting Search Task Difficulty

Jaime Arguello

Search task difficulty refers to a users assessment about the amount of effort required to complete a search task. Our goal in this work is to learn predictive models of search task difficulty. We evaluate features derived from the users interaction with the search engine as well as features derived from the users level of interest in the task and level of prior knowledge in the task domain. In addition to user-interaction features used in prior work, we evaluate features generated from scroll and mouse-movement events on the SERP. In some situations, we may prefer a system that can predict search task difficulty early in the search session. To this end, we evaluate features in terms of whole-session evidence and first-round evidence, which excludes all interactions starting with the second query. Our results found that the most predictive features were different for whole-session vs.i¾?first-round prediction, that mouseover features were effective for first-round prediction, and that level of interest and prior knowledge features did not improve performance.

Explore More