Is this you? Create Your Porfile

G. Craig Murray

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where G. Craig Murray is active.

Explore More

Publication

Featured researches published by G. Craig Murray.

Database | 2009

Understanding PubMed® user search behavior through log analysis

Rezarta Islamaj Doğan; G. Craig Murray; Aurélie Névéol; Zhiyong Lu

This article reports on a detailed investigation of PubMed users’ needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users’ needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users’ interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users’ decisions. Analysis of characteristics such as these plays a critical role in identifying users’ information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval. Database URL: http://www.ncbi.nlm.nih.gov/PubMed

international acm sigir conference on research and development in information retrieval | 2004

Building an information retrieval test collection for spontaneous conversational speech

Douglas W. Oard; Dagobert Soergel; David S. Doermann; Xiaoli Huang; G. Craig Murray; Jianqiang Wang; Bhuvana Ramabhadran; Martin Franz; Samuel Gustman; James Mayfield; Liliya Kharevych; Stephanie M. Strassel

Test collections model use cases in ways that facilitate evaluation of information retrieval systems. This paper describes the use of search-guided relevance assessment to create a test collection for retrieval of spontaneous conversational speech. Approximately 10,000 thematically coherent segments were manually identified in 625 hours of oral history interviews with 246 individuals. Automatic speech recognition results, manually prepared summaries, controlled vocabulary indexing, and name authority control are available for every segment. Those features were leveraged by a team of four relevance assessors to identify topically relevant segments for 28 topics developed from actual user requests. Search-guided assessment yielded sufficient inter-annotator agreement to support formative evaluation during system development. Baseline results for ranked retrieval are presented to illustrate use of the collection.

international acm sigir conference on research and development in information retrieval | 2007

Query log analysis: social and technological challenges

G. Craig Murray; Jaime Teevan

Analysis of search engine query logs is an important tool for developers and researchers. However, the potentially personal content of query logs raises a number of questions about the use of that data. Privacy advocates are concerned about potential misuse of personal data; search engine providers are interested in protecting their users while maintaining a competitive edge; and academic researchers are frustrated by barriers to shared learning though shared data analysis. This paper reports on a workshop held at the WWW 2007 Conference to foster dialogue on the social and technical challenges that are posed by the content of query logs and the analysis of that content.

Computer Speech & Language | 2004

Lexical knowledge and human disagreement on a WSD task

G. Craig Murray; Rebecca Green

Abstract This paper explores factors correlating with lack of inter-annotator agreement on a word sense disambiguation (WSD) task taken from SENSEVAL-2. Twenty-seven subjects were given a series of tasks requiring word sense judgments. Subjects were asked to judge the applicability of word senses to polysemous words used in context. Metrics of lexical ability were evaluated as predictors of agreement between judges. A strong interaction effect was found for lexical ability, in which differences between levels of lexical knowledge predict disagreement. Individual levels of lexical knowledge, however, were not independently predictive of disagreement. The finding runs counter to previous assumptions regarding expert agreement on WSD annotation tasks, which in turn impacts notions of a meaningful “gold standard” for systems evaluation.

international acm sigir conference on research and development in information retrieval | 2005

Assessing the term independence assumption in blind relevance feedback

Jimmy J. Lin; G. Craig Murray

When applying blind relevance feedback for ad hoc document retrieval, is it possible to identify, a priori, the set of query terms that will most improve retrieval performance? Can this complex problem be reduced into the simpler one of making independent decisions about the performance effects of each query term? Our experiments suggest that, for the selection of terms for blind relevance feedback, the term independence assumption may be empirically justified.

meeting of the association for computational linguistics | 2006

Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation

G. Craig Murray; Bonnie J. Dorr; Jimmy J. Lin; Jan Hajiċ; Pavel Pecina

Thesauri and ontologies provide important value in facilitating access to digital archives by representing underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most ontologies precludes fully-automated machine translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations when constructing domain-specific lexical resources. We evaluate the effectiveness of this process by producing a probabilistic phrase dictionary and translating a thesaurus of 56,000 concepts used to catalogue a large archive of oral histories. Our experiments demonstrate a cost-effective technique for accurate machine translation of large ontologies.

language resources and evaluation | 2009

A cost-effective lexical acquisition process for large-scale thesaurus translation

Jimmy J. Lin; G. Craig Murray; Bonnie J. Dorr; Jan Hajic; Pavel Pecina

Thesauri and controlled vocabularies facilitate access to digital collections by explicitly representing the underlying principles of organization. Translation of such resources into multiple languages is an important component for providing multilingual access. However, the specificity of vocabulary terms in most thesauri precludes fully-automatic translation using general-domain lexical resources. In this paper, we present an efficient process for leveraging human translations to construct domain-specific lexical resources. This process is illustrated on a thesaurus of 56,000 concepts used to catalog a large archive of oral histories. We elicited human translations on a small subset of concepts, induced a probabilistic phrase dictionary from these translations, and used the resulting resource to automatically translate the rest of the thesaurus. Two separate evaluations demonstrate the acceptability of the automatic translations and the cost-effectiveness of our approach.

Proceedings of The Asist Annual Meeting | 2007