Laura Dietz
University of New Hampshire
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Laura Dietz.
international conference on machine learning | 2007
Laura Dietz; Steffen Bickel; Tobias Scheffer
Publication repositories contain an abundance of information about the evolution of scientific research areas. We address the problem of creating a visualization of a research area that describes the flow of topics between papers, quantifies the impact that papers have on each other, and helps to identify key contributions. To this end, we devise a probabilistic topic model that explains the generation of documents; the model incorporates the aspects of topical innovation and topical inheritance via citations. We evaluate the models ability to predict the strength of influence of citations against manually rated citations.
international acm sigir conference on research and development in information retrieval | 2014
Jeffrey Dalton; Laura Dietz; James Allan
Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches.
conference on information and knowledge management | 2015
Michael Schuhmacher; Laura Dietz; Simone Paolo Ponzetto
When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style queries like Argentine British relations, which typically demand a set of heterogeneous entities with no specific target type like, for instance, Falklands_-War} or Margaret-_Thatcher, as answer. Standard approaches to entity retrieval rely purely on features from the knowledge base. We approach the problem from the opposite direction, namely by analyzing web documents that are found to be query-relevant. Our approach hinges on entity linking technology that identifies entity mentions and links them to a knowledge base like Wikipedia. We use a learning-to-rank approach and study different features that use documents, entity mentions, and knowledge base entities -- thus bridging document and entity retrieval. Since established benchmarks for this problem do not exist, we use TREC test collections for document ranking and collect custom relevance judgments for entities. Experiments on TREC Robust04 and TREC Web13/14 data show that: i) single entity features, like the frequency of occurrence within the top-ranke documents, or the query retrieval score against a knowledge base, perform generally well; ii) the best overall performance is achieved when combining different features that relate an entity to the query, its document mentions, and its knowledge base representation.
BMC Bioinformatics | 2011
Sebastian Konietzny; Laura Dietz; Alice C. McHardy
BackgroundGenome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.ResultsWe describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.ConclusionsWe show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.
international conference on the theory of information retrieval | 2017
Federico Nanni; Bhaskar Mitra; Matt Magnusson; Laura Dietz
Providing answers to complex information needs is a challenging task. The new TREC Complex Answer Retrieval (TREC CAR) track introduces a large-scale dataset where paragraphs are to be retrieved in response to outlines of Wikipedia articles representing complex information needs. We present early results from a variety of approaches -- from standard information retrieval methods (e.g., TF-IDF) to complex systems that adopt query expansion, knowledge bases and deep neural networks. The goal is to offer an overview of some promising approaches to tackle this problem.
european conference on information retrieval | 2016
Michael Schuhmacher; Benjamin Roth; Simone Paolo Ponzetto; Laura Dietz
This work studies the combination of a document retrieval and a relation extraction system for the purpose of identifying query-relevant relational facts. On the TREC Web collection, we assess extracted facts separately for correctness and relevance. Despite some TREC topics not being covered by the relation schema, we find that this approach reveals relevant facts, and in particular those not yet known in the knowledge base DBpedia. The study confirms that mention frequency, document relevance, and entity relevance are useful indicators for fact relevance. Still, the task remains an open research problem.
acm ieee joint conference on digital libraries | 2017
Federico Nanni; Simone Paolo Ponzetto; Laura Dietz
Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This is achieved by 1) identifying relevant concepts and entities from a knowledge base, and 2) detecting their mentions in documents, which are interpreted as indicators for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record, and we test its performance on the TREC KBA Stream corpus, a large and publicly available web archive.
exploiting semantic annotations in information retrieval | 2015
Laura Dietz; Michael Schuhmacher
We aim to augment textual knowledge resources such as Wikipedia with information from the World Wide Web and at the same time focus on a given information need. We demonstrate a solution based on what we call knowledge portfolios. A knowledge portfolio is a query-specific collection of relevant entities together with associated passages from the Web that explain how the entity is relevant for the query. Knowledge portfolios are extracted through a combination of retrieval from World Wide Web and Wikipedia with a reasoning process on mutual relevance. A key ingredient are entity link annotations that tie abstract entities from the knowledge base into their context on the Web. We demonstrate the results of our fully automated system Queripidia, which is capable to create a knowledge portfolios for any web-style query, on data from the TREC Web track. The online demo is available via http://smart-cactus.org/~dietz/knowport/.
international conference on the theory of information retrieval | 2016
Lydia Weiland; Ioana Hulpus; Simone Paolo Ponzetto; Laura Dietz
The message of news articles is often supported by the pointed use of iconic images. These images together with their captions encourage emotional involvement of the reader. Current algorithms for understanding the semantics of news articles focus on its text, often ignoring the image. On the other side, works that target the semantics of images, mostly focus on recognizing and enumerating the objects that appear in the image. In this work, we explore the problem from another perspective: Can we devise algorithms to understand the message encoded by images and their captions? To answer this question, we study how well algorithms can describe an image-caption pair in terms of Wikipedia entities, thereby casting the problem as an entity-ranking task with an image-caption pair as query. Our proposed algorithm brings together aspects of entity linking, subgraph selection, entity clustering, relatedness measures, and learning-to-rank. In our experiments, we focus on media-iconic image-caption pairs which often reflect complex subjects such as sustainable energy and endangered species. Our test collection includes a gold standard of over 300 image-caption pairs about topics at different levels of abstraction. We show that with a MAP of 0.69, the best results are obtained when aggregating content-based and graph-based features in a Wikipedia-derived knowledge base.
conference on information and knowledge management | 2013
Laura Dietz; Ziqi Wang; Samuel Huston; W. Bruce Croft
Abstract Understanding the landscape of opinions on a given topic or issue is important for policy makers, sociologists, and intelligence analysts. The first step in this process is to retrieve relevant opinions. Discussion forums are potentially a good source of this information, but comes with a unique set of retrieval challenges. In this short paper, we test a range of existing techniques for forum retrieval and develop new retrieval models to differentiate between opinionated and factual forum posts. We are able to demonstrate some significant performance improvements over the baseline retrieval models, demonstrating that this as a promising avenue for further study.