Jeffrey Pound
University of Waterloo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeffrey Pound.
international world wide web conferences | 2010
Jeffrey Pound; Peter Mika; Hugo Zaragoza
Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.
international acm sigir conference on research and development in information retrieval | 2011
Roi Blanco; Harry Halpin; Daniel M. Herzig; Peter Mika; Jeffrey Pound; Henry S. Thompson; Thanh Tran Duc
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the first large-scale evaluation campaign that specifically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over different evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.
international conference on management of data | 2010
Jeffrey Pound; Ihab F. Ilyas; Grant E. Weddell
Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer size of schema information available to the user. We address this challenge by proposing a new query language that blends keyword search with structured query processing over large information graphs with rich semantics. Our formalism for structured queries based on keywords combines the flexibility of keyword search with the expressiveness of structures queries. We propose a solution to the resulting disambiguation problem caused by introducing keywords as primitives in a structured query language. We show how expressions in our proposed language can be rewritten using the vocabulary of the web-extracted KB, and how different possible rewritings can be ranked based on their syntactic relationship to the keywords in the query as well as their semantic coherence in the underlying KB. An extensive experimental study demonstrates the efficiency and effectiveness of our approach. Additionally, we show how our query language fits into QUICK, an end-to-end information system that integrates web-extracted data graphs with full-text search. In this system, the rewritten query describes an arbitrary topic of interest for which corresponding entities, and documents relevant to the entities, are efficiently retrieved.
conference on information and knowledge management | 2012
Jeffrey Pound; Alexander K. Hudek; Ihab F. Ilyas; Grant E. Weddell
Many keyword queries issued to Web search engines target information about real world entities, and interpreting these queries over Web knowledge bases can often enable the search system to provide exact answers to queries. Equally important is the problem of detecting when the reference knowledge base is not capable of answering the keyword query, due to lack of domain coverage. In this work we present an approach to computing structured representations of keyword queries over a reference knowledge base. We mine frequent query structures from a Web query log and map these structures into a reference knowledge base. Our approach exploits coarse linguistic structure in keyword queries, and combines it with rich structured query representations of information needs.
international conference on management of data | 2011
Jeffrey Pound; Stelios Paparizos; Panayiotis Tsaparas
In recent years, there has been a strong trend of incorporating results from structured data sources into keyword-based web search systems such as Bing or Amazon. When presenting structured data, facets are a powerful tool for navigating, refining, and grouping the results. For a given structured data source, a fundamental problem in supporting faceted search is finding an ordered selection of attributes and values that will populate the facets. This creates two sets of challenges. First, because of the limited screen real-estate, it is important that the top facets best match the anticipated user intent. Second, the huge scale of available data to such engines demands an automated unsupervised solution. In this paper, we model the user faceted-search behavior using the intersection of web query-logs with existing structured data. Since web queries are formulated as free-text queries, a challenge in our approach is the inherent ambiguity in mapping keywords to the different possible attributes of a given entity type. We present an automated solution that elicits user preferences on attributes and values, employing different disambiguation techniques ranging from simple keyword matching, to more sophisticated probabilistic models. We demonstrate experimentally the scalability of our solution by running it on over a thousand categories of diverse entity types and measure the facet quality with a real-user study.
Journal of Web Semantics | 2013
Roi Blanco; Harry Halpin; Daniel M. Herzig; Peter Mika; Jeffrey Pound; Henry S. Thompson; Thanh Tran
An increasing amount of structured data on the Web has attracted industry attention and renewed research interest in what is collectively referred to as semantic search. These solutions exploit the explicit semantics captured in structured data such as RDF for enhancing document representation and retrieval, or for finding answers by directly searching over the data. These data have been used for different tasks and a wide range of corresponding semantic search solutions have been proposed in the past. However, it has been widely recognized that a standardized setting to evaluate and analyze the current state-of-the-art in semantic search is needed to monitor and stimulate further progress in the field. In this paper, we present an evaluation framework for semantic search, analyze the framework with regard to repeatability and reliability, and report on our experiences on applying it in the Semantic Search Challenge 2010 and 2011.
very large data bases | 2015
Anil Kumar Goel; Jeffrey Pound; Nathan Auch; Peter Bumbulis; Scott Maclean; Franz Färber; Francis Gropengiesser; Christian Mathis; Thomas Bodner; Wolfgang Lehner
We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers.
international joint conference on artificial intelligence | 2011
Jeffrey Pound; David Toman; Grant E. Weddell; Jiewen Wu
We consider a generalization of instance retrieval over knowledge bases that provides users with assertions in which descriptions of qualifying objects are given in addition to their identifiers. Notably, this involves a transfer of basic database paradigms involving caching and query rewriting in the context of an assertion retrieval algebra. We present an optimization framework for this algebra, with a focus on finding plans that avoid any need for general knowledge base reasoning at query execution time when sufficient cached results of earlier requests exist.
very large data bases | 2010
Jeffrey Pound; Ihab F. Ilyas; Grant E. Weddell
Recent work on Web-extracted data sets has produced an interesting new source of structured Web data. These data sets can be viewed as knowledge bases (KB) -- large heterogeneous linked entity collections with millions of unique edge and node labels, often encoding rich semantic information over entities. For example, YAGO [5] and ExDB [2] have fact collections numbering in the tens and hundreds of millions respectfully, and WebTables [1] contains over one hundred million extracted relations. In terms of schema information, the ExDB, YAGO, and WebTables data sets all have schema items numbering in the millions.
symposium on cloud computing | 2017
Hua Fan; Jeffrey Pound; Peter Bumbulis; Nathan Auch; Scott Maclean; Eric Garber; Anil Kumar Goel
Distributed shared logs are a powerful building block for distributed systems. By providing fault-tolerant persistence and strong ordering guarantees, applications can use a distributed shared log to reliably communicate a stream of events between processes. This can be used, for example, to replicate application state or to build a reliable publish/subscribe system. The log itself must also replicate data in order to provide availability and fault-tolerance. Key to the design of a distributed shared log is the choice of replication algorithm, which will determine many properties of the system. We propose an algorithm for consistent replication of log data, quorum-replication with meta-data exchange (QMX), that is linearizable while allowing writes to be successful with only a single round-trip to a quorum of replicas and allowing reads to generally be serviced by any single replica, or read-one/write-quorum. This is achieved by coupling the reads with an asynchronous message exchange algorithm that continuously runs amongst the replicas. The message exchange algorithm allows replicas to infer the global state of writes across the cluster, in order to deduce which writes have been successfully quorum replicated and which have not. This metadata allows any single replica to directly answer reads in many cases, though in the worst case a read must wait for the message passing round to complete before being serviced which requires a majority quorum of servers to be responsive.