Luke K. McDowell
United States Naval Academy
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Luke K. McDowell.
international world wide web conferences | 2004
Luke K. McDowell; Oren Etzioni; Alon Y. Halevy; Henry M. Levy
This paper investigates how the vision of the Semantic Web can be carried overto the realm of email. We introduce a general notion of semantice mail, in which an email message consists of an RDF query or update coupled with corresponding explanatory text. Semantic email opens the door to a wide range of automated, email-mediated applications with formally guaranteed properties. In particular, this paper introduces a broad class of semantic email processes. For example consider the process of sending an email to a program committee asking who will attend the PC dinner automatically collecting the responses and tallying them up. We define bothlogical and decision-theoretic models where an email process ismodeled as a set of updates to a data set on which we specify goals via certain constraints or utilities. We then describe a set ofinference problems that arise while trying to satisfy these goals and analyze their computational tractability. In particular weshow that for the logical model it is possible to automatically infer which email responses are acceptable w.r.t. a set ofconstraints in polynomial time and for the decision-theoreticmodel it is possible to compute the optimal message-handling policy in polynomial time. Finally we discuss our publicly available implementation of semantic email and outline research challenges inthis realm.
Journal of Artificial Intelligence Research | 2012
Ryan A. Rossi; Luke K. McDowell; David W. Aha; Jennifer Neville
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.
international semantic web conference | 2006
Luke K. McDowell; Michael J. Cafarella
The Semantic Web’s need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on “document-driven” systems that individually process a small set of documents, annotating each with respect to a given ontology. This paper introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an “ontology-driven” manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance learning in a variety of domains based on independently constructed ontologies. We also introduce new methods for improving instance verification, and demonstrate that they improve upon previously known techniques.
ACM Transactions on Computer Systems | 2003
Steven Swanson; Luke K. McDowell; Michael M. Swift; Susan J. Eggers; Henry M. Levy
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources on such machines would be largely idle. In contrast to superscalars, simultaneous multithreaded (SMT) processors achieve high resource utilization by issuing instructions from multiple threads every cycle. An SMT processor thus has two means of hiding latency: speculation and multithreaded execution. However, these two techniques may conflict; on an SMT processor, wrong-path speculative instructions from one thread may compete with and displace useful instructions from another thread. For this reason, it is important to understand the trade-offs between these two latency-hiding techniques, and to ask whether multithreaded processors should speculate differently than conventional superscalars.This paper evaluates the behavior of instruction speculation on SMT processors using both multiprogrammed (SPECINT and SPECFP) and multithreaded (the Apache Web server) workloads. We measure and analyze the impact of speculation and demonstrate how speculation on an 8-context SMT differs from superscalar speculation. We also examine the effect of speculation-aware fetch and branch prediction policies in the processor. Our results quantify the extent to which (1) speculation is critical to performance on a multithreaded processor because it ensures an ample supply of parallelism to feed the functional units, and (2) SMT actually enhances the effectiveness of speculative execution, compared to a superscalar processor by reducing the impact of branch misprediction. Finally, we quantify the impact of both hardware configuration and workload characteristics on speculations usefulness and demonstrate that, in nearly all cases, speculation is beneficial to SMT performance.
Journal of Web Semantics | 2004
Luke K. McDowell; Oren Etzioni; Alon Y. Halevy
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of a structured query or update coupled with corresponding explanatory text. Semantic email opens the door to a wide range of automated, email-mediated applications with formally guaranteed properties. In particular, this paper introduces a broad class of semantic email processes. For example, consider the process of sending an email to a program committee, asking who will attend the PC dinner, automatically collecting the responses, and tallying them up. We define both logical and decision-theoretic models where an email process is modeled as a set of updates to a data set on which we specify goals via certain constraints or utilities. We then describe a set of inference problems that arise while trying to satisfy these goals and analyze their computational tractability. In particular, we show that for the logical model it is possible to automatically infer which email responses are acceptable w.r.t. a set of constraints in polynomial time, and for the decision-theoretic model it is possible to compute the optimal message-handling policy in polynomial time. In addition, we show how to automatically generate explanations for a processs actions, and identify cases where such explanations can be generated in polynomial time. Finally, we discuss our publicly available implementation of semantic email and outline research challenges in this realm.
ieee international conference on data science and advanced analytics | 2015
Luke K. McDowell
Many information tasks involve objects that are explicitly or implicitly connected in a network (or graph), such as webpages connected by hyperlinks or people linked by “friendships” in a social network. Research on link-based classification (LBC) has shown how to leverage these connections to improve classification accuracy. Unfortunately, acquiring a sufficient number of labeled examples to enable accurate learning for LBC can often be expensive or impractical. In response, some recent work has proposed the use of active learning, where the LBC method can intelligently select a limited set of additional labels to acquire, so as to reduce the overall cost of learning a model with sufficient accuracy. This work, however, has produced conflicting results and has not considered recent progress for LBC inference and semi-supervised learning. In this paper, we evaluate multiple prior methods for active learning and demonstrate that none consistently improve upon random guessing. We then introduce two new methods that both seek to improve active learning by leveraging the link structure to identify nodes to acquire that are more representative of the underlying data. We show that both approaches have some merit, but that one method, by proactively acquiring nodes so as to produce a more representative distribution of known labels, often leads to significant accuracy increases with minimal computational cost.
international semantic web conference | 2004
Luke K. McDowell; Oren Etzioni; Alon Y. Halevy
The development of intelligent agents is a key part of the Semantic Web vision, but how does an ordinary person tell an agent what to do? One approach to this problem is to use RDF templates that are authored once but then instantiated many times by ordinary users. This approach, however, raises a number of challenges. For instance, how can templates concisely represent a broad range of potential uses, yet ensure that each possible instantiation will function properly? And how does the agent explain its actions to the humans involved? This paper addresses these challenges in the context of a case study carried out on our fully-deployed system for semantic email agents. We describe how high-level features of our template language enable the concise specification of flexible goals. In response to the first question, we show that it is possible to verify, in polynomial time, that a given template will always produce a valid instantiation. Second, we show how to automatically generate explanations for the agents actions, and identify cases where explanations can be computed in polynomial time. These results both improve the usefulness of semantic email and suggest general issues and techniques that may be applicable in other Semantic Web systems.
international conference on data mining | 2012
Anton Rosenov Dimitrov; Alexandra Olteanu; Luke K. McDowell; Karl Aberer
Users of todays information networks need to digest large amounts of data. Therefore, tools that ease the task of filtering the relevant content are becoming necessary. One way to achieve this is to identify the users who generate content in a certain topic of interest. However, due to the diversity and ambiguity of the shared information, assigning users to topics in an automatic fashion is challenging. In this demo, we present Topick, a system that leverages state of the art techniques and tools to automatically distill high-level topics for a given user. Topick exploits both the user stream and her profile information to accurately identify the most relevant topics. The results are synthesised as a set of stars associated to each topic, designed to give an intuition about the topics encompassed in the user streams and the confidence in the results. Our prototype achieves a precision of 70% or more, with a recall of 60%, relative to manual labeling. Topick is available at http://topick.alexandra.olteanu.eu.
formal methods | 2014
Luke K. McDowell; Aaron Fleming; Zane Markel
Data describing networks such as social networks, citation graphs, hypertext systems, and communication networks is becoming increasingly common and important for analysis. Research on link-based classification studies methods to leverage connections in such networks to improve accuracy. Recently, a number of such methods have been proposed that first construct a set of latent features or links that summarize the network, then use this information for inference. Some work has claimed that such latent methods improve accuracy, but has not compared against the best non-latent methods. In response, this article provides the first substantial comparison between these two groups. Using six real datasets, a range of synthetic data, and multiple underlying models, we show that (non-latent) collective inference methods usually perform best, but that the dataset’s label sparsity, attribute predictiveness, and link density can dramatically affect the performance trends. Inspired by these findings, we introduce three novel algorithms that combine a latent construction with a latent or non-latent method, and demonstrate that they can sometimes substantially increase accuracy.
ACM Transactions on Knowledge Discovery From Data | 2016
Luke K. McDowell; David W. Aha
Many analysis tasks involve linked nodes, such as people connected by friendship links. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. Most such prior research has assumed the provision of a densely labeled training network. Instead, this article studies the common and challenging case when LBC must use a single sparsely labeled network for both learning and inference, a case where existing methods often yield poor accuracy. To address this challenge, we introduce a novel method that enables prediction via “neighbor attributes,” which were briefly considered by early LBC work but then abandoned due to perceived problems. We then explain, using both extensive experiments and loss decomposition analysis, how using neighbor attributes often significantly improves accuracy. We further show that using appropriate semi-supervised learning (SSL) is essential to obtaining the best accuracy in this domain and that the gains of neighbor attributes remain across a range of SSL choices and data conditions. Finally, given the challenges of label sparsity for LBC and the impact of neighbor attributes, we show that multiple previous studies must be re-considered, including studies regarding the best model features, the impact of noisy attributes, and strategies for active learning.