Is this you? Create Your Porfile

David D. Jensen

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David D. Jensen is active.

Explore More

Publication

Featured researches published by David D. Jensen.

ieee international conference computer and communications | 2006

MaxProp: Routing for Vehicle-Based Disruption-Tolerant Networks

John Burgess; Brian Gallagher; David D. Jensen; Brian Neil Levine

Disruption-tolerant networks (DTNs) attempt to route network messages via intermittently connected nodes. Routing in such environments is difficult because peers have little information about the state of the partitioned network and transfer opportunities between peers are of limited duration. In this paper, we propose MaxProp, a protocol for effective routing of DTN messages. MaxProp is based on prioritizing both the schedule of packets transmitted to other peers and the schedule of packets to be dropped. These priorities are based on the path likelihoods to peers according to historical data and also on several complementary mechanisms, including acknowledgments, a head-start for new packets, and lists of previous intermediaries. Our evaluations show that MaxProp performs better than protocols that have access to an oracle that knows the schedule of meetings between peers. Our evaluations are based on 60 days of traces from a real DTN network we have deployed on 30 buses. Our network, called UMassDieselNet, serves a large geographic area between five colleges. We also evaluate MaxProp on simulated topologies and show it performs well in a wide variety of DTN environments.

knowledge discovery and data mining | 1999

Efficient progressive sampling

Foster Provost; David D. Jensen; Tim Oates

Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive samplingusing progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient .

knowledge discovery and data mining | 2004

Why collective inference improves relational classification

David D. Jensen; Jennifer Neville; Brian Gallagher

Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial transactions. Several recent studies indicate that collective inference can significantly reduce classification error when compared with traditional inference techniques. We investigate the underlying mechanisms for this error reduction by reviewing past work on collective inference and characterizing different types of statistical models used for making inference in relational data. We show important differences among these models, and we characterize the necessary and sufficient conditions for reduced classification error based on experiments with real and simulated data.

knowledge discovery and data mining | 2003

Learning relational probability trees

Jennifer Neville; David D. Jensen; Lisa Friedland; Michael Hay

Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance.

Machine Learning | 2000

Multiple Comparisons in Induction Algorithms

David D. Jensen; Paul R. Cohen

A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

international conference on data mining | 2009

Accurate Estimation of the Degree Distribution of Private Networks

Michael Hay; Chao Li; Gerome Miklau; David D. Jensen

We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical analysis shows that the error scales linearly with the number of unique degrees, whereas the error of conventional techniques scales linearly with the number of nodes. We complement the theoretical analysis with a thorough empirical analysis on real and synthetic graphs, showing that the algorithms variance and bias is low, that the error diminishes as the size of the input graph increases, and that common analyses like fitting a power-law can be carried out very accurately.

knowledge discovery and data mining | 2005

Using relational knowledge discovery to prevent securities fraud

Jennifer Neville; Özgür Şimşek; David D. Jensen; John Komoroske; Kelly Palmer; Henry G. Goldberg

We describe an application of relational knowledge discovery to a key regulatory mission of the National Association of Securities Dealers (NASD). NASD is the worlds largest private-sector securities regulator, with responsibility for preventing and discovering misconduct among securities brokers. Our goal was to help focus NASDs limited regulatory resources on the brokers who are most likely to engage in securities violations. Using statistical relational learning algorithms, we developed models that rank brokers with respect to the probability that they would commit a serious violation of securities regulations in the near future. Our models incorporate organizational relationships among brokers (e.g., past coworker), which domain experts consider important but have not been easily used before now. The learned models were subjected to an extensive evaluation using more than 18 months of data unseen by the model developers and comprising over two person weeks of effort by NASD staff. Model predictions were found to correlate highly with the subjective evaluations of experienced NASD examiners. Furthermore, in all performance measures, our models performed as well as or better than the handcrafted rules that are currently in use at NASD.

conference on information and knowledge management | 2000

Language models for financial news recommendation

Victor Lavrenko; Matthew D. Schmill; Dawn J. Lawrie; Paul Ogilvie; David D. Jensen; James Allan

ABSTRACT We present a unique approa h to identifying news stories that in uen e the behavior of nan ial markets. Spe i ally, we des ribe the design and implementation of nalyst, a system that an re ommend interesting news stories { stories that are likely to a e t market behavior. nalyst operates by orrelating the ontent of news stories with trends in nan ial time series. We identify trends in time series using pie ewise linear tting and then assign labels to the trends a ording to an automated binning pro edure. We use language models to represent patterns of language that are highly asso iated with parti ular labeled trends. nalyst an then identify and re ommend news stories that are highly indi ative of future trends. We evaluate the system in terms of its ability to re ommend the stories that will a e t the behavior of the sto k market. We demonstrate that stories re ommended by nalyst ould be used to pro tably predi t forth oming trends in sto k pri es.

international acm sigir conference on research and development in information retrieval | 2007

Recommending citations for academic papers

Trevor Strohman; W. Bruce Croft; David D. Jensen

We approach the problem of academic literature search by considering an unpublished manuscript as a query to a search system. We use the text of previous literature as well as the citation graph that connects it to find relevant related material. We evaluate our technique with manual and automatic evaluation methods, and find an order of magnitude improvement in mean average precision as compared to a text similarity baseline.

Sigkdd Explorations | 2005

The case for anomalous link discovery

Matthew J. Rattigan; David D. Jensen

In this paper, we describe the challenges inherent to the task of link prediction, and we analyze one reason why many link prediction models perform poorly. Specifically, we demonstrate the effects of the extremely large class skew associated with the link prediction task. We then present an alternate task --- anomalous link discovery (ALD) --- and qualitatively demonstrate the effectiveness of simple link prediction models for the ALD task. We show that even the simplistic structural models that perform poorly on link prediction can perform quite well at the ALD task.

Explore More