Peter Lofgren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Lofgren is active.

Explore More

Publication

Featured researches published by Peter Lofgren.

very large data bases | 2013

Question selection for crowd entity resolution

Steven Euijong Whang; Peter Lofgren; Hector Garcia-Molina

We study the problem of enhancing Entity Resolution (ER) with the help of crowdsourcing. ER is the problem of clustering records that refer to the same real-world entity and can be an extremely difficult process for computer algorithms alone. For example, figuring out which images refer to the same person can be a hard task for computers, but an easy one for humans. We study the problem of resolving records with crowdsourcing where we ask questions to humans in order to guide ER into producing accurate results. Since human work is costly, our goal is to ask as few questions as possible. We propose a probabilistic framework for ER that can be used to estimate how much ER accuracy we obtain by asking each question and select the best question with the highest expected accuracy. Computing the expected accuracy is #P-hard, so we propose approximation techniques for efficient computation. We evaluate our best question algorithms on real and synthetic datasets and demonstrate how we can obtain high ER accuracy while significantly reducing the number of questions asked to humans.

knowledge discovery and data mining | 2014

FAST-PPR: scaling personalized pagerank estimation for large graphs

Peter Lofgren; Siddhartha Banerjee; Ashish Goel; C. Seshadhri

We propose a new algorithm, FAST-PPR, for computing personalized PageRank: given start node s and target node t in a directed graph, and given a threshold δ, it computes the Personalized PageRank π_s(t) from s to t, guaranteeing that the relative error is small as long πs(t) > δ. Existing algorithms for this problem have a running-time of Ω(1/δ in comparison, FAST-PPR has a provable average running-time guarantee of O(√d/δ) (where d is the average in-degree of the graph). This is a significant improvement, since δ is often O(1/n) (where n is the number of nodes) for applications. We also complement the algorithm with an Ω(1/√δ) lower bound for PageRank estimation, showing that the dependence on δ cannot be improved. We perform a detailed empirical study on numerous massive graphs, showing that FAST-PPR dramatically outperforms existing algorithms. For example, on the 2010 Twitter graph with 1.5 billion edges, for target nodes sampled by popularity, FAST-PPR has a 20 factor speedup over the state of the art. Furthermore, an enhanced version of FAST-PPR has a 160 factor speedup on the Twitter graph, and is at least 20 times faster on all our candidate graphs.

international conference on management of data | 2015

tDP: An Optimal-Latency Budget Allocation Strategy for Crowdsourced MAXIMUM Operations

Vasilis Verroios; Peter Lofgren; Hector Garcia-Molina

Latency is a critical factor when using a crowdsourcing platform to solve a problem like entity resolution or sorting. In practice, most frameworks attempt to reduce latency by heuristically splitting a budget of questions into rounds, so that after each round the answers are analyzed and new questions are selected. We focus on one of the most extensively studied crowdsourcing operations, the MAX operation (finding the best element in a collection under human criteria), and we study the problem of budget allocation into rounds for this operation. We provide a polynomial-time dynamic-programming budget allocation algorithm that minimizes the latency when questions form tournaments in each round. Furthermore, we study the general case where questions can be asked in any arbitrary way in each round. Our theoretical results for the general case indicate that our approach is also optimal under certain worst and average-case scenarios. We compare our approach to alternatives on Amazon Mechanical Turk, where many of our theory assumptions do not necessarily hold. We find that our approach is also optimal in practice and achieves a notable improvement over alternatives in most cases.

knowledge discovery and data mining | 2016

Approximate Personalized PageRank on Dynamic Graphs

Hongyang Zhang; Peter Lofgren; Ashish Goel

We propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One, Forward Push, propagates probability mass forwards along edges from a source node, while the other, Reverse Push, propagates local changes backwards along edges from a target. In both variations, we maintain an invariant between two vectors, and when an edge is updated, our algorithm first modifies the vectors to restore the invariant, then performs any needed local push operations to restore accuracy. For Reverse Push, we prove that for an arbitrary directed graph in a random edge model, or for an arbitrary undirected graph, given a uniformly random target node t, the cost to maintain a PPR vector to t of additive error ε as k edges are updated is O(k + d/ε, where d is the average degree of the graph. This is O(1) work per update, plus the cost of computing a reverse vector once on a static graph. For Forward Push, we show that on an arbitrary undirected graph, given a uniformly random start node s, the cost to maintain a PPR vector from s of degree-normalized error ε as k edges are updated is O(k + 1/ε, which is again O(1) per update plus the cost of computing a PPR vector once on a static graph.

workshop on algorithms and models for the web graph | 2015

Bidirectional PageRank Estimation: From Average-Case to Worst-Case

Peter Lofgren; Siddhartha Banerjee; Ashish Goel

We present a new algorithm for estimating the Personalized PageRank PPR between a source and target node on undirected graphs, with sublinear running-time guarantees over the worst-case choice of source and target nodes. Our work builds on a recent line of work on bidirectional estimators for PPR, which obtained sublinear running-time guarantees but in an average-case sense, for a uniformly random choice of target node. Crucially, we show how the reversibility of random walks on undirected networks can be exploited to convert average-case to worst-case guarantees. While past bidirectional methods combine forward random walks with reverse local pushes, our algorithm combines forward local pushes with reverse random walks. We also discuss how to modify our methods to estimate random-walk probabilities for any length distribution, thereby obtaining fast algorithms for estimating general graph diffusions, including the heat kernel, on undirected networks.

allerton conference on communication, control, and computing | 2016

Sublinear estimation of a single element in sparse linear systems

Nitin Shyamkumar; Siddhartha Banerjee; Peter Lofgren

We present a fast bidirectional algorithm for estimating a single element of the product of a matrix power and vector. This is an important primitive in many applications; in particular, we describe how it can be used to estimate a single element in the solution of a linear system Ax = b, with sublinear average-case running time guarantees for sparse systems. Our work combines the von Neumann-Ulam MCMC scheme for matrix multiplication with recent developments in bidirectional algorithms for estimating random-walk metrics. In particular, given a target additive-error threshold, we show how to combine a reverse local-variational technique with forward MCMC sampling, such that the resulting algorithm is order-wise faster than each individual approach.

web search and data mining | 2016