David Arthur | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Arthur is active.

Explore More

Publication

Featured researches published by David Arthur.

symposium on computational geometry | 2006

How slow is the k -means method?

David Arthur; Sergei Vassilvitskii

The k-means method is an old but popular clustering algorithm known for its observed speed and its simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of k-means is superpolynomial by improving the best known lower bound from Ω(n) iterations to 2Ω(√n).

foundations of computer science | 2009

k-Means Has Polynomial Smoothed Complexity

David Arthur; Bodo Manthey; Heiko Röglin

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/sigma, where sigma is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.

workshop on internet and network economics | 2009

Pricing Strategies for Viral Marketing on Social Networks

David Arthur; Rajeev Motwani; Aneesh Sharma; Ying Xu

We study the use of viral marketing strategies on social networks that seek to maximize revenue from the sale of a single product. We propose a model in which the decision of a buyer to buy the product is influenced by friends that own the product and the price at which the product is offered. The influence model we analyze is quite general, naturally extending both the Linear Threshold model and the Independent Cascade model, while also incorporating price information. We consider sales proceeding in a cascading manner through the network, i.e. a buyer is offered the product via recommendations from its neighbors who own the product. In this setting, the seller influences events by offering a cashback to recommenders and by setting prices (via coupons or discounts) for each buyer in the social network. This choice of prices for the buyers is termed as the sellers strategy. Finding a seller strategy which maximizes the expected revenue in this setting turns out to be NP-hard. However, we propose a seller strategy that generates revenue guaranteed to be within a constant factor of the optimal strategy in a wide variety of models. The strategy is based on an influence-and-exploit idea, and it consists of finding the right trade-off at each time step between: generating revenue from the current user versus offering the product for free and using the influence generated from this sale later in the process.

Journal of the ACM | 2011

Smoothed Analysis of the k-Means Method

David Arthur; Bodo Manthey; Heiko Röglin

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this article, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.

foundations of computer science | 2006

Worst-case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-means Method

David Arthur; Sergei Vassilvitskii

We show a worst-case lower bound and a smoothed upper bound on the number of iterations performed by the iterative closest point (ICP) algorithm. First proposed by Besl and McKay, the algorithm is widely used in computational geometry where it is known for its simplicity and its observed speed. The theoretical study of ICP was initiated by Ezra, Sharir and Efrat, who bounded its worst-case running time between Omega(n log n) and O(n2d)d. We substantially tighten this gap by improving the lower bound to Omega(n/d)d+1 . To help reconcile this bound with the algorithms observed speed, we also show the smoothed complexity of ICP is polynomial, independent of the dimensionality of the data. Using similar methods, we improve the best known smoothed upper bound for the popular k-means method to nO(k) once again independent of the dimension

symposium on discrete algorithms | 2006

Analyzing BitTorrent and related peer-to-peer networks

David Arthur; Rina Panigrahy

We analyze protocols for disseminating a collection of data blocks over a network of peers with a view towards Bit-Torrent and related peer-to-peer networks. Unlike previous work, we accurately model the distribution of the individual data blocks, a process which is critical to the parallelism that makes BitTorrent successful in practice. We also consider multiple network topologies and routing algorithms. We first demonstrate several routing algorithms that distribute b data blocks on a network with diameter d and maximum degree D in O(D(b + d)) phases of concurrent downloads with high probability. This is tight within a factor of D. We also specialize to the networks used by BitTorrent and we improve this bound to O(b ln n) phases where n is the number of clients. Finally, we discuss several practical extensions to BitTorrent, one of which improves the bound to a near-optimal O (b + (ln n)2) phases.

SIAM Journal on Computing | 2009

Worst-Case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-Means Method

David Arthur; Sergei Vassilvitskii

We show a worst-case lower bound and a smoothed upper bound on the number of iterations performed by the Iterative Closest Point (ICP) algorithm. First proposed by Besl and McKay, the algorithm is widely used in computational geometry, where it is known for its simplicity and its observed speed. The theoretical study of ICP was initiated by Ezra, Sharir, and Efrat, who showed that the worst-case running time to align two sets of

symposium on discrete algorithms | 2007