Dan Feldman
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dan Feldman.
symposium on computational geometry | 2007
Dan Feldman; Morteza Monemizadeh; Christian Sohler
Given a point set P ⊆ R<sup>d</sup> the k-means clustering problem is to find a set C=(c<sub>1</sub>,...,c<sub>k</sub>) of k points and a partition of P into k clusters C<sub>1</sub>,...,C<sub>k</sub> such that the sum of squared errors ∑<sub>i=1</sub><sup>k</sup> ∑<sub>p ∈ C<sub>i</sub></sub> |p -c<sub>i</sub> |<sub>2</sub><sup>2</sup> is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (ε, k)-coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is <i>independent</i> of the cardinality |P| of the point set and the dimension d of the Euclidean space R<sup>d</sup>. A weak coreset is a weighted set S ⊆ P together with a set T such that T contains a (1+ε)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±ε)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d · Poly(k/ε) + 2<sup>Õ</sup>(k/ε)).
symposium on the theory of computing | 2009
Dan Feldman; Amos Fiat; Haim Kaplan; Kobbi Nissim
A coreset of a point set P is a small weighted set of points that captures some geometric properties of
Geophysical Research Letters | 2007
Liming Li; Andrew P. Ingersoll; Xun Jiang; Dan Feldman; Yuk L. Yung
P
foundations of computer science | 2006
Dan Feldman; Amos Fiat; Micha Sharir
. Coresets have found use in a vast host of geometric settings. We forge a link between coresets, and differentially private sanitizations that can answer any number of queries without compromising privacy. We define the notion of private coresets, which are simultaneously both coresets and differentially private, and show how they may be constructed. We first show that the existence of a small coreset with low generalized sensitivity (i.e., replacing a single point in the original point set slightly affects the quality of the coreset) implies (in an inefficient manner) the existence of a private coreset for the same queries. This greatly extends the works of Blum, Ligett, and Roth [STOC 2008] and McSherry and Talwar [FOCS 2007]. We also give an efficient algorithm to compute private coresets for k-median and k-mean queries in Red, immediately implying efficient differentially private sanitizations for such queries. Following McSherry and Talwar, this construction also gives efficient coalition proof (approximately dominant strategy) mechanisms for location problems. Unlike coresets which only have a multiplicative approximation factor, we prove that private coresets must have an additive error. We present a new technique for showing lower bounds on this error.
intelligent robots and systems | 2012
Cynthia Sung; Dan Feldman; Daniela Rus
The mean state of the global atmospheric energy cycle is re-examined using the two reanalysis datasets — NCEP2 and ERA40 (1979–2001). The general consistency between the two datasets suggests that the present estimates of the energy cycle are probably the most reliable ones. The comparison between the present and a previous study shows noticeable discrepancies in some of the energy components and conversion rates. The current estimate of the transformations from mean potential energy to mean kinetic energy C(P_M, K_M) further suggests that the near-surface processes play an important role in the conversion rate C(P_M, K_M), along with the Ferrel cell and Hadley cells, which probably change the direction of the conversion rate C(P_M, K_M).
Journal of Geophysical Research | 2008
Hui Su; Jonathan H. Jiang; Y. Gu; J. David Neelin; Brian H. Kahn; Dan Feldman; Yuk L. Yung; J. W. Waters; Nathaniel J. Livesey; Michelle L. Santee; William G. Read
We develop efficient (1 + epsiv)-approximation algorithms for generalized facility location problems. Such facilities are not restricted to being points in Ropf, and can represent more complex structures such as linear facilities (lines in Ropfd, j-dimensional flats), etc. We introduce coresets for weighted (point) facilities. These prove to be useful for such generalized facility location problems, and provide efficient algorithms for their construction. Applications include: k-mean and k-median generalizations, i.e., find k lines that minimize the sum (or sum of squares) of the distances from each input point to its nearest line. Other applications are generalizations of linear regression problems to multiple regression lines, new SVD/PCA generalizations, and many more. The results significantly improve on previous work, which deals efficiently only with special cases. Open source code for the algorithms in this paper is also available
information processing in sensor networks | 2012
Dan Feldman; Andrew Sugaya; Daniela Rus
We investigate a data-driven approach to robotic path planning and analyze its performance in the context of interception tasks. Trajectories of moving objects often contain repeated patterns of motion, and learning those patterns can yield interception paths that succeed more often. We therefore propose an original trajectory clustering algorithm for extracting motion patterns from trajectory data and demonstrate its effectiveness over the more common clustering approach of using k-means. We use the results to build a Hidden Markov Model of a targets motion and predict movement. Our simulations show that these predictions lead to more effective interception. The results of this work have potential applications in coordination of multi-robot systems, tracking and surveillance tasks, and dynamic obstacle avoidance.
symposium on computational geometry | 2014
Alexander Munteanu; Christian Sohler; Dan Feldman
[1] The variations of tropical upper tropospheric (UT) clouds with sea surface temperature (SST) are analyzed using effective cloud fraction from the Atmospheric Infrared Sounder (AIRS) on Aqua and ice water content (IWC) from the Microwave Limb Sounder (MLS) on Aura. The analyses are limited to UT clouds above 300 hPa. Our analyses do not suggest a negative correlation of tropical-mean UT cloud fraction with the cloud-weighted SST (CWT). Instead, both tropical-mean UT cloud fraction and IWC are found to increase with CWT, although their correlations with CWT are rather weak. The rate of increase of UT cloud fraction with CWT is comparable to that of precipitation, while the UT IWC and ice water path (IWP) increase more strongly with CWT. The radiative effect of UT clouds is investigated, and they are shown to provide a net warming at the top of the atmosphere. An increase of IWP with SST yields an increase of net warming that corresponds to a positive feedback, until the UT IWP exceeds a value about 50% greater than presently observed by MLS. Further increases of the UT IWP would favor the shortwave cooling effect, causing a negative feedback. Sensitivities of UTcloud forcing to the uncertainties in UT CFR and IWC measurements are discussed.
intelligent robots and systems | 2012
Stephanie Gil; Dan Feldman; Daniela Rus
The wide availability of networked sensors such as GPS and cameras is enabling the creation of sensor networks that generate huge amounts of data. For example, vehicular sensor networks where in-car GPS sensor probes are used to model and monitor traffic can generate on the order of gigabytes of data in real time. How can we compress streaming high-frequency data from distributed sensors? In this paper we construct coresets for streaming motion. The coreset of a data set is a small set which approximately represents the original data. Running queries or fitting models on the core-set will yield similar results when applied to the original data set. We present an algorithm for computing a small coreset of a large sensor data set. Surprisingly, the size of the coreset is independent of the size of the original data set. Combining map-and-reduce techniques with our coreset yields a system capable of compressing in parallel a stream of O(n) points using space and update time that is only O(log n). We provide experimental results and compare the algorithm to the popular Douglas-Peucker heuristic for compressing GPS data.
advances in geographic information systems | 2012
Dan Feldman; Cynthia Sung; Daniela Rus
This paper deals with computing the smallest enclosing ball of a set of points subject to probabilistic data. In our setting, any of the n points may not or may occur at one of finitely many locations, following its own discrete probability distribution. The objective is therefore considered to be a random variable and we aim at finding a center minimizing the expected maximum distance to the points according to their distributions. Our main contribution presented in this paper is the first polynomial time (1 + &epsis;)-approximation algorithm for the probabilistic smallest enclosing ball problem with extensions to the streaming setting.