Yogish Sabharwal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yogish Sabharwal is active.

Explore More

Publication

Featured researches published by Yogish Sabharwal.

foundations of computer science | 2004

A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions

Amit Kumar; Yogish Sabharwal; Sandeep Sen

We present the first linear time (1 + /spl epsiv/)-approximation algorithm for the k-means problem for fixed k and /spl epsiv/. Our algorithm runs in O(nd) time, which is linear in the size of the input. Another feature of our algorithm is its simplicity - the only technique involved is random sampling.

international conference on parallel processing | 2008

Optimization of All-to-All Communication on the Blue Gene/L Supercomputer

Sameer Kumar; Yogish Sabharwal; Rahul Garg; Philip Heidelberger

All-to-all communication is a well known performance bottleneck for many applications, such as the ones that use the Fast-Fourier-transform (FFT) algorithm. We analyze the performance of all-to-all communication on the BlueGene/L torus interconnect that has link contention even for all-to-all operations with short messages. We observed that the performance of all-to-all depends on the shape of the processor partition. We present a performance analysis of all-to-all on partitions of various shapes. We then present optimization schemes that substantially improve the performance of all-to-all with short and large messages.In particular, throughput improved from 64% to over 99% of peak on the 65,536 (64 times 32 times 32) node Blue Gene/L machine at the Lawrence Livermore National Lab. We show the impact of the all-to-all performance optimizations in 1-D and 3-D FFT benchmarks. We achieved a performance of over 2.8 TF for the HPC Challenge 1D FFT benchmark with our optimized all-to-all.

Journal of the ACM | 2010

Linear-time approximation schemes for clustering problems in any dimensions

Amit Kumar; Yogish Sabharwal; Sandeep Sen

We present a general approach for designing approximation algorithms for a fundamental class of geometric clustering problems in arbitrary dimensions. More specifically, our approach leads to simple randomized algorithms for the k-means, k-median and discrete k-means problems that yield (1+ϵ) approximations with probability ≥ 1/2 and running times of O(2(k/ϵ)O(1) dn). These are the first algorithms for these problems whose running times are linear in the size of the input (nd for n points in d dimensions) assuming k and ϵ are fixed. Our method is general enough to be applicable to clustering problems satisfying certain simple properties and is likely to have further applications.

ieee international conference on high performance computing data and analytics | 2012

Breaking the speed and scalability barriers for graph exploration on distributed-memory machines

Fabio Checconi; Fabrizio Petrini; Jeremiah Willcock; Andrew Lumsdaine; Anamitra R. Choudhury; Yogish Sabharwal

In this paper, we describe the challenges involved in designing a family of highly-efficient Breadth-First Search (BFS) algorithms and in optimizing these algorithms on the latest two generations of Blue Gene machines, Blue Gene/P and Blue Gene/Q. With our recent winning Graph 500 submissions in November 2010, June 2011, and November 2011, we have achieved unprecedented scalability results in both space and size. On Blue Gene/P, we have been able to parallelize a scale 38 problem with 238 vertices and 242 edges on 131,072 processing cores. Using only four racks of an experimental configuration of Blue Gene/Q, we have achieved a processing rate of 254 billion edges per second on 65,536 processing cores. This paper describes the algorithmic design and the main classes of optimizations that we have used to achieve these results.

international conference on database theory | 2009

Analysis of sampling techniques for association rule mining

Venkatesan T. Chakaravarthy; Vinayaka Pandit; Yogish Sabharwal

In this paper, we present a comprehensive theoretical analysis of the sampling technique for the association rule mining problem. Most of the previous works have concentrated only on the empirical evaluation of the effectiveness of sampling for the step of finding frequent itemsets. To the best of our knowledge, a theoretical framework to analyze the quality of the solutions obtained by sampling has not been studied. Our contributions are two-fold. First, we present the notions of ε-close frequent itemset mining and ε-close association rule mining that help assess the quality of the solutions obtained by sampling. Secondly, we show that both the frequent items mining and association rule mining problems can be solved satisfactorily with a sample size that is independent of both the number of transactions size and the number of items. Let θ be the required support, ε the closeness parameter, and 1/h the desired bound on the probability of failure. We show that the sampling based analysis succeeds in solving both ε-close frequent itemset mining and ε-close association rule mining with a probability of at least (1 - 1/h) with a sample of size S = O(1/ε2θ [Δ + log h/(1 - ε)θ]), where Δ is the maximum number of items present in any transaction. Thus, we establish that it is possible to speed up the entire process of association rule mining for massive databases by working with a small sample while retaining any desired degree of accuracy. Our work gives a comprehensive explanation for the well known empirical successes of sampling for association rule mining.

international conference on supercomputing | 2006

Scalable algorithms for global snapshots in distributed systems

Rahul Garg; Vijay K. Garg; Yogish Sabharwal

Existing algorithms for global snapshots in distributed systems are not scalable when the underlying topology is complete. In a network with N processors, these algorithms require O(N) space and O(N) messages per processor. As a result, these algorithms are not efficient in large systems when the logical topology of the communication layer such as MPI is complete. In this paper, we propose three algorithms for global snapshot: a grid-based, a tree-based and a centralized algorithm. The grid-based algorithm uses O(N) space but only O(√N) messages per processor. The tree-based algorithm requires only O(1) space and O(logNlog w) messages per processor where w is the average number of messages in transit per processor. The centralized algorithm requires only O(1) space and O(log w) messages per processor. We also have a matching lower bound for this problem. Our algorithms have applications in checkpointing, detecting stable predicates and implementing synchronizers. We have implemented our algorithms on top of the MPI library on the Blue Gene/L supercomputer. Our experiments confirm that the proposed algorithms significantly reduce the message and space complexity of a global snapshot.Existing algorithms for global snapshots in distributed systems are not scalable when the underlying topology is complete. In a network with N processors, these algorithms require O(N) space and O(N) messages per processor. As a result, these algorithms are not efficient in large systems when the logical topology of the communication layer such as MPI is complete. In this paper, we propose three algorithms for global snapshot: a grid-based, a tree-based and a centralized algorithm. The grid-based algorithm uses O(N) space but only O(√N) messages per processor. The tree-based algorithm requires only O(1) space and O(logNlog w) messages per processor where w is the average number of messages in transit per processor. The centralized algorithm requires only O(1) space and O(log w) messages per processor. We also have a matching lower bound for this problem. Our algorithms have applications in checkpointing, detecting stable predicates and implementing synchronizers. We have implemented our algorithms on top of the MPI library on the Blue Gene/L supercomputer. Our experiments confirm that the proposed algorithms significantly reduce the message and space complexity of a global snapshot.

international colloquium on automata languages and programming | 2005

Linear time algorithms for clustering problems in any dimensions

Amit Kumar; Yogish Sabharwal; Sandeep Sen

We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and e to be fixed. A key ingredient of the k-median result is a (1+e)-approximation algorithm for the 1-median problem which has running time O(2(1/e)O(1)d). The previous best known algorithm for this problem had linear running time.

conference on high performance computing (supercomputing) | 2006

Large scale drop impact analysis of mobile phone using ADVC on Blue Gene/L

Hiroshi Akiba; Tomonobu Ohyama; Yoshinoir Shibata; Kiyoshi Yuyama; Yoshikazu Katai; Ryuichi Takeuchi; Takeshi Hoshino; Shinobu Yoshimura; Hirohisa Noguchi; Manish Gupta; John A. Gunnels; Vernon Austel; Yogish Sabharwal; Rahul Garg; Shoji Kato; Takashi Kawakami; Satoru Todokoro; Junko Ikeda

Existing commercial finite element analysis (FEA) codes do not exhibit the performance necessary for large scale analysis on parallel computer systems. In this paper, we demonstrate the performance characteristics of a commercial parallel structural analysis code, ADVC, on Blue Gene/L (BG/L). The numerical algorithm of ADVC is described, tuned, and optimized on BG/L, and then a large scale drop impact analysis of a mobile phone is performed. The model of the mobile phone is a nearly-full assembly that includes inner structures. The size of the model we have analyzed has 47 million nodal points and 142 million DOFs. This does not seem exceptionally large, but the dynamic impact analysis of a product model, with the contact condition on the entire surface of the outer case under this size, cannot be handled by other CAE systems. Our analysis is an unprecedented attempt in the electronics industry. It took only half a day, 12.1 hours, for the analysis of about 2.4 milliseconds. The floating point operation performance obtained has been 538 GFLOPS on 4096 node of BG/L.

international conference on parallel processing | 2011

Real time contingency analysis for power grids

Anshul Mittal; Jagabondhu Hazra; Nikhil Jain; Vivek Goyal; Deva P. Seetharam; Yogish Sabharwal

Modern power grids are continuously monitored by trained system operators equipped with sophisticated monitoring and control systems. Despite such precautionary measures, large blackouts, that affect more than a million consumers, occur quite frequently. To prevent such blackouts, it is important to perform high-order contingency analysis in real time. However, contingency analysis is computationally very expensive as many different combinations of power system component failures must be analyzed. Analyzing several million such possible combinations can take inordinately long time and it is not be possible for conventional systems to predict blackouts in time to take necessary corrective actions. To address this issue, we present a scalable parallel implementation of a probabilistic contingency analysis scheme that processes only most severe and most probable contingencies. We evaluate our implementation by analyzing benchmark IEEE 300 bus and 118 bus test grids. We perform contingency analysis up to level eight (contingency chains of length eight) and can correctly predict blackouts in real time to a high degree of accuracy. To the best of our knowledge, this is the first implementation of real time contingency analysis beyond level two.

international parallel and distributed processing symposium | 2010

Varying bandwidth resource allocation problem with bag constraints

Venkatesan T. Chakaravarthy; Vinayaka Pandit; Yogish Sabharwal; Deva P. Seetharam

We consider the problem of scheduling jobs on a pool of machines. Each job requires multiple machines on which it executes in parallel. For each job, the input specifies release time, deadline, processing time, profit and the number of machines required. The total number of machines may be different at different points of time. A feasible solution is a subset of jobs and a schedule for them such that at any timeslot, the total number of machines required by the jobs active at the timeslot does not exceed the number of machines available at that timeslot. We present an O(log(Bmax/Bmin))-approximation algorithm, where Bmax and Bmin are the maximum and minimum available bandwidth (maximum and minimum number of machines available over all the timeslots). Our algorithm and the approximation ratio are applicable for more a general problem that we call the Varying bandwidth resource allocation problem with bag constraints (BagVBRap). The BagVBRap problem is a generalization of some previously studied scheduling and resource allocation problems.

Explore More