Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jaswinder Pal Singh is active.

Publication


Featured researches published by Jaswinder Pal Singh.


international symposium on computer architecture | 1995

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo; Moriyoshi Ohara; Evan Torrie; Jaswinder Pal Singh; Anoop Gupta

The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.


international conference on parallel architectures and compilation techniques | 2008

The PARSEC benchmark suite: characterization and architectural implications

Christian Bienia; Sanjeev Kumar; Jaswinder Pal Singh; Kai Li

This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.


ACM Sigarch Computer Architecture News | 1992

SPLASH: Stanford parallel applications for shared-memory

Jaswinder Pal Singh; Wolf-Dietrich Weber; Anoop Gupta

We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss some of their important characteristics, and explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time.


acm symposium on parallel algorithms and architectures | 1996

Scope consistency: a bridge between release consistency and entry consistency

Liviu Iftode; Jaswinder Pal Singh; Kai Li

Systems that maintain coherence at large granularity, such as shared virtual memory systems, suffer from false sharing and extra communication. Relaxed memory consistency models have been used to alleviate these problems, but at a cost in programming complexity. Release Consistency (RC) and Lazy Release Consistency (LRC) are accepted to offer a reasonable tradeoff between performance and programming complexity. Entry Consistency (EC) offers a more relaxed consistency model, but it requires explicit association of shared data objects with synchronization variables. The programming burden of providing such associations can be substantial.


Journal of Parallel and Distributed Computing | 1995

Load balancing and data locality in adaptive hierarchical N -body methods: Barnes-Hut, fast multipole, and radiosity

Jaswinder Pal Singh; Chris Holt; Takashi Totsuka; Anoop Gupta; John L. Hennessy

Abstract Hierarchical N-body methods, which are based on a fundamental insight into the nature of many physical processes, are increasingly being used to solve large-scale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for long-range communication. In this paper, we study the partitioning and scheduling techniques required to obtain effective parallel performance on applications that use a range of hierarchical N-body methods. To obtain representative coverage, we first examine applications that use the two best methods known for classical N-body problems: the Barnes-Hut method and the fast multipole method. Then, we examine a recent hierarchical method for radiosity calculations in computer graphics, which applies the hierarchical N-body approach to a problem with very different characteristics. We find that straightforward decomposition techniques which an automatic scheduler might implement do not scale well, because they are unable to simultaneously provide load balancing and data locality. However, all the applications yield very good parallel performance if appropriate partitioning and scheduling techniques are implemented by the programmer. For the applications that use the Barnes-Hut and fast multipole methods, simple yet effective partitioning techniques can be developed by exploiting some key insights into both the methods and the classical problems that they solve. Using a novel partitioning technique, even relatively small problems achieve 45-fold speedups on a 48-processor Stanford DASH machine (a cache-coherent, shared address space multiprocessor) and 118-fold speedups on a 128-processor simulated architecture. The very different characteristics of the radiosity application require a different partitioning/scheduling approach to be used for it; however, it too yields very good parallel performance.


IEEE Computer Graphics and Applications | 2000

Building and using a scalable display wall system

Kai Li; Han Wu Chen; Yuqun Chen; Douglas W. Clark; Perry R. Cook; Stefanos N. Damianakis; Georg Essl; Adam Finkelstein; Thomas A. Funkhouser; T. Housel; Allison W. Klein; Zhiyan Liu; Emil Praun; Jaswinder Pal Singh; B. Shedd; J. Pal; George Tzanetakis; J. Zheng

Princetons scalable display wall project explores building and using a large-format display with commodity components. The prototype system has been operational since March 1998. Our goal is to construct a collaborative space that fully exploits a large-format display system with immersive sound and natural user interfaces. Our prototype system is built with low-cost commodity components: a cluster of PCs, PC graphics accelerators, consumer video and sound equipment, and portable presentation projectors. This approach has the advantages of low cost and of tracking technology well, as high-volume commodity components typically have better price-performance ratios and improve at faster rates than special-purpose hardware. We report our early experiences in building and using the display wall system. In particular, we describe our approach to research challenges in several specific research areas, including seamless tiling, parallel rendering, parallel data visualization, parallel MPEG decoding, layered multiresolution video input, multichannel immersive sound, user interfaces, application tools, and content creation.


international conference on computer graphics and interactive techniques | 1999

Load balancing for multi-projector rendering systems

Rudrajit Samanta; Jiannan Zheng; Thomas A. Funkhouser; Kai Li; Jaswinder Pal Singh

Multi-projector systems are increasingly being used to provide large-scale and high-resolution displays for next-generation interactive 3D graphics applications, including large-scale data visualization, immersive virtual environments, and collaborative design. These systems must include a very high-performance and scalable 3D rendering subsystem in order to generate high-resolution images at real-time frame rates. This paper describes a sort-first parallel rendering system for a scalable display wall system built with a network of PCs, graphics accelerators, and portable projectors. The main challenge is to develop scalable algorithms to partition and assign rendering tasks effectively under the performance and functionality constraints of system area networks, PCs, and commodity 3-D graphics accelerators. We have developed three coarse-grained partitioning algorithms, incorporated them into a working prototype system, and run initial experiments aimed at evaluating algorithmic trade-offs and performance bottlenecks in such a system. Results of our experiments indicate that the coarse-grained characteristics of the sort-first architecture are well suited for constructing a parallel rendering system running on a PC cluster.


international conference on computer graphics and interactive techniques | 2000

Hybrid sort-first and sort-last parallel rendering with a cluster of PCs

Rudrajit Samanta; Thomas A. Funkhouser; Kai Li; Jaswinder Pal Singh

We investigate a new hybrid of sort-first and sort-last approach for parallel polygon rendering, using as a target platform a cluster of PCs. Unlike previous methods that statically partition the 3D model and/or the 2D image, our approach performs dynamic, view-dependent and coordinated partitioning of both the 3D model and the 2D image. Using a specific algorithm that follows this approach, we show that it performs better than previous approaches and scales better with both processor count and screen resolution. Overall, our algorithm is able to achieve interactive frame rates with efficiencies of 55.0% to 70.5% during simulations of a system with 64 PCs. While it does have potential disadvantages in client-side processing and in dynamic data management—which also stem from its dynamic, view-dependent nature—these problems are likely to diminish with technology trends in the future.


international conference on computer communications | 2004

Efficient event routing in content-based publish-subscribe service networks

Fengyun Cao; Jaswinder Pal Singh

Efficient event delivery in a content-based publish/subscribe system has been a challenging problem. Existing group communication solutions, such as IP multicast or application-level multicast techniques, are not readily applicable due to the highly heterogeneous communication pattern in such systems. We first explore the design space of event routing strategies for content-based publish/subscribe systems. Two major existing approaches are studied: filter-hosed approach, which performs content-based filtering on intermediate routing servers to dynamically guide routing decisions, and multicast-based approach, which delivers events through a few high-quality multicast groups that are pre-constructed to approximately match user interests. These approaches have different trade-offs in the routing quality achieved and the implementation cost and system load generated. We then present a new routing scheme called Kyra that carefully balance these trade-offs. Kyra combines the advantages of content-based filtering and event-space partitioning in the existing approaches to achieve better overall routing efficiency. We use detailed simulations to evaluate Kyra and compare it with existing approaches. The results demonstrate the effectiveness of Kyra in achieving high network efficiency, reducing implementation cost and balancing system load across the publish-subscribe service network.


IEEE Computer | 1993

Scaling parallel programs for multiprocessors: methodology and examples

Jaswinder Pal Singh; John L. Hennessy; Anoop Gupta

Models for the constraints under which an application should be scaled, including constant problem-size scaling, memory-constrained scaling, and time-constrained scaling, are reviewed. A realistic method is described that scales all relevant parameters under considerations imposed by the application domain. This method leads to different conclusions about the effectiveness and design of large multiprocessors than the naive practice of scaling only the data set size. The primary example application is a simulation of galaxies using the Barnes-Hut hierarchical N-body method.<<ETX>>

Collaboration


Dive into the Jaswinder Pal Singh's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kai Li

Princeton University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge