Is this you? Create Your Porfile

Andrew Sohn

New Jersey Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew Sohn is active.

Explore More

Publication

Featured researches published by Andrew Sohn.

international conference on supercomputing | 2008

Autonomous learning for efficient resource utilization of dynamic VM migration

Hyung Won Choi; Hu-Keun Kwak; Andrew Sohn; Kyu-Sik Chung

Dynamic migration of virtual machines on a cluster of physical machines is designed to maximize resource utilization by balancing loads across the cluster. When the utilization of a physical machine is beyond a fixed threshold, the machine is deemed overloaded. A virtual machine is then selected within the overloaded physical machine for migration to a lightly loaded physical machine. Key to such threshold-based VM migration is to determine when to move which VM to what physical machine, since wrong or inadequate decisions can cause unnecessary migrations that would adversely affect the overall performance. We present in this paper a learning framework that autonomously finds and adjusts thresholds at runtime for different computing requirements. Central to our approach is the previous history of migrations and their effects before and after each migration in terms of standard deviation of utilization. We set up an experimental environment that consists of extensive real world benchmarking problems and a cluster of 16 physical machines each of which has on average eight virtual machines. We demonstrate through experimental results that our approach autonomously finds thresholds close to the optimal ones for different computing scenarios and that such varying thresholds yield an optimal number of VM migrations for maximizing resource utilization.

IEEE Transactions on Parallel and Distributed Systems | 1995

Parallel N-ary speculative computation of simulated annealing

Andrew Sohn

Simulated annealing is known to be an efficient method for combinatorial optimization problems. Its usage for realistic problem size, however, has been limited by the long execution time due to its sequential nature. This report presents a practical approach to synchronous simulated annealing for massively parallel distributed-memory multiprocessors. We use an n-ary speculative tree to execute n different iterations in parallel on n processors, called generalized speculative computation (GSC). Execution results of the 100- to 500-city traveling salesman problems on the AP1000 massively parallel multiprocessor demonstrate that the GSC approach can be an effective method for parallel simulated annealing as it gave over 20-fold speedup on 100 processors. >

international conference on supercomputing | 1998

Load balanced parallel radix sort

Andrew Sohn; Yuetsu Kodama

Radix sort suffers from the unequal number of input keys due to the unknown characteristics of input keys. We present in this report a new radix sorting algorithm, called balanced radix sort which guarantees that each processor has exactly the same number of keys regardless of the data characteristics. The main idea of balanced radix sort is to store any processor which has over n/P keys to its neighbor processor, where n is the total number of keys and P is the number of processors. We have implemented balanced radix sort on two distributed-memory machines IBM SPZWN and Cray T3E. Multiple versions of 32-bit and 64-bit integers and 64-bit doubles are implemented in Message Passing Interface for portability. The sequential and parallel versions consist of approximately 50 and 150 lines of C code respectively including parallel constructs. Experimental results indicate that balanced radix sort can sort OSG integers in 20 seconds and 128M doubles in 15 seconds on a 64-processor SPZWN while yielding over 40-fold speedup. When compared with other radix sorting algorithms, balanced radix sort outperformed, showing two to six times faster. When compared with sample sorting algorithms, which are known to outperform all similar methods, balanced radix sort is 30% to 100% faster based on the same machine and key initialization.

acm symposium on parallel algorithms and architectures | 1996

A dynamic load balancing framework for unstructured adaptive computations on distributed-memory multiprocessors

Andrew Sohn; Rupak Biswas; Horst D. Simon

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view each time the computational mesh is adapted. JOVE has been implemented on an SP2 in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. Furthermore, JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.

Journal of Parallel and Distributed Computing | 1998

HARP: a dynamic spectral partitioner

Horst D. Simon; Andrew Sohn; Rupak Biswas

Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such as recursive spectral bisection (RSB), have proven effective in generating high-quality partitions of realistically sized meshes. The major problem which hindered their widespread use was their long execution times. This paper presents a new inertial spectral partitioner called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime for the dynamic load balancing framework JOVE which dynamically balances the computational loads of distributed-memory machines with a global view. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigenvectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP-2, which is several times faster than other spectral partitioners, while maintaining the solution quality of the proven RSB method. These results indicate that graph partitioning can now be truly embedded in dynamically changing real-world applications.

international conference on parallel architectures and compilation techniques | 1996

Identifying the capability of overlapping computation with communication

Andrew Sohn; Jui Ku; Yuetsu Kodama; Mitsuhisa Sato; Hirofumi Sakane; Hayato Yamana; Shuichi Sakai; Yoshinori Yamaguchi

Overlapping computation with communication is central to obtaining high performance on distributed-memory multiprocessors. This report explicates the overlapping capability of two distributed-memory multiprocessors: the EM-X and IBM SP-2. The well-known bitonic sorting algorithm is selected for experiments. Various message sizes are used to determine when, where, how much and why overlapping takes place. Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. EM-X is found to be message-size insensitive yielding high overlap for various message sizes, while SP-2 was effective for the window of message size 512 to 2K integers.

acm symposium on parallel algorithms and architectures | 1997

HARP: a fast spectral partitioner

Horst D. Simon; Andrew Sohn; Rupak Biswas

Partitioningunstructuredgraphsis central to the parallel solution of computational science and engineering problems, Spectral partitioners, such recursive spectral bisection (RSB), have proven effective in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their widespread use was their long execution times. This paper presents a new inertial spectral partitioned, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigenvectors of the originaJ mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime, Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel MPI version of HARP has also been implemented on IBM SP2 and Cray T3E. PrrralIel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications,

international conference on cloud computing | 2014

Workload Prediction of Virtual Machines for Harnessing Data Center Resources

Kashifuddin Qazi; Yang Li; Andrew Sohn

Virtual Machines (VM) offer data-center and cloud owners the option to lease computational resources such as CPU cycles, Memory, Disk space and Network bandwidth to end-users. Optimal usage of the resources of the Physical Machines (PM) that make up the cloud is an important consideration as a lot of major enterprises and institutions are opting for servers in the cloud. At any given time, the PMs should not be overloaded to meet SLO requirements and at the same time a minimum number of PMs should be running to conserve energy. The resource loads on individual VMs in the data center are not arbitrary. Finding patterns in the loads can help the data center owners arrange the VMs on the PMs such that both of the above requirements are met. In this paper we present a fast, low overhead, framework that intelligently predicts the behavior of the cluster based on its history and then accordingly re-distributes VMs in the cluster to free up PMs. These PMs are then re-purposed to accommodate more VMs or turned off to save energy. We analyze real world loads and show that they follow a Chaotic time series. At the core of our framework are concepts of Chaos Theory with optimizations that make our framework indifferent to the type of loads and inherent cycles in them. We set up this framework on our testbed cluster and analyze its performance. Extensive experimental results for a variety of real world loads, indicate our frameworks efficacy compared to other methods reported to date.

conference on high performance computing (supercomputing) | 1998

S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-based Computations

Andrew Sohn; Horst D. Simon

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. We present in this report a scalable parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP is a universal dynamic partitioner with three distinctive features: (a) fast partitioning from scratch with a global view, requiring no information from the previous iterations, (b) no restriction on the issue of one partition per processor, (c) no imbalance factor issue because of precise bisection using sorting. Two types of parallelism have been exploited in S-HARP, fine-grain loop-level parallelism and coarse-grain recursive parallelism. The parallel partitioner has been implemented in Message Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.18 seconds on a 64-processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 17-fold speedup on 64 processors while ParaMeTiS1.0 gives a few-fold speedup. Experimental results demonstrate that S-HARP is three to 15 times faster than the other dynamic partitioners on computational meshes of size over 100,000 vertices while giving comparable edge cuts.

Journal of Parallel and Distributed Computing | 1997

Data and Workload Distribution in a Multithreaded Architecture

Andrew Sohn; Mitsuhisa Sato; Namhoon Yoo; Jean-Luc Gaudiot

Matching data distribution to workload distribution is important in improving the performance of distributed-memory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular resource usage. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading ondatadistribution andworkloaddistribution withvariable, thread granularity. Various types of workload distribution strategies are defined along with thread granularity. Several types of data distribution strategies are investigated. These include row-wise cyclic,k-way partial-row cyclic, and blocked distribution. To investigate the performance of multithreading, two problems are selected: highly sequential Gaussian elimination with partial pivoting and highly parallel matrix multiplication. Execution results on the 80-processor EM-4 distributed-memory multiprocessor indicate that multithreading can off set the loss due to the mismatch between data distribution and workload distribution even for sequential and irregular problems while giving high absolute performance.

Explore More