Justin Y. Shi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Justin Y. Shi is active.

Explore More

Publication

Featured researches published by Justin Y. Shi.

international conference on algorithms and architectures for parallel processing | 2011

SpotMPI: a framework for auction-based HPC computing using amazon spot instances

Moussa Taifi; Justin Y. Shi; Abdallah Khreishah

The economy of scale offers cloud computing virtually unlimited cost effective processing potentials. Theoretically, prices under fair market conditions should reflect the most reasonable costs of computations. The fairness is ensured by the mutual agreements between the sellers and the buyers. Resource use efficiency is automatically optimized in the process. While there is no lack of incentives for the cloud provider to offer auction-based computing platform, using these volatile platform for practical computing is a challenge for existing programming paradigms. This paper reports a methodology and a toolkit designed to tame the challenges for MPI applications. Unlike existing MPI fault tolerance tools, we emphasize on dynamically adjusted optimal checkpoint-restart (CPR) intervals. We introduce a formal model, then a HPC application toolkit, named SpotMPI, to facilitate the practical execution of real MPI applications on volatile auction-based cloud platforms. Our models capture the intrinsic dependencies between critical time consuming elements by leveraging instrumented performance parameters and publicly available resource bidding histories. We study algorithms with different computing v.s. communication complexities. Our results show non-trivial insights into the optimal bidding and application scaling strategies.

high performance computing and communications | 2011

Resource Planning for Parallel Processing in the Cloud

Justin Y. Shi; Moussa Taifi; Abdallah Khreishah

Before the emergence of commercial cloud computing, interests in parallel algorithm analysis have been mostly academic. When computing and communication resources are charged by hours, cost effective parallel processing would become a required skill. This paper reports a resource planning study using a method derived from classical program time complexity analysis, we call Timing Models. Unlike existing qualitative performance analysis methods, a Timing Model uses application instrumented capacity measures to capture the quantitative dependencies between a computer program (sequential or parallel) and its processing environments. For applications planning to use commercial clouds, this tool is ideally suited for choosing the most cost-effective configuration. The contribution of the proposed tool is its ability to explore multiple dimensions of a program quantitatively to gain non-trivial insights. This paper uses a simple matrix multiplication application to illustrate the modeling, program instrumentation and performance prediction processes. Since cloud vender do offer HPC hardware resources, we use Amazon EC2 as the target processing environments. The computing and communication models are not only useful in choosing the processing platform but also for understanding the resource usage bills. Comparisons between predicted and actual resource usages show that poor processing granularity wastes resources. Prediction errors are minimized near the optimal number of processors.

computational science and engineering | 2011

Sustainable GPU Computing at Scale

Justin Y. Shi; Moussa Taifi; Abdallah Khreishah; Jie Wu

General purpose GPU (GPGPU) computing has produced the fastest running supercomputers in the world. For continued sustainable progress, GPU computing at scale also need to address two open issues: a) how increase applications mean time between failures (MTBF) as we increase supercomputers component counts, and b) how to minimize unnecessary energy consumption. Since energy consumption is defined by the number of components used, we consider a sustainable high performance computing (HPC) application can allow better performance and reliability at the same time when adding computing or communication components. This paper reports a two-tier semantic statistical multiplexing framework for sustainable HPC at scale. The idea is to leverage the powers of statistic multiplexing to tame the nagging HPC scalability challenges. We include the theoretical model, sustainability analysis and computational experiments with automatic system level multiple CPU/GPU failure containment. Our results show that assuming three times slowdown of the statistical multiplexing layer, for an application using 1024 processors with 35\% checkpoint overhead, the two-tier framework will produce sustained time and energy savings for MTBF less than 6 hours. With 5% checkpoint overhead, 1.5 hour MTBF would be the break even point. These results suggest the practical feasibility for the proposed two-tier framework.

ieee international conference on high performance computing data and analytics | 2012

Program Scalability Analysis for HPC Cloud: Applying Amdahl's Law to NAS Benchmarks

Justin Y. Shi; Moussa Taifi; Aakash Pradeep; Abdallah Khreishah; Vivek Antony

The availability of high performance computing (HPC) clouds requires scalability analysis of parallel programs for multiple different environments in order to maximize the promised economic benefits. Unlike traditional HPC application performance studies that aim to predict performances of like-kind processors, this paper reports an instrumentation assisted complexity analysis method based on Amdahls Law framework for program scalability analysis for different HPC environments. We show that program instrumentation helps Gustafsons scaled speedup formulation to quantify the elusive quality in Amdahls Law. We report that without separating communication time from computing, prediction results are not trustworthy. We demonstrate a methodology that can transform asymptotic complexity models to timing models in order to separate communication time and to identify the optimal degree of parallelism. A traditional HPC cluster and a private HPC cloud are used to validate the proposed methodology by showing the feasibility of optimal parallel processing and by scalability analysis of five NAS benchmarks. Our results show that either cloud or cluster can be effectively exploited if the application can adapt to changing processing conditions dynamically. As we dig deeper into the performance analysis myths, “scalability limit” seems to mean less than its common interpretation but more on the inadequacy our programming habits and architecture support.

Journal of Parallel and Distributed Computing | 2012

Tuple switching network-When slower may be better

Justin Y. Shi; Moussa Taifi; Abdallah Khreishah; Jie Wu

This paper reports an application dependent network design for extreme scale high performance computing (HPC) applications. Traditional scalable network designs focus on fast point-to-point transmission of generic data packets. The proposed network focuses on the sustainability of high performance computing applications by statistical multiplexing of semantic data objects. For HPC applications using data-driven parallel processing, a tuple is a semantic object. We report the design and implementation of a tuple switching network for data parallel HPC applications in order to gain performance and reliability at the same time when adding computing and communication resources. We describe a sustainability model and a simple computational experiment to demonstrate extreme scale applications sustainability with decreasing system mean time between failures (MTBF). Assuming three times slowdown of statistical multiplexing and 35% time loss per checkpoint, a two-tier tuple switching framework would produce sustained performance and energy savings for extreme scale HPC application using more than 1024 processors or less than 6 hour MTBF. Higher processor counts or higher checkpoint overheads accelerate the benefits.

computer science and information engineering | 2009

High Performance Lossless ESB Architecture with Data Protection for Mission-Critical Applications

Justin Y. Shi

This article reports two techniques for building a high performance lossless Service-Oriented Architecture (SOA). It employs a lossless ESB (Enterprise Service Bus) and a lossless high performance transaction processing (database) cluster.

high performance computing and communications | 2009

Decoupling as a Foundation for Large Scale Parallel Computing

Justin Y. Shi

The “main-stream” inter-process communication models (share-memory and message-passing) require the programmers responsible for the construction of a very complex state machine for parallel processing. This has resulted multiple difficulties including programming, performance tuning, debugging, job scheduling and fault tolerance. The most troubling is the degree of difficulties. It increases exponentially as the multiprocessor grows in size. Inspired by the successes of packet switching protocols, this paper reports our preliminary findings in using decoupling technologies for parallel applications with high reliability, high performance and programmability.

American Journal of Speech-language Pathology | 2015

Using Virtual Technology to Promote Functional Communication in Aphasia: Preliminary Evidence From Interactive Dialogues With Human and Virtual Clinicians

Michelene Kalinyak-Fliszar; Nadine Martin; Emily Keshner; Alexander I. Rudnicky; Justin Y. Shi; Gregory Teodoro

PURPOSE We investigated the feasibility of using a virtual clinician (VC) to promote functional communication abilities of persons with aphasia (PWAs). We aimed to determine whether the quantity and quality of verbal output in dialogues with a VC would be the same or greater than those with a human clinician (HC). METHOD Four PWAs practiced dialogues for 2 sessions each with a HC and VC. Dialogues from before and after practice were transcribed and analyzed for content. We compared measures taken before and after practice in the VC and HC conditions. RESULTS Results were mixed. Participants either produced more verbal output with the VC or showed no difference on this measure between the VC and HC conditions. Participants also showed some improvement in postpractice narratives. CONCLUSION Results provide support for the feasibility and applicability of virtual technology to real-life communication contexts to improve functional communication in PWAs.

ieee international conference on high performance computing data and analytics | 2012

Understanding Cloud Data Using Approximate String Matching and Edit Distance

Joseph Jupin; Justin Y. Shi; Zoran Obradovic

For health and human services, fraud detection and other security services, identity resolution is a core requirement for understanding big data in the cloud. Due to the lack of a globally unique identifier and captured typographic differences for the same identity, identity resolution has high spatial and temporal complexities. We propose a filter and verify method to substantially increase the speed of approximate string matching using edit distance. This method has been found to be almost 80 times faster (130 times when combined with other optimizations) than Damerau-Levenshtein edit distance and preserves all approximate matches. Our method creates compressed signatures for data fields and uses Boolean operations and an enhanced bit counter to quickly compare the distance between the fields. This method is intended to be applied to data records whose fields contain relatively short-length strings, such as those found in most demographic data. Without loss of accuracy, the proposed Fast Bitwise Filter will provide substantial performance gain to approximate string comparison in database, record linkage and deduplication data processing systems.

international conference on high performance computing and simulation | 2011

Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs

Moussa Taifi; Abdallah Khreishah; Justin Y. Shi

Recent GPU developments have attracted much interest in the HPC community. Since each GPU interface requires a dedicated host processor, the unused high performance non-GPU processors are simply wasted. GPUs are energy intensive and are more likely to fail than CPUs, we are interested in using all processors to a) boosting application performance, and b) defending GPU failures. This paper reports parallel computation experiments using a natural semantic multiplexing substrate; we call Deeply Decoupled Parallel Processing (D2P2). The idea is to apply statistic multiplexing on applications semantic network with application-defined data tuples. Tuple space parallel processing is a natural choice for applying statistic multiplexing on application semantic networks. We report up to 53% performance gain for CPU:GPU capability ratio of 1:5. For faster GPUs, CPUs are better used to prevent application halt when GPU fails. The D2P2 substrate allows fault tolerant parallel processing using heterogeneous processors.

Explore More