Shanjiang Tang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shanjiang Tang is active.

Explore More

Publication

Featured researches published by Shanjiang Tang.

ieee international conference on cloud computing technology and science | 2014

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters

Shanjiang Tang; Bu-Sung Lee; Bingsheng He

MapReduce is a popular computing paradigm for large-scale data processing in cloud computing. However, the slot-based MapReduce system (e.g., Hadoop MRv1) can suffer from poor performance due to its unoptimized resource allocation. To address it, this paper identifies and optimizes the resource allocation from three key aspects. First, due to the pre-configuration of distinct map slots and reduce slots which are not fungible, slots can be severely under-utilized. Because map slots might be fully utilized while reduce slots are empty, and vice-versa. We propose an alternative technique called Dynamic Hadoop SlotAllocation by keeping the slot-based model. It relaxes the slot allocation constraint to allow slots to be reallocated to either map or reduce tasks depending on their needs. Second, the speculative execution can tackle the straggler problem, which has shown to improve the performance for a single job but at the expense of the cluster efficiency. In view of this, we propose Speculative Execution Performance Balancing to balance the performance tradeoff between a single job and a batch of jobs. Third, delay scheduling has shown to improve the data locality but at the cost of fairness. Alternatively, we propose a technique called Slot PreSchedulingthat can improve the data locality but with no impact on fairness. Finally, by combining these techniques together, we form a step-by-step slot allocation system called DynamicMR that can improve the performance of MapReduce workloads substantially. The experimental results show that our DynamicMR can improve the performance of Hadoop MRv1 significantly while maintaining the fairness, by up to 46~115 percent for single jobs and 49~112 percent for multiple jobs. Moreover, we make a comparison with YARN experimentally, showing that DynamicMR outperforms YARN by about 2~9 percent for multiple jobs due to its ratio control mechanism of running map/reduce tasks.

IEEE Transactions on Parallel and Distributed Systems | 2012

EasyPDP: An Efficient Parallel Dynamic Programming Runtime System for Computational Biology

Shanjiang Tang; Ce Yu; Jizhou Sun; Bu-Sung Lee; Tao Zhang; Zhen Xu; Huabei Wu

Dynamic programming (DP) is a popular and efficient technique in many scientific applications such as computational biology. Nevertheless, its performance is limited due to the burgeoning volume of scientific data, and parallelism is necessary and crucial to keep the computation time at acceptable levels. The intrinsically strong data dependency of dynamic programming makes it difficult and error-prone for the programmer to write a correct and efficient parallel program. Therefore, this paper builds a runtime system named EasyPDP aiming at parallelizing dynamic programming algorithms on multicore and multiprocessor platforms. Under the concept of software reusability and complexity reduction of parallel programming, a DAG Data Driven Model is proposed, which supports those applications with a strong data interdependence relationship. Based on the model, EasyPDP runtime system is designed and implemented. It automatically handles thread creation, dynamic data task allocation and scheduling, data partitioning, and fault tolerance. Five frequently used DAG patterns from biological dynamic programming algorithms have been put into the DAG pattern library of EasyPDP, so that the programmer can choose to use any of them according to his/her specific application. Besides, an ideal computing distribution model is proposed to discuss the optimal values for the performance tuning arguments of EasyPDP. We evaluate the performance potential and fault tolerance feature of EasyPDP in multicore system. We also compare EasyPDP with other methods such as Block-Cycle Wavefront (BCW). The experimental results illustrate that EasyPDP system is fine and provides an efficient infrastructure for dynamic programming algorithms.

international conference on supercomputing | 2014

Long-term resource fairness: towards economic fairness on pay-as-you-use computing systems

Shanjiang Tang; Bu-Sung Lee; Bingsheng He; Haikun Liu

Fair resource allocation is a key building block of any shared computing system. However, MemoryLess Resource Fairness (MLRF), widely used in many existing frameworks such as YARN, Mesos and Dryad, is not suitable for pay-as-you-use computing. To address this problem, this paper proposes Long-Term Resource Fairness (LTRF), a novel fair resource allocation mechanism. We show that LTRF satisfies several highly desirable properties. First, LTRF incentivizes clients to share resources via group-buying by ensuring that no client is better off in a computing system that she buys and uses individually. Second, LTRF incentivizes clients to submit non-trivial workloads and be willing to yield unneeded resources to others. Third, LTRF has a resource-as-you-pay fairness property, which ensures the amount of resources that each client should get according to her monetary cost, despite that her resource demand varies over time. Finally, LTRF is strategy-proof, since it can make sure that a client cannot get more resources by lying about her demand. We have implemented LTRF in YARN by developing LTYARN, a long-term YARN fair scheduler, and shown that it leads to a better resource fairness than other state-of-the-art fair schedulers.

international conference on cluster computing | 2013

Dynamic slot allocation technique for MapReduce clusters

Shanjiang Tang; Bu-Sung Lee; Bingsheng He

MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. However, the slot utilization can be low, especially when Hadoop Fair Scheduler is used, due to the pre-allocation of slots among map and reduce tasks, and the order that map tasks followed by reduce tasks in a typical MapReduce environment. To address this problem, we propose to allow slots to be dynamically (re)allocated to either map or reduce tasks depending on their actual requirement. Specifically, we have proposed two types of Dynamic Hadoop Fair Scheduler (DHFS), for two different levels of fairness (i.e., cluster and pool level). The experimental results show that the proposed DHFS can improve the system performance significantly (by 32% ~ 55% for a single job and 44% ~ 68% for multiple jobs) while guaranteeing the fairness.

international conference on parallel processing | 2013

MROrder: flexible job ordering optimization for online mapreduce workloads

Shanjiang Tang; Bu-Sung Lee; Bingsheng He

MapReduce has become a widely used computing model for large-scale data processing in clusters and data centers. A MapReduce workload generally contains multiple jobs. Due to the general execution constraints that map tasks are executed before reduce tasks, different job execution orders in a MapReduce workload can have significantly different performance and system utilization. This paper proposes a prototype system called MROrder to dynamically optimize the job order for online MapReduce workloads. Moreover, MROrder is designed to be flexible for different optimization metrics, e.g., makespan and total completion time. The experimental results show that MROrder is able to improve the system performance by up to 31% for makespan and 176% for total completion time.

IEEE Transactions on Services Computing | 2016

Dynamic Job Ordering and Slot Configurations for MapReduce Workloads

Shanjiang Tang; Bu-Sung Lee; Bingsheng He

MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This paper proposes two classes of algorithms to minimize the makespan and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 ~ 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.

IEEE Transactions on Services Computing | 2018

Fair Resource Allocation for Data-Intensive Computing in the Cloud

Shanjiang Tang; Bu-Sung Lee; Bingsheng He

To address the computing challenge of ‘big data’, a number of data-intensive computing frameworks (e.g., MapReduce, Dryad, Storm and Spark) have emerged and become popular. YARN is a de facto resource management platform that enables these frameworks running together in a shared system. However, we observe that, in cloud computing environment, the fair resource allocation policy implemented in YARN is not suitable because of its memoryless resource allocation fashion leading to violations of a number of good properties in shared computing systems. This paper attempts to address these problems for YARN. Both single-level and hierarchical resource allocations are considered. For single-level resource allocation, we propose a novel fair resource allocation mechanism called Long-Term Resource Fairness (LTRF) for such computing. For hierarchical resource allocation, we propose Hierarchical Long-Term Resource Fairness (H-LTRF) by extending LTRF. We show that both LTRF and H-LTRF can address these fairness problems of current resource allocation policy and are thus suitable for cloud computing. Finally, we have developed LTYARN by implementing LTRF and H-LTRF in YARN, and our experiments show that it leads to a better resource fairness than existing fair schedulers of YARN.

BMC Bioinformatics | 2017

CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment

Xi Chen; Chen Wang; Shanjiang Tang; Ce Yu; Quan Zou

BackgroundThe multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users’ sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.ResultsThis paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users’ submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn2) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software.ConclusionCMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

EasyHPS: A Multilevel Hybrid Parallel System for Dynamic Programming

Jun Du; Ce Yu; Jizhou Sun; Chao Sun; Shanjiang Tang; Yanlong Yin

Dynamic programming approach solves complex problems efficiently by breaking them down into simpler sub-problems, and is widely utilized in scientific computing. With the increasing data volume of scientific applications and development of multi-core/multi-processor hardware technologies, it is necessary to develop efficient techniques for parallelizing dynamic programming algorithms, particularly in multilevel computing environment. The intrinsically strong data dependency of dynamic programming also makes it difficult and error-prone for the programmer to write a correct and efficient parallel program. In order to make the parallel programming easier and more efficient, we have developed a multilevel hybrid parallel runtime system for dynamic programming named EasyHPS based on the Directed Acyclic Graph(DAG) Data Driven Model in this paper. The EasyHPS system encapsulates details of parallelization implementation, such as task scheduling and message passing, and provides easy API for users to reduce the complexity of parallel programming parallelization. In the DAG Data Driven Model, the entire application is initially partitioned by data partitioning into sub-tasks that each sub-task processing a data block. Then all sub-tasks are modeled as a DAG, in which each vertex represents a sub-task and each edge indicates the communication dependency between the two sub-tasks. In task scheduling, a dynamic approach is proposed based on DAG Data Driven Model to achieve load balancing. Data partitioning and task scheduling are both done on processor-level and thread-level in the multilevel computing environment. In addition, experimental results demonstrate that the proposed dynamic scheduling approach in EasyHPS is more efficient in comparison with those static ones such as block-cyclic based wave front.

IEEE Transactions on Biomedical Engineering | 2017

Gait Rhythm Fluctuation Analysis for Neurodegenerative Diseases by Empirical Mode Decomposition

Peng Ren; Shanjiang Tang; Fang Fang; Lizhu Luo; Lei Xu; Maria L. Bringas-Vega; Dezhong Yao; Keith M. Kendrick; Pedro A. Valdes-Sosa

Previous studies have indicated that gait rhythm fluctuations are useful for characterizing certain pathologies of neurodegenerative diseases such as Huntingtons disease (HD), amyotrophic lateral sclerosis (ALS), and Parkinsons disease (PD). However, no previous study has investigated the properties of frequency range distributions of gait rhythms. Therefore, in our study, empirical mode decomposition was implemented for decomposing the time series of gait rhythms into intrinsic mode functions from the high-frequency component to the low-frequency component sequentially. Then, Kendalls coefficient of concordance and the ratio for energy change for different IMFs were calculated, which were denoted as Wand RE, respectively. Results revealed that the frequency distributions of gait rhythms in patients with neurodegenerative diseases are less homogeneous than healthy subjects, and the gait rhythms of the patients contain much more high-frequency components. In addition, parameters of W and RE can significantly differentiate among the four groups of subjects (HD, ALS, PD, and healthy subjects) (with the minimum p-value of 0.0000493). Finally, five representative classifiers were utilized in order to evaluate the possible capabilities of W and RE to distinguish the patients with neurodegenerative diseases from the healthy subjects. This achieved maximum area under the curve values of 0.949, 0.900, and 0.934 for PD, HD, and ALS detection, respectively. In sum, our study suggests that gait rhythm features extracted in the frequency domain should be given consideration seriously in the future neurodegenerative disease characterization and intervention.

Explore More