Is this you? Create Your Porfile

Xian-He Sun

Illinois Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xian-He Sun is active.

Explore More

Publication

Featured researches published by Xian-He Sun.

grid computing | 2003

QoS guided min-min heuristic for grid task scheduling

XiaoShan He; Xian-He Sun; Gregor von Laszewski

Task scheduling is an integrated component of computing. With the emergence of Grid and ubiquitous computing, new challenges appear in task scheduling based on properties such as security, quality of service, and lack of central control within distributed administrative domains. A Grid task scheduling framework must be able to deal with these issues. One of the goals of Grid task scheduling is to achieve high system throughput while matching applications with the available computing resources. This matching of resources in a non-deterministically shared heterogeneous environment leads to concerns over Quality of Service (QoS). In this paper a novel QoS guided task scheduling algorithm for Grid computing is introduced. The proposed novel algorithm is based on a general adaptive scheduling heuristics that includes QoS guidance. The algorithm is evaluated within a simulated Grid environment. The experimental results show that the new QoS guided Min-Min heuristic can lead to significant performance gain for a variety of applications. The approach is compared with others based on the quality of the prediction formulated by inaccurate information.

Journal of Parallel and Distributed Computing | 1993

Scalable problems and memory-bounded speedup

Xian-He Sun; Lionel M. Ni

Abstract In this paper three models of parallel speedup are studied. They are fixed-size speedup , fixed-time speedup , and memory-bounded speedup . The latter two consider the relationship between speedup and problem scalability. Two sets of speedup formulations are derived for these three models. One set considers uneven workload allocation and communication overhead and gives more accurate estimation. Another set considers a simplified case and provides a clear picture on the impact of the sequential portion of an application on the possible performance gain from parallel processing. The simplified fixed-size speedup is Amdahl′s law . The simplified fixed-time speedup is Gustafson′s scaled speedup . The simplified memory-bounded speedup contains both Amdahl′s law and Gustafson′s scaled speedup as special cases. This study leads to a better understanding of parallel processing.

Journal of Parallel and Distributed Computing | 2010

Reevaluating Amdahl's law in the multicore era

Xian-He Sun; Yong Chen

Microprocessor architecture has entered the multicore era. Recently, Hill and Marty presented a pessimistic view of multicore scalability. Their analysis was based on Amdahls law (i.e. fixed-workload condition) and challenged readers to develop better models. In this study, we analyze multicore scalability under fixed-time and memory-bound conditions and from the data access (memory wall) perspective. We use the same hardware cost model of multicore chips used by Hill and Marty, but achieve very different and more optimistic performance models. These models show that there is no inherent, immovable upper bound on the scalability of multicore architectures. These results complement existing studies and demonstrate that multicore architectures are capable of extensive scalability.

ieee international conference on high performance computing data and analytics | 2008

Parallel I/O prefetching using MPI file caching and I/O signatures

Surendra Byna; Yong Chen; Xian-He Sun; Rajeev Thakur; William Gropp

Parallel I/O prefetching is considered to be effective in improving I/O performance. However, the effectiveness depends on determining patterns among future I/O accesses swiftly and fetching data in time, which is difficult to achieve in general. In this study, we propose an I/O signature-based prefetching strategy. The idea is to use a predetermined I/O signature of an application to guide prefetching. To put this idea to work, we first derived a classification of patterns and introduced a simple and effective signature notation to represent patterns. We then developed a toolkit to trace and generate I/O signatures automatically. Finally, we designed and implemented a thread-based client-side collective prefetching cache layer for MPI-IO library to support prefetching. A prefetching thread reads I/O signatures of an application and adjusts them by observing I/O accesses at runtime. Experimental results show that the proposed prefetching method improves I/O performance significantly for applications with complex patterns.

conference on high performance computing (supercomputing) | 1990

Another view on parallel speedup

Xian-He Sun; Lionel M. Ni

Three models of parallel speedup are studied: fixed-size speedup, fixed-time speedup, and memory-bounded speedup. Two sets of speedup formulations are derived for these three models. One set requires more information and gives more accurate estimation. Another set considers a simplified case and provides a clear picture of possible performance gain of parallel processing. The simplified fixed-size speedup is Amdahls law. The simplified fixed-time speedup is Gustafsons scaled speedup. The simplified memory-bounded speedup contains both Amdahls law and Gustafsons scaled speedup as its special cases. A metric for performance evaluation is proposed.<<ETX>>

international parallel and distributed processing symposium | 2003

Grid Harvest Service: a system for long-term, application-level task scheduling

Xian-He Sun; Ming Wu

With the emergence of Grid computing environment, performance measurement, analysis and prediction of non-dedicated distributed systems have become increasingly important. In this study, we put forward a novel performance model for non-dedicated network computing. Based on this model, a performance prediction and task scheduling system called Grid Harvest Service (GHS), has been designed and implemented. GHS consists of a performance measurement component, a prediction component and a scheduling component. Different scheduling algorithms are proposed for different situations. Experimental results show that the GHS system provides satisfactory solution for performance prediction and scheduling of large applications and that GHS has a real potential.

ieee international conference on high performance computing data and analytics | 2008

Hiding I/O latency with pre-execution prefetching for parallel applications

Yong Chen; Surendra Byna; Xian-He Sun; Rajeev Thakur; William Gropp

Parallel applications are usually able to achieve high computational performance but suffer from large latency in I/O accesses. I/O prefetching is an effective solution for masking the latency. Most of existing I/O prefetching techniques, however, are conservative and their effectiveness is limited by low accuracy and coverage. As the processor-I/O performance gap has been increasing rapidly, data-access delay has become a dominant performance bottleneck. We argue that it is time to revisit the ldquoI/O wallrdquo problem and trade the excessive computing power with data-access speed. We propose a novel pre-execution approach for masking I/O latency. We describe the pre-execution I/O prefetching framework, the pre-execution thread construction methodology, the underlying library support, and the prototype implementation in the ROMIO MPI-IO implementation in MPICH2. Preliminary experiments show that the pre-execution approach is promising in reducing I/O access latency and has real potential.

Software - Practice and Experience | 2002

Data collection and restoration for heterogenenous process migration

Kasidit Chanchio; Xian-He Sun

This study presents a practical solution for data collection and restoration to migrate a process written in high‐level stack‐based languages such as C and Fortran over a network of heterogeneous computers. We first introduce a logical data model, namely the Memory Space Representation (MSR) model, to recognize complex data structures in process address space. Then, novel methods are developed to incorporate the MSR model into a process, and to collect and restore data efficiently. We have implemented prototype software and performed experiments on different programs. Experimental and analytical results show that: (1) a user‐level process can be migrated across different computing platforms; (2) semantic information of data structures in the processs memory space can be correctly collected and restored; (3) costs of data collection and restoration depend on the complexity of the MSR graph in the memory space and the amount of data involved; and (4) the implantation of the MSR model into the process is not a decisive factor of incurring execution overheads. With appropriate program analysis, we can practically achieve low overhead. Copyright

international conference on cluster computing | 2015

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

Ke Wang; Ning Liu; Iman Sadooghi; Xi Yang; Xiaobing Zhou; Tonglin Li; Michael Lang; Xian-He Sun; Ioan Raicu

Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoops scalability to tomorrows extreme-scale data centers. This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications of many-task computing on supercomputers. We propose to leverage the distributed design wisdoms of MATRIX to schedule arbitrary data processing applications in cloud. We compare MATRIX with YARN in processing typical Hadoop workloads, such as WordCount, TeraSort, Grep and RandomWriter, and the Ligand application in Bioinformatics on the Amazon Cloud. Experimental results show that MATRIX outperforms YARN by 1.27X for the typical workloads, and by 2.04X for the real application. We also run and simulate MATRIX with fine-grained sub-second workloads. With the simulation results giving the efficiency of 86.8% at 64K cores for the 150ms workload, we show that MATRIX has the potential to enable Hadoop to scale to extreme-scale data centers for fine-grained workloads.

conference on decision and control | 1989

Parallel processing for the load flow of power systems: the approach and applications

Fathi M. A. Salam; Lionel M. Ni; S. Guo; Xian-He Sun

A description is given of homotopy-based computational parallel algorithms for solving for all the roots of a system of algebraic polynomial equations. Also presented is a convenient polynomial representation of the load flow equations of power systems. The algorithm techniques are then applied to obtain all steady-state solutions of the load flow for five-bus and seven-bus power system networks. A special probability-one homotopy method is tailored for the load flow to reduce the computational complexity while still guaranteeing the finding of all solutions computationally. More importantly and practically, the numerical implementation of the solution procedures exploits inherent parallelism in the load flow equations to be efficiently executed on massively parallel distributed-memory multiprocessors.<<ETX>>

Explore More