Tianhai Zhao
Northwestern Polytechnical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tianhai Zhao.
The Journal of Supercomputing | 2015
Lei Zhu; Jianhua Gu; Yunlan Wang; Tianhai Zhao; Zhennao Cai
The complexity and scale of high-performance computer systems are rapidly increasing, so fault tolerance is becoming a critical challenge. In this paper, we consider the impact of multiple proactive actions on proactive fault tolerance and periodic checkpointing. We extended Aupy’s model in the presence of multiple proactive actions, including proactive checkpointing and task migration. We then propose optimal strategies for deciding when to trust predictions, and provide algorithms for the optimal storage interval for periodic checkpointing. The results show that the proposed method can significantly improve system productivity. Our case study indicates that the recall of the predictor is more important for small platforms, and that precision becomes increasingly important as the scale of the system increases.
APPT 2013 Revised Selected Papers of the 10th International Symposium on Advanced Parallel Processing Technologies - Volume 8299 | 2013
Lei Zhu; Jianhua Gu; Yunlan Wang; Tianhai Zhao
With the rapid growth of the high performance computer system size and complexity, passive fault tolerance can no longer effectively provide reliability of the system because of the high overhead and poor scalability of these methods. Hybrid fault tolerant method which is the combination of passive and active fault tolerant approaches has the potential to be widely used in fault tolerance of exascale system. However, there are still many issues of this method need to be ironed out. This paper focuses on the issues of checkpointing of hybrid fault tolerant method. A common question surrounding checkpointing is the optimization of the checkpoint interval. This paper proposes two models to model the systems which adopt hybrid fault tolerance. By comparing their results with the simulation, this paper evaluates the effectiveness of these two models. Experimental result shows that the modified model can not only predict the total work time excellently, but also can predict the optimum checkpoint interval precisely.
network and parallel computing | 2012
Gangfeng Liu; Yunlan Wang; Tianhai Zhao; Jianhua Gu; Dongyang Li
CPU/GPU heterogeneous computing has become a tendency in scientific and engineering computing. The conventional computation models cannot be used to estimate the application running time under the CPU/GPU heterogeneous computing environment. In this paper, a new model named mHLogGP is presented on the basis of mPlogP, LogGP and LogP. In mHLogGP, he communication and memory access is abstracted based on the characteristic of CPU/GPU hybrid computing cluster. This model can be used to study the behavior of application, estimate the execution time and guide the optimization of parallel programs. The results show that the predicted running time approaches to the actual execution of program.
international conference on computational science | 2018
Yunlan Wang; Jing Wang; Xingshe Zhou; Tianhai Zhao; Jianhua Gu
In order to predict blasting vibration intensity accurately, support vector machine regression (SVR) was adopted to predict blasting vibration velocity, vibration frequency and vibration duration. The mutation operation of genetic algorithm (GA) is used to avoid the local optimal solution of particle swarm optimization (PSO). The improved PSO algorithm is used to search for the best parameters of SVR model. In the experiments, the improved PSO-SVR algorithm was realized on the Apache Spark platform. The execution time and prediction accuracy of the sadovski method, the traditional SVR algorithm, the neural network (NN) algorithm and the improved PSO-SVR algorithm were compared. The results show that the improved PSO-SVR algorithm on Spark is feasible and efficient, and the SVR model can predict the blasting vibration intensity more accurately than other methods.
Computers & Electrical Engineering | 2018
Zhengxiong Hou; Yunlan Wang; Yulei Sui; Jianhua Gu; Tianhai Zhao; Xingshe Zhou
Abstract There are several challenges (e.g., imbalance between supply and demand of hardware resources and software licenses, and usability) under modern High-Performance Computing (HPC) environment. As a means of providing an on-demand service for end users, we propose a Software-as-a-Service (SaaS) approach for managing commercial HPC applications as a Web-based service deployed on top of federated clouds. Some inter-trusted private or public clouds are federated to create a unified service platform with a large amount of hardware resources. In addition, an on-demand, pay-per-use model for Web-service-enabled HPC applications is proposed. Further, we provide an economic analysis of the proposed approach from the perspective of end users, cloud service providers, and Independent Software Vendors (ISVs). We conduct a simulation using two HPC application services on three federated clouds. A combined Quality of Service (QoS) and economic evaluation demonstrates a better effect of the proposed approach comparing with existing HPC platforms.
international conference on signal processing | 2016
Yunlan Wang; Bin Zhang; Tianhai Zhao; Jianhua Gu
In this paper, a novel scientific workflow simulator is designed and implemented which is energy-aware and can simulate the scientific workflow that running on the multi-core cluster system. A method for application modeling is proposed which can describe the process dependence, data dependence and performance requirement of the workflow. A computing system model was also introduced to describe the layered structure of the cluster, the communication matrixes of the cluster nodes, and the energy consumption under different load level. Based on the application model and the computing systems model, the scientific workflow scheduling problem was abstracted to multi-objective optimization problem, a scheduling algorithm is presented which can satisfy the performance constraints and is energy aware as while. The experiment results proved the effectiveness of the simulator.
international conference on computer science and network technology | 2015
Zhengxiong Hou; Jianhua Gu; Yunlan Wang; Tianhai Zhao
For the distributed high performance computing infrastructure, there is a service oriented trend with hybrid physical clusters and virtual clusters. Application software can be service-enabled on the basis of underlying hardware resources. To enable on-demand service for end users, monitoring of software services is important in the service oriented hybrid computing environment. In this paper, an autonomic monitoring framework for the web service-enabled application software is proposed. Software resources are web service-enabled on the basis of elastic hardware resources—physical machines or virtual machines. Both of the software services and underlying hardware resources are dynamically monitored. An autonomic monitoring algorithm with self-optimized updating period and self-adaptive event-driven update is also given for the dynamic information retrieving of software services and underlying hardware resources. Some experiments were conducted in a multi-cluster based hybrid distributed HPC infrastructure. The proposed approach can bring more accurate monitoring information with a better updating cost.
grid and pervasive computing | 2013
Lei Zhu; Jianhua Gu; Tianhai Zhao; Yunlan Wang
With system size and complexity is growing rapidly, traditional passive fault tolerance can no longer guarantee the reliability of system because of the high overhead and poor scalability of these methods. Active fault tolerance is believed to be the most important fault tolerant approach for exascale systems. Aiming at system failure prediction, this paper proposes a system logs pre-processing method using classification via sparse representation (SRCP). Adopting the idea of vectorization, SRCP removes the details of each log and generates the corresponding Vectors. It uses TF-IDF (term frequency-inverse document frequency) method to Weight each keyword which can reveal more precise information about correlation between log records. In order to improve the accuracy and flexibility of pre-processing method, log vectors are processed by sparse representation classification. For generalization purpose, SRCP does not adopt any expert system or domain knowledge. Experimental results show that, SRCP can not only achieve both outstanding precision and F-measure, but also provide a satisfactory compression ratio.
International Conference on Graphic and Image Processing (ICGIP 2011) | 2011
Xiuchun Li; Jianhua Gu; Yunlan Wang; Tianhai Zhao
Synchronization of different chaotic signals is realized by means of adaptive sliding mode control method, where signal system is influenced by random perturbations. The designed control method can be applied to a large class of chaotic systems. Finally, two different chaotic systems are simulated to realize signal synchronization, and it verifies the effectiveness of the proposed method.
international congress on image and signal processing | 2010
Xiuchun Li; Jianhua Gu; Yunlan Wang; Tianhai Zhao; Zhengxiong Hou; Tao Wang
Using adaptive tracking control, we investigate a class of the perturbed chaotic systems with unknown parameters based on Lyapunov stability theory, where both the bound of random perturbations of chaotic signal and the feedback strength of the controller are estimated through an adaptive control process respectively, so they are not necessarily known in advance. It is shown that this adaptive approach makes the perturbed chaotic signal track any desired reference signal. In the end, the perturbed Lorenz chaotic system is simulated to verify the effectiveness of the proposed scheme.