Is this you? Create Your Porfile

Hongyang Sun

École normale supérieure de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongyang Sun is active.

Explore More

Publication

Featured researches published by Hongyang Sun.

IEEE Transactions on Parallel and Distributed Systems | 2011

Efficient Adaptive Scheduling of Multiprocessors with Stable Parallelism Feedback

Hongyang Sun; Yangjie Cao; Wen-Jing Hsu

With proliferation of multicore computers and multiprocessor systems, an imminent challenge is to efficiently schedule parallel applications on these resources. In contrast to conventional static scheduling, adaptive schedulers that dynamically allocate processors to jobs possess good potential for improving processor utilization and speeding up jobs execution. In this paper, we focus on adaptive scheduling of malleable jobs with periodic processor reallocations based on parallelism feedback of the jobs and allocation policy of the system. We present an efficient adaptive scheduler Acdeq that provides parallelism feedback using an adaptive controller A-Control and allocates processors based on the well-known Dynamic Equipartitioning algorithm (Deq). Compared to A-Greedy, an existing adaptive scheduler that experiences feedback instability thus incurs unnecessary scheduling overheads, we show that A-Control achieves much more stable feedback among other desirable control-theoretic properties. Furthermore, we analyze algorithmically the performances of Acdeq in terms of its response time and processor waste for an individual job as well as makespan and total response time for a set of jobs. To the best of our knowledge, Acdeq is the first multiprocessor scheduling algorithm that offers both control-theoretic and algorithmic guarantees. We further evaluate Acdeq via simulations by using Downeys parallel job model augmented with internal parallelism variations. The results confirm its improved performances over Agdeq, and they show that Acdeq excels especially when the scheduling overhead becomes high.

international conference on distributed computing systems | 2011

Tians Scheduling: Using Partial Processing in Best-Effort Applications

Yuxiong He; Sameh Elnikety; Hongyang Sun

To service requests with high quality, interactive services such as web search, on-demand video and on line gaming keep average server utilization low. As servers become busy, queuing delays increase, and requests miss their deadlines, resulting in degraded quality of service with poor user experience and potential revenue loss. In this paper, we propose Tians scheduling, a group of scheduling algorithms for interactive services that can produce partial answers during overload. A Tians scheduler allocates processing time to each request based on system load with the objective of maximizing overall quality of responses. We propose three Tians scheduling algorithms -- off line, on line clairvoyant and on line non clairvoyant. For interactive applications with concave quality profile, we prove that the off line algorithm is optimal. We show the effectiveness of the on line algorithms by conducting a simulation study modeling important applications -- a web search engine and video-on-demand (VOD) system. Simulation results show a significant improvement of Tians over traditional server models: average response quality improves and the variance of responses decreases.

ad hoc networks | 2015

Energy-efficient, thermal-aware modeling and simulation of data centers

Leandro Fontoura Cupertino; Georges Da Costa; Ariel Oleksiak; Wojciech Piatek; Jean-Marc Pierson; Jaume Salom; Laura Sisó; Patricia Stolf; Hongyang Sun; Thomas Zilio

This paper describes the CoolEmAll project and its approach for modeling and simulating energy-efficient and thermal-aware data centers. The aim of the project was to address energy-thermal efficiency of data centers by combining the optimization of IT, cooling and workload management. This paper provides a complete data center model considering the workload profiles, the applications profiling, the power model and a cooling model. Different energy efficiency metrics are proposed and various resource management and scheduling policies are presented. The proposed strategies are validated through simulation at different levels of a data center.

Sustainable Computing: Informatics and Systems | 2014

Energy-efficient and thermal-aware resource management for heterogeneous datacenters

Hongyang Sun; Patricia Stolf; Jean-Marc Pierson; Georges Da Costa

We propose in this paper to study the energy-, thermal- and performance-aware resource management in heterogeneous datacenters. Witnessing the continuous development of heterogeneity in datacenters, we are confronted with their different behaviors in terms of performance, power consumption and thermal dissipation: indeed, heterogeneity at server level lies both in the computing infrastructure (computing power, electrical power consumption) and in the heat removal systems (different enclosure, fans, thermal sinks). Also the physical locations of the servers become important with heterogeneity since some servers can (over)heat others. While many studies address independently these parameters (most of the time performance and power or energy), we show in this paper the necessity to tackle all these aspects for an optimal resource management of the computing resources. This leads to improved energy usage in a heterogeneous datacenter including the cooling of the computer rooms. We build our approach on the concept of heat distribution matrix to handle the mutual influence of the servers, in heterogeneous environments, which is novel in this context. We propose a heuristic to solve the server placement problem and we design a generic greedy framework for the online scheduling problem. We derive several single-objective heuristics (for performance, energy, cooling) and a novel fuzzy-based priority mechanism to handle their tradeoffs. Finally, we show results using extensive simulations fed with actual measurements on heterogeneous servers.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2014

Assessing General-Purpose Algorithms to Cope with Fail-Stop and Silent Errors

Anne Benoit; Aurélien Cavelan; Yves Robert; Hongyang Sun

In this paper, we combine the traditional checkpointing and rollback recovery strategies with verification mechanisms to address both fail-stop and silent errors. The objective is to minimize either makespan or energy consumption. While DVFS is a popular approach for reducing the energy consumption, using lower speeds/voltages can increase the number of errors, thereby complicating the problem. We consider an application workflow whose dependence graph is a chain of tasks, and we study three execution scenarios: (i) a single speed is used during the whole execution; (ii) a second, possibly higher speed is used for any potential re-execution; (iii) different pairs of speeds can be used throughout the execution. For each scenario, we determine the optimal checkpointing and verification locations (and the optimal speeds for the third scenario) to minimize either objective. The different execution scenarios are then assessed and compared through an extensive set of experiments.

international parallel and distributed processing symposium | 2008

Adaptive B-Greedy (ABG): A simple yet efficient scheduling algorithm

Hongyang Sun; Wen-Jing Hsu

In order to improve processor utilizations on parallel systems, adaptive scheduling with parallelism feedback was recently proposed. A-Greedy, an existing adaptive scheduler, offers provably-good job execution time and processor utilization. Unfortunately, it suffers from unstable feedback and hence unnecessary processor reallocations even when the job has constant parallelism. This problem may cause difficulties in the management of system resources. We propose a new adaptive scheduler called ABG (for Adaptive B-Greedy), which ensures both performance and stability. In a direct comparison with A-Greedy using simulated data- parallel jobs, ABG shows an average 50% reduction in wasted processor cycles and an average 20% improvement in running time. For a set of jobs, ABG also outperforms A-Greedy by 10% to 15% on average in terms of both makespan and mean response time, provided the system is not heavily loaded. Our detailed analysis shows that ABG indeed offers improved transient and steady-state behaviors in terms of control-theoretic metrics. Using trim analysis, we show that ABG provides nearly linear speedup for individual jobs and good processor utilizations. Using competitive analysis, we also show that ABG offers good makespan and mean response time bounds.

cluster computing and the grid | 2014

Multi-objective Scheduling for Heterogeneous Server Systems with Machine Placement

Hongyang Sun; Patricia Stolf; Jean-Marc Pierson; Georges Da Costa

Heterogeneous servers are becoming prevalent in many high-performance computing environments, including clusters and data enters. In this paper, we consider multi-objective scheduling for heterogeneous server systems to optimize simultaneously the application performance, energy consumption and thermal imbalance. First, a greedy online framework is presented to allow the scheduling decisions to be made based on any well-defined cost function. To tackle the possibly conflicting objectives, we propose a fuzzy-based priority approach for exploring the tradeoffs of two or more objectives at the same time. Moreover, we present a heuristic algorithm for the static placement of physical machines in order to reduce the maximum temperature at the server outlets. Extensive simulations based on an emerging class of high-density server system have demonstrated the effectiveness of our proposed approach and heuristics in optimizing multiple objectives while achieving better thermal balance.

international parallel and distributed processing symposium | 2013

Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality

Zhihui Du; Hongyang Sun; Yuxiong He; Yu He; David A. Bader; Huazhe Zhang

High response quality is critical for many best-effort interactive services, and at the same time, reducing energy consumption can directly reduce the operational cost of service providers. In this paper, we study the quality-energy tradeoff for such services by using a composite performance metric that captures their relative importance in practice: Service providers usually grant top priority to quality guarantee and explore energy saving secondly. We consider scheduling on multicore systems with core-level DVFS support and a power budget. Our solution consists of two steps. First, we employ an equal sharing principle for both job and power distribution. Specifically, we present a “Cumulative Round-Robin” policy to distribute the jobs onto the cores, and a “Water-Filling” policy to distribute the power dynamically among the cores. Second, we exploit the concave quality function of many best-effort applications, and develop Online-QE, a myopic optimal online algorithm for scheduling jobs on a single-core system. Combining the two steps together, we present a heuristic online algorithm, called DES (Dynamic Equal Sharing), for scheduling best-effort interactive services on multicore systems. The simulation results based on a web search engine application show that DES takes advantage of the core-level DVFS architecture and exploits the concave quality function of best-effort applications to achieve high service quality with low energy consumption.

international parallel and distributed processing symposium | 2016

Optimal Resilience Patterns to Cope with Fail-Stop and Silent Errors

Anne Benoit; Aurélien Cavelan; Yves Robert; Hongyang Sun

This work focuses on resilience techniques at extreme scale. Many papers deal with fail-stop errors. Many others deal with silent errors (or silent data corruptions). But very few papers deal with fail-stop and silent errors simultaneously. However, HPC applications will obviously have to cope with both error sources. This paper presents a unified framework and optimal algorithmic solutions to this double challenge. Silent errors are handled via verification mechanisms(either partially or fully accurate) and in-memory checkpoints. Fail-stop errors are processed via disk checkpoints. All verification and checkpoint types are combined into computational patterns. We provide a unified model, and a full characterization of the optimal pattern. Our results nicely extend several published solutions and demonstrate how to make use of different techniques to solve the double threat of fail-stop and silent errors. Extensive simulations based on real data confirm the accuracy of the model, and show that patterns that combine all resilience mechanisms are required to provide acceptable overheads.

international conference on parallel processing | 2015

Assessing the Impact of Partial Verifications against Silent Data Corruptions

Aurélien Cavelan; Saurabh Kumar Raina; Yves Robert; Hongyang Sun

Silent errors, or silent data corruptions, constitute a major threat on very large scale platforms. When a silent error strikes, it is not detected immediately but only after some delay, which prevents the use of pure periodic check pointing approaches devised for fail-stop errors. Instead, check pointing must be coupled with some verification mechanism to guarantee that corrupted data will never be written into the checkpoint file. Such a guaranteed verification mechanism typically incurs a high cost. In this paper, we assess the impact of using partial verification mechanisms in addition to a guaranteed verification. The main objective is to investigate to which extent it is worthwhile to use some light cost but less accurate verifications in the middle of a periodic computing pattern, which ends with a guaranteed verification right before each checkpoint. Introducing partial verifications dramatically complicates the analysis, but we are able to analytically determine the optimal computing pattern (up to the first-order approximation), including the optimal length of the pattern, the optimal number of partial verifications, as well as their optimal positions inside the pattern. Performance evaluations based on a wide range of parameters confirm the benefit of using partial verifications under certain scenarios, when compared to the baseline algorithm that uses only guaranteed verifications.

Explore More