Is this you? Create Your Porfile

Kan Hu

Huazhong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kan Hu is active.

Explore More

Publication

Featured researches published by Kan Hu.

international conference on parallel and distributed systems | 2011

Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs

Xiaowen Feng; Hai Jin; Ran Zheng; Kan Hu; Jingxiang Zeng; Zhiyuan Shao

Sparse Matrix-Vector multiplication (SpMV) is one of the most significant yet challenging issues in computational science area. It is a memory-bound application whose performance mostly depends on the input matrix and the underlying architecture. Many researchers have paid more attentions on exploring a variety of optimization techniques to SpMV. One of the most promising respects is how to adapt the storage format to satisfy the underlying architecture. Alterative storage formats can largely lessen memory pressure, however, the computational resources are often underutilized. Therefore, a new storage format, which is called Compressed Sparse Row with Segmented Interleave Combination (SIC), is proposed. Stemming from Compressed Sparse Row format (CSR), SIC format employs an interleave combination pattern that combines certain amount of CSR rows to form a new SIC row. In order to further improve performance, segmented processing is also brought in. According to the empirical data, we also develop an automatic SIC-based SpMV suitable for all the matrices. Experimental results show that our approach outperforms the NVIDIA CSR vector kernel, achieving up to 12.6 × speedup. It also demonstrates a comparable performance with the Hybrid format, even with the highest 2.89 × speedup.

acs ieee international conference on computer systems and applications | 2010

Adaptive audio-aware scheduling in Xen virtual environment

Huacai Chen; Hai Jin; Kan Hu; Minhao Yuan

With the development of client virtualization technology, it has become an important tendency to apply soft real-time applications in virtual environment. Currently, most schedulers in VMM (i.e., virtual machine monitor) take the fairly sharing of processor resources and load balancing as a main concern, while show less regard to application diversity and I/O responsiveness. This would be unable to meet the requirement for latency-sensitive tasks, such as audio application. Audio stream may suffer from severe input buffer overrun or output buffer underrun, especially in the case that there is no real-time guarantee on virtualized clients under heavy load. In this paper, we introduce experiments to illustrate that current scheduler in Xen does a poor job in guaranteeing fluent audio playing, and then formulate the fluent playing conditions a scheduler should satisfy. A scheduling strategy with soft real-time support is proposed to improve the responsiveness of latency-sensitive guests. To implement our proposition, we extend the Credit scheduler by using flexible time slice and real-time priority. Our solution is audio-aware and capable of adjusting the real-time priority of guest domains adaptively, achieving a better experience for end-users. The experimental results show that audio glitches can be completely eliminated via our extended scheduler even when the system load is very high.

international conference on parallel processing | 2010

Dynamic Switching-Frequency Scaling: Scheduling Overcommitted Domains in Xen VMM

Huacai Chen; Hai Jin; Kan Hu; Jian Huang

Virtualization enables multiple guest operating systems run on a single physical platform. These virtual machines may host any types of application, including concurrent HPC programs. Traditionally, VMM schedulers have focused on fairly sharing the processor resources among domains, rarely consider VCPUs’ behaviors. However, this can result in poor application performance to overcommitted domains if there are concurrent programs hosted in them. In this paper we review the properties of both Xen’s Credit and SEDF schedulers, and show how these schedulers may seriously impact the performance of the communication-intensive and I/O-intensive concurrent applications in overcommitted domains. We discuss the origination of the problem theoretically, and confirm the derived conclusion on benchmarks. A novel approach, that dynamically scales the context switching-frequency by selecting variable time slices according to VCPUs` behaviors, is then proposed to improve the Credit scheduler more adaptive for concurrent applications. The experimental results show that this extended Credit scheduler can improve the performance of communication-intensive and I/O-intensive concurrent applications in overcommitted domains to the same magnitude as in undercommitted domains.

ieee international conference on cloud computing technology and science | 2010

Affinity-aware Proportional Share Scheduling for Virtual Machine System

Huacai Chen; Hai Jin; Kan Hu

VM (virtual machine) scheduling is a fundamental topic of virtualization, and fairness is its important design goal. Most VMMs (virtual machine monitor) provide PS (proportional share) schedulers. A PS scheduler assign a weight to every VM to declare the computational resource requirement, and VM is allocated CPU cycles proportional to weight. CPU-affinity is a property of VCPU (virtual CPU) to describe which PCPUs (physical CPU) it can run on. However, current definition of weight does not collaborate well with CPU-affinity. It behaves as if all PCPUs are available to all VMs. We expose the issue by using Xen’s Credit scheduler, and show that CPU cycles can be fairly allocated to VMs in the default free-mapping case (no affinity restriction), but not so in restricted-mapping cases (when affinities are configured). This fairness issue makes it necessary to extend the meaning of weight, so as to reflect the resource requirements in all cases. In this paper we reconcile the meaning of a domain’s weight with CPU-affinity, and improve Credit scheduler as an affinity-aware one. Experimental results show that our affinity-aware proportional share scheduler (named Credit-APS scheduler) brings a good fairness both in free-mapping and restricted-mapping cases. Keywords-Xen, Proportional Share, Credit Scheduler, Free-mapping, Restricted-mapping, APS

parallel and distributed computing: applications and technologies | 2010

XenHVMAcct: Accurate CPU Time Accounting for Hardware-Assisted Virtual Machine

Huacai Chen; Hai Jin; Kan Hu

CPU time accounting is a basis of performance measurement and process scheduling in operating system. Accounting operations are traditionally completed in timer interrupt handler since timer interrupt is periodically delivered to OS. However, when virtualization introduced, the CPU time is shared by multiple virtual CPUs (i.e., VCPU for short) and the virtual timer interrupt is paused for those ones be scheduled out. This makes the time accounting be inaccurate, and we should consider new method for VM to provide a stable and reliable data source, especially for the hardware-assisted virtual machines (i.e., HVM for short) which are not aware of VMM. The key point of accurate CPU time accounting is to distinguish the time allocated to “this VCPU” and “other VCPUs”. Para-virtualization (i.e., PV for short) achieves this goal by modifying the timer handling routines. For HVM, we propose an accurate accounting method (named XenHVMAcct) within Xen virtual platform. XenHVMAcct is designed by using the mechanisms of virtual interrupt and loadable kernel module, without direct modifications to guest OS. Experimental results show that our accounting method is as accurate as the PV solution.

Archive | 2011

High-Quality Sound Device Virtualization in Xen Para-Virtualized Environment

Kan Hu; Wenbo Zhou; Hai Jin; Zhiyuan Shao; Huacai Chen

I/O virtualization plays an important role in client virtualization, and audio performance directly impacts user experience. Solutions for I/O virtualization includes full device emulation, split driver model and direct I/O. Full device emulation supports sound device virtualization but suffers in heavy load environment. Split driver model has better performance but does not support sound devices. Direct I/O has the best I/O performance which is close to native, but it sacrifices fault isolation and device transparency. In order to improve user audio experience in virtual machine, a method is proposed to implement high-quality sound device virtualization based on split driver model. Built on top of Xen, the frontend and backend drivers of sound device are both provided. The test results show that sound device virtualization based on split driver model enhances user audio experience in Xen virtual environment.

2010 Proceedings of the 5th International Conference on Ubiquitous Information Technologies and Applications | 2010

XenMVM: Exploring Potential Performance of Multi-Core System in Virtual Machine Environment

Zhiyuan Shao; Jian Huang; Hai Jin; Kan Hu

In this work, we propose computing resource management system based on Xen VMM and multi-core system, called XenMVM. It adjusts the computing resource dynamically according to the actual workload generated by the applications running in the virtual machine to improve the resource utilization of the computer system. According to the shared L2 cache architecture of multi-core system, we propose the Underlying Layout Aware Scheduling (ULAS) which can schedule the virtual CPUs of virtual machines to the appropriate physical CPUs. The test result shows that ULAS can improve the performance about 4.5%~32.52%. Furthermore, to improve the whole performance of multiple virtual machines and meet the needs of specific jobs (i.e., urgent and I/O intensive) deployed in the virtual machines, a simplified method called Domain-based static priority is adopted in XenMVM. Using case studies we show that our proposal reduces the turnover time of the whole system by 20.45% compared with FCFS scheduling scheme.

parallel, distributed and network-based processing | 2010

FTDS: Adjusting Virtual Computing Resources in Threshing Cases

Jian Huang; Hai Jin; Kan Hu; Zhiyuan Shao

In a virtual execution environment, dynamic computing resource adjustment technique, configuring the computing resource of virtual machines automatically according to the actual loads generated by applications, is often adopted in virtual machine monitor to improve the resource utilization rate. Traditionally, the simple Additive Increase Subtractive Decrease (AISD) scheme is used as an adjusting rule. However, in some special situations, for example, compiling kernel in a virtual machine, the configuration of virtual machines may change abruptly because of the violent vibration of workload during a short interval, and the threshing can inevitably result in additional overhead under AISD rule. In this paper, we extend the Proportional-Integral-Derivative (PID) algorithm and present a feedback control model for configuring virtual computing resources, and propose an innovative adjusting scheme called Forecasting and Time Delayed Subtraction (FTDS) to reduce the overhead caused by threshing. The FTDS uses both statistic history of resource requests and current utilization of computing resource to predict whether threshing happens and to determine how many and when to adjust the amount of virtual CPUs. Experimental results show that FTDS can effectively reduce the jitter occurred in adjusting and make the performance penalty for threshing decreased from 9% to 0.3% compared with AISD, while maintaining that in non-threshing cases the same as AISD.

Future Generation Computer Systems | 2013