Xiaotao Chang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaotao Chang is active.

Explore More

Publication

Featured researches published by Xiaotao Chang.

computing frontiers | 2014

Enabling FPGAs in the cloud

Fei Chen; Yi Shan; Yu Zhang; Yu Wang; Hubertus Franke; Xiaotao Chang; Kun Wang

Cloud computing is becoming a major trend for delivering and accessing infrastructure on demand via the network. Meanwhile, the usage of FPGAs (Field Programmable Gate Arrays) for computation acceleration has made significant inroads into multiple application domains due to their ability to achieve high throughput and predictable latency, while providing programmability, low power consumption and time-to-value. Many types of workloads, e.g. databases, big data analytics, and high performance computing, can be and have been accelerated by FPGAs. As more and more workloads are being deployed in the cloud, it is appropriate to consider how to make FPGAs and their capabilities available in the cloud. However, such integration is non-trivial due to issues related to FPGA resource abstraction and sharing, compatibility with applications and accelerator logics, and security, among others. In this paper, a general framework for integrating FPGAs into the cloud is proposed and a prototype of the framework is implemented based on OpenStack, Linux-KVM and Xilinx FPGAs. The prototype enables isolation between multiple processes in multiple VMs, precise quantitative acceleration resource allocation, and priority-based workload scheduling. Experimental results demonstrate the effectiveness of this prototype, an acceptable overhead, and good scalability when hosting multiple VMs and processes.

high-performance computer architecture | 2011

Efficient data streaming with on-chip accelerators: Opportunities and challenges

Rui Hou; Lixin Zhang; Michael C. Huang; Kun Wang; Hubertus Franke; Yi Ge; Xiaotao Chang

The transistor density of microprocessors continues to increase as technology scales. Microprocessors designers have taken advantage of the increased transistors by integrating a significant number of cores onto a single die. However, a large number of cores are met with diminishing returns due to software and hardware scalability issues and hence designers have started integrating on-chip special-purpose logic units (i.e., accelerators) that were previously available as PCI-attached units. It is anticipated that more accelerators will be integrated on-chip due to the increasing abundance of transistors and the fact that not all logic can be powered at all times due to power budget limits. Thus, on-chip accelerator architectures deserve more attention from the research community. There is a wide spectrum of research opportunities for design and optimization of accelerators. This paper attempts to bring out some insights by studying the data access streams of on-chip accelerators that hopefully foster some future research in this area. Specifically, this paper uses a few simple case studies to show some of the common characteristics of the data streams introduced by on-chip accelerators, discusses challenges and opportunities in exploiting these characteristics to optimize the power and performance of accelerators, and then analyzes the effectiveness of some simple optimizing extensions proposed.

international symposium on computer architecture | 2013

Improving virtualization in the presence of software managed translation lookaside buffers

Xiaotao Chang; Hubertus Franke; Yi Ge; Tao Liu; Kun Wang; Jimi Xenidis; Fei Chen; Yu Zhang

Virtualization has become an important technology that is used across many platforms, particularly servers, to increase utilization, multi-tenancy and security. Virtualization introduces additional overhead that often relates to memory management, interrupt handling and hypervisor mode switching. Among those, memory management and translation lookaside buffer (TLB) management have been shown to have a significant impact on the performance of systems. Two principal mechanisms for TLB management exist in todays systems, namely software and hardware managed TLBs. In this paper, we analyze and quantify the overhead of a pure software virtualization that is implemented over a software managed TLB. We then describe our design of hardware extensions to support virtualization in systems with software managed TLBs to remove the most dominant overheads. These extensions were implemented in the Power embedded A2 core, which is used in the PowerEN and in the Blue Gene/Q processors. They were used to implement a KVM port. We evaluate each of these hardware extensions to determine their overall contributions to performance and efficiency. Collectively these extensions demonstrate an average improvement of 232% over a pure software implementation.

distributed event-based systems | 2012

Pub/Sub on stream: a multi-core based message broker with QoS support

Zhaoran Wang; Yu Zhang; Xiaotao Chang; Xiang Mi; Yu Wang; Kun Wang; Huazhong Yang

Publish/Subscribe (Pub/Sub) is becoming an increasingly popular message delivery technique in the Internet of Things (IoT) era. However, classical Publish/Subscribe is not suitable for some emerging IoT applications such as smart grid, transportation and sensor/actuator applications due to its lack of QoS capability. To meet the requirements for QoS in IoT message delivery, in this paper we propose the first Publish/Subscribe message broker with the ability to actively schedule computation resources to guarantee QoS requirements. We abstract the message matching algorithm into a task graph to express the data flow, forming a task-based stream matching framework. Based on the framework, we explore a message dispatching algorithm called Smart Dispatch and a task scheduling algorithm called DFGS to guarantee different QoS requirements. Experiments show that, the QoS-aware system can support more than 10x throughput than QoS-ignorant systems in representative Smart Grid cases. Also, our system shows near-linear scalability on a commodity multi-core machine.

design, automation, and test in europe | 2011

Optimization of stateful hardware acceleration in hybrid architectures

Xiaotao Chang; Yike Ma; Hubertus Franke; Kun Wang; Rui Hou; Hao Yu; Terry Nelms

In many computing domains, hardware accelerators can improve throughput and lower power consumption, instead of executing functionally equivalent software on the general-purpose micro-processors cores. While hardware accelerators often are stateless, network processing exemplifies the need for stateful hardware acceleration. The packet oriented streaming nature of current networks enables data processing as soon as packets arrive rather than when the data of the whole network flow is available. Due to the concurrence of many flows, an accelerator must maintain and switch contexts between many states of the various accelerated streams embodied in the flows, which increases overhead associated with acceleration. We propose and evaluate dynamic reordering of requests of different accelerated streams in a hybrid on-chip/memory based request queue in order to reduce the associated overhead.

Archive | 2013