Chenchen Fu
City University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chenchen Fu.
international symposium on low power electronics and design | 2014
Chenchen Fu; Mengying Zhao; Chun Jason Xue; Alex Orailoglu
Energy consumption of memories is always a significant issue for computing systems. Recently, hybrid PRAM and DRAM memory architectures have been proposed. It combines the advantages of DRAM and PRAM, such as low leakage power in PRAM and short write latency in DRAM. However, the leakage power in DRAM is still considerable in hybrid memories. The leakage power can only be reduced by turning DRAM into sleep state. In this paper, a novel proximity concept is proposed to guide the variable partitioning to maximize the possibility of turning DRAM into sleep mode. A novel Sleep-Aware Variable Partition Algorithm (SAVPA) is then proposed with the objective of maximizing the sleep time of DRAM while satisfying the performance and endurance constraints. The experiment results show that SAVPA reduces the energy consumption by 11.25% in average (up to 15.84%) compared to the state-of-art work with simple sleep technique.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2014
Keni Qiu; Mengying Zhao; Qingan Li; Chenchen Fu; Chun Jason Xue
Recently hybrid cache architecture consisting of both spin-transfer torque RAM (STT-RAM) and SRAM has been proposed for energy efficiency. In hybrid caches, migration-based techniques have been proposed. A migration technique dynamically moves write-intensive and read-intensive data between STT-RAM and SRAM to explore the advantages of hybrid cache. Meanwhile, migrations also introduce extra reads and writes during data movements. For stencil loops with read and write data dependencies, we observe that migration overhead is significant, and migrations closely correlate to the interleaved read and write memory access pattern in a memory block. This paper proposes a loop retiming framework during compilation to reduce the migration overhead by changing the interleaved memory access pattern. With the proposed loop retiming technique, the interleaved memory accesses can be significantly reduced so that migration overhead is mitigated, and energy efficiency of hybrid cache is significantly improved. The experimental results have shown that, with the proposed methods, on average, the migration number is reduced up to 27.1% and the cache dynamic energy is reduced up to 14.0%.
design, automation, and test in europe | 2015
Chenchen Fu; Yingchao Zhao; Minming Li; Chun Jason Xue
Reducing energy consumption is a critical problem in most of the computing systems today. This paper focuses on reducing the energy consumption of the shared main memory in multi-core processors by putting it into sleep state when all the cores are idle. Based on this idea, this work presents systematic analysis of different assignment and scheduling models and proposes a series of scheduling schemes to maximize the common idle time of all cores. An optimal scheduling scheme is proposed assuming the number of cores is unbounded. When the number of cores is bounded, an efficient heuristic algorithm is proposed. The experimental results show that the heuristic algorithm works efficiently and can save as much as 25.6% memory energy compared to a conventional multi-core scheduling scheme.
application specific systems architectures and processors | 2013
Keni Qiu; Mengying Zhao; Chenchen Fu; Liang Shi; Chun Jason Xue
In hybrid cache architecture consisting of both STT-RAM and SRAM, migration based techniques have been proposed. The migration technique dynamically moves write-intensive and read-intensive data between STT-RAM and SRAM to explore the advantage of hybrid cache. Meanwhile, migrations induce extra read and write overhead during data movements. For loops with intensive data array operations, we observe that migration overhead is significant and migrations closely correlate to the interleaved read and write access pattern in a memory block. This paper proposes a loop retiming framework to reduce the migration overhead by changing the interleaved memory access pattern. The experimental results show that with the proposed method, migrations are significantly reduced without any hardware modification. As a result, energy efficiency and performance of hybrid cache can be improved.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2017
Mengying Zhao; Chenchen Fu; Zewei Li; Qingan Li; Mimi Xie; Yongpan Liu; Jingtong Hu; Zhiping Jia; Chun Jason Xue
Wearable devices gain increasing popularity since they can collect important information for healthcare and well-being purposes. Compared with battery, energy harvesting is a better power source for these wearable devices due to many advantages. However, harvested energy is naturally unstable and program execution will be interrupted frequently. Nonvolatile processors demonstrate promising advantages to back up volatile state before the system energy is depleted. However, it also introduces non-negligible energy and area overhead. In this paper, we aim to reduce the amount of data that need to be backed up during a power failure. Based on the observation that stack size varies along program execution, we propose to analyze the application program and identify efficient backup positions, by which the stack content to back up can be significantly reduced. The evaluation results show an average of 45.7% reduction on nonvolatile stack size for stack backup, with 0.58% storage overhead. In the mean time, with the proposed schemes, the energy utilization and program forward progress can be greatly improved compared with instant backup.
real time systems symposium | 2016
Chenchen Fu; Gruia Calinescu; Kai Wang; Minming Li; Chun Jason Xue
The rapid development of the Internet of Things (IoT) has increased the requirement on the processing capabilities of sensors, mobile phones and smart devices. Meanwhile, energy efficiency techniques are in desperate need as most devices in the IoT systems are battery powered. Following the above two trends, this work explores the memory system energy efficiency for a general multi-core architecture. This architecture integrates a local memory in each processing core, with a large off-chip memory shared among multiple cores. Decisions need to be made on whether tasks will be executed with the shared memory or the local memory to minimize the total energy consumption within real-time constraints. This paper proposes optimal schemes as well as a polynomial-time approximation algorithm with constant ratio. The complexity analysis of the problem for different task and system models is also presented. Experimental results show that the proposed approximation algorithm performs close to the optimal solution in average.
ifip ieee international conference on very large scale integration | 2013
Keni Qiu; Mengying Zhao; Chenchen Fu; Chun Jason Xue
Cache locking is a cache management technique to preclude the replacement of locked contents. Recently, instruction cache locking has been applied to improve average-case execution time (ACET). However, we observe that the prior instruction cache locking method shows very limited performance improve-ment for data cache. The main reason lies in that, data access similarity in data memory blocks is weaker than that in code memory blocks. This paper proposes a data re-allocation enabled cache locking approach which can significantly enhance locking efficiency for data cache and thus improve system performance. The experimental results show that with the proposed approach, on average, the miss rate is reduced by 9.1% and execution cycles are reduced by 9.4% across a suite of benchmarks.
embedded and real-time computing systems and applications | 2017
Yu Liang; Chenchen Fu; Yajuan Du; Aosong Deng; Mengying Zhao; Liang Shi; Chun Jason Xue
Flash Friendly File System (F2FS) is getting popular among mobile devices. However, lack of empirical and comprehensive analysis for characteristics of F2FS prohibits better application of F2FS. In this paper, we present a set of comprehensive experimental studies on mobile devices and show several counterintuitive observations on F2FS, including imprecise hot/cold data separation, unexpected trigger condition of background GC, impact of fragmentation on read performance and impact of readahead by fragments and available space. Based on these observations, we further provide several pilot solutions to improve the performance of these mobile devices. The objective is to inspire researchers and users to pay attention to F2FS characteristics, and further optimize its performance.
IEEE Transactions on Very Large Scale Integration Systems | 2017
Chenchen Fu; Yingchao Zhao; Minming Li; Chun Jason Xue
Nowadays, memory energy reduction attracts significant attention as main memory consumes large amount of energy among all the energy consuming components. This paper focuses on reducing the energy consumption of the shared main memory in multicore processors by putting the memory into sleep state when all cores are idle. Based on this idea, we present systematic analysis of different models and propose a series of scheduling schemes to maximize the common idle time of all cores. The target problem is classified into two cases based on whether task migration is allowed or not among cores. Considering task migration, an optimal scheduling scheme is proposed, assuming the number of cores is unbounded. When the number of cores is bounded, an integer linear programming formulation and two efficient heuristic algorithms are proposed. When task migration is not allowed, we first prove the NP-hardness of the problem, and then propose the optimal solutions when task partitions are given in advance. The energy overhead caused by transitions between active and sleep modes of the memory is analyzed. The experimental results show that the heuristic algorithms work efficiently and can save 7.25% and 11.71% system energy, respectively, with 1-GB memory, compared with an energy-efficient multicore scheduling scheme. Larger energy reduction can be further achieved with larger size of memory.
design, automation, and test in europe | 2015
Chenchen Fu; Minming Li; Chun Jason Xue
Reducing energy consumption is a critical problem in most of the computing systems today. Among all the computing system components, processor and memory are two significant energy consumers. Dynamic voltage scaling is typically applied to reduce processor energy while sleep mode is usually injected to trim memory’s leakage energy. However, in the memory architecture with multiple cores sharing memory, in order to optimize the system-wide energy, these two classic techniques are difficult to be directly combined due to the complicated interactions. In this work, we explore the coordination of the multiple cores and the memory, and present systematic analysis for minimizing the system-wide energy based on different system models and task models. For tasks with common release time, optimal schemes are presented for the systems both with and without considering the static power of the cores. For agreeable deadline tasks, different dynamic programming-based optimal solutions are proposed for negligible and non-negligible static power of cores. For the general task model, this paper proposes a heuristic online algorithm. Furthermore, the scheme is extended to handle the problem when the transition overhead between the active and sleep modes is considered. The optimality of the proposed schemes for common release time and agreeable deadline tasks are proved. The validity of the proposed heuristic scheme is evaluated through experiments. Experimental results confirm the superiority of the heuristic scheme in terms of the energy saving improvement compared to the most related existing work.