Chaochao Feng
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chaochao Feng.
network on chip architectures | 2010
Chaochao Feng; Zhonghai Lu; Axel Jantsch; Jinwen Li; Minxuan Zhang
We propose a reconfigurable fault-tolerant deflection routing algorithm (FTDR) based on reinforcement learning for NoC. The algorithm reconfigures the routing table through a kind of reinforcement learning---Q-learning using 2-hop fault information. It is topology-agnostic and insensitive to the shape of the fault region. In order to reduce the routing table size, we also propose a hierarchical Q-learning based deflection routing algorithm (FTDR-H) with area reduction up to 27% for a switch in an 8 x 8 mesh compared to the original FTDR. Experimental results show that in the presence of faults, FTDR and FTDR-H are better than other fault-tolerant deflection routing algorithms and a turn model based fault-tolerant routing algorithm.
IEEE Transactions on Very Large Scale Integration Systems | 2013
Chaochao Feng; Zhonghai Lu; Axel Jantsch; Minxuan Zhang; Zuocheng Xing
Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and forward error correction link-level error control scheme to handle transient faults and a reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR router. Synthesized results show that, compared with the FTDR router, the FTDR-H router can reduce the area by 27% in an 88 network. Simulation results demonstrate that under synthetic workloads, in the presence of permanent link faults, the throughput of an 8 8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the cost-based deflection routing algorithm, respectively. Under real application workloads, the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN algorithm. For transient faults, the performance of the FTDR router can achieve graceful degradation even at a high fault rate. We also implement the fault-tolerant deflection router which can achieve 400 MHz in TSMC 65-nm technology.
symposium on cloud computing | 2010
Chaochao Feng; Zhonghai Lu; Axel Jantsch; Jinwen Li; Minxuan Zhang
Reliability has become a key issue of Networks-on-Chip (NoC) as the CMOS technology scales down to the nanoscale domain. This paper proposes a Fault-on-Neighbor (FoN) aware deflection routing algorithm for NoC which makes routing decision based on the link status of neighbor switches within 2 hops to avoid fault links and switches. Simulation results demonstrate that in the presence of faults, the saturated throughput of the FoN switch is 13% higher on average than a cost-based deflection switch for 8×8 mesh. The average hop counts can be up to 1.7× less than the cost-based switch. The FoN switch is also synthesized using 65nm TSMC technology and it can work at 500MHz with small area overhead.
ieee computer society annual symposium on vlsi | 2011
Chaochao Feng; Minxuan Zhang; Jinwen Li; Jiang Jiang; Zhonghai Lu; Axel Jantsch
This paper proposes a low-overhead fault-tolerant deflection routing algorithm, which uses a layer routing table and two TSV state vectors to make efficient routing decision to avoid both TSV and horizontal link faults, for 3D NoC. The proposed switch is implemented in hardware with TSMC 65nm technology, which can achieve 250MHz. Compared with a reinforcement-learning-based fault-tolerant deflection switch with a global routing table, the proposed switch occupies 40% less area and consumes 49% less power consumption. Simulation results demonstrate that the proposed switch has 5% less average packet latency than the switch with the global routing table under real application workloads and with only 5% performance degradation under synthetic workloads in the presence of 10% link faults.
international conference on asic | 2011
Chaochao Feng; Jinwen Li; Zhonghai Lu; Axel Jantsch; Minxuan Zhang
In this paper, we propose two novel deflection routing algorithms for de Bruijn and Spidergon NoCs and evaluate the performance of the deflection routing on 5 NoC topologies with different synthetic traffic patterns. We also synthesize the routers in various NoC topologies with TSMC 65nm technology. The evaluation results illustrate that the performance of deflection routing is susceptible to the network topology and traffic pattern. The results can also guide the NoC architect to choose the suitable NoC topology for the specific application.
international conference on solid-state and integrated circuits technology | 2008
Chaochao Feng; Shaoqing Li; Minxuan Zhang
This paper designs a 64-bit floating-point reciprocal and square root reciprocal unit of a stream processor (FT64), which combines the methods of table look-up and functional iteration to implement division and square root operations. This unit which is implemented with two pipeline stages provides the initial value for the iteration of division and square root. A semi-custom and full-custom mixed design method is adopted to improve its performance, and a mixed verification method is also proposed to verify the unit. The results of verification show that the unit can achieve the performance of 1 GHz under the typical condition of 0.13 ¿m CMOS technology.
international conference on asic | 2015
Chaochao Feng; Zhuofan Liao; Zhonghai Lu; Axel Jantsch; Zhenyu Zhao
In general, the bufferless NoC router has only one local output port for ejection, which may lead to multiple arriving flits competing for the only one output port. In this paper, we propose a reconfigurable bufferless router in which the number of ejection ports can be configured as 2, 3 and 4. Simulation results demonstrate that the average packet latency of the routers with multi-ejection ports is 18%, 10%, 6%, 14%, 9% and 7% on average less than that of the router with 1 ejection ports under six synthetic workloads respectively. For application workloads, the average packet latency of the router with more than two ejection ports is slightly better than the router with only one ejection port, which can be neglect. Making a compromise of hardware cost and performance, it can be concluded that it is no need to implement bufferless routers with 3 and 4 ejection ports, as the router with 2 ejection ports can achieve almost the same performance as the routers with 3 and 4 ejection ports.
ieee international conference on solid-state and integrated circuit technology | 2012
Anwen Huang; Jun Gao; Jiang Jiang; Chaochao Feng; Minxuan Zhang
Aiming at the challenge of latency reduction in large distributed cache in tiled chip multiprocessors, this paper presents an adaptive replication mechanism based on victim filter and target detection. Not only the characteristic of memory access to the hot block is considered, but also the negative impact of victim replication upon the local hit rate is taken into account at the granularity of a cache set. Simulation results using a fully system simulator demonstrate that the proposed mechanism outperforms the baseline shared non-uniform cache architecture for the multi-thread benchmark programs, while the hardware overhead is negligible.
international conference on asic | 2007
Xun Chen; Chaochao Feng; Yanning Wang; Shaoqing Li; Minxuan Zhang
Two new high fan-in logics, grouped domino logic and complementary boost logic (CBL), are proposed to overcome the problem of circuit speed and noise immunity. Grouped domino logic not only speeds up the circuit, but also simplifies the design of keeper. CBL develops from source following evaluation gate (SFEG), invite complementary boost logic to speed up the circuit. In 0.13 um technology, we compare the grouped domino logic and CBL with traditional domino logic, result shows that the parameter of grouped domino logic is better than traditional domino logic, and for CBL, the unity noise gain (UNG) of CBL is about 90% of traditional domino logic whose keeper ratio is 1, but the speed of CBL is 3 times faster than domino logic.
ACM Computing Surveys | 2013
Martin Radetzki; Chaochao Feng; Xueqian Zhao; Axel Jantsch