Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Binbin Wu is active.

Publication


Featured researches published by Binbin Wu.


computer and information technology | 2010

A Reconfigurable Processor Architecture Combining Multi-core and Reconfigurable Processing Unit

Like Yan; Binbin Wu; Yuan Wen; Shaobin Zhang; Tianzhou Chen

It’s a promising way to improve performance significantly by adding reconfigurable processing unit to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is proposed. The Reconfigurable Logic is logically divided into Reconfigurable Processing Units (RPUs), which are coupled with General Purpose Cores (GPCs) as co-processors via a configurable full crossbar switch. And a RPU-Manager (RPU-M) is designed to manage the RPUs. To verify RMC, a simulation methodology based on the Simics and Virtex 5 FPGA is adopted, which simplifies the simulation and assures the accuracy of the hardware function core. The experimental results of workloads 3-DES, AES and JPEG_ENC show a 2.34X average speedup over software implementation, while the data and control transfer overhead is acceptable.


international conference on scalable computing and communications | 2009

CMP Thread Assignment Based on Group Sharing L2 Cache

Guanjun Jiang; Du Chen; Binbin Wu; Yi Zhao; Tianzhou Chen; Jingwei Liu

With the development of electric technology, uniprocessor is being substituted by CMP (chip multi processors). CMP can run multi-thread program efficiently, many researchers engage in the study of multi-thread, including extracting multi-thread from legacy single thread program and making standardization for future multi-thread program. Data communication among threads is an inevitable problem in multi-thread program, and efficient data sharing is an important aspect for program performance. But researchers focus data sharing on memory organization and relationship among threads, there is little attention for intra-processor. In this paper, we develop a thread assignment method for group sharing L2 cache architecture according to the data relationship among threads. We allocate some threads to some cores and some threads others. In our experiment, we simulates four threads with different degree data sharing and running in four cores CMP, whose cores is divided into two groups. Comparing some program execution tracks, we find that the main difference between two simulations is the hit rate of L2 cache and thread assignment brings 6.25% running time improvement. The L2 cache hit rate is 91.0% and 87.1% with thread assignment our proposed, but the L2 cache hit rate is 77.0% and 75.4% with random thread assignment. It descends 14.0% and 11.7% for each group.


symposium on cloud computing | 2010

Run-time configuration prefetching to reduce the overhead of dynamically reconfiguration

Binbin Wu; Like Yan; Yuan Wen; Tianzhou Chen

Reconfigurable computing is a promising approach with both flexibility and efficiency for high performance computing. However, the overhead of run-time reconfiguration affects the performance severely. In the paper, we develop a simple and effective method to hide the latency of configuration by configuration prefetching at runtime. A simulation platform based on Simics is developed for evaluation. The experimental results show that the predictive accuracy is rather high, the hit rate of reconfigurable processing unit is increased by 24.6%∼53.7% when reconfigurable resource is not adequate.


computer and information technology | 2010

Distributed On-Chip Operating System for Network on Chip

Wei Hu; Jianliang Ma; Binbin Wu; Lihan Ju; Tianzhou Chan

Network on Chip (NoC) is proposed as a promising solution for processors with many cores integrated onto a single chip. The main advantages of NoC are favorable scalability and high bandwidth for on-chip cores and communications. However, OS designed for NoC have not been fully researched to date. Because the microkernel operating system is composed of modules, such architecture is suitable to execute on many-core architecture. In this paper, a methodology is proposed to design and implement a microkernel-based OS on NoC. The OS is divided into modules and distributed onto the whole network using the NoC communication fabric. MINIX 3 has been extended to implement a prototype of the OS. Simulation results for real applications demonstrate that the mapping approach affects performance hugely, with the best mapping outperforming the worst with up to 43.2% reduction in average latency.


computer and information technology | 2010

Virtual I/O Based on ScratchPad Memory for Embedded System

Binbin Wu; Xiangsheng Tang; Hui Yuan; Qingsong Shi; Jiexiang Kang; Tianzhou Chen

Scratchpad memory (SPM) is software-controlled on-chip SRAM memory and widely used in embedded processors to meet the strict requirements on performance, energy consumption and real-time response of embedded systems. This paper proposes an SPM based I/O approach, called Virtual I/O based on SPM (SPMIO), to accelerate the I/O access time efficiently through SPM. Different I/O requests are buffered and scheduled on SPM through SPMIO with the help of ScratchPad Memory Operating System (SPMOS). SPMIO will provide the direct datapath to CPU for I/O accesses, reduce delays and lower the power consumption. The experimental results show that SPMIO is efficient and practical.


computer and information technology | 2010

Network Main Memory Architecture for NoC-Based Chips

Xingsheng Tang; Binbin Wu; Tianzhou Chen; Wei Hu; Jiexiang Kang; Zhenwei Zheng

Network on Chip (NoC) is considered to be the best candidate for future on-chip communication; however, with the increase in the number of on-chip processors, the simultaneous memory accesses of these processors can cause serious main memory bottleneck problem. In this study, we have proposed the concept of Network Main Memory (NMM). NMM has distributed network architecture for main memory and multicommunication channels to NoC chips, which can overcome the main memory bottleneck problem. When compared with traditional memory, the bandwidth of NMM can be sufficiently used owing to the network architecture, and it is convenient to increase the memory bandwidth. Our experimental results on simulator show that our NMM can provide better traffic for NoCs. In addition, management of NMM as well as the software model for NoC chips and NMM have also been discussed.


international conference on scalable computing and communications | 2009

The Implementation of a Mobile Java Debug Tool

Degui Feng; Jian Chen; Like Yan; Binbin Wu; Xueqing Lou; Tianzhou Chen

As the handsets integrated the J2ME environment is increasing in recent time. After PhoneME which is one implementation of the J2ME had become an open project, transplanting it to many different platforms become a hotspot for some time. There are some implementations for the online debugging between the PC and specific embedded device. But there isn’t a well-designed architecture for the debugging processes between the PC and the mobile phone, there just some assistance tools for co-debug. As the mobile phone industry developing, there are many smart phones with affluent hardware resource. This makes the co-debug between PC and smart mobile phone become possible. In this paper, we present a debug framework for the JAVA application development platform across the PC and mobile phone. With this framework, we can develop the MIDlets for specific embedded device more convenient.


Telecommunication Systems | 2014

A reconfigurable processor architecture combining multi-core and reconfigurable processing units

Like Yan; Binbin Wu; Yuan Wen; Shaobin Zhang; Tianzhou Chen

It’s a promising way to improve performance significantly by adding reconfigurable processing unit (RPU) to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is proposed. Reconfigurable logic is separated into RPUs logically, which are coupled with general purpose cores as co-processors via a full crossbar switch. An RPU Manager (RPU-M) is also designed to manage RPUs. To verify RMC, a simulation method based on the Simics and Virtex 5 FPGA is adopted, which simplifies the simulation and assures the evaluation accuracy of hardware function cores. Five workloads are selected to test RMC, including 3-DES, AES, SHA2, IDCT and JPEG_ENC. The experimental results show a 3.10 times average speedup over software implementation on the original multi-core, and the data and control communication overhead on RMC is acceptable.


computer and information technology | 2010

A Bypass Optimization Method for Network on Chip

Wei Hu; Binbin Wu; Bin Xie; Tianzhou Chen; Lianghua Miao

Network-on-Chip (NoC) is proposed to solve the communication bottleneck for multi-core SoC. Performance is one of the most critical feature of the NoC. Many different approaches have been introduced to improve the performance of NoC. However, most of them focus on the network part of NoC architecture and neglect other important parts of the system, especially the processor core part. This paper proposes a new architecture: a transmission bypass framework. It adds the bypass path behind the EX stage, an execution stage of the instruction pipeline, to transmit the intermediate results and save transmission time. Experimental results show that when cache misses occur, the performance of send and receive operations can be improved by 15%–38%. The performance of Splash-2 applications can be improved by 28% at most.


international conference on scalable computing and communications | 2009

The Design and Implementation of Adaptive Reconfigurable Computing Array

Binbin Wu; Like Yan; Degui Feng; Tianzhou Chen

More and more reconfigurable devices have been used to accelerate specific computation in traditional computing systems. But isolate reconfigurable system has some shortcomings such as limited computation ability, low utilization of reconfigurable devices. In this paper, a networked adaptive array of reconfigurable computing nodes was proposed, which is composed of host and reconfigurable devices. Via sharing reconfigurable resources among nodes in the system, the ability of computation of one node is enhanced and the utilization ratio of reconfigurable resources is increased. The experiment result shows that about 19.5%-48.2% execution time could be reduced by using 2-5 nodes in the array comparing with a single node for heavy workload.

Collaboration


Dive into the Binbin Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wei Hu

Zhejiang University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge