Lihan Ju
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lihan Ju.
computer and information technology | 2010
Jie Yang; Like Yan; Lihan Ju; Yuan Wen; Shaobin Zhang; Tianzhou Chen
Reconfigurable computing based on FPGAs (Field Programmable Gate Arrays) has been a promising solution to improve the performance with high flexibility. However, the physical capacity limitation of FPGAs prevents its wide adoption in real world. In this paper, a homogeneous NoC-based FPGA architecture is proposed, in which reconfigurable and I/O resources are interconnected via NoC so that reconfigurable modules can be placed anywhere once enough space available. Meanwhile, a virtual FPGA is proposed with which over large circuit can be implemented on a limited capacity FPGA. The experiment verified that our approach can provide more flexible reconfiguration, and combing NOC on FPGA, the resource utilization increased within 44.7%-53.5% because of the fragment in CRs benefit from such kind of dynamic partial configuration.
international conference on green computing and communications | 2011
Yuan Wen; Xingsheng Tang; Lihan Ju; Tianzhou Chen
Regular expression is an important approach which is widely used in string pattern matching. And in many pragmatic applications string pattern matching is the most compute intensive task which takes majority processing time, therefore, in order to improve system efficiency many works have been done around hardware implementation of regular expression matching. However, the traditional design approaches pay more attention on the implementation methods as well as their efficiency than the power consumption. In this paper we provide a power efficient regular expression matching architecture (PeRex). By taking full use of both rising and trailing edges of the system clocks such architecture is able to match two characters in a single system cycle. So, by maintaining the high performance and throughput the architecture in this paper is able to work in a lower clock frequency, consequently it will decrease the dynamic power consumption remarkably. Analyzed by XPower, which offered by Xilinx Inc., our approach is able to save dynamic power consumption by1.7 times comparing to traditional approaches on Virtex-V XC5VLX30 FPGA device.
computer and information technology | 2010
Wei Hu; Jianliang Ma; Binbin Wu; Lihan Ju; Tianzhou Chan
Network on Chip (NoC) is proposed as a promising solution for processors with many cores integrated onto a single chip. The main advantages of NoC are favorable scalability and high bandwidth for on-chip cores and communications. However, OS designed for NoC have not been fully researched to date. Because the microkernel operating system is composed of modules, such architecture is suitable to execute on many-core architecture. In this paper, a methodology is proposed to design and implement a microkernel-based OS on NoC. The OS is divided into modules and distributed onto the whole network using the NoC communication fabric. MINIX 3 has been extended to implement a prototype of the OS. Simulation results for real applications demonstrate that the mapping approach affects performance hugely, with the best mapping outperforming the worst with up to 43.2% reduction in average latency.
international conference on algorithms and architectures for parallel processing | 2010
Tianzhou Chen; Xingsheng Tang; Jianliang Ma; Lihan Ju; Guanjun Jiang; Qingsong Shi
CMP brings more benefits comparing with uni-core processor, but CMP is not fit for legacy code well because legacy code bases on uni-core processor. This paper presents a novel Thread Level Parallelism technology called Dataflow Abstracting Thread (DFAT). DFAT builds a United Dependence Graph (UDG) for the program and decouples single thread into many threads which can run on CMP parallelly. DFAT analyzes the programs data-, control- and anti-dependence and gets a dependence graph, then dependences are combined and be added some attributes to get a UDG. The UDG decides instructions execution order, and according to this, instructions can be assigned to different thread one by one. An algorithm decides how to assign those instructions. DFAT considers both communication overhead and thread balance after the original thread division. Thread communication in DFAT is implemented by producer-consumer model. DFAT can automatically abstract multi-thread from single thread and be implemented in compiler. In our evaluation, we decouple single thread into at most 8 threads with DFAT and the result shows that decoupling single thread into 4-6 threads can get best benefits.
green computing and communications | 2010
Lihan Ju; Wei Hu; Lingxiang Xiang; Tianzhou Chen
With the development of semi-conductor industry, more transistors can be integrated onto a single chip. But the software programming model cannot fit the parallelism requirement of CMP (Chip Multi Processor) based architecture. The communication between different cores becomes a very serious problem, and it made bad effectiveness on performance. This paper proposes an approach called API (Architecture of Parallelism on Instructions) which can scan the source code of the programs, analyze the data dependency, and cluster retentive instructions together. The instructions without dependency can be issued directly in parallel by different cores. API provides a global register file for the effective execution of the programs on CMP chips. We have also evaluated the time consuming comparison between API and the traditional architecture in our experiments by using SPEC benchmark CPU2000. The experimental results show that the instruction clock in API is only 49 percent of original instruction clocks. Moreover, there only need 4 cores to approach the best performance.
computer and information technology | 2010
Chunhao Wang; Lihan Ju; Di Wu; Lingxiang Xiang; Wei Hu; Tianzhou Chen
Executing sequential program on multi-core is crucial for accommodating instruction level parallelism (ILP) in chip multiprocessor (CMP) architecture. One widely used method of steering instructions across cores is based on dependency. However, this method requires a sophisticated steering mechanism and brings much hardware complexity and area overhead. This paper presents the Global Register Alias Table (GRAT), a structure which can be used in CMP architecture to facilitate sequential program execution across cores. The GRAT also reduces the area and complexity for steering instructions greatly without introducing additional programming effort and compiler support. In our evaluation, the result shows that our work performs within 5.9% of Core Fusion, a recent proposal which requires a complex steering unit.
Archive | 2009
Qingsong Shi; Degui Feng; Man Cao; Jianliang Ma; Lihan Ju; Chao Wang; Gang Wang; Jingwei Liu; Wei Hu; Tianzhou Chen
Archive | 2008
Tianzhou Chen; Wei Hu; Tiefei Zhang; Like Yan; Bin Xie; Jian Chen; Du Chen; Changbin Huang; Lihan Ju; Feng Sha
Archive | 2011
Binbin Wu; Lihan Ju; Qingsong Shi; Du Chen; Wei Hu; Man Cao; Jianliang Ma; Tianzhou Chen; Chao Wang
Archive | 2008
Tianzhou Chen; Nan Zhang; Like Yan; Bin Xie; Tiefei Zhang; Changbin Huang; Wei Ma; Lihan Ju; Jian Chen; Degui Feng