Lihan Ju | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lihan Ju is active.

Explore More

Publication

Featured researches published by Lihan Ju.

computer and information technology | 2010

Homogeneous NoC-based FPGA: The Foundation for Virtual FPGA

Jie Yang; Like Yan; Lihan Ju; Yuan Wen; Shaobin Zhang; Tianzhou Chen

Reconfigurable computing based on FPGAs (Field Programmable Gate Arrays) has been a promising solution to improve the performance with high flexibility. However, the physical capacity limitation of FPGAs prevents its wide adoption in real world. In this paper, a homogeneous NoC-based FPGA architecture is proposed, in which reconfigurable and I/O resources are interconnected via NoC so that reconfigurable modules can be placed anywhere once enough space available. Meanwhile, a virtual FPGA is proposed with which over large circuit can be implemented on a limited capacity FPGA. The experiment verified that our approach can provide more flexible reconfiguration, and combing NOC on FPGA, the resource utilization increased within 44.7%-53.5% because of the fragment in CRs benefit from such kind of dynamic partial configuration.

international conference on green computing and communications | 2011

PeRex: A Power Efficient FPGA-based Architecture for Regular Expression Matching

Yuan Wen; Xingsheng Tang; Lihan Ju; Tianzhou Chen

Regular expression is an important approach which is widely used in string pattern matching. And in many pragmatic applications string pattern matching is the most compute intensive task which takes majority processing time, therefore, in order to improve system efficiency many works have been done around hardware implementation of regular expression matching. However, the traditional design approaches pay more attention on the implementation methods as well as their efficiency than the power consumption. In this paper we provide a power efficient regular expression matching architecture (PeRex). By taking full use of both rising and trailing edges of the system clocks such architecture is able to match two characters in a single system cycle. So, by maintaining the high performance and throughput the architecture in this paper is able to work in a lower clock frequency, consequently it will decrease the dynamic power consumption remarkably. Analyzed by XPower, which offered by Xilinx Inc., our approach is able to save dynamic power consumption by1.7 times comparing to traditional approaches on Virtex-V XC5VLX30 FPGA device.

computer and information technology | 2010

Distributed On-Chip Operating System for Network on Chip

Wei Hu; Jianliang Ma; Binbin Wu; Lihan Ju; Tianzhou Chan

Network on Chip (NoC) is proposed as a promising solution for processors with many cores integrated onto a single chip. The main advantages of NoC are favorable scalability and high bandwidth for on-chip cores and communications. However, OS designed for NoC have not been fully researched to date. Because the microkernel operating system is composed of modules, such architecture is suitable to execute on many-core architecture. In this paper, a methodology is proposed to design and implement a microkernel-based OS on NoC. The OS is divided into modules and distributed onto the whole network using the NoC communication fabric. MINIX 3 has been extended to implement a prototype of the OS. Simulation results for real applications demonstrate that the mapping approach affects performance hugely, with the best mapping outperforming the worst with up to 43.2% reduction in average latency.

international conference on algorithms and architectures for parallel processing | 2010

Single thread program parallelism with dataflow abstracting thread

Tianzhou Chen; Xingsheng Tang; Jianliang Ma; Lihan Ju; Guanjun Jiang; Qingsong Shi

CMP brings more benefits comparing with uni-core processor, but CMP is not fit for legacy code well because legacy code bases on uni-core processor. This paper presents a novel Thread Level Parallelism technology called Dataflow Abstracting Thread (DFAT). DFAT builds a United Dependence Graph (UDG) for the program and decouples single thread into many threads which can run on CMP parallelly. DFAT analyzes the programs data-, control- and anti-dependence and gets a dependence graph, then dependences are combined and be added some attributes to get a UDG. The UDG decides instructions execution order, and according to this, instructions can be assigned to different thread one by one. An algorithm decides how to assign those instructions. DFAT considers both communication overhead and thread balance after the original thread division. Thread communication in DFAT is implemented by producer-consumer model. DFAT can automatically abstract multi-thread from single thread and be implemented in compiler. In our evaluation, we decouple single thread into at most 8 threads with DFAT and the result shows that decoupling single thread into 4-6 threads can get best benefits.

green computing and communications | 2010

Shared Register File Based ILP for Multicore

Lihan Ju; Wei Hu; Lingxiang Xiang; Tianzhou Chen

With the development of semi-conductor industry, more transistors can be integrated onto a single chip. But the software programming model cannot fit the parallelism requirement of CMP (Chip Multi Processor) based architecture. The communication between different cores becomes a very serious problem, and it made bad effectiveness on performance. This paper proposes an approach called API (Architecture of Parallelism on Instructions) which can scan the source code of the programs, analyze the data dependency, and cluster retentive instructions together. The instructions without dependency can be issued directly in parallel by different cores. API provides a global register file for the effective execution of the programs on CMP chips. We have also evaluated the time consuming comparison between API and the traditional architecture in our experiments by using SPEC benchmark CPU2000. The experimental results show that the instruction clock in API is only 49 percent of original instruction clocks. Moreover, there only need 4 cores to approach the best performance.

computer and information technology | 2010

Global Register Alias Table: Executing Sequential Program on Multi-Core

Chunhao Wang; Lihan Ju; Di Wu; Lingxiang Xiang; Wei Hu; Tianzhou Chen

Executing sequential program on multi-core is crucial for accommodating instruction level parallelism (ILP) in chip multiprocessor (CMP) architecture. One widely used method of steering instructions across cores is based on dependency. However, this method requires a sophisticated steering mechanism and brings much hardware complexity and area overhead. This paper presents the Global Register Alias Table (GRAT), a structure which can be used in CMP architecture to facilitate sequential program execution across cores. The GRAT also reduces the area and complexity for steering instructions greatly without introducing additional programming effort and compiler support. In our evaluation, the result shows that our work performs within 5.9% of Core Fusion, a recent proposal which requires a complex steering unit.

Archive | 2009