Is this you? Create Your Porfile

Weixing Ji

Beijing Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weixing Ji is active.

Explore More

Publication

Featured researches published by Weixing Ji.

international workshop on computer architecture for machine perception | 2007

A Traffic-Aware Energy Efficient Routing Protocol for Wireless Sensor Networks

Feng Shi; Weixing Ji; Baojun Qiao; Bin Liu; H. ul Rashid

This paper introduces an online load balanced energy-aware routing protocol for large-scale wireless sensor networks. The protocol designed, namely traffic-aware energy efficient (TAEE) routing protocol, exploits traffic load information in addition to power residue levels to optimize the load distribution of the entire sensor network, and thus accomplish longer network lifetime. An algorithm for adaptively computing the best parameter for TAEE is also described. Furthermore, to better accommodate larger-scale wireless sensor networks, our TAEE protocol can be adapted to include a random grouping scheme which implements hierarchical routing to reduce computation and routing overhead and to maintain energy efficiency. Our simulation shows that the TAEE protocol generates better performance in terms of network lifetime compared with the leading power-aware Max-min zPmin protocol.

real-time systems symposium | 2007

Performance Evaluation of a Self-Maintained Memory Module

Weixing Ji; Feng Shi; Baojun Qiao; Qi Zuo; Caixia Liu

Hardware approach emerges as one of the candidate in improving the performance of dynamic memory management. This paper presents measurements of a self-maintained memory module subjected to several different workloads. This memory module supporting explicit dynamic memory management takes advantage of the high speed of a pure hardware implementation. Object allocation and deletion are strictly bounded in time. The whole heap space is divided into two semi-spaces, and a concurrent bidirectional memory compaction algorithm is exploited, so that memory compaction can be done while mutator process is running on the processor concurrently. Reported measurements demonstrate that hardware-assisted memory management is a viable alternative to traditional explicit memory management techniques. Experimental results show that more than 60% of memory traffic is saved by the proposed memory compaction scheme compared to software-only approach. Both processor delay and program execution time are greatly reduced.

conference on industrial electronics and applications | 2007

A New Hierarchical Interconnection Network for Multi-core Processor

Baojun Qiao; Feng Shi; Weixing Ji

On-chip communication architectures can have a great influence on the speed and area of multi-core processor (MCP) designs. A new chip design paradigm called network-on-chip (NOC) offers a promising interconnection architectural choice for future MCP. A new on-chip interconnection network named Triple-based Hierarchical Interconnection Network (THIN) is proposed that aims to decrease the node degree, reduce the links and shorten the diameter. The topology of THIN is very simple and it has obviously hierarchical, symmetric and scalable characteristic. THIN applies the hierarchical address-encoding scheme that can make the design of routing algorithm simple and efficient. The network properties are studied and compared with 2-D mesh. The results show that THIN is a better candidate for constructing the NOC than 2-D mesh, when there are not too many cores.

Microprocessors and Microsystems | 2011

3D floorplanning of low-power and area-efficient Network-on-Chip architecture

Licheng Xue; Feng Shi; Weixing Ji; Haroon-Ur-Rashid Khan

Network-on-Chip (NoC) architectures have been adopted by chip multi-processors (CMPs) as a flexible solution to the increasing delay in the deep sub-micron regime. However, the shrinking feature size limits the performance of NoCs due to power and area constraints. In this paper, we propose three 3D floorplanning methods for a Triplet-based Hierarchical Interconnection Network (THIN) which is a new high performance NoC. The proposed floorplanning methods use both Manhattan and Y-architecture routing architectures so as to improve the performance, reduce the power consumption and area requirement of THIN. A cycle accurate simulator was developed based on Noxim NoC simulator and ORION 2.0 energy model. The proposed floorplanning methods show up to 24.69% energy and 8.84% area reduction at best compared with 3D Mesh. Our analysis concludes that THIN is not only a feasible but also a low-power and area-efficient NoC at physical level.

embedded and real-time computing systems and applications | 2009

A Novel Adaptive Scratchpad Memory Management Strategy

Ning Deng; Weixing Ji; Jiaxin Li; Feng Shi; Yizhuo Wang

Scratchpad Memory (SPM) is a fast and small software-managed SRAM. Its current extensive uses in embedded processors are motivated by the advantages of power saving, small area and low access time compared with cache. However, existing SPM management methods depend heavily on profiling and compilers. The dependence on compiler also makes embedded applications hard to transplant. This paper presents a novel strategy to manage the scratchpad memory without compiler support. Based on the memory reference locality theory, a hardware random sampling module is adopted to dynamically identify the frequently accessed addresses at runtime. The consequential data movement and address redirection are handled by software operation with the assistance of memory management unit (MMU). We evaluate our method on 10 typical embedded applications and compare the results to a cache reference system. Experimental results show that, on average, our scheme can achieve 33:5% reduction in energy consumption with only slight(

international conference on parallel processing | 2016

Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU

Akrem Benatia; Weixing Ji; Yizhuo Wang; Feng Shi

Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed recently for this kernel on the GPU side. Since the performance of these sparse formats varies significantly according to the sparsity characteristics of the input matrix and the hardware specifications, no one of them can be considered as the best one to use for every sparse matrix. In this paper, we address the problem of selecting the best representation for a given sparse matrix on GPU by using a machine learning approach. First, we present some interesting and easy to compute features for characterizing the sparse matrices on GPU. Second, we use a multiclass Support Vector Machine (SVM) classifier to select the best format for each input matrix. We consider in this paper four popular formats (COO, CSR, ELL, and HYB), but our work can be extended to support more sparse representations. Experimental results on two different GPUs (Fermi GTX 580 and Maxwell GTX 980 Ti) show that we achieved more than 98% of the performance possible with a perfect selection.

Journal of Systems Architecture | 2011

Dynamic and adaptive SPM management for a multi-task environment

Weixing Ji; Ning Deng; Feng Shi; Qi Zuo; Jiaxin Li

In this paper, we present a dynamic and adaptive scratchpad memory (SPM) management strategy targeting a multi-task environment. It can be applied to a contemporary embedded processor that maps the physically addressed SPM into a virtual space with the help of an integrated memory management unit (MMU). Based on mass-count disparity, we introduce a hardware memory reference sampling unit (MRSU) that samples the memory reference stream with very low probability. The captured address is considered as one of the memory addresses contained in a frequently referenced memory block. A hardware interruption is generated by the MRSU, and the identified frequently accessed memory block is placed into the SPM space by software. The software also modifies the page table so that the follow-up memory accesses to the memory block will be redirected to the SPM. With no dependence on compiler and profiling information, our proposed strategy is specifically adequate for SPM management in a multi-task environment. In such an environment, a real-time operating system (RTOS) is usually hosted, and the behavior of the memory accesses cannot be predicted by static analysis or profiling. We evaluate our SPM allocation strategy by running several tasks on a tiny RTOS with preemptive scheduling. Experimental results show that our approach can achieve 10% reduction in energy consumption on average, with 1% performance degradation at runtime compared with a cache-only reference system.

international conference on algorithms and architectures for parallel processing | 2007

THIN: a new hierarchical interconnection network-on-chip for SOC

Baojun Qiao; Feng Shi; Weixing Ji

On-chip communication architectures can have a great influence on the speed and area of System-on-Chip (SOC) designs. A new chip design paradigm called Network-on-Chip (NOC) offers a promising architectural choice for future SOC. Focusing on decreasing node degree, reducing links and shortening diameter, a new NOC, named Triple-based Hierarchical Interconnection Network (THIN), is presented in this paper. The topology of THIN is very simple and it has obviously hierarchical, symmetric and scalable characteristic. The network properties and zero-load latency were studied and compared with 2-D mesh and Hypercube. The results show THIN is superior to 2-D mesh and Hypercube to construct interconnection network for SOC, when the network size is not very large. A new tree-based multicast routing algorithm in THIN is proposed. Thorough analyses and experiments based on different multicast implementation schemes are conducted. The results do confirm the advantage of our scheme over unicast-based and path-based multicast schemes.

parallel computing | 2014

An adaptive and hierarchical task scheduling scheme for multi-core clusters

Yizhuo Wang; Yang Zhang; Yan Su; Xiaojun Wang; Xu Chen; Weixing Ji; Feng Shi

An adaptive and hierarchical task scheduling scheme (AHS) is proposed.Work-sharing is used in conjunction with work-stealing.An initial partitioning is performed with respect to the pattern of task parallelism.A practical implementation of AHS is described.The theoretical, simulation and experimental studies of AHS are presented. Work-stealing and work-sharing are two basic paradigms for dynamic task scheduling. This paper introduces an adaptive and hierarchical task scheduling scheme (AHS) for multi-core clusters, in which work-stealing and work-sharing are adaptively used to achieve load balancing.Work-stealing has been widely used in task-based parallel programing languages and models, especially on shared memory systems. However, high inter-node communication costs hinder work-stealing from being directly performed on distributed memory systems. AHS addresses this issue with the following techniques: (1) initial partitioning, which reduces the inter-node task migrations; (2) hierarchical scheduling scheme, which performs work-stealing inside a node before going across the node boundary and adopts work-sharing to overlap computation and communication at the inter-node level; and (3) hierarchical and centralized control for inter-node task migration, which improves the efficiency of victim selection and termination detection.We evaluated AHS and existing work-stealing schemes on a 16-nodes multi-core cluster. Experimental results show that AHS outperforms existing schemes by 11-21.4%, for the benchmarks studied in this paper.

design, automation, and test in europe | 2013

A work-stealing scheduling framework supporting fault tolerance

Yizhuo Wang; Weixing Ji; Feng Shi; Qi Zuo

Fault tolerance and load balancing are critical points for executing long-running parallel applications on multicore clusters. This paper addresses both fault tolerance and load balancing on multicore clusters by presenting a novel work-stealing task scheduling framework which supports hardware fault tolerance. In this framework, both transient and permanent faults are detected and recovered at task granularity. We incorporate task-based fault detection and recovery mechanisms into a hierarchical work-stealing scheme to establish the framework. This framework provides low-overhead fault-tolerance and optimal load balancing by fully exploiting task parallelism.

Explore More