Fan Xiaoya
Northwestern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fan Xiaoya.
computer and information technology | 2012
Yao Tao; Gao Deyuan; Fan Xiaoya; Ren Xianglong
Multi-operand adder is one of attractive solutions compared with a network of 2-operand adders for accelerating algorithms including a lot of addition operations. In this paper, an improved 3-operand floating-point (FP) adder has been presented. Firstly, the internal width of the adder has been given which is compatible with IEEE-Std754. Secondly, a realignment method processing sticky bits is used to make the architecture has the same accuracy with a FP adder which has a infinite internal width. Thirdly, a low cost method to detect catastrophic cancellation has been employed. Several sophisticated techniques, such as compound adder and Leading zero anticipation (LZA), are utilized to optimize the architecture. The implementation results show that the proposed architecture has a competitive area and delay by comparing with both a basic 3-operand FP adder and a network of 2-operand FP adders. A small data format version of the proposed architecture has been verified by an exhaustive testing.
application specific systems architectures and processors | 2013
Yao Tao; Gao Deyuan; Fan Xiaoya; Jari Nurmi
This study presents hardware architectures performing correctly rounded Floating-Point (FP) multioperand addition and dot-product computation, both of which are widely used in various fields, such as scientific computing, digital signal processing, and 3D graphic applications. A novel realignment method is proposed to solve the catastrophic cancellation and multi-sticky bits. Only one rounding operation is performed in both of the proposed FP multi-operand adder and dot-product computation unit. Implementation results show that our architectures not only can produce correctly rounded results, whose errors are less than 0.5 ULP (Unit in the Last Place), but also have reduced delay comparing with the traditional network architecture, which uses 2-operand FP adders and multipliers to perform multi-operand addition and dot-product computation.
Journal of Semiconductors | 2013
Wang Shaoxi; Wang Mingxin; Fan Xiaoya; Zhang Shengbing; Han Ru
After analyzing the multivariate Cpm method (Chan et al. 1991), this paper presents a spatial multivariate process capability index (PCI) method, which can solve a multivariate off-centered case and may provide references for assuring and improving process quality level while achieving an overall evaluation of process quality. Examples for calculating multivariate PCI are given and the experimental results show that the systematic method presented is effective and actual.
2007 International Symposium on Integrated Circuits | 2007
Wang Jing; Fan Xiaoya; Zhang Shenbing; Zhang Meng; Wang Hai
This paper describes the design of a fully integrated PC-architecture SoC for industrial control which is capable of executing DOS and X86-compatible binary applications. It is a high performance and cost-effective SoC and meets the needs of embedded applications. The SoC provides 2 UARTs, 1 ECP/EPP parallel port, PC104 bus interface, 15 interrupt requests, 3 DMA requests, 3 timers, and 1 watchdog timer. Further, it supports up to 64 MB SDRAM, 32 MB solid state disk. The SOC integrates a self-designed 486DX4 compatible microprocessor, system control logic, SDRAM controller and peripheral control logic. It was designed using 0.18 mum CMOS standard cell library and consists of about 780 K gates, 10 KB SRAM and 350 Kbits ROM. It works at 133 MHz frequency, with less than 2 W power consumption.
international conference on signal processing | 2013
Zhang Peng; Fan Xiaoya; Huang Xiaoping
Supporting the online debugging is one of the design goals of SoC. Usually the function unit and the debugging structure are tightly coupled, thus it is hard to reuse the debug structure in other systems. This paper presents an on-chip debug method for SoC bus architecture. The system reuses On Chip Bus(OCB) as the transmission path for debugging data and debugs the units in system in form of bus access through debug interface. To implement debugging of embedded processor, Debug Support Unit and Debug Handle Unit is designed. The method fits for mainstream SoCs. It includes various debug functions and few limitations of debug interface, which is also very efficient. SoC of AMBA architecture with debug architecture of this kind has been implemented, and the debug function has been verified. The experiment indicates that the design satisfies the demand of SoC debugging.
international conference on electronic measurement and instruments | 2007
Wang Jing; Fan Xiaoya; Wang Hai; Yang Ming
With the development of the modern architecture and chip integration technology, parallel process technologies have become the mainstream. The increasingly large gap between processor and memory speed has made the design of high bandwidth and large scale cache a key part in high performance microprocessor. In this paper, we describe the design of a 16-port data cache, which is 8-way associative using pseudo-LRU replacement policy. The interleaved storage and cross-switch interconnection techniques enable the cache can response for up to 16 concurrent access requests.
wase international conference on information engineering | 2009
Liu Jun-rui; Chen Ying-tu; Fan Xiaoya; Kang Ji-chang
At present, the speed of the communications on PCI is limited by PCI bus, the number of the computers memory slots and the memory capacity significantly increased, and the management of memory becomes advanced. So, in this paper, the Direct Memory Communication method, abbreviated as DMC, is forwarded. The NIC (the Network Interface Card) based on DMC is inserted into a memory slot, the data is written into the memory while it’s written into the NIC, and the user can accomplished the direct point-to-point communications between the computers through accessing memory. The communications speed isn’t limited by PCI bus, and the copy between memory and NIC is omitted. The author also applied DMC into the high-speed fibre channel switch network, and has been studying the DMC-NIC based on the FB-DIMM DDR2. The experiments’ result shows that this NIC has the following advantages: a higher communications speed, a smaller communications delay and a simpler operation method.
field programmable custom computing machines | 2008
Tian Hangpei; Gao Deyuan; Wei Wu; Fan Xiaoya; Zhu Yian
In a partially reconfigurable system with online placement algorithm, we try to avoid mapping some redundant tasks by caching modules on the reconfigurable area. This paper proposes an elaborate strategy named virtual deletion and a low cost board- level hardware named recycle cache to accomplish the goal. In our strategy, the record of corresponding module is deleted from placer and indexed in the recycle cache. If the module might be used by following tasks, it can be restored from reconfigurable area by recycle cache immediately, without mapping the module again. Recycle cache can shorten average configuring time of partial reconfiguration without increasing arithmetic complex and placing time of the placer. Compared with large size of local register file which cache context of modules, the recycle cycle is much smaller and cheaper. Simulation results on large random tasks sets have shown that the recycle cache can improve performance of partially reconfigurable system effectively.In a partially reconfigurable system with online placement algorithm, we try to avoid mapping some redundant tasks by caching modules on the reconfigurable area. This paper proposes an elaborate strategy named virtual deletion and a low cost board- level hardware named recycle cache to accomplish the goal. In our strategy, the record of corresponding module is deleted from placer and indexed in the recycle cache. If the module might be used by following tasks, it can be restored from reconfigurable area by recycle cache immediately, without mapping the module again. Recycle cache can shorten average configuring time of partial reconfiguration without increasing arithmetic complex and placing time of the placer. Compared with large size of local register file which cache context of modules, the recycle cycle is much smaller and cheaper. Simulation results on large random tasks sets have shown that the recycle cache can improve performance of partially reconfigurable system effectively.
international conference on digital manufacturing & automation | 2012
Wang Mingxin; Wang Shaoxi; Zhang Shengbing; Fan Xiaoya
Process capability ultimately decides process quality level. Based on analyzing process capability index (PCI), process capability may be effectively assured. For the multivariate manufacturing processes, tremendous difficulties are often encountered when one attempts to measure the process capability by directly extending the univariate approach. The paper presents a modify spatial multivariate PCI method, which can solve multivariate off-centered case and may provide references for assuring and improving process quality level while achieving overall evaluation of process quality. At last, examples for calculating multivariate PCI are given and the experimental results show that the systematic method presented is effective and actual.
international conference for young computer scientists | 2008
Hai Yu; Fan Xiaoya
With the continuous downscaling of CMOS technologies, the reliability has become a major bottleneck in the evolution of the next generation scaling. Technology trends such as transistor downsizing, use of new materials and high performance computer architecture continue to increase the sensitivity of systems to soft errors. Today the technologies are moving into the period of nanotechnologies and system-on-chip (SoC) designs are widely used in most of the applications, the issues of soft errors and reliability in complex SoC designs are set to become and increasingly challenging. This paper gives a review to the soft error in SoC designs and then presents the fault tolerant solution.