Shen Chih Tung
University of Pittsburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shen Chih Tung.
EURASIP Journal on Advances in Signal Processing | 2006
Raymond R. Hoare; Dara Kusic; Joshua Fazekas; John Foster; Shen Chih Tung; Michael L. McCloud
This paper presents an architecture that combines VLIW (very long instruction word) processing with the capability to introduce application-specific customized instructions and highly parallel combinational hardware functions for the acceleration of signal processing applications. To support this architecture, a compilation and design automation flow is described for algorithms written in C. The key contributions of this paper are as follows: (1) a 4-way VLIW processor implemented in an FPGA, (2) large speedups through hardware functions, (3) a hardware/software interface with zero overhead, (4) a design methodology for implementing signal processing applications on this architecture, (5) tractable design automation techniques for extracting and synthesizing hardware functions. Several design tradeoffs for the architecture were examined including the number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply-accumulate operations. Using the MediaBench benchmark suite, we tested our methodology and architecture to accelerate software. Our combined VLIW processor with hardware functions was compared to that of software executing on a RISC processor, specifically the soft core embedded NIOS II processor. For software kernels converted into hardware functions, we show a hardware performance multiplier of up to times that of software with an average times faster. For the entire application in which only a portion of the software is converted to hardware, the performance improvement is as much as 30X times faster than the nonaccelerated application, with a 12X improvement on average.
design automation conference | 2006
Raymond R. Hoare; Swapna Dontharaju; Shen Chih Tung; Ralph Sprang; Joshua Fazekas; James T. Cain; Marlin H. Mickle
This paper describes an ultra low power active RFID tag and its automated design flow. RFID primitives to be supported by the tag are enumerated with RFID macros and the behavior of each primitive is specified using ANSI-C within the template to automatically generate the tag controller. Two power saving components, a passive transceiver/burst switch and a smart buffer, are presented to save power and increase tag lifetime. Based on a test program, the processors required 183, 43, and 19 muJ per transaction for StrongARM, XScale, and EISC processors, respectively. Three hardware controllers using a Fusion FPGA, Coolrunner II CPLD, and ASIC required 13 nJ, 1.3 nJ, and 0.07 nJ per transaction
field-programmable custom computing machines | 2006
Raymond R. Hoare; Swapna Dontharaju; Shen Chih Tung; Ralph Sprang; Joshua Fazekas; James T. Cain; Marlin H. Mickle
Current radio frequency identification (RFID) systems generally have long design times and low tolerance to changes in specification. This paper describes a field programmable, low-power active RFID tag, and its associated specification and automated design flow. RFID primitives to be supported by the tag are enumerated with RFID macros, or assembly-like descriptions of the tag operations. From these, the RFID preprocessor generates templates automatically. The behavior of each RFID primitive is specified using ANSI C in the template. The resulting file is compiled by the RFID compiler. A smart buffer sits between the transceiver and the tag controller, to detect whether incoming packets are intended for the tag. By doing so, the main controller may remain powered down to reduce power consumption. Two system-on-a-chip implementation strategies are presented. First, a microprocessor based system for which a C program is automatically generated. The second includes a block of low-power FPGA logic. The user supplied RFID logic in ANSI-C is automatically converted into combinational VHDL by the RFID compiler. Based on a test program, the processors required 183, 43, and 19 muJ per transaction for StrongARM, XScale, and EISC processors, respectively. By replacing the processor with a Coolrunner II, the controller can be reduced to 1.11 nJ per transaction
international parallel and distributed processing symposium | 2008
Shen Chih Tung
While RFID is starting to become a ubiquitous technology, the variation between different RFID systems still remains high. This paper describes a design automation flow for fast implementation of the physical layer component of new RFID systems. Physical layer features are described using waveform features, which are used to automatically generate physical layer encoding and decoding hardware blocks. We present automated implementations of five protocols related to RFID including Manchester encoding for ultra high frequency (UHF) active tags, pulse interval encoding (PIE) for UHF passive tags, and modified miller encoding for lower frequency RFID tags. We have targeted reconfigurable devices to allow changes in the design and compared these implementations with a standard cell ASIC target.
ACM Transactions on Design Automation of Electronic Systems | 2008
Swapna Dontharaju; Shen Chih Tung; Leonid Mats; Peter J. Hawrylak; Raymond R. Hoare; James T. Cain; Marlin H. Mickle
While RFID is starting to become a ubiquitious technology, the variation between different RFID systems still remains high. This paper presents several prototyping environments for different components of radio frequency identification (RFID) tags to demonstrate how many of these components can be standardized for many different purposes. We include two active tag prototypes, one based on a microprocessor and the second based on custom hardware. To program these devices we present a design automation flow that allows RFID transactions to be described in terms of primitives with behavior written in ANSI C code. To save power with active RFID devices we describe a passive transceiver switch called the “burst switch” and demonstrate how this can be used in a system with a microprocessor or custom hardware controller. Finally, we present a full RFID system prototyping environment based on real-time spectrum analysis technology currently deployed at the University of Pittsburgh RFID Center of Excellence. Using our prototyping techniques we show how transactions from multiple standards can be combined and targeted to several microprocessors include the Microchip PIC, Intel StrongARM and XScale, and AD Chips EISC as well as several hardware targets including the Altera Apex, Actel Fusion, Xilinx Coolrunner II, Spartan 3 and Virtex 2, and cell-based ASICs.
Journal of Parallel and Distributed Computing | 2005
Raymond R. Hoare; Zhu Ding; Shen Chih Tung; Rami G. Melhem
This paper introduces a framework for the design, synthesis and cycle-accurate simulation for parallel computing networks of 128+ processors. In order to accurately characterize the network, we present a bottom-up design methodology in which each of the components are designed using a hardware description language and synthesized to an FPGA for performance estimation of the final ASIC implementation. The components are then integrated to form a parallel computing network and simulated using a cycle-accurate simulator with network traffic described by command files. This enabled us to simulate various switching techniques, three of which are presented in this paper: wormhole switching, circuit switching and a newly introduced technique called predictive circuit switching. In our experiments, four different representational traffics are generated for our simulation and, to show the flexibility of this model, we vary the cable lengths and thus their latency for all four test cases. Our results show that this hardware design, synthesis and cycle-accurate simulation methodology provides a useful method for evaluating design tradeoffs in parallel networks. A non-blocking queue, with up to 128 internal queues, and a real-time bandwidth scheduler, for up to 128 ports, were designed in hardware with hardware synthesis results presented. From our network simulation results, we conclude that predictive circuit switching exceeds the performance of packet switching for highly predictable traffic, like collective communications, and for heavily loaded unpredictable traffic with small packet sizes. As expected, predictive circuit switching significantly underperforms both packet and circuit switching for unpredictable traffic.
ACM Transactions on Design Automation of Electronic Systems | 2009
Swapna Dontharaju; Shen Chih Tung; James T. Cain; Leonid Mats; Marlin H. Mickle
While RFID has become a ubiquitous technology, there is still a need for RFID systems with different capabilities, protocols, and features depending on the application. This article describes a design automation flow and power estimation technique for fast implementation and design feedback of new RFID systems. Physical layer features are described using waveform features, which are used to automatically generate physical layer encoding and decoding hardware blocks. RFID primitives to be supported by the tag are enumerated with RFID macros and the behavior of each primitive is specified using ANSI-C within the template to automatically generate the tag controller. Case studies implementing widely used standards such as ISO 18000 Part 7 and ISO 18000 Part 6C using this automation technique are presented. The power macromodeling flow demonstrated here is shown to be within 5% to 10% accuracy, while providing results 100 times faster than traditional methods. When eliminating the need for certain features of ISO 18000 Part 6C, the design flow shows that the power required by the implementation is reduced by nearly 50%.
IEEE Communications Magazine | 2007
Swapna Dontharaju; Shen Chih Tung; Leo Mats; Justin Panuski; James T. Cain; Marlin H. Mickle
The ISO 18000 Part 6C UHF standard is becoming a widely accepted standard in RFID applications in supply chain management and is driving development of passive tags. The communication primitives of ISO 18000 Part 6C are significantly different and more complex than ISO 18000 Part 7. The complexity of the Part 6C standard makes the design of these tags extremely time consuming and challenging for reducing power consumption and silicon area. This article examines various features of the ISO Part 6C standard and compares it to the ISO Part 7 standard for active tags for the purpose of evaluating generic interrogator/tag protocol complexity. For a 0.16 mm ASIC implementation, the Query command from ISO 18000 Part 6C is more complex than 10 primitives of the simpler ISO 18000 Part 7 standard
international conference on microelectronics | 2003
Raymond R. Hoare; Shen Chih Tung
Advances in FPGA technology and in design automation tools have changed the way digital electronics are created. Shematic entry of gates is becoming deprecated by hardware description languages (HDLs) and sophisticated synthesis tools. This requires that the designer have a complete understanding of the entire design flow so that they can utilize the efficiency of HDLs while thinking about the hardware that will be created. This paper reviews three years worth of experience in teaching a two-semester senior/graduate course sequence on Hardware Design Methodologies using the Mentor Graphics HDL Designer series tools and targeting FPGAs. Currently, all project use an ARM-embedded FPGA as their target. Given the complexity of these new devices, the tools, and only two semesters, we discuss the potential, the limitations/difficulties and the general sanity of this approach.
Microprocessors and Microsystems | 2007
Raymond R. Hoare; Swapna Dontharaju; Shen Chih Tung; Ralph Sprang; Joshua Fazekas; James T. Cain; Marlin H. Mickle