Yuke Wang
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuke Wang.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2004
Yingtao Jiang; A. Al-Sheraidah; Yuke Wang; Edwin Hsing-Mean Sha; Jin-Gyun Chung
The 1-bit full adder circuit is a very important component in the design of application specific integrated circuits. This paper presents a novel low-power multiplexer-based 1-bit full adder that uses 12 transistors (MBA-12T). In addition to reduced transition activity and charge recycling capability, this circuit has no direct connections to the power-supply nodes, leading to a noticeable reduction in short-current power consumption. Intensive HSPICE simulation shows that the new adder has more than 26% in power savings over conventional 28-transistor CMOS adder and it consumes 23% less power than 10-transistor adders (SERF and 10T ) and is 64% faster.
global communications conference | 2005
Qiong Zhang; Vinod M. Vokkarane; Yuke Wang; Jason P. Jue
Due to the bufferless nature of OBS networks, random burst losses may occur, even at low traffic loads. For optical burst-switched (OBS) networks in which TCP is implemented at a higher layer, these random burst losses may be mistakenly interpreted by the TCP layer as congestion in the network, leading to serious degradation of the TCP performance. In this paper, we reduce random burst losses by a burst retransmission scheme in which the bursts lost due to contention in the OBS network are retransmitted at the OBS layer. The OBS retransmission scheme can then reduce the probability that the TCP layer falsely detects congestion, thereby improving the TCP throughput. We analyze the TCP throughput when OBS networks employ the burst retransmission scheme and develop a simulation model to validate the analytical results. Based on our simulation results, we show that an OBS layer with burst retransmission provides an improvement of up to ten times the TCP throughput over an OBS layer without burst retransmission. This significant improvement is primarily because the TCP layer triggers fewer time-out based retransmissions when the OBS retransmission scheme is used
broadband communications, networks and systems | 2005
Qiong Zhang; Vinod M. Vokkarane; Yuke Wang; Jason P. Jue
In this paper, we evaluate the performance of a burst retransmission scheme in which the bursts lost due to contentions in an OBS network are retransmitted at the OBS layer. The retransmission scheme aims to reduce burst loss probability in OBS networks. We develop an analytical model for obtaining the burst loss probability over an OBS network that uses the retransmission scheme. We also compare the performance of the burst retransmission scheme with the deflection scheme. Simulation results also show that at a moderate traffic load, the retransmission scheme provides an improvement of up to four times the burst loss probability with the deflection scheme. Results also show that the retransmission scheme significantly improves the burst loss probability compared to an OBS network without the retransmission scheme.
IEEE Transactions on Signal Processing | 2007
Yuke Wang; Yiyan Tang; Yingtao Jiang; Jin-Gyun Chung; Sang-Seob Song; Myoung-Seob Lim
Memory references in digital signal processors (DSP) are expensive due to their long latencies and high power consumption. Implementing fast Fourier transform (FFT) algorithms on DSP involves many memory references to access butterfly inputs and twiddle factors. Conventional FFT implementations require redundant memory references to load identical twiddle factors for butterflies from different stages in the FFT diagrams. In this paper, we present novel memory reference reduction methods to minimize memory references due to twiddle factors for implementing various different FFT algorithms on DSP. The proposed methods first group the butterflies with identical twiddle factors from different stages in the FFT diagrams and compute them before computing other butterflies with different twiddle factors, and then reduce the number of twiddle factor lookups by taking advantage of the properties of twiddle factors. Consequently, each twiddle factor is loaded only once and the number of memory references due to twiddle factors can be minimized. We have applied the proposed methods to implement radix-2 DIF FFT algorithm on TI TMS320C64x DSP. Experimental results show the proposed methods can achieve average of 76.4% reduction in the number of memory references, 53.5% saving of memory spaces due to twiddle factors, and average of 36.5% reduction in the number of clock cycles to compute radix-2 DIF FFT on DSP comparing to the conventional implementation. Similar performance gain is reported for implementing radix-2 DIT FFT algorithms using the new methods
global communications conference | 2005
Yiyan Tang; Lie Qian; Yuke Wang
The explosive growth of 802.11-based wireless LANs has attracted interest in providing higher data rates and greater system capacities. Among the IEEE 802.11 standards, the 802.11a standard based on OFDM modulation scheme has been defined to address high-speed and large-system-capacity challenges. Hardware implementations are often used to meet the high-data-rate requirements of 802.11a standard. Although software based solutions are more attractive due to the lower cost, shorter development time, and higher flexibility, it is still a challenge to meet the high-data-rate requirements of 802.11a by software. In this paper, we implement a software-based 802.11a digital baseband transmitter on the TI TMS320C64x DSP. The transmitter can operate over all data rates defined in the 802.11a standard and is compatible with the high-rate portions of the 802.11g standard. Two major optimizations have been used in the software implementation to achieve the high-data-rate: 1) parallelizing the scrambler function and 2) concatenating the FEC encoder, puncturing, and interleaver functions. Experimental results show that the optimized software implementation on a single C64x DSP with a clock frequency of 1.0 GHz can operate at the maximum of 136 Mbits/s, which is twice as fast as the previous software implementation at the same clock frequency
international parallel and distributed processing symposium | 2002
Yingtao Jiang; Ting Zhou; Yiyan Tang; Yuke Wang
In microprocessor-based systems, memory access is expensive due to longer latency and higher power consumption. In this paper, we present a novel FFT algorithm to reduce the frequency of memory access as well as multiplication operations. For an N-point FFT, we design the FFT with two distinct sections: (1) The first section of the FFT structure computes the butterflies involving twiddle factors WNj (j ≠ 0) through a computation/partitioning scheme similar to the Hoffman coding. In this section, all the butterflies sharing the same twiddle factor will be clustered and computed together. In this way, redundant memory access to load twiddle factors is avoided. (2) In the second section, the remaining (N - 1) butterflies involving the twiddle factor WN0 are computed with a register-based breadth first tree traversal algorithm. This novel twiddle factor-based FFT is tested on the TIT MS320C62x digital signal processor. The results show that, for a 32-point FFT, the new algorithm leads to as much as 20% reduction in clock cycles and an average of 30% reduction in memory access than that of the conventional DIF FFT.
international conference on electronics circuits and systems | 2001
Yuke Wang; Yingtao Jiang; Edwin Hsing-Mean Sha
Multiplication is one of the most critical operations in many computational systems. In this paper, we present an improved architecture for a multiplexer-based multiplication algorithm. Also through intensive HSPICE simulation, it has been shown in this paper that due to smaller internal capacitance, the multiplexer-based array multiplier outperforms the modified Booth multiplier in both speed and power dissipation by 13% to 26%. In addition, we demonstrate that using area-efficient full adder circuits (SERF and 10T) can help reduce the overall routing capacitance, resulting in less power consumption for multipliers built upon those adder circuits. Therefore, a multiplexer-based multiplier following the suggested architecture, along with area-efficient full adder circuits, can be used for low power high performance parallel multiplier designs.
global communications conference | 2004
Lie Qian; Anand Krishnamurthy; Yuke Wang; Yiyan Tang; Philippe Dauchy; Alberto Conte
On-line traffic, including conversational calls, videoconference calls, and live video, is becoming an important type of traffic in the Internet. The traffic traces of on-line traffic are not pre-recorded, which means little information on the on-line traffic is known in advance. Hence, on-line traffic is hard to characterize by existing traffic models, such as D-BIND. In order to anticipate and capture the burstiness property of on-line traffic, we introduce a new confidence-level-based statistical bounding interval-length dependent (S-BIND) traffic model and a statistical admission control algorithm, based on the S-BIND traffic model: the GammaH-BIND algorithm. Our simulation results show that by using the S-BIND traffic model as inputs, the GammaH-BIND algorithm can achieve the maximum valid network utilization for both low-bursty and high-bursty on-line traffic, which is 50%/spl sim/70% higher than the achievable network utilization under the D-BIND traffic model.
global communications conference | 2001
Yiyan Tang; Yingtao Jiang; Yuke Wang
This paper presents a label search engine, built upon multiple hierarchically configured CAM cores, for multiprotocol label switching over ATM networks. Novel cache-based sorting logic, following a linear search algorithm with exponential insertion, is incorporated into each CAM core to efficiently explore the temporal correlations among incoming labels as well as indirect data sorting, resulting in significant power saving and throughput increase. This search engine with 1024 data entries has been designed using a 0.18 /spl mu/m CMOS technology running at 200 MHz with total power consumption of less than 2 W.
international symposium on circuits and systems | 2003
Yiyan Tang; Lie Qian; Yuke Wang; Yvon Savaria
Memory reference in digital signal processors (DSP) is among the most costly operations due to its long latency and substantial power consumption. In this paper, we present a new method to minimize memory references due to twiddle factors for implementing any existing fast Fourier transform (FFT) algorithms on DSP processors. The new method takes advantage of previously proposed twiddle factor reduction method (TFRM) and twiddle-factor-based butterfly grouping method (TFBBGM). It can compute two butterflies in one stage of any FFT diagram by loading only one twiddle-factor. Further memory reference reduction is done by computing butterflies with the same twiddle factor at the same time in different stages of the FFT diagram. We have applied the new method to implement radix-2 DIT FFT algorithm on TI TMS320C64x DSP. While using only 50% memory space for storing twiddle factors compared to the conventional DIT FFT implementation, the new method achieves an average reduction in the number of memory references by 79% for accessing the twiddle factors, and 17.5% reduction in the number of clock cycles.