Yiyan Tang
Alcatel-Lucent
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yiyan Tang.
IEEE Transactions on Signal Processing | 2007
Yuke Wang; Yiyan Tang; Yingtao Jiang; Jin-Gyun Chung; Sang-Seob Song; Myoung-Seob Lim
Memory references in digital signal processors (DSP) are expensive due to their long latencies and high power consumption. Implementing fast Fourier transform (FFT) algorithms on DSP involves many memory references to access butterfly inputs and twiddle factors. Conventional FFT implementations require redundant memory references to load identical twiddle factors for butterflies from different stages in the FFT diagrams. In this paper, we present novel memory reference reduction methods to minimize memory references due to twiddle factors for implementing various different FFT algorithms on DSP. The proposed methods first group the butterflies with identical twiddle factors from different stages in the FFT diagrams and compute them before computing other butterflies with different twiddle factors, and then reduce the number of twiddle factor lookups by taking advantage of the properties of twiddle factors. Consequently, each twiddle factor is loaded only once and the number of memory references due to twiddle factors can be minimized. We have applied the proposed methods to implement radix-2 DIF FFT algorithm on TI TMS320C64x DSP. Experimental results show the proposed methods can achieve average of 76.4% reduction in the number of memory references, 53.5% saving of memory spaces due to twiddle factors, and average of 36.5% reduction in the number of clock cycles to compute radix-2 DIF FFT on DSP comparing to the conventional implementation. Similar performance gain is reported for implementing radix-2 DIT FFT algorithms using the new methods
global communications conference | 2005
Yiyan Tang; Lie Qian; Yuke Wang
The explosive growth of 802.11-based wireless LANs has attracted interest in providing higher data rates and greater system capacities. Among the IEEE 802.11 standards, the 802.11a standard based on OFDM modulation scheme has been defined to address high-speed and large-system-capacity challenges. Hardware implementations are often used to meet the high-data-rate requirements of 802.11a standard. Although software based solutions are more attractive due to the lower cost, shorter development time, and higher flexibility, it is still a challenge to meet the high-data-rate requirements of 802.11a by software. In this paper, we implement a software-based 802.11a digital baseband transmitter on the TI TMS320C64x DSP. The transmitter can operate over all data rates defined in the 802.11a standard and is compatible with the high-rate portions of the 802.11g standard. Two major optimizations have been used in the software implementation to achieve the high-data-rate: 1) parallelizing the scrambler function and 2) concatenating the FEC encoder, puncturing, and interleaver functions. Experimental results show that the optimized software implementation on a single C64x DSP with a clock frequency of 1.0 GHz can operate at the maximum of 136 Mbits/s, which is twice as fast as the previous software implementation at the same clock frequency
global communications conference | 2004
Lie Qian; Anand Krishnamurthy; Yuke Wang; Yiyan Tang; Philippe Dauchy; Alberto Conte
On-line traffic, including conversational calls, videoconference calls, and live video, is becoming an important type of traffic in the Internet. The traffic traces of on-line traffic are not pre-recorded, which means little information on the on-line traffic is known in advance. Hence, on-line traffic is hard to characterize by existing traffic models, such as D-BIND. In order to anticipate and capture the burstiness property of on-line traffic, we introduce a new confidence-level-based statistical bounding interval-length dependent (S-BIND) traffic model and a statistical admission control algorithm, based on the S-BIND traffic model: the GammaH-BIND algorithm. Our simulation results show that by using the S-BIND traffic model as inputs, the GammaH-BIND algorithm can achieve the maximum valid network utilization for both low-bursty and high-bursty on-line traffic, which is 50%/spl sim/70% higher than the achievable network utilization under the D-BIND traffic model.
global communications conference | 2001
Yiyan Tang; Yingtao Jiang; Yuke Wang
This paper presents a label search engine, built upon multiple hierarchically configured CAM cores, for multiprotocol label switching over ATM networks. Novel cache-based sorting logic, following a linear search algorithm with exponential insertion, is incorporated into each CAM core to efficiently explore the temporal correlations among incoming labels as well as indirect data sorting, resulting in significant power saving and throughput increase. This search engine with 1024 data entries has been designed using a 0.18 /spl mu/m CMOS technology running at 200 MHz with total power consumption of less than 2 W.
international symposium on circuits and systems | 2003
Yiyan Tang; Lie Qian; Yuke Wang; Yvon Savaria
Memory reference in digital signal processors (DSP) is among the most costly operations due to its long latency and substantial power consumption. In this paper, we present a new method to minimize memory references due to twiddle factors for implementing any existing fast Fourier transform (FFT) algorithms on DSP processors. The new method takes advantage of previously proposed twiddle factor reduction method (TFRM) and twiddle-factor-based butterfly grouping method (TFBBGM). It can compute two butterflies in one stage of any FFT diagram by loading only one twiddle-factor. Further memory reference reduction is done by computing butterflies with the same twiddle factor at the same time in different stages of the FFT diagram. We have applied the new method to implement radix-2 DIT FFT algorithm on TI TMS320C64x DSP. While using only 50% memory space for storing twiddle factors compared to the conventional DIT FFT implementation, the new method achieves an average reduction in the number of memory references by 79% for accessing the twiddle factors, and 17.5% reduction in the number of clock cycles.
international conference on electronics circuits and systems | 2004
Yiyan Tang; Yuke Wang; Jin-Gyun Chung; Sang Seob Song; Myoung-Seob Lim
The memory reference in digital signal processors (DSP) is among the most costly of operations due to its long latency and substantial power consumption. Previously proposed twiddle-factor-based butterfly grouping methods can effectively minimize memory references due to twiddle factors for implementing any existing fast Fourier transform (FFT) algorithms on DSP. However, the performance of its C implementation on DSP is far behind the corresponding TI assembly benchmark for radix-2 DIF FFT due to limitations of the compiler. In this paper, we propose a hand-coded assembly implementation for the radix-2 DIF FFT algorithm with the twiddle-factor-based butterfly grouping method on a TI TMS320C64/spl times/ DSP. Experimental results show that for 1024-pt radix-2 DIF FFT, our hand-coded assembly implementation is 8 times faster than the C implementation and slightly faster than the TI assembly benchmark while requiring only 50% of memory references due to twiddle factors compared to the TI assembly benchmark.
international conference on communications | 2004
Yiyan Tang; Lie Qian; Bashar Bou-Diab; Anand Krishnamurthy; Gérard Damm; Yuke Wang
In this paper, we present a high-performance implementation for the search function of a graph-based packet classification algorithm, used by networking applications in Internet routers, on the Intel IXP1200 network processor. The implementation uses optimal consolidation of memory reads to reduce the number of expensive SRAM accesses. Also, the implementation inserts instructions after SRAM accesses to hide the memory access latencies and improve processor utilization. Experimental results show the performance of the implemented search function on the IXP1200 using five microengines at 166 MHz can be as high as 1.18 Msps (million searches per second), which satisfies the requirements of packet rates from OC-3 or fast Ethernet and up to OC-12 or Gigabit Ethernet. The methods presented here can also be adapted to other network processors with similar architectures.
global communications conference | 2006
Lie Qian; Yiyan Tang; Yuke Wang; Bashar Bou-Diab; Wladek Olesinski
Multicast is an efficient mechanism for delivering data to multiple receivers. However conventional multicast routing suffers from a linear scalability problem that impedes the deployment of IP multicast in the Internet. The number of forwarding states maintained in the network layer increases linearly with the number of existing multicast channels in the network. MPLS coexists with many existing network protocols and can provide scalability for how these protocols are used in todays networks. In this paper, we propose a new scalable multicast solution in MPLS networks to reduce the number of forwarding states in the network layer. In previously existing solutions, forwarding states are removed from non-branching routers on multicast trees and packets are label switched by MPLS LSPs in non-branching routers. Our new solution proposes a new algorithm for multicast tree construction and a new channel sharing scheme to further reduce the number of forwarding states in branching routers. Simulation results show that the new solution can achieve 85%-97% reduction in the number of multicast forwarding states in MPLS networks over network layer multicast schemes, and 45%-90% reduction over the existing MPLS multicast tree based multicast.
global communications conference | 2005
Lie Qian; Yiyan Tang; Yuke Wang; Bashar Bou-Diab; Wladek Olesinski
Multicast is a bandwidth efficient mechanism for delivering the same data to multiple receivers simultaneously. Layered multicast, which is suited for high-bandwidth multimedia traffic like video, may be used to differentiate between receivers of the same multimedia content based on their processing and bandwidth resources. To achieve fair bandwidth allocation in layered multicast, max-min fair allocation solutions have been proposed. However, existing solutions either use non-scalable centralized computation or require core routers to maintain per-session information. Maintaining per-session information in core routers violates the core stateless principle of DiffServ architecture, which is proposed as a scalable QoS solution in the Internet. In this paper, we propose a new scalable distributed max-min fair bandwidth allocation algorithm for DiffServ, which runs iteratively and does not maintain per-session information in core routers. The new algorithm is proved capable of achieving max-min fair bandwidth allocation. Through analysis, we show that the new algorithm complies with core stateless DiffiServ architecture to have O(1) storage complexity in core routers
Archive | 2003
Gerard Damm; Bashar Said Bou-Diab; Yuke Wang; Yun Zhang; Yiyan Tang; Anand Krishnamurthy; Lie Qian