Mats Torkelson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mats Torkelson is active.

Explore More

Publication

Featured researches published by Mats Torkelson.

international conference on parallel processing | 1996

A new approach to pipeline FFT processor

Shousheng He; Mats Torkelson

A new VLSI architecture for a real-time pipeline FFT processor is proposed. A hardware-oriented radix-2/sup 2/ algorithm is derived by integrating a twiddle factor decomposition technique in the divide-and-conquer approach. The radix-2/sup 2/ algorithm has the same multiplicative complexity as the radix-4 algorithm, but retains the butterfly structure of the radix-2 algorithm. The single-path delay-feedback architecture is used to exploit the spatial regularity in the signal flow graph of the algorithm. For length-N DFT computation, the hardware requirement of the proposed architecture is minimal on both dominant components: log/sub 4/N-1 complexity multipliers and N-1 complexity data memory. The validity and efficiency of the architecture have been verified by simulation in the hardware description language VHDL.

custom integrated circuits conference | 1998

Design and implementation of a 1024-point pipeline FFT processor

Shousheng He; Mats Torkelson

The design and implementation of a 1024-point pipeline FFT processor is presented. The architecture is based on a new form of FFT, the radix-2/sup 2/ algorithm. By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor. The chip has been implement in 0.5 /spl mu/m CMOS technology and takes an area of 40 mm/sup 2/. With 3.3 V power supply, it can compute 2/sup n/, n=0, 1, ..., 10 complex point forward and inverse FFT in real time with up to 30 MHz sampling frequency. The SQNR is above 50 dB for white noise input.

IEEE Journal of Solid-state Circuits | 1996

A monolithic digital clock-generator for on-chip clocking of custom DSP's

Peter Nilsson; Mats Torkelson

This paper shows a robust and easily implemented clock generator for custom designs. It is a fully digital design suitable for both high-speed clocking and low-voltage applications. This clocking method is digital, and it avoids analog methods like phase locked loops or delay line loops. Instead, the clock generator is based on a ring counter which stops a ring oscillator after the correct number of cycles. Both a 385 MHz clock and a 15 MHz custom DSP application using the on-chip clocking strategy are described. The prototypes have been fabricated in a 0.8 /spl mu/m standard CMOS process. The major advantages with this clocking method are robustness, small size, low-power consumption, and that it can operate at a very low supply voltage.

custom integrated circuits conference | 1994

FPGA implementation of FIR filters using pipelined bit-serial canonical signed digit multipliers

Shousheng He; Mats Torkelson

A pipelinable bit-serial multiplier using Canonic Signed Digit, or CSD code to represent constant coefficients is introduced. A bit-serial module for a(x/spl plusmn/y)z/sup -1/ type computation is further developed. Optimization over discrete power-of-two coefficient space has been retargeted on this type of multiplier to generate minimized no-zero bit coefficients. This also make it possible to confine the latency to be equivalent to the data wordlength without causing a large delay in partial product sum propagation. A single chip FPGA implementation of a full 16-bit 31-tap Hilbert transformer is used as an example to demonstrate the application of the multiplier module with the special consideration of FPGA architectures. It is shown that FPGA architecture is an ideal vehicle for thus optimized bit-serial processing.<<ETX>>

international symposium on circuits and systems | 2000

A digitally controlled low-power clock multiplier for globally asynchronous locally synchronous designs

Thomas Olsson; Peter Nilsson; T. Meincke; A. Hemam; Mats Torkelson

Partitioning large high-speed globally synchronous ASICs into locally clocked blocks reduces clock skew problems and if handled correctly it also reduces the power consumption. However, to achieve these positive effects, the blocks need on-chip clock generators having properties such as small area and low power consumption. Therefore, a low power, high frequency, small area digitally controlled on-chip clock generator is designed and fabricated using a 0.35 /spl mu/m process. The clock generator delivers up to 1.15 GHz at 3.3 V supply voltage. At 1 V supply voltage, it delivers up to 92 MHz while consuming 0.16 mW.

IEEE Journal of Solid-state Circuits | 2000

A low logic depth complex multiplier using distributed arithmetic

Anders Berkeman; Viktor Öwall; Mats Torkelson

A combinatorial complex multiplier has been designed for use in a pipelined fast Fourier transform processor. The performance in terms of throughput of the processor is limited by the multiplication. Therefore, the multiplier is optimized to make the input-to-output delay as short as possible. A new architecture based on distributed arithmetic, Wallace-trees, and carry-lookahead adders has been developed. The multiplier has been fabricated using standard cells in a 0.5-/spl mu/m process and verified for functionality, speed, and power consumption. Running at 40 MHz, a multiplier with input wordlengths of 16+16 times 10+10 bits consumes 54% less power compared to an distributed arithmetic array multiplier fabricated under equal conditions.

IEEE Journal of Solid-state Circuits | 1997

A custom digital intermediate frequency filter for the American mobile telephone system

Peter Nilsson; Mats Torkelson

A digital filter for intermediate frequency filtering in mobile communication systems is presented. The purpose of the work is to show an alternative to the analog filters which are used in most of todays heterodyne receivers. Bit-serial arithmetic is applied on a twelfth-order wave digital lattice filter algorithm. The paper also shows a method for retiming such algorithms. The power consumption in two fabricated prototypes is compared. By customizing the library cells, the power consumption has been reduced significantly. In the low power prototype, the power dissipation is 8 mW using 3 V supply voltage. The prototype is a 10 MIPS design fabricated in a 0.8-/spl mu/m standard two-metal-layer CMOS process.

international symposium on parallel architectures algorithms and networks | 1994

A systolic array implementation of common factor algorithm to compute DFT

Shousheng He; Mats Torkelson

An extension to the common factor algorithm, CFA, to compute discrete Fourier transform, DFT, under the condition that the site of the transform is N=M/sup 2/, shows that the input and output data array of the transform may have identical index mapping. A simple planar 2-dimensional systolic array implementation of CFA algorithm is presented. The systolic array consists of N homogeneous processing element, PE. A DFT of size N=M/sup 2/ can be computed in 2M+1 steps of pipelined operations, achieving the area-time complexity AT/sup 2/=O(N/sup 2/log/sup 3/N). Asymptotically sub-optimal and without the necessity of complicated index mapping and data shuffling, the proposed approach is compared favorably with other existing approaches in realistic VLSI implementation. This architecture has also very good expansibility that a 2/sup t/N-size DFT transform can be computed on 2/sup t/ nearest-neighbor connected N-size array with reloaded twiddle factors, which makes it more suitable for VLSI implementation of DFT transform in various practical size.<<ETX>>

custom integrated circuits conference | 1996

A complex array multiplier using distributed arithmetic

Shousheng He; Mats Torkelson

The design of an efficient array architecture for the multiplication of complex numbers applying distributed arithmetic is presented. The complex multiplier takes an area just over that of two real multipliers and its speed is almost the same as a single real multiplier. The texture of the design is obtained by an in-depth examination of a real multiplier structure with data in the off-set binary representation. Residue error compensation and the functional requirement of various boundary cells, such as negative weight addition, are discussed in detail. VHDL module with generic parameters has been written and successfully simulated, which enable the complex multiplier module to be included in large designs with required word-lengths for both operands. A test chip has been implemented with a standard library in 0.8 /spl mu/m CMOS process and fabricated.

european solid state circuits conference | 1989

A basic CAD-tool for module generation

Lars Brange; Mats Torkelson

A basic CAD-tool using a simple grid-like floorplan for placement of cells of varying sizes is presented. It is focused on the generation of physical and behavioural descriptions of data paths but is also suitable for random logic, data path controllers containing PLAs, registers, counters etc. and for analog designs containing OPs, capacitors etc. Two different types of floorplans can be chosen; one resembling the standard cell design approach and one resembling the bit-slice approach. By not assembling the cells with abutment, existing general purpose cell libraries can be used. An easy modifiable interpreting lisp-like language is used as input. Three design examples are discussed, one data path for a decimation filter, one data path for a Viterbi receiver and one multiplier.

Explore More