Johan Eilert
Linköping University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Johan Eilert.
international symposium on circuits and systems | 2007
Johan Eilert; Di Wu; Dake Liu
Complex matrix inversion is a very computationally demanding operation in advanced multi-antenna wireless communications. Traditionally, systolic array-based QR decomposition (QRD) is used to invert large matrices. However, the matrices involved in MIMO baseband processing in mobile handsets are generally small which means QRD is not necessarily efficient. In this paper, a new method is proposed using programmable hardware units which not only achieves higher performance but also consumes less silicon area. Furthermore, the hardware can be reused for many other operations such as complex matrix multiplication, filtering, correlation and FFT/IFFT.
IEEE Communications Magazine | 2009
Dake Liu; Anders Nilsson; Eric Tell; Di Wu; Johan Eilert
A programmable radio baseband signal processor is one of the essential enablers of software- defined radio. As wireless standards evolve, the processing power needed for baseband processing increases dramatically and the underlying hardware needs to cope with various standards or even simultaneously maintaining several radio links. Meanwhile, the maximum power consumption allowed by mobile terminals is still strictly limited. These challenges require both system and architecture level innovations. This article introduces a design methodology for radio baseband processors discussing the challenges and solutions of radio baseband signal processing. The LeoCore architecture is presented here as an example of a baseband processor design aimed at reducing power and silicon cost while maintaining sufficient flexibility.
international conference on acoustics, speech, and signal processing | 2008
Johan Eilert; Di Wu; Dake Liu
This paper presents a linear minimum mean square error (LMMSE) symbol detector for MIMO-OFDM enabled mobile terminals. The detector is implemented using a programmable baseband processor aimed for software-defined radio (SDR). Owing to the dynamic range supplied by the floating-point SIMD datapath, special algorithms can be adopted to reduce the computational latency of detection. The programmable solution not only supports different transmit/receive antenna configurations, but also allows hardware multiplexing to obtain silicon and power efficiency. Compared to several existing fixed-functional solutions, the one proposed in this paper is smaller, more flexible and faster.
ieee computer society annual symposium on vlsi | 2007
Di Wu; Johan Eilert; Dake Liu; Dandan Wang; Naofal Al-Dhahir; Hlaing Minn
This paper studies the efficient complex matrix inversion for multi-user STBC-MIMO decoding. A novel method called Alamouti blockwise analytical matrix inversion (ABAMI) and its programmable VLSI implementation are proposed for the inversion of (in this context) large complex matrices with Alamouti sub-blocks. Our solution significantly reduces the number of operations which makes it more than 4 times faster than several other solutions in the literature. Furthermore, compared to these fixed function VLSI implementations, our solution is more flexible and consumes less silicon area because the hardware can be reused for many other operations. In addition to the routine analysis of the general computational complexity based on the number of basic operations, the computational latency is also measured in clock cycles based on the conceptual hardware for real-time matrix inversion
multimedia signal processing | 2004
Johan Eilert; Andreas Ehliar; Dake Liu
The purpose of our work has been to evaluate the practicality of using a 16-bit floating point representation to store the intermediate sample values and other data in memory during the decoding of MP3 bit streams. A floating point number representation offers a better trade-off between dynamic range and precision than a fixed point representation for a given word length. Using a floating point representation means that smaller memories can be used which leads to smaller chip area and lower power consumption without reducing sound quality. We have designed and implemented a DSP processor based on 16-bit floating point intermediate storage. The DSP processor is capable of decoding all MP3 bit streams at 20 MHz and this has been demonstrated on an FPGA prototype.
ieee international conference on circuits and systems for communications | 2008
Di Wu; Johan Eilert; Dake Liu
This paper presents the first programmable Lattice- Reduction Aided (LRA) symbol detector for Multiple-Input Multiple-Output (MIMO) and Orthogonal Frequency Division Multiple Access (OFDMA). The detector proposed is implemented using 65 nm ASIC technologies. Owing to the programmability, the detector can be dynamically switched between linear (e.g. MMSE) and lattice-reduction aided (e.g. LRA-MMSE) detectors by simply running another software subroutine. Therefore, it allows a good trade-off between performance and computational latency to be achieved under various scenarios. Along with the hardware, two algorithm simplifications (SCNT-LR and SOT-LR) are proposed for finding subcarriers with ill- conditioned channel matrices. And in the end, interpolated LR (I- LR) is proposed to further reduce the computational complexity for real-time implementations.
international conference on signal processing | 2007
Di Wu; Boonshyang Lim; Johan Eilert; Dake Liu
Although single-chip multiprocessor architectures are available nowadays for embedded computing, programming them with efficiency and productivity has become a significant challenge. This paper studies the multi-level parallelization of video encoding algorithms on a state-of-the-art on-chip multiprocessor. The encoding of H.264/AVC video is chosen as the case to be studied because of its performance demanding and branch-rich features. The final benchmarking result proves that the optimized processing flow can achieve more than 100 operations per cycle in performance which allows a single-chip multiprocessor to encode high resolution video (1920 times 1080) in real-time (30 fps).
Eurasip Journal on Wireless Communications and Networking | 2010
Di Wu; Johan Eilert; Rizwan Asghar; Dake Liu
This paper presents a low-complexity MIMO symbol detector with close-Maximum a posteriori performance for the emerging multiantenna enhanced high-speed wireless communications. The VLSI implementation is based on a novel MIMO detection algorithm called Modified Fixed-Complexity Soft-Output (MFCSO) detection, which achieves a good trade-off between performance and implementation cost compared to the referenced prior art. By including a microcode-controlled channel preprocessing unit and a pipelined detection unit, it is flexible enough to cover several different standards and transmission schemes. The flexibility allows adaptive detection to minimize power consumption without degradation in throughput. The VLSI implementation of the detector is presented to show that real-time MIMO symbol detection of 20 MHz bandwidth 3GPP LTE and 10 MHz WiMAX downlink physical channel is achievable at reasonable silicon cost.
international symposium on system-on-chip | 2009
Di Wu; Johan Eilert; Dake Liu; Anders Nilsson; Eric Tell; Erik Alfredsson
3G evolution towards HSPA (High Speed Packet Access) and LTE (Long-Term Evolution) is ongoing which will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing such as time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier demapping. The throughput and latency requirements of a Category 4 User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ. This brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution.
wireless communications and networking conference | 2008
Qi Wang; Di Wu; Johan Eilert; Dake Liu
Channel state information (CSI) is critical for the overall performance of wireless systems. Meanwhile, the estimation of CSI forms one of the most intensive tasks in radio baseband signal processing. This paper investigates the real-time implementation of channel estimation for MIMO-OFDM systems using programmable hardware aimed for software defined radio. Based on the programmable hardware architecture proposed by us, several prevalent channel estimation methods such as Least Square (LS), Minimum Mean Square Error (MMSE) and Pilot-Symbol-Aided (PSA) are evaluated from both the performance and computational latency perspectives. By utilizing the symmetric feature of the covariance matrix, a simplified two-sided Jacobi rotation method is adopted to speed up the complex-valued singular value decomposition involved in the MMSE channel estimation.