Amanullah Ghazi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amanullah Ghazi is active.

Explore More

Publication

Featured researches published by Amanullah Ghazi.

asilomar conference on signals, systems and computers | 2013

Mobile transmitter digital predistortion: Feasibility analysis, algorithms and design exploration

Mahmoud Abdelaziz; Amanullah Ghazi; Lauri Anttila; Jani Boutellier; Toni Lahteensuo; Xiaojia Lu; Joseph R. Cavallaro; Shuvra S. Bhattacharyya; Markku J. Juntti; Mikko Valkama

This article addresses intermodulation challenges in carrier aggregation (CA) and multicluster type transmission scenarios in mobile transmitters. In such transmission schemes, emerging in 3GPP LTE-Advanced mobile cellular radio evolution, the spectrum of the signal entering the transmit power amplifier (PA) is of non-contiguous nature and thus severe intermodulation is created which may violate the spurious emission mask. To satisfy the stringent emission requirements and limits, devices may need to considerably back off their transmit power, compared to nominal value of ;23dBm, but this will reduce the uplink coverage. As an alternative, feasibility of digital predistortion (DPD) is explored in this article. A DPD solution is developed to control the most critical intermodulation components from terminal emission mask perspective with reduced complexity compared to conventional DPD solutions. The DPD is based on the Augmented Parallel Hammerstein (APH) architecture which can handle IQ imbalance and local oscillator leakage in addition to the PA nonlinearity while using simple digital linear estimation techniques. Furthermore, digital design exploration is carried out for the predistortion algorithm, implying that the needed computational resources are close to what is already available in most advanced mobile platforms and chipsets in the market.

international conference on acoustics, speech, and signal processing | 2014

Low power implementation of digital predistortion filter on a heterogeneous application specific multiprocessor

Amanullah Ghazi; Jani Boutellier; Mahmoud Abdelaziz; Xiaojia Lu; Lauri Anttila; Joseph R. Cavallaro; Shuvra S. Bhattacharyya; Mikko Valkama; Markku J. Juntti

Power-constrained mobile radio communication transmitters drive their transmit power amplifiers close to their saturation regions, which results in nonlinear intermodulation distortion that is especially harmful in multi-cluster and carrier aggregation transmission scenarios. Digital predistortion is a method for linearizing the transmitter and suppressing the most harmful spurious emissions at the transmitter power amplifier output. This paper describes a programmable implementation of a digital predistortion filter on a heterogeneous Transport Trigger Architecture (TTA) multiprocessor. The predistortion algorithm is based on a parallel Hammerstein polynomial model and the experimental results show that the proposed programmable architecture is capable of linearizing a 20 MHz LTE carrier in realtime with a power consumption that is suitable for mobile devices.

international symposium on circuits and systems | 2015

A customized lattice reduction multiprocessor for MIMO detection

Shahriar Shahabuddin; Janne Janhunen; Zaheer Khan; Markku J. Juntti; Amanullah Ghazi

Lattice reduction (LR) is a preprocessing technique for multiple-input multiple-output (MIMO) symbol detection to achieve better bit error-rate (BER) performance. In this paper, we propose a customized homogeneous multiprocessor for LR. Each individual core is based on transport triggered architecture (TTA). We propose a few modifications of the popular LR algorithm, Lenstra-Lenstra-Lovász (LLL) for high throughput. High level programming is used to implement the control path of the TTA cores and several special function units are designed to accelerate the program. The multiprocessor takes 187 cycles to reduce a single matrix for LR. The architecture is synthesized on 90 nm technology and takes 405 kgates at 210 MHz.

ieee global conference on signal and information processing | 2015

Multicore execution of dynamic dataflow programs on the distributed application layer

Jani Boutellier; Amanullah Ghazi

Dataflow programming has received increasing attention in the age of multicore computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by the ISO, the RVC-CAL dataflow language provides a solid basis for the development of tools, design methodologies and design flows. This paper proposes a novel design flow for mapping RVC-CAL dataflow programs to highly parallel execution platforms. Through the proposed design flow the programmer can describe an application in the RVC-CAL language and map it to multi- and many-core platforms for efficient execution. The functionality and efficiency of the proposed approach is demonstrated by a parallel implementation of a video processing application and a run-time reconfigurable filter for telecommunications. Experiments are performed on a multicore platform with up to 16 cores, and the results show that for high-performance applications the proposed design flow provides up to 4x higher throughput than the state-of-the-art approach in multicore execution of RVC-CAL programs.

ieee global conference on signal and information processing | 2015

Mobile GPU accelerated digital predistortion on a software-defined mobile transmitter

Kaipeng Li; Amanullah Ghazi; Jani Boutellier; Mahmoud Abdelaziz; Lauri Anttila; Markku J. Juntti; Mikko Valkama; Joseph R. Cavallaro

We present the design exploration and the performance evaluation of a mobile transmitter digital predistortion (DPD) module on a mobile GPU. Digital predistortion is a widely used technique for suppressing the spurious spectrum emission caused by the imperfection of power amplifier and radio frequency (RF) circuits in a real wireless transmitter. Considering the parallel architecture, numerous computing cores and programmability of GPU, in this work, a DPD design based on augmented parallel Hammerstein structure is implemented on a mobile GPU integrated in an Nvidia Jetson TK1 mobile development board, targeting at a mobile transmitter. The algorithm level and data level parallelism are carefully explored for efficient mapping of the DPD algorithm and full utilization of the mobile GPU resources. We analyze the throughput and timing performance of our implementation and verify the functionality of DPD experimentally on a novel software-defined mobile terminal. The results show that our proposed mobile GPU driven digital predistortion design not only achieves real-time high performance, but also offers programmability and reconfigurability for design upgrading and extension.

signal processing systems | 2017

Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters

Kaipeng Li; Amanullah Ghazi; Chance Tarver; Jani Boutellier; Mahmoud Abdelaziz; Lauri Anttila; Markku J. Juntti; Mikko Valkama; Joseph R. Cavallaro

Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.

asilomar conference on signals, systems and computers | 2015

Data-parallel implementation of reconfigurable digital predistortion on a mobile GPU

Amanullah Ghazi; Jani Boutellier; Lauri Anttila; Markku J. Juntti; Mikko Valkama

3GPP LTE-A offers new technologies such as non-contiguous carrier allocation for improving radio spectrum utilization. However, implementation of these technologies is challenging because of intermodulation distortion caused by non- linearity of components. Digital Predistortion (DPD) offers a way for compensating for these nonlinearities by modifying the digital baseband signal. As most consumer-oriented mobile devices are equipped with powerful Graphics Processing Units (GPUs), it has become possible to implement DPD functionality to such devices with no additional hardware cost. In this paper, we propose data- parallel, reconfigurable predistortion and measure its performance on mobile GPUs: Qualcomm Adreno 330 and ARM Mali T628.

signal processing systems | 2013

High-performance programs by source-level merging of RVC-CAL dataflow actors

Jani Boutellier; Amanullah Ghazi; Olli Silvén; Johan Ersfolk

RVC-CAL is a dataflow language that has acquired an ecosystem of sophisticated design tools. Previous works have shown that RVC-CAL-based applications can automatically be deployed to multiprocessor platforms, as well as hardware descriptions with high efficiency. However, as RVC-CAL is a concurrent language, code generation for a single processor core requires careful application analysis and scheduling. Although much work has been done in this area, to this date no publication has reported that programs generated from RVC-CAL could rival handwritten programs on single-core processors. This paper proposes performance optimization of RVC-CAL applications by actor merging at source code level. The proposed methodology is demonstrated with an IEEE 802.15.4 (ZigBee) transmitter case study. The transmitter baseband software, previously written in C, is rewritten in RVC-CAL and optimized with the proposed methodology. Experiments show that on a VLIW-flavored processor the RVC-CAL-based program achieves the performance of manually written software.

international conference on embedded computer systems architectures modeling and simulation | 2013

Design of a unified transport triggered processor for LDPC/turbo decoder

Shahriar Shahabuddin; Janne Janhunen; Muhammet Fatih Bayramoglu; Markku J. Juntti; Amanullah Ghazi; Olli Silvén

This paper summarizes the design of a programmable processor with transport triggered architecture (TTA) for decoding LDPC and turbo codes. The processor architecture is designed in such a manner that it can be programmed for LDPC or turbo decoding for the purpose of internetworking and roaming between different networks. The standard trellis based maximum a posteriori (MAP) algorithm is used for turbo decoding. Unlike most other implementations, a supercode based sum-product algorithm is used for the check node message computation for LDPC decoding. This approach ensures the highest hardware utilization of the processor architecture for the two different algorithms. Up to our knowledge, this is the first attempt to design a TTA processor for the LDPC decoder. The processor is programmed with a high level language to meet the time-to-market requirement. The optimization techniques and the usage of the function units for both algorithms are explained in detail. The processor achieves 22.64 Mbps throughput for turbo decoding with a single iteration and 10.12 Mbps throughput for LDPC decoding with five iterations for a clock frequency of 200 MHz.

signal processing systems | 2013

Programmable implementation of zero-crossing demodulator on an application specific processor

Amanullah Ghazi; Jani Boutellier; Jari Hannuksela; Shahriar Shahabuddin; Olli Silvén

The zero-intermediate frequency zero-crossing demodulator (ZIFZCD) is extensively used for demodulating continuous phase frequency shift keying (CPFSK) signals in low power and low cost devices. ZIFZCD has previously been implemented as hardwired circuits. Many variations have been suggested to the ZIFZCD algorithm for different modulation methods and channel conditions. To support all these variants, a programmable processor based implementation of the ZIFZCD is needed. This paper describes a programmable software implementation of ZIFZCD on an application specific processor (ASP). The ASP is based on transport triggered architecture (TTA) and provides an ideal low power platform for ZIFZCD implementation due to its simplicity. The designed processor operates at a maximum clock frequency of 250 MHz and has gate count of 134 kGE for a 32-bit TTA processor and 76 kGE for a 16-bit processor. The demodulator has been developed as a part of an open source radio implementation for wireless sensor nodes.

Explore More