Murtaza Ali | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Murtaza Ali is active.

Explore More

Publication

Featured researches published by Murtaza Ali.

international conference on acoustics speech and signal processing | 1998

Stereophonic acoustic echo cancellation system using time-varying all-pass filtering for signal decorrelation

Murtaza Ali

This paper describes a novel technique for decorrelating the stereo signals in stereophonic acoustic echo cancellation (AEC) systems. At present, most teleconferencing systems use a single full-duplex audio channel for voice communications. However, in order to introduce spatial realism, future teleconferencing systems are expected to have more than one channel (at least stereo with two channels). However, in stereophonic AEC systems, the correlation between the stereo signals does not allow correct identification of the echo path responses. We develop a signal decorrelation technique based on time-varying all-pass filtering of the individual stereo signals. Experiments show that this technique does not effect the perception of the stereo signals, but identifies the echo path responses correctly.

international workshop on openmp | 2013

OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip

Eric J. Stotzer; Ajay Jayaraj; Murtaza Ali; Arnon Friedmann; Gaurav Mitra; Alistair P. Rendell; Ian Lintault

The Texas Instrument (TI) Keystone II architecture integrates an octa-core C66X DSP with a quad-core ARM Cortex A15 MPCore processor in a non-cache coherent shared memory environment. This System-on-a-Chip (SoC) offers very high Floating Point Operations per second (FLOPS) per Watt, if used efficiently. This paper reports an initial attempt at developing a bare-metal OpenMP runtime for the C66X multi-core DSP using the Open Event Machine RTOS. It also outlines an extension to OpenMP that allows code to run across both the ARM and the DSP cores simultaneously. Preliminary performance data for OpenMP constructs running on the ARM and DSP parts of the SoC are given and compared with other current processors.

symposium on computer architecture and high performance computing | 2012

Level-3 BLAS on the TI C6678 Multi-core DSP

Murtaza Ali; Eric J. Stotzer; Francisco D. Igual; Robert A. van de Geijn

Digital Signal Processors (DSP) are commonly employed in embedded systems. The increase of processing needs in cellular base-stations, radio controllers and industrial/medical imaging systems, has led to the development of multi-core DSPs as well as inclusion of floating point operations while maintaining low power dissipation. The eight-core DSP from Texas Instruments, codenamed TMS320C6678, provides a peak performance of 128 GFLOPS (single precision) and an effective 32 GFLOPS(double precision) for only 10 watts. In this paper, we present the first complete implementation and report performance of the Level-3 Basic Linear Algebra Subprograms(BLAS) routines for this DSP. These routines are first optimized for single core and then parallelized over the different cores using OpenMP constructs. The results show that we can achieve about 8 single precision GFLOPS/watt and 2.2double precision GFLOPS/watt for General Matrix-Matrix multiplication (GEMM). The performance of the rest of theLevel-3 BLAS routines is within 90% of the corresponding GEMM routines.

2012 IEEE Conference on High Performance Extreme Computing | 2012

Synthetic Aperture Radar on low power multi-core Digital Signal Processor

Dan Wang; Murtaza Ali

Commercial off-the-self (COTS) components have recently gained popularity in Synthetic Aperture Radar (SAR) applications. The compute capabilities of these devices have advanced to a level where real time processing of complex SAR algorithms have become feasible. In this paper, we focus on a low power multi-core Digital Signal Processor (DSP) from Texas Instruments Inc. and evaluate its capability for SAR signal processing. The specific DSP studied here is an eight-core device, codenamed TMS320C6678, that provides a peak performance of 128 GFLOPS (single precision) for only 10 watts. We describe how the basic SAR operations can be implemented efficiently in such a device. Our results indicate that a baseline SAR range-Doppler algorithm takes around 0.25 second for a 16 M (4K × 4K) image, achieving real-time performance.

ieee international conference on high performance computing data and analytics | 2012

Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC

Francisco D. Igual; Murtaza Ali; Arnon Friedmann; Eric J. Stotzer; Timothy Wentz; Robert A. van de Geijn

Take a multicore Digital Signal Processor (DSP) chip designed for cellular base stations and radio network controllers, add floating-point capabilities to support 4G networks, and out of thin air a HPC engine is born. The potential for HPC is clear: It promises 128 GFLOPS (single precision) for 10 Watts; It is used in millions of network related devices and hence benefits from economies of scale; It should be simpler to program than a GPU. Simply put, it is fast, green, and cheap. But is it easy to use? In this paper, we show how this potential can be applied to general-purpose high performance computing, more specifically to dense matrix computations, without major changes in existing codes and methodologies, and with excellent performance and power consumption numbers.

IEEE Transactions on Signal Processing | 2002

Deterministic and iterative solutions to subset selection problems

Mohammed Nafie; Ahmed H. Tewfik; Murtaza Ali

Signal decompositions with overcomplete dictionaries are not unique. We present two new approaches for identifying the sparsest representation of a given signal in terms of a given overcomplete dictionary. The first approach is an algebraic approach that attempts to solve the problem by generating other vectors that span the space of minimum dimension that includes the signal. Unlike other current techniques, including our proposed iterative technique, this algebraic approach is guaranteed to find the sparsest representation of the signal under certain conditions. For example, we can always find the exact solution if the size of the dictionary is close to the size of the space or when the dictionary can be represented by a Vandermonde matrix. Although our technique can work for high signal-to-noise cases, the exact solution is only guaranteed in noise-free cases. Our second approach is iterative and can be applied in cases where the algebraic approach cannot be used. This technique is guaranteed to achieve at least a local minimum of the error function representing the difference between the signal and its sparse representation.

internaltional ultrasonics symposium | 2011

Software-based ultrasound phase rotation beamforming on multi-core DSP

Jieming Ma; Kerem Karadayi; Murtaza Ali; Yongmin Kim

Phase rotation beamforming (PRBF) is a commonlyused digital receive beamforming technique. However, due to its high computational requirement, it has traditionally been supported by hardwired architectures (e.g., application-specific integrated circuits (ASICs) or more recently field-programmable gate arrays (FPGAs)). In this paper, we investigate the feasibility of supporting software-based PRBF on a multi-core DSP. To alleviate the high computing requirement, the analog front-end (AFE) chips integrating quadrature demodulation in addition to analog-to-digital conversion could be adopted. Under this condition, only delay alignment and phase rotation need to be performed by DSP, substantially reducing the computational load. We implemented the delay alignment and phase rotation modules on a Texas Instruments C6678 DSP with 8 cores. With a sampling rate of 40 MHz and 2:1 decimation, it takes 200 μs to generate one scanline (2048 samples/scanline) on two cores. With 4 cores, it can support beamforming for 64 channels with 10k scanlines/s, e.g., 200 scanlines/frame and at 50 frames/s. The remaining 4 cores can work on back-end processing tasks and applications, e.g., color Doppler or ultrasound elastography. Keywords-ultrasound, beamforming, phase rotation, DSP, programmable, software-based

Ultrasonics | 2014

Ultrasound phase rotation beamforming on multi-core DSP

Jieming Ma; Kerem Karadayi; Murtaza Ali; Yongmin Kim

Phase rotation beamforming (PRBF) is a commonly-used digital receive beamforming technique. However, due to its high computational requirement, it has traditionally been supported by hardwired architectures (e.g., application-specific integrated circuits (ASICs) or more recently field-programmable gate arrays (FPGAs)). In this paper, we investigate the feasibility of supporting software-based PRBF on a multi-core DSP. To alleviate the high computing requirement, the analog front-end (AFE) chips integrating quadrature demodulation in addition to analog-to-digital conversion could be adopted. Under this condition, only delay alignment and phase rotation need to be performed by DSP, substantially reducing the computational load. We implemented the delay alignment and phase rotation modules on a Texas Instruments C6678 DSP with 8 cores. With a sampling rate of 40 MHz and 2:1 decimation, it takes 200 μs to generate one scanline (2048 samples/scanline) on two cores. With 4 cores, it can support beamforming for 64 channels with 10k scanlines/s, e.g., 200 scanlines/frame and at 50 frames/s. The remaining 4 cores can work on back-end processing tasks and applications, e.g., color Doppler or ultrasound elastography.

internaltional ultrasonics symposium | 2010

A lossy compression scheme for pre-beamformer and post-beamformer ultrasound data

Mohamed F. Mansour; Murtaza Ali

We present a framework for lossy compression of ultrasound RF data. The proposed scheme can be deployed either before or after the receiver beamformer. It performs a two-dimensional decorrelation in both the lateral and axial directions. The decorrelation in the lateral direction is done using a Karhun-loeve like transform, e.g., DCT or Hadamard. The decorrelation in the axial direction is done using customized orthogonal wavelet packets that are optimized for a particular ultrasound probe. The effectiveness of the compression algorithm for both pre-beamformer and post-beamformer compression is established by evaluating the performance on a set of simulated ultrasound signals.

international conference on acoustics, speech, and signal processing | 2016

Vibration parameter estimation using FMCW radar

Lei Ding; Murtaza Ali; Sujeet Patole; Anand G. Dabak

Vibration sensing is essential in many applications. Traditional vibration sensors are contact based. With the advance of low-cost and highly integrated CMOS radars, another class of non-contact vibration sensors is emerging. In this paper, we present detailed analysis on obtaining vibration parameters using frequency modulated continuous wave (FMCW) radars. We establish the Cramer Rao lower bounds (CRLB) of the parameter estimation problem and propose an estimation algorithm that achieves the bounds in simulations. These analysis show that vibration sensing using FMCW radars can easily achieve sub-Hertz frequency accuracy and micrometer level amplitude accuracy.

Explore More