Mohammed A. S. Khalid
University of Windsor
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohammed A. S. Khalid.
international conference on microelectronics | 2006
Jason G. Tong; Ian D. L. Anderson; Mohammed A. S. Khalid
A soft-core processor is a hardware description language (HDL) model of a specific processor (CPU) that can be customized for a given application and synthesized for an ASIC or FPGA target. In many applications, soft-core processors provide several advantages over custom designed processors such as reduced cost, flexibility, platform independence and greater immunity to obsolescence. Embedded systems are hardware and software components working together to perform a specific function. Usually they contain embedded processors that are often in the form of soft-core processors that execute software code. This paper presents a survey of soft-core processors that are used in embedded systems. Several soft-core processors available from commercial vendors and open-source communities are reviewed and compared based on major architectural features. In addition, several real world examples of embedded systems that employ soft-core processors are summarized. As the complexity of embedded systems continues to increase, it is expected that the usage of customizable soft-core processors will become more widespread.
field-programmable logic and applications | 2005
Yonghong Xu; Mohammed A. S. Khalid
In this paper we present QPF, a quadratic placement tool for FPGAs. Quadratic placement algorithms try to minimize total squared wire length by solving linear equations. The resulting placement tends to locate all cells near the center of the chip with a large amount of overlap. Also, since squared wire length is only an indirect measure of linear wire length, the resulting total wire length may not be minimized. We propose methods to alleviate the above two problems that give high quality results while minimizing the total run time. We incorporate multiple iterations of equation solving process together with a technique for pulling nodes out of the dense area while minimizing linear wire length. Experimental results using twenty MCNC benchmark circuits show that, on average, QPF is 5.8 times faster compared to a well known FPGA placement tool VPR, while providing almost comparable estimated total wire length.
Journal of Computers | 2008
Jason G. Tong; Mohammed A. S. Khalid
Profiling tools are computer-aided design (CAD) tools that help in determining the computationally intensive portions in software. Embedded systems consist of hardware and software components that execute concurrently and efficiently to execute a specific task or application. Profiling tools are used by embedded system designers to choose computationally intensive functions for hardware implementation and acceleration. In this paper we review and compare various existing profiling tools for FPGA-based embedded systems. We then describe Airwolf, an FPGAbased profiling tool. We present a quantitative comparison of Airwolf and a well known software-based profiling tool, GNU gprof. Four software benchmarks were used to obtain profiling results using Airwolf and gprof. We show that Airwolf provides up to 66.2% improvement in accuracy of profiled results and reduces the run time performance overhead, caused by software-based profiling tools, by up to 41.3%. The results show that Airwolf provides accurate profiling results with minimal overhead and it can help the designers of FPGA-based embedded systems in identifying the computationally intensive portions of software code for hardware implementation and acceleration.
IEEE Signal Processing Letters | 2006
Kevin Banovic; Esam Abdel-Raheem; Mohammed A. S. Khalid
A new radius-adjusted approach for blind adaptive equalization for quadrature amplitude modulation (QAM) signals is introduced. Static circular contours are defined around an estimated symbol point in a QAM signal constellation, which creates regions that can be mapped to adaptation phases. The equalizer tap update consists of a linearly weighted sum of adaptation criteria that is scaled by a variable step size. Each region corresponds to a fixed step size and weighting factor, which creates a time-varying tap update based on the equalizer output radius. Two new algorithms are proposed based on this new approach and the multimodulus algorithm (MMA). The first algorithm trades off MMA and constellation-matched errors to reduce the time-to-convergence and mean-squared error (MSE), while the second trades off MMA and decision-directed errors to achieve reliable transfer between error modes and to obtain low MSE. A method to tune the proposed algorithms is developed based on statistics of the radius. The proposed algorithms are compared with related blind algorithms, and simulation results confirm that the proposed algorithms lead to enhanced performance.
Eurasip Journal on Embedded Systems | 2013
Junsong Liao; Brajendra Kumar Singh; Mohammed A. S. Khalid; Kemal E. Tepe
AbstractThis article presents the design and implementation of modular customizable event-driven architecture with parallel execution capability for the first time with wireless sensor nodes using stand alone FPGA. This customizable event-driven architecture is based on modular generic event dispatchers and autonomous event handlers, which will help WSN application developers to quickly develop their applications by adding the required number of event dispatchers and event handlers as per the need of a WSN application. This architecture can handle multiple events in parallel, including high priority ones. Additionally, it provides non-preemptive operation which removes the timing uncertainty and overhead involved with interrupt-driven processor-based sensor node implementation, which is required in real-time wireless sensor networks (WSNs). Thus, higher computation power of FPGAs combined with the non-preemptive modular event-driven architecture with parallel execution capability enables a variety of new WSN applications and facilitates rapid prototyping of WSN applications. In this article, the performance of FPGA-based sensor device is compared with general purpose processor-based implementations of sensor devices. Results show that our FPGA-based implementation provides significant improvement in system efficiency measured in terms of clock cycle counts required for typical sensor network tasks such as packet transmission, relay and reception.
electro information technology | 2009
Thuan Le; Mohammed A. S. Khalid
Network-on-Chip (NoC) approach is emerging as an effective paradigm which addresses the shortcomings of traditional bus-based systems relating to scalability and efficiency for large System-on-Chip (SoC) designs. A significant amount of theoretical work has been done exploring various NoC architectures. But only a handful of studies have demonstrated actual implementation of NoC-based systems for real world applications. These studies provide greater practical insight compared to theoretical studies that rely solely on simulations from traffic generators. Prototyping NoC-based systems for real world applications enables more detailed performance evaluation based on metrics such as area and speed. In this paper, we present a NoC-based Field-Programmable System-on-chip (FPSoC) that is used to implement an image processing benchmark as a real world application. We discuss the challenges of developing an NoC-based system for FPGA implementation and assess the NoCs potential for future development.
canadian conference on electrical and computer engineering | 2007
Jason G. Tong; Mohammed A. S. Khalid
This paper presents an analysis and comparison of the profiled results using software-based profilers (SBP) and FPGA-based profilers (FPGA-BP) for a Nios II Processor system. SBP tools are commonly used to detect performance bottlenecks of a program by applying instrumentation code at the binary level and using sampling methods for performance data gathering. This can cause the reported profiled results to be inaccurate which can mislead the embedded designer to implement the improper software function in the hardware domain. FPGA-BP tools are profilers that contain dedicated hardware that can accurately measure the performance of the software system running on a soft-core processor. They require minimal code modification and do not use any sampling techniques to collect performance data. This can provide accurate results that embedded designers can use to create an efficient and effective hardware-software partition of an embedded system.
Digital Signal Processing | 2007
Kevin Banovic; Mohammed A. S. Khalid; Esam Abdel-Raheem
This paper discusses the design and field programmable gate array (FPGA) implementation of a configurable 18-tap fractionally-spaced blind adaptive equalizer intellectual property (IP) core for quadrature amplitude modulation (QAM) signals. The design can be configured to implement the constant modulus algorithm (CMA), multimodulus algorithm (MMA), radius-adjusted modified-multimodulus algorithm (RMMA), and radius-adjusted multimodulus decision-directed algorithm (RMDA), while it can achieve channel equalization for square QAM signals up to 256-QAM. The input samples to the equalizer tapped delay line are sampled at twice the symbol rate, while the equalizer output and tap coefficients are updated at the symbol rate. This is exploited by the equalizer tap and update modules of the design, which utilize the same hardware to implement two consecutive equalizer taps per module. The IP core is implemented for the Altera Stratix II EP2S130F780C4 FPGA and targets cable demodulators. The implementation operates at a maximum symbol frequency of 8.055 MBaud, which is comparable to recent QAM equalizer designs for cable modems.
midwest symposium on circuits and systems | 2005
Kevin Banovic; Mohammed A. S. Khalid; Esam Abdel-Raheem
This paper discusses the application of field programmable gate arrays (FPGAs) for digital signal processing (DSP). A survey of DSP design methodologies and computer-aided design (CAD) tools for FPGAs is presented, which includes methodologies for standard register-transfer-level (RTL) design, system-level design, and hardware/software (HW/SW) co-design. The application of FPGA emulation systems as a platform for rapid prototyping is addressed and future trends of FPGA-based DSP systems are suggested
canadian conference on electrical and computer engineering | 2015
Ian Janik; Qing Y. Tang; Mohammed A. S. Khalid
In recent years there has been a great interest in High Level Synthesis (HLS) CAD tools to raise the level of design abstraction, reduce design time, rapidly explore the design space and fully exploit the multi-million gate heterogeneous hardware platforms provided by dramatic improvements in integrated circuits. Open Computing Language (OpenCL) is a well-known standard for heterogeneous computing. The Altera SDK for OpenCL is used to convert OpenCL code to kernels that can be run on an FPGA accelerator card. It is a recently introduced HLS CAD tool that allows for the potential to convert existing, or create new C/C++ programs that utilize dedicated hardware to execute specific applications much faster and more efficient than current computer systems, whether single core or multi-core. This can all be done without the knowledge of FPGAs, VHDL, or Verilog as the SDK converts the OpenCL files into Verilog models that are then compiled into FPGA hardware. This paper presents a user-centric overview of Altera SDK for OpenCL. As a first step to achieve the best speedup, the candidate algorithm for acceleration must be analyzed to check if it is inherently parallelizable. The key features such as designing appropriate OpenCL kernels and host program, their compilation, execution and testing are summarized. A working example for accelerating a simple matrix multiplication algorithm is described. Our motivation is to provide the novice users with a useful tutorial that will enable them to quickly become proficient in using this important HLS CAD tool. To our knowledge, such a user-centric tutorial has not been presented so far in the literature.