Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Abu Asaduzzaman is active.

Publication


Featured researches published by Abu Asaduzzaman.


international symposium on multimedia | 2004

Cache optimization for mobile devices running multimedia applications

Abu Asaduzzaman; Imad Mahgoub; Praveen Sanigepalli; Hari Kalva; Ravi Shankar; Borko Furht

The popularity of mobile/wireless embedded systems running multimedia applications is growing. MPEG4 is an important and demanding multimedia application. With improved CPU, memory subsystem deficiency is the major barrier to improving the system performance. Studies show that there is sufficient reuse of values for caching to significantly reduce the raw required memory bandwidth for video data. Decoding MPEG4 video data in software generates many times more cache-memory traffic than required. Proper understanding of the decoding algorithm and the composition of its data set is obvious to improve the performance of such a system. The focus of this paper is to enhance MPEG4 decoding performance through cache optimization of a mobile device. The architecture we simulate includes a digital signal processor (DSP) to run the decoding algorithm and a two-level cache system. Level-1 cache is split into data (D1) and instruction (I1) caches and level-2 (CL2) is a unified cache. We use Cachegrind and VisualSim simulation tools to optimize cache size, line size, associativity, and levels of caches for a wireless device decoding MPEG4 video.


Journal of Systems Architecture | 2010

Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

Abu Asaduzzaman; Fadi N. Sibai; Manira Rani

To confer the robustness and high quality of service, modern computing architectures running real-time applications should provide high system performance and high timing predictability. Cache memory is used to improve performance by bridging the speed gap between the main memory and CPU. However, the cache introduces timing unpredictability creating serious challenges for real-time applications. Herein, we introduce a miss table (MT) based cache locking scheme at level-2 (L2) cache to further improve the timing predictability and system performance/power ratio. The MT holds information of block addresses related to the application being processed which cause most cache misses if not locked. Information in MT is used for efficient selection of the blocks to be locked and victim blocks to be replaced. This MT based approach improves timing predictability by locking important blocks with the highest number of misses inside the cache for the entire execution time. In addition, this technique decreases the average delay per task and total power consumption by reducing cache misses and avoiding unnecessary data transfers. This MT based solution is effective for both uniprocessors and multicores. We evaluate the proposed MT-based cache locking scheme by simulating an 8-core processor with 2 levels of caches using MPEG4 decoding, H.264/AVC decoding, FFT, and MI workloads. Experimental results show that in addition to improving the predictability, a reduction of 21% in mean delay per task and a reduction of 18% in total power consumption are achieved for MPEG4 (and H.264/AVC) by using MT and locking 25% of the L2. The MT results in about 5% delay and power reductions on these video applications, possibly more on applications with worse cache behavior. For the FFT and MI (and other) applications whose code fits inside the level-1 instruction (I1) cache, the mean delay per task increases only by 3% and total power consumption increases by 2% due to the addition of the MT.


international conference on innovations in information technology | 2007

Evaluation of I-Cache Locking Technique for Real-Time Embedded Systems

Abu Asaduzzaman; Niranjan Limbachiya; Imad Mahgoub; Fadi N. Sibai

Cache memory improves performance by reducing the speed gap between the CPU and the main memory. However, the execution time becomes unpredictable due to the caches adaptive and dynamic behavior. Real-time applications are subject to operational deadlines and predictability is considered necessary to support them. Studies show that for embedded systems, cache locking helps determine the worst case execution time (WCET) and cache-related preemption delay. In this work, we evaluate predictability of an embedded system running real-time applications by instruction cache (I-Cache) locking. We implement an algorithm that locks the blocks that may cause more cache misses, using the Heptane simulation tool. We obtain CPU utilization measures for both cache analysis (no cache locking) and I-cache locking. Experimental results show that our proposed cache locking algorithm improves predictability and performance up to 15% locking, after that, predictability may be further enhanced by sacrificing performance.


Multimedia Tools and Applications | 2006

Cache modeling and optimization for portable devices running MPEG-4 video decoder

Abu Asaduzzaman; Imad Mahgoub

There are increasing demands on portable communication devices to run multimedia applications. ISO (an International Organization for Standardization) standard MPEG-4 is an important and demanding multimedia application. To satisfy the growing consumer demands, more functions are added to support MPEG-4 video applications. With improved CPU speed, memory sub-system deficiency is the major barrier to improving the system performance. Studies show that there is sufficient reuse of values for caching that significantly reduce the memory bandwidth requirement for video data. Software decoding of MPEG-4 video data generates much more cache-memory traffic than required. Proper understanding of the decoding algorithm and the composition of its data set is obvious to improve the performance of such a system. The focus of this paper is cache modeling and optimization for portable communication devices running MPEG-4 video decoding algorithm. The architecture we simulate includes a digital signal processor (DSP) for running the MPEG-4 decoding algorithm and a memory system with two levels of caches. We use VisualSim and Cachegrind simulation tools to optimize cache sizes, levels of associativity, and cache levels for a portable device decoding MPEG-4 video.


Membranes | 2013

Study of Hydrophilic Electrospun Nanofiber Membranes for Filtration of Micro and Nanosize Suspended Particles

Ramazan Asmatulu; Harish Muppalla; Zeinab Veisi; Waseem Sabir Khan; Abu Asaduzzaman; Nurxat Nuraje

Polymeric nanofiber membranes of polyvinyl chloride (PVC) blended with polyvinylpyrrolidone (PVP) were fabricated using an electrospinning process at different conditions and used for the filtration of three different liquid suspensions to determine the efficiency of the filter membranes. The three liquid suspensions included lake water, abrasive particles from a water jet cutter, and suspended magnetite nanoparticles. The major goal of this research work was to create highly hydrophilic nanofiber membranes and utilize them to filter the suspended liquids at an optimal level of purification (i.e., drinkable level). In order to overcome the fouling/biofouling/blocking problems of the membrane, a coagulation process, which enhances the membrane’s efficiency for removing colloidal particles, was used as a pre-treatment process. Two chemical agents, Tanfloc (organic) and Alum (inorganic), were chosen for the flocculation/coagulation process. The removal efficiency of the suspended particles in the liquids was measured in terms of turbidity, pH, and total dissolved solids (TDS). It was observed that the coagulation/filtration experiments were more efficient at removing turbidity, compared to the direct filtration process performed without any coagulation and filter media.


acs/ieee international conference on computer systems and applications | 2009

Impact of L1 entire locking and L2 way locking on the performance, power consumption, and predictability of multicore real-time systems

Abu Asaduzzaman; Imad Mahgoub; Fadi N. Sibai

Based on the recent design trend from giant chip-vendors, multicore systems are being deployed with multilevel caches to achieve higher levels of performance. Supporting real-time applications on multicore systems becomes a great challenge as caches are power hungry and caches make the execution time predictability worse. Studies show that timing predictability can be improved using cache locking techniques. However, level-1 (L1) entire locking may not be efficient if smaller amount of instructions/data compared to the cache size is locked. An alternative choice may be way locking. For some processors, way locking is possible at level-2 (L2) cache (not permitted at L1). Even though both L1 entire locking and L2 way locking improve predictability, it is difficult to justify the performance and power trade-off between these two locking mechanisms. In this work, we simulate a multicore system with two levels of caches to explore the impact of L1 entire locking and L2 way locking on the performance, power consumption, and predictability. Simulation results using FFT, DFT, and MPEG4 algorithms show that both performance and predictability can be increased and power consumption can be decreased by using a cache locking mechanism added to a cache memory hierarchy. Results also show that for FFT and DFT, L2 way locking outperforms L1 entire locking; but for MPEG4, L1 entire locking performs better than L2 way locking.


southeastcon | 2015

A time and energy efficient parking system using Zigbee communication protocol

Abu Asaduzzaman; Kishore K. Chidella; Muhammad F. Mridha

A major problem in large and busy traffic areas is parking vehicles by searching for empty (and available) spaces. In the recent days, some parking lot systems are equipped with sensors and microcontrollers to automatically count the cars parked in the lot. However, such a parking system may not indicate any empty spots. In addition, existing systems are very expensive and suffer due to long processing time and large energy consumption. Recently introduced ZigBee technology is a low-cost and low-power wireless communication protocol targeted towards automation and remote control applications. In this work, we propose a smart parking system for heavy traffic environments using ZigBee wireless transmission module. The proposed system is suitable for multi-floor buildings and able to send a message to vehicles about the status of parking spaces. The parking monitoring system continuously collects the data from parking slot detectors and then it intimates the vehicle section. We simulate the proposed system using ZigBee and two other popular wireless technologies: Bluetooth and Wi-Fi. Experimental results show that ZigBee provides transition time and power advantages over Bluetooth and Wi-Fi.


acs/ieee international conference on computer systems and applications | 2006

Cache Optimization for Embedded Systems Running H.264/AVC Video Decoder

Abu Asaduzzaman; Imad Mahgoub

The computing power of microprocessors has exponentially increased in the past few decades and so is the support to computation intensive multimedia applications. With such improved computing power, memory subsystem deficiency becomes the major barrier to support video messaging and video telephony/conferencing on mobile handsets. Studies show that for multimedia applications there are sufficient reuses of values for caching and there is an opportunity for customizing the cache subsystem for improved performance. In our previous work, we optimized cache to enhance MPEG-4 (Part 2) decoding performance running on a mobile device. H.264/AVC (or MPEG-4 Part 10) outperforms both MPEG-4 (Part 2) and H.263 by providing better video quality at a lower bit-rate and a lower latency. As a result, H.264/AVC becomes the next generation video codec for embedded systems. In this paper, our focus is to enhance decoding performance of H.264/AVC through cache optimization for an embedded and mobile device. The simulated architecture includes a processor to run the decoding algorithm and a twolevel cache system. Level-1 cache is split into Instruction and Data caches and level-2 cache is a unified cache. We use Cachegrind to characterize H.264/AVC decoding workload and VisualSim to model the system-level architecture and run the simulation for H.264/AVC decoding workload. Simulation results show that H.264/AVC decoding performance can be enhanced through cache optimization.


international conference on microelectronics | 2010

On the design of low-power cache memories for homogeneous multi-core processors

Abu Asaduzzaman; Manira Rani; Fadi N. Sibai

We investigate the impact of level-1 cache (CL1) parameters, level-2 cache (CL2) parameters, and cache organizations on the power consumption and performance of multi-core systems. We simulate two 4-core architectures - both with private CL1s, but one with shared CL2 and the other one with private CL2s. Simulation results with MPEG4, H.264, matrix inversion, and DFT workloads show that reductions in total power consumption and mean delay per task of up to 42% and 48%, respectively, are possible with optimized CL1s and CL2s. Total power consumption and the mean delay per task depend significantly on the applications including the code size and locality.


Microprocessors and Microsystems | 2009

Impact of level-2 cache sharing on the performance and power requirements of homogeneous multicore embedded systems

Abu Asaduzzaman; Fadi N. Sibai; Manira Rani

In order to satisfy the needs for increasing computer processing power, there are significant changes in the design process of modern computing systems. Major chip-vendors are deploying multicore or manycore processors to their product lines. Multicore architectures offer a tremendous amount of processing speed. At the same time, they bring challenges for embedded systems which suffer from limited resources. Various cache memory hierarchies have been proposed to satisfy the requirements for different embedded systems. Normally, a level-1 cache (CL1) memory is dedicated to each core. However, the level-2 cache (CL2) can be shared (like Intel Xeon and IBM Cell) or distributed (like AMD Athlon). In this paper, we investigate the impact of the CL2 organization type (shared Vs distributed) on the performance and power consumption of homogeneous multicore embedded systems. We use VisualSim and Heptane tools to model and simulate the target architectures running FFT, MI, and DFT applications. Experimental results show that by replacing a single-core system with an 8-core system, reductions in mean delay per core of 64% for distributed CL2 and 53% for shared CL2 are possible with little additional power (15% for distributed CL2 and 18% for shared CL2) for FFT. Results also reveal that the distributed CL2 hierarchy outperforms the shared CL2 hierarchy for all three applications considered and for other applications with similar code characteristics.

Collaboration


Dive into the Abu Asaduzzaman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Imad Mahgoub

Florida Atlantic University

View shared research outputs
Top Co-Authors

Avatar

Muhammad F. Mridha

University of Asia and the Pacific

View shared research outputs
Top Co-Authors

Avatar

Manira Rani

Florida Atlantic University

View shared research outputs
Top Co-Authors

Avatar

Fadi N. Sibai

College of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Fadi N. Sibai

College of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Chok M. Yip

Wichita State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge