Hakduran Koc
University of Houston–Clear Lake
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hakduran Koc.
international symposium on low power electronics and design | 2006
Hakduran Koc; Ozcan Ozturk; Mahmut T. Kandemir; Sri Hari Krishna Narayanan; Ehat Ercanli
Banking has been identified as one of the effective methods using which memory energy can be reduced. We propose a novel approach that improves the energy effectiveness of banked memory architecture by performing extra computations if doing so makes it unnecessary to reactivate a bank which is in the low-power operating mode. More specifically, when an access to a bank, which is in the low-power mode, is to be made, our approach first checks whether the data required from that bank can be recomputed by using the data that are currently stored in already active banks. If this is the case, we do not turn on the bank in question, and instead, recalculate the value of the requested data using the values of the data stored in the active banks. Given the fact that the contribution of the leakage consumption to overall energy budget keeps increasing, the proposed approach has the potential of being even more attractive in the future. Our experimental results collected so far clearly show that this recomputation based approach can reduce energy consumption significantly
design automation conference | 2007
Hakduran Koc; Mahmut T. Kandemir; Ehat Ercanli; Ozcan Ozturk
There have been numerous efforts on Scratch-Pad Memory (SPM) management in the context of single CPU systems and, more recently, multi-processor architectures. This paper presents a novel SPM space utilization strategy, for embedded chip multi-processor systems, based on recomputing the value of an off-chip data element using on-chip (SPM resident) data elements. In doing so, our goal is to eliminate the corresponding off-chip memory access that would otherwise be performed, and save execution cycles and power. This paper presents the details of a compiler algorithm that implements this approach and reports the experimental data we collected using six data-intensive applications. Our results indicate that, on a four processor chip multiprocessor, the average performance improvement our approach brings is about 11.8%, over a state-of-the-art SPM management scheme. We also observed that there is a specific range of total SPM size/total data size ratios, for which our approach generates the best results. Finally, our results also show that the proposed approach brings consistent improvements when the number of CPUs is varied between 2 and 16.
ieee computer society annual symposium on vlsi | 2006
Hakduran Koc; Suleyman Tosun; Ozcan Ozturk; Mahmut T. Kandemir
As embedded applications are processing increasingly larger data sets, keeping their memory space consumptions under control is becoming a very pressing issue. Observing this, several prior efforts have considered memory space reduction techniques (in both hardware and software) based on data compression and lifetime-based memory recycling (garbage collection). In this work, we propose and evaluate an alternate approach to memory space saving in multi-CPU embedded systems such as chip multiprocessors. The unique characteristic of our approach is that it recomputed the results of select tasks in a given task graph (which represents the application), instead of storing these results in memory and accessing them from there as needed.
symposium on cloud computing | 2010
Hakduran Koc; Mahmut T. Kandemir; Ehat Ercanli
This paper presents a novel on-chip memory space utilization strategy for architectures that accommodate large on-chip software-managed memories. In such architectures, the access latencies of data blocks are typically proportional to the distance between the processor and the requested data. Considering such an on-chip memory hierarchy, we propose to recompute the value of an on-chip data, which is far from the processor, using the closer data elements instead of directly accessing the far data if it is beneficial to do so in terms of performance. This paper presents the details of a compiler algorithm that implements the proposed approach and reports the experimental data collected using six data-intensive applications programs. Our experimental evaluation indicates 8.2% performance improvement, on the average, over a state-of-the-art on-chip memory management strategy and shows consistent improvements for varying on-chip memory sizes and different data access latencies.
international midwest symposium on circuits and systems | 2006
Priyank Parakh; Divya Mullassery; Anand Chandrashekar; Hakduran Koc; Deniz Dal; Nazanin Mansouri
In deep sub-micron (DSM) technologies, the interconnect significantly impacts the design performance and reliability: a considerable fraction of the total circuit power is consumed by interconnects, crossing the global interconnects requires multiple clock cycles and the wire capacitance directly affects the noise levels. This paper introduces a novel high- level synthesis (HLS) methodology that exploits architectural optimizations that lead to final circuits (layouts) with enhanced interconnect power and delay without introducing any overhead. We present a new interconnect-centric scheduling algorithm and a global binding that combines the functional unit binding and register binding. These routines generate circuits with improved interconnect through minimizing the nets (the number and fan-out) and steering logic. The binding process achieves this by considering clusters of operations as compatible candidates instead of individual operations, while the scheduling makes assignments that maximize the cluster compatibility. The experiments with several synthesis benchmarks show the effectiveness of the approach in reducing the total number, the average fan-out and the length of the nets without introducing any logic overhead. We achieved an average reduction of approximately 16% in total wire length compared to the layout of the designs generated by conventional synthesis tools.
international conference on digital information processing and communications | 2015
Elham Azari; Hakduran Koc
Hardware/software partitioning has always been a crucial step in co-design of embedded systems as it affects the overall system performance significantly. This paper proposes a new approach to partition the tasks in a given Control Data Flow Graph (CDFG) representing an application. In order to enhance the performance, our approach considers the combination of two main paths in the system: hot path and critical path during the partitioning phase of the co-design. These two paths dominate the total execution time of a system. After identifying the hot path and the critical path, the proposed approach assigns as many tasks as possible to the hardware components by giving higher priorities to the tasks in the hot paths which directly have significant effect on critical path. Consequently, the total execution time of an application is reduced. The experimental evaluation shows that the proposed path-based partitioning method improves the performance significantly. In addition, the performance/area trade-off is presented.
international conference on technological advances in electrical electronics and computer engineering | 2013
Bayan Nimer; Hakduran Koc
As the demand for high performance and computational complexity continue to increase in embedded systems, reliability issues are becoming limiting factors on application scalability and long-term survivability in the harsh conditions in which such systems are operated. As embedded systems already house large datasets, traditional techniques to improve reliability such as task redundancy can be very costly in terms of memory space consumption. In this paper, we propose an approach that utilizes idle time frames of computational resources in order to iteratively recompute tasks to increase reliability of overall design without incurring any additional execution latency, area or extra memory space. We present experimental evaluation that shows the effectiveness of the proposed task recomputation approach using both task graphs extracted from benchmarks and automatically-generated task graphs.
international conference on connected vehicles and expo | 2013
Fatih Karabacak; Hakduran Koc; Arif Ceber
This paper presents the design and implementation of an electronic car sticker (e-sticker) using a proprietary RFID protocol. The system is a wireless data transmission system and consists of a monitoring module (reader) and many data acquisition modules (tags). In order to extend the battery life, the hardware and software components in both reader and tag modules are specifically designed to consume minimal power/energy. The proposed e-sticker can replace many windshield stickers such as registration, inspection and parking stickers on vehicles and can be monitored by patrol officers, inspection stations and authorized offices.
Proceedings of the 2018 2nd International Conference on Algorithms, Computing and Systems | 2018
Archit Gajjar; Xiaokun Yang; Lei Wu; Hakduran Koc; Ishaq Unwala; Yunxiang Zhang; Yi Feng
This paper presents a synthesis of well-known Viola-Jones face detection algorithm on Xilinx software and platform - Vivado and field programmable gate array (FPGA) as Nexys 4 Artix-7 device. Compared with the prior work on the Altera platform proposed in [1], our work reduces the slice count by 1018. And additionally, the power consumption of the implementation is 714 mW, including 15% as the static cost and 85% as the dynamic power dissipation. Furthermore, the design details of the components of the structure, such as generation of integral image, multiple pipelined classifiers, as well as the parallel processing, are discussed in this work, in order to provide a potential improvement for the future work. This paper not only provides successful synthesis of a face detection system but also ignites intriguing ideas in terms of improvement aspects, such as approximating the design for finding an optimal energy-quality tradeoff corresponding to different applications as our future work.
ieee annual computing and communication workshop and conference | 2017
Hakduran Koc; Mehmet Ucar
In this paper, we present a technique based on Matrix Model Computation (MMC) in order to improve the performance of embedded systems that run data-intensive applications. Unlike traditional techniques that consider a loop nest in a data-intensive application as one execution phase, the proposed technique aims at efficiently dividing a loop nest into multiple execution phases in order to improve the utilization of dynamic memory management schemes in a more efficient way. The target architecture is an embedded processor with software-managed on-chip memory components with multiple levels in the hierarchy. The experimental results presented using single-core embedded architecture show significant performance improvements over available dynamic memory management schemes.