Afrin Naz
University of North Texas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Afrin Naz.
memory performance dealing with applications systems and architecture | 2005
Afrin Naz; Mehran Rezaei; Krishna M. Kavi; Philip H. Sweany
In our prior work we explored a cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate caches.That work showed that using separate (data) caches for indexed or stream data and scalar data items could lead to substantial improvements in terms of cache misses. In addition, such a separation allowed for the design of caches that could be tailored to meet the properties exhibited by different data items.In this paper, we investigate the interaction between three established methods: split cache, victim cache and stream buffer. Since significant amounts of compulsory and conflict misses are avoided, the size of each cache (i.e., array and scalar), as well as the combined cache capacity can be reduced. Our results show that on average 55% reduction in miss rates over the base configuration.
international symposium on pervasive systems, algorithms, and networks | 2009
Oluwayomi Adamo; Afrin Naz; Tommy Janjusic; Krishna M. Kavi; Chung-Ping Chung
As more cores (processing elements) are included in a single chip, it is likely that the sizes of per core L-1 caches will become smaller while more cores will share L-2 cache resources. It becomes more critical to improve the use of L-1 caches and minimize sharing conflicts for L-2 caches. In our prior work we have shown that using smaller but separate L-1 array data and L-1 scalar data cache, instead of a larger single L-1 data cache, can lead to significant performance improvements. In this paper we will extend our experiments by varying cache design parameters including block size, associativity and number of sets for L-1 array and L-1 scalar caches. We will also present the affect of separate array and scalar caches on the non-uniform accesses to different (L-1) cache sets exhibited while using a single (L-1) data cache. For this purpose we use third and fourth central moments (skewness and kurtosis), which characterize the access patterns. Our experiments show that for several embedded benchmarks (from MiBench) split data caches significantly mitigate the problem of non-uniform accesses to cache sets (leading to more uniform utilization of cache resources, reduction of conflicts to cache sets, and minimizing hot spots in cache). They also show that neither higher set-associativities nor large block sizes are necessary with split cache organizations.
Journal of Systems Architecture | 2007
Wentong Li; Mehran Rezaei; Krishna M. Kavi; Afrin Naz; Philip H. Sweany
In conventional architectures, the central processing unit (CPU) spends a significant amount of execution time allocating and de-allocating memory. Efforts to improve memory management functions using custom allocators have led to only small improvements in performance. In this work, we test the feasibility of decoupling memory management functions from the main processing element to a separate memory management hardware. Such memory management hardware can reside on the same die as the CPU, in a memory controller or embedded within a DRAM chip. Using Simplescalar, we simulated our architecture and investigated the execution performance of various benchmarks selected from SPECInt2000, Olden and other memory intensive application suites. Hardware allocator reduced the execution time of applications by as much as 50%. In fact, the decoupled hardware results in a performance improvement even when we assume that both the hardware and software memory allocators require the same number of cycles. We attribute much of this improved performance to improved cache behavior since decoupling memory management functions reduces cache pollution caused by dynamic memory management software. We anticipate that even higher levels of performance can be achieved by using innovative hardware and software optimizations. We do not show any specific implementation for the memory management hardware. This paper only investigates the potential performance gains that can result from a hardware allocator.
high performance computing systems and applications | 2004
Afrin Naz; Krishna M. Kavi; Philip H. Sweany; Mehran Rezaei
acm symposium on applied computing | 2007
Afrin Naz; Krishna M. Kavi; Jung-Hwan Oh; Pierfrancesco Foglia
memory performance dealing with applications systems and architecture | 2006
Afrin Naz; Krishna M. Kavi; Mehran Rezaei; Wentong Li
Archive | 2007
Krishna M. Kavi; Afrin Naz
ISCA PDCS | 2006
Wentong Li; Krishna M. Kavi; Afrin Naz; Philip H. Sweany
ISCA PDCCS | 2009
Afrin Naz; Oluwayomi Adamo; Krishna M. Kavi; Tomislav Janjusic
Journal of Embedded Computing | 2006
Afrin Naz; Krishna M. Kavi; Wentong Li; Philip H. Sweany