Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nafiul Alam Siddique is active.

Publication


Featured researches published by Nafiul Alam Siddique.


international performance computing and communications conference | 2016

LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors

Nafiul Alam Siddique; Abdel-Hameed A. Badawy; Jeanine Cook; David Resnick

In this paper, we present a hardware controlled on-chip memory called Local Memory Store (LMStr) that can be used either solely as a scratchpad or as a combination of scratchpad and cache, storing any variable specified by the programmer or extracted by the compiler. LMStr is different than a traditional scratchpad in that it is hardware-controlled and it stores the same type of variables in a block that is allocated based on availability and demand. In this initial work on LMStr, we focus on identifying the potential for LMStr, namely, the advantages of storing temporary and program variables in blocks in LMStr and comparing the performance against a regular cache. To the best of our knowledge, this is the first work where scratchpad has been used in a generalized way where the focus is on storing temporary and programmer specified variables in blocks. We evaluate LMStr on a micro-benchmark and a set of the mini-applications in the mantevo suite. We simulate LMStr in the Structural Simulation Toolkit (SST) simulator. LMStr provides a 10% reduction in average data movement between on-chip and off-chip memory compared to a traditional cache hierarchy.


international conference on computational science | 2016

Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite

Nafiul Alam Siddique; Patricia Grubel; Abdel-Hameed A. Badawy; Jeanine Cook

Cache hierarchies have long been utilized to minimize the latency of main memory accesses by caching frequently used data closer to the processor. Significant research has been done to identify the most crucial metrics of cache performance. Though the majority of research focuses on measuring cache hit rates and data movement as the major cache performance metrics, cache utilization can be equally important. In this work, we present cache utilization performance metrics that provide insight into application behavior. We define cache utilization in two forms: 1) the fraction of data bytes in a cache line that are actually accessed at least once before eviction from cache and 2) the access frequency of data bytes in a cache line. We discuss the relationship between the utilization measurement and two important application properties: 1) spatial locality – the use of data located near data that has already been accessed, and 2) temporal locality – the reuse of data over time. In addition to measuring cache line utilization performance, we present conventional performance metrics as well to illustrate a holistic understanding of cache behavior. To facilitate this work, we build a memory simulator incorporated into the Structural Simulation Toolkit (SST). We measure and analyze the performance for several scientific mini-applications from the Mantevo suite [1]. This work justifies that caches are not necessarily the best onchip solution for all types of applications due to the fixed cache line size.


international conference on informatics electronics and vision | 2012

Control of autonomous cars for intelligent transportation system

Mohammad Qayum; Nafiul Alam Siddique; Mohammad Abtiqul Haque; Abu Saleh Md. Tayeen

This work concentrates on establishing a technology to develop a unique approach for the integration of intelligent system control into transportation systems. We have implemented an algorithm [1] that controls a car to track a predefined track which integrates a human driver with computer control to increase human performance while reducing reliance on detailed driver attention. Knowledge obtained from the optical tracking system about vehicle position and orientation provides the automatic decision making intelligence needed to follow a virtual vehicle moving on track. The result shows good performance of this model independent algorithm with fairly cheap RC cars.


International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2014

Insight into Application Performance Using Application-Dependent Characteristics

Waleed Alkohlani; Jeanine Cook; Nafiul Alam Siddique

Carefully crafted performance characterization can provide significant insight into application performance and can be beneficial to computer designers, compiler and application developers, and end users. To achieve all the benefits of performance characterization, the characterization must incorporate a comprehensive set of characteristics that affect performance and can be measured with minimal perturbation from the underlying micro-architecture. To this end, we advocate the use of application-dependent characteristics that allow general conclusions to be drawn about the application itself rather than its observed performance on a specific architecture. In our prior work [7], we introduced a set of application-dependent characteristics and showed that they are consistent across architectures. In this work, we present an efficient characterization methodology that incorporates a more comprehensive set of application-dependent characteristics. We also explain in detail how these characteristics can be used to reason about and gain insight into application performance. Finally, we report characterization results on SPEC MPI2007 and Mantevo benchmarks. To our knowledge, this is the first work to present application-dependent characterization results for SPEC MPI2007 and some of the new Mantevo benchmarks.


The Journal of Supercomputing | 2018

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Nafiul Alam Siddique; Patricia Grubel; Abdel-Hameed A. Badawy; Jeanine Cook

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. Furthermore, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Our results suggest that variable cache line size can result in better performance and can also conserve power.


Proceedings of the International Symposium on Memory Systems | 2017

SprBlk cache: enabling fault resilience at low voltages

Nafiul Alam Siddique; Abdel-Hameed A. Badawy

This paper proposes a novel cache architecture that uses spare cache blocks to work as back up blocks in a set associative cache, which can operate reliably at voltages well below the manufacturing induced operating voltage (Vccmin). We detect errors in all cache lines at low voltage, tag them as either faulty or fault-free. At runtime, we bypass the faulty words. To bypass faulty words, we use adder and shifter circuitry. Furthermore, we develop a fault model to find the cache set failure probability at low voltage. At 485mV, SprBlk cache operates with a 16.7% lower bit failure probability compared to a conventional cache operating at 782mV while reducing power consumption by 1% when SprBlk is implemented in the L1 data cache only, by 75% when implemented in the L2 cache only, and by 76% when implemented in both L1 and L2 caches. SprBlk cache is 15% more area efficient than the previously proposed Bit-Fix mechanism. Additionally, SprBlk provides ∼ 73% reduction in EPI (energy per instruction) compared to a conventional cache.Furthermore, it consumes 2--5% less power compared to Bit-Fix mechanism.


2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR) | 2017

Interactive intelligent agents with creative minds: Experiments with mobile robots in cooperating tasks by using machine learning

Mohammd Abdul Qayum; Nazmun Nahar; Nafiul Alam Siddique; Z. M. Saifullah

In this paper, we present an intelligent system where agents can co-ordinate creative tasks through machine learning and cooperation. For machine learning, we used commonly used pattern recognition algorithm - Principal Component Analysis (PCA). Based on recognition, we plan a task that is performed by multiple intelligent agents. In our case, task is to draw a pattern or perform a creative art by agents. The task action is divided into three phases: obtaining a design, composing a mathematical model and and performing the task by agents. In case of agents co-ordination, various feedback techniques using wireless sensors and on-board sensors are used. As for proof of concept (POC), a flower pattern is detected, which is painted on a canvas by using mobile robots. Also, persons identity and mood is detected and then a creative art is performed by mobile robots to improve the mood.


international conference on informatics electronics and vision | 2012

Future of multiprocessors: Heterogeneous Chip Multiprocessors

Mohammad Qayum; Nafiul Alam Siddique; Mohammad Atiqul Haque; Abu Saleh Md. Tayeen

As computer applications are becoming complex, large and versatile; the advent of Chip multiprocessor is ubiquitous. There are numerous researches going on about the core architectures within the chip. Heterogeneous Chip Multiprocessor (CMP) is leading the innovation. Heterogeneous CMP is composed of cores of varying performance, and complexity. It gives better area to performance ratio, high throughput, and higher speed up and mitigates Amdahls bottleneck to some extent. There are three major issues to Heterogeneous CMP- scheduling applications to different cores, configuration of cores, and Amdahls law utilization. This paper discusses recent researches of these three issues in details and finally, some recommendations are drawn from the study.


ubiquitous intelligence and computing | 2017

Local memory store (LMStr): A hardware controlled shared scratchpad for multicores

Nafiul Alam Siddique; Abdel-Hameed A. Badawy; Jeanine Cook; David Resnick


ubiquitous intelligence and computing | 2017

The time-varying nature of cache utilization: A case study on the Mantevo and Apex benchmarks

Nafiul Alam Siddique; Patricia Grubel; Abdel-Hameed A. Badawy

Collaboration


Dive into the Nafiul Alam Siddique's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeanine Cook

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

David Resnick

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Patricia Grubel

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mohammad Qayum

New Mexico State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Waleed Alkohlani

New Mexico State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge