S. Arash Ostadzadeh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where S. Arash Ostadzadeh is active.

Explore More

Publication

Featured researches published by S. Arash Ostadzadeh.

applied reconfigurable computing | 2010

QUAD: a memory access pattern analyser

S. Arash Ostadzadeh; Roel Meeuws; Carlo Galuzzi; Koen Bertels

In this paper, we present the Quantitative Usage Analysis of Data (QUAD) tool, a sophisticated memory access tracing tool that provides a comprehensive quantitative analysis of memory access patterns of an application with the primary goal of detecting actual data dependencies at function-level. As improvements in processing performance continue to outpace improvements in memory performance, tools to understand memory access behaviors are inevitably vital for optimizing the execution of data-intensive applications on heterogeneous architectures. The tool, first in its kind, is described in detail and the benefit and the qualities of the presented tool are described on a real case study, the x264 benchmarking application.

computer, information, and systems sciences, and engineering | 2008

An MDA-Based Generic Framework to Address Various Aspects of Enterprise Architecture

S. Shervin Ostadzadeh; Fereidoon Shams Aliee; S. Arash Ostadzadeh

With a trend toward becoming more and more information based, enterprises constantly attempt to surpass the accomplishments of each other by improving their information activities. Building an Enterprise Architecture (EA) undoubtedly serves as a fundamental concept to accomplish this goal. EA typically encompasses an overview of the entire information system in an enterprise, including the software, hardware, and information architectures. Here, we aim the use of Model Driven Architecture (MDA) in order to cover different aspects of Enterprise Architecture. MDA, the most recent de facto standard for software development, has been selected to address EA across multiple hierarchical levels spanned from business to IT. Despite the fact that MDA is not intended to contribute in this respect, we plan to enhance its initial scope to take advantage of the facilities provided by this innovative architecture. The presented framework helps developers to design and justify completely integrated business and IT systems which results in improved project success rate.

international conference on parallel processing | 2010

tQUAD - Memory Bandwidth Usage Analysis

S. Arash Ostadzadeh; Marco Corina; Carlo Galuzzi; Koen Bertels

One of the main issues in heterogeneous reconfigurable computing is the well-known processor/memory bottleneck. Due to the memory bandwidth limitations, the performance of execution of an application can dramatically increase via the efficient usage of the memory. In this paper, we present tQUAD, a new tool for the memory bandwidth usage analysis. This tool is capable of delivering detailed temporal memory bandwidth usage information for the functions in an application throughout a comprehensive analysis of the memory access patterns of individual functions. This tool, first in its kind, provides an accurate analysis of the task execution and memory bandwidth usage which in the end leads to a sophisticated partitioning of the tasks into different phases during the execution span of an application. Together with an accurate description of the tool, the paper presents a real case study from the multimedia domain to detail all features of the proposed tool.

field programmable gate arrays | 2009

A clustering framework for task partitioning based on function-level data usage analysis

S. Arash Ostadzadeh; Roel Meeuws; Kamana Sigdel; Koen Bertels

Recently, reconfigurable computing has received a great deal of attention due to its ability to increase an application performance with hardware execution, while possessing the flexibility of software solution. One of the major requirements for such systems is to identify which application or part of the application can be implemented as software and which can be mapped onto reconfigurable devices. Grouping the tasks within an application can intensify coarse-grained partitioning of the application, which can eventually improve the performance of the system. In this work, we introduce a clustering framework along with a flexible multipurpose clustering algorithm that initiates task clustering at the functional level based on dynamic profiling information. The clustering framework can be used as the basic step to modify the granularity of tasks in the hardware/software partitioning and scheduling phases. As a result, an elaborate mapping onto the system resources and possibly a higher degree of task parallelism can be obtained. In an initial attempt, the framework addresses two primary objectives to create workload-balanced and loosely-coupled clusters. The experimental results show that the clustering complies with the desired metrics, which were defined through the objectives.

ACM Transactions on Reconfigurable Technology and Systems | 2013

Quipu: A Statistical Model for Predicting Hardware Resources

Roel Meeuws; S. Arash Ostadzadeh; Carlo Galuzzi; Vlad Mihai Sima; Razvan Nane; Koen Bertels

There has been a steady increase in the utilization of heterogeneous architectures to tackle the growing need for computing performance and low-power systems. The execution of computation-intensive functions on specialized hardware enables to achieve substantial speedups and power savings. However, with a large legacy code base and software engineering experts, it is not at all obvious how to easily utilize these new architectures. As a result, there is a need for comprehensive tool support to bridge the knowledge gap of many engineers as well as to retarget legacy code. In this article, we present the Quipu modeling approach, which consists of a set of tools and a modeling methodology that can generate hardware estimation models, which provide valuable information for developers. This information helps to focus their efforts, to partition their application, and to select the right heterogeneous components. We present Quipu’s capability to generate domain-specific models, that are up to several times more accurate within their particular domain (error: 4.6%) as compared to domain-agnostic models (error: 23%). Finally, we show how Quipu can generate models for a new toolchain and platform within a few days.

international conference on parallel processing | 2011

A Simulation Framework for Reconfigurable Processors in Large-Scale Distributed Systems

Faisal Nadeem; S. Arash Ostadzadeh; Muhammad Nadeem; Stephan Wong; Koen Bertels

The inclusion of reconfigurable processors in distributed grid systems promises to offer increased performance without compromising flexibility. Consequently, these large-scale distributed grid systems (such as TeraGrid) are utilizing reconfigurable computing resources next to general-purpose processors (GPPs) in their computing nodes. The near-optimal utilization of resources in such distributed systems considerably depends on the resource management and the application task scheduling. Many state-of-the-art simulators for application scheduling simulation in distributed computing systems have been proposed. However, there is no dedicated simulation framework to study the behavior of reconfigurable nodes in grids. The incorporation of reconfigurable nodes in these systems requires to take into account reconfigurable hardware characteristics, such as, area utilization, performance increase, reconfiguration time, and time to transfer configuration bit streams, execution code, and data. Many of these characteristics are not taken into account by traditional simulators. In this paper, we present a simulation framework for reconfigurable processors in large-scale distributed systems. It is capable of modeling reconfigurable nodes, processor configurations, and tasks in a distributed system. Furthermore, as part of the verification of the framework, we implemented a dynamic task scheduling algorithm with support for the scheduling of tasks on reconfigurable nodes. A number of experiments with various simulation parameters were conducted. The results show an expected trend. We also present a thorough discussion of the results.

computer, information, and systems sciences, and engineering | 2008

Resource Allocation in Market-based Grids Using a History-based Pricing Mechanism

Behnaz Pourebrahimi; S. Arash Ostadzadeh; Koen Bertels

In an ad-hoc Grid environment where producers and consumers compete for providing and employing resources, trade handling in a fair and stable way is a challenging task. Dynamic changes in the availability of resources over time makes the treatment yet more complicated. Here we employ a continuous double auction protocol as an economic-based approach to allocate idle processing resources among the demanding nodes. Consumers and producers determine their bid and ask prices using a sophisticated history-based dynamic pricing strategy and the auctioneer follows a discriminatory pricing policy which sets the transaction price individually for each matched buyer-seller pair. The pricing strategy presented generally simulates human intelligence in order to define a logical price by local analysis of the previous trade cases. This strategy is adopted to meet the user requirements and constraints set by consumers/producers. Experimental results show waiting time optimization which is particularly critical when resources are scarce.

international parallel and distributed processing symposium | 2012

Task Scheduling in Large-scale Distributed Systems Utilizing Partial Reconfigurable Processing Elements

M. Faisal Nadeem; Imran Ashraf; S. Arash Ostadzadeh; Stephan Wong; Koen Bertels

Recent progress in processing speeds, network bandwidths, and middleware technologies have contributed towards novel computing platforms, ranging from large-scale computing clusters to globally distributed systems. Consequently, most current computing systems possess different types of heterogeneous processing resources. Entering into the peta-scale computing era and beyond, reconfigurable processing elements such as Field Programmable Gate Arrays (FPGAs), as well as forthcoming integrated hybrid computing cores, will play a leading role in the design of future distributed systems. Therefore, it is important to develop simulation tools to measure the performance of reconfigurable processors in the current and future distributed systems. In this paper, we propose the design of a simulation framework to investigate the performance of reconfigurable processors in distributed systems. The framework incorporates the partial reconfigurable functionality to the reconfigurable nodes. Depending on the available reconfigurable area, each node is able to execute more than one task simultaneously. Furthermore as a case study, we present a simple task scheduling algorithm to verify the functionality of the simulation framework. The proposed algorithm supports the scheduling of tasks on partially reconfigurable nodes. The simulation results are based on various experiments and they provide a comparison between full (one node-one task mapping) and partial (one node-multiple tasks mapping) configuration of the nodes, for the same set of parameters in each simulation run. Results suggest that the average wasted area per task is less as compared to the full configuration, verifying the functionality of the simulation framework.

international conference on high performance computing and simulation | 2011

Task scheduling strategies for dynamic reconfigurable processors in distributed systems

M. Faisal Nadeem; S. Arash Ostadzadeh; Stephan Wong; Koen Bertels

Reconfigurable processors in distributed grid systems can potentially offer enhanced performance along with flexibility. Therefore, grid systems, such as TeraGrid, are utilizing reconfigurable computing resources next to general-purpose processors (GPPs) in their computing nodes. In general, the application task scheduling largely affects the near-optimal performance of resources in distributed grid systems. The inclusion of reconfigurable nodes in such systems requires to take into account reconfigurable hardware characteristics, such as, area utilization, reconfiguration time, and time to communicate configuration bit-streams, execution codes, and data. Generally, many of these characteristics are not taken into account by traditional task scheduling systems in distributed grids. In this paper, we present a simulation framework for application task distribution among different nodes of a reconfigurable computing grid. Furthermore, we propose three different task scheduling strategies, namely Optional Closest Match (OCM), Exact Match Priority (EMP), and Sufficient-Area Priority (SAP). The simulation results are presented based on the average scheduling steps required by the scheduler to accommodate each task, the total scheduler workload, and the average waiting time per task. We compare the impacts of the three scheduling strategies on these metrics. In addition, we present a thorough discussion of the results. In particular, the results show that the two key metrics average scheduling steps per task and average waiting time per task are reduced for the EMP and the SAP when compared to the OCM.

applied reconfigurable computing | 2012

The q 2 profiling framework: driving application mapping for heterogeneous reconfigurable platforms

S. Arash Ostadzadeh; Roel Meeuws; Imran Ashraf; Carlo Galuzzi; Koen Bertels

Heterogeneous multicore architectures pose specific challenges regarding their programmability and they require smart mapping schemes to make efficient use of different processing elements. Various criteria can drive this mapping, such as computational intensity, memory requirements, and area consumption. In order to facilitate this complex mapping task, there is a clear need for tools that investigate the use of such critical resources, like memory and hardware area. For this purpose, we developed the Q2profiling framework. It consists of two main parts: an advanced memory access profiling toolset, which provides detailed information on the runtime memory access patterns of an application and a statistical modeling component, which makes hardware area predictions early in the design phase based on software metrics. These tools are integrated using a partitioning methodology. We demonstrate the effectiveness of our framework using three applications in our experiments. One application is further detailed in a case study to illustrate the use of our methodology. Experimental results show application speedup of up to 2.92×.

Explore More