Joseph M. Arul | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joseph M. Arul is active.

Explore More

Publication

Featured researches published by Joseph M. Arul.

IEEE Transactions on Mobile Computing | 2003

An efficient fault-tolerant approach for Mobile IP in wireless systems

Jenn-Wei Lin; Joseph M. Arul

This paper presents the fault tolerance of Mobile IP in wireless systems. Mobile IP can support wireless users with continuous network connections while changing locations. It is achieved by allocating a number of mobility agents (foreign agents and home agents) in the architecture of a wireless system. If a failure occurs in a mobility agent, the wireless users located in the coverage area of the faulty mobility agent will lose their network connections. To tolerate the failures of mobility agents, this paper proposes an efficient approach to maintaining the network connections of wireless users without being affected by the failures. Once detecting a failure in a mobility agent, failure-free mobility agents are dynamically selected to be organized as a backup set to take over the faulty mobility agent. Compared to the previous approaches, the proposed approach does not take any actions against failures during the failure-free period. Besides, the hardware redundancy technique is also not used in the proposed approach. The overhead of the proposed approach is analyzed using the M/G/c/c queuing model. The results show that the proposed approach can effectively resolve the fault-tolerant problem of Mobile IP in wireless systems.

international conference on algorithms and architectures for parallel processing | 2002

Scalability of scheduled data flow architecture (SDF) with register contexts

Joseph M. Arul; Krishna M. Kavi

Our new architecture called scheduled data flow (SDF) system deviates from current trend of building complex hardware to exploit instruction level parallelism (ILP) by exploring a simpler, yet powerful execution paradigm that is based on data flow, multi-threading and decoupling of memory accesses from execution. A program is partitioned into non-blocking threads. In addition, all memory accesses are decoupled from the threads execution. Data is pre-loaded into the threads context (registers), and all results are post-stored after the completion of the threads execution. Even though multi-threading and decoupling are possible with control-flow architecture, the non-blocking and functional nature of the SDF system make it easier to coordinate the memory accesses and execution of a thread. In this paper we show some recent improvements on SDF implementation, whereby threads exchange data directly in register contexts, thus eliminating the need for creating thread frames. Thus it is now possible to explore the scalability of our architectures performance when more register contexts are included on the chip.

ieee international conference on high performance computing data and analytics | 2005

An efficient way of passing of data in a multithreaded scheduled dataflow architecture

Joseph M. Arul; Tsozen Yeh; Chiacheng Hsu; Janjr Li

The scheduled dataflow (SDF) architecture deviates from current trend of building complex hardware to exploit instruction level parallelism (ILP) by exploring a simpler, yet powerful execution paradigm that is based on dataflow, multithreading and decoupling of memory accesses from execution. A program is partitioned into non-blocking threads and all memory accesses are decoupled from the threads execution. Data is pre-loaded into the threads context (registers), and all results are post-stored after the completion of the threads execution. This paper presents an efficient way of storing of data into the threads register context directly as opposed to storing of data into the frame memory. This way eliminates the need for creating thread frames when there are sufficient register contexts available in the system. Thus, it is possible to explore the scalability of SDF architectures performance when more register contexts are available on the chip. All the benchmarks ran using these two methods show performance improvement of at least about 20%. This method of allocating data to a consecutive thread in a multithreaded architecture could be applied generally

Cluster Computing | 2018

Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems

Jenn-Wei Lin; Joseph M. Arul; Chi-Yi Lin

MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided into a number of map and reduce tasks by a well determined division manner on its processing data. In a cloud computing system, multiple MapReduce jobs may be submitted together to compete for the computing resources of the system. When a job has a particular performance requirement (e.g. execution deadline), the appropriate computing resources must be kept for executing the map/reduce tasks of the job; otherwise, the performance requirement cannot be satisfied. Several deadline-constrained MapReduce schedulers have been proposed, but most of them are not aware of the performance influence over existing tasks. We propose a deadline-constrained and influence-aware MapReduce scheduler which combines the following three factors: (1) relaxed data locality, (2) performance influence over existing tasks, and (3) coordinating allocation contention. We first adopt the data-locality criterion to make a tentative allocation plan. By verifying the data-locality allocation plan, if some new tasks severely affect existing tasks or the deadline requirements of some new tasks are not satisfied, the data-locality allocation plan will be modified by re-allocating some new tasks. To optimize the computing resource usage, the solution of a well-known network graph problem: minimum cost maximum-flow (MCMF) is applied to perform the modification of the data-locality allocation plan. A heuristic algorithm is also presented to suppress the complexity of MCMF problem. In addition to meeting the deadline requirements of new jobs, the final allocation plan also considers the performance influence over existing jobs. Finally, we conduct the performance analysis to demonstrate the performance of our proposed MapReduce scheduler using various performance metrics.

international symposium on next generation electronics | 2015

Improvement of memory bandwidth utilization using OpenMP task with processor affinity

Joseph M. Arul; Chun-Chih Huang

The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.

2014 World Congress on Computing and Communication Technologies | 2014

VHDL Implementation of Scheduled Dataflow Architecture and the Impact of Efficient Way of Passing of Data

Joseph M. Arul; Han-Yao Ko; Hwa-Yuan Chung

Since the invention of microprocessors around 1970, CPU performance improvement together with the Instruction Level Parallelism (ILP) had been the main focus of the computer industry. Recently, ILP seemed to have reached its limit and together with the problem of power consumption and heat dissipation, emerged the multi-core era. The focus had shifted from ILP to Thread Level Parallelism (TLP) and efficient use of multi-core processors. However, the detection of RAW hazard technique relies on complex hardware in the current computers, which may cause the designers to make the CPU consume lot of energy and the design to be more complex. By using dataflow paradigm, this can naturally eliminate the RAW hazards. This new architecture uses a paradigm, to closely link the ILP and TLP by combining the sequential and dataflow approach. It is designed using VHDL language and tested on Alter a DE2 board. With just two register sets, tremendous amount of performance improvement can be gained. This architecture not only reduces the latency of memory accesses, but also can be suitable for multithreaded multi-core platforms.

international conference on algorithms and architectures for parallel processing | 2010

An efficient non-blocking multithreaded embedded system

Joseph M. Arul; Tsung-Yun Chen; Guan-Jie Hwang; Hua-Yuan Chung; Fu-Jiun Lin; You-Jen Lee

Most embedded systems are designed to perform one or fewer specific functions It is important that the hardware and the software must closely interact to achieve maximum efficiency in all of these realms and overcome the drawbacks found in each aspect This research focuses on designing a totally new Instruction Set Architecture (ISA) as well as hardware that can closely tie together with the new emerging trend such as multithreaded multicore embedded systems This new ISA can efficiently execute simple programs in a much efficient way as well as have a better cache performance with less cache misses due to the way the program is split into non-blocking multithreading paradigm This particular research is aimed to compare the performance of this new non-blocking multithreaded architecture with the ARM architecture that is commonly used in an embedded environment It has a speedup of 1.7 in general compared to the MIPS like ARM architecture.

ieee international conference on fuzzy systems | 2008

Discrimination of female children in school education - induced Fuzzy Associative Memories (IFAM) analysis on the causes and consequences

T. Pathinathan; Joseph M. Arul

Modern developments and scientific technology advancements have engulfed every sphere of life yet the abolition of gender discrimination and equal rights with respect to education for the girl child has not become a reality. In this paper using fuzzy associative memories (FAM) first and then the newly introduced induced fuzzy associative memories (IFAM), we analyze the causes for school drop outs among female children and suggest ways and means to reduce the rate of dropouts.

high performance distributed computing | 2006

Using File Grouping to Improve the Disk Performance (Extended Abstract)

Tsozen Yeh; Joseph M. Arul; Jia-Shian Wu; I-Fan Chen; Kuo-Hsin Tan

As the speed gap between CPU and the secondary storage device will not be narrowing in the foreseeable future, file grouping can be a promising way to reduce the disk I/O latency. The order of sequential access among files observed during the execution of individual programs is very predictable. Based on this idea, we propose a new file grouping model called program-based grouping (PBG). Through the Reiser file system, we implemented PBG into Linux kernel. The experiments demonstrate that PBG out performs both ReiserFS and Ext3. Compared with ReiserFS, PBG can improve the ReiserFS performance by up to 66%

Proceedings of the 4th International Conference | 2000