Avinash Malik
University of Auckland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Avinash Malik.
ACM Transactions on Design Automation of Electronic Systems | 2009
Avinash Malik; Zoran Salcic; Parthasarathi Roop
SystemJ is a language based on the Globally Asynchronous Locally Synchronous (GALS) paradigm. A SystemJ program is a collection of GALS nodes, also called clock domains, and each clock domain is a synchronous program that extends the Java language. Initial compilation of SystemJ has been to standard Java executing on a Java Virtual Machine (JVM), which is both inefficient and bulky for small embedded systems. This article proposes a new approach for compiling and executing SystemJ using a new type of virtual machine, called a Tandem Virtual Machine (TVM). The TVM approach provides an efficient implementation of SystemJ on both standard processors and resource-constrained embedded processors. The new approach is based on separating the control-driven and data-driven operations for execution on two virtual machines. While the JVM executes the data-driven operations, a Control Virtual Machine (CVM) is introduced to execute the control-driven parts of a SystemJ program. The TVM approach is capable of handling all data-driven and control-driven operations required by the GALS model. The benchmark results show that the TVM has code size improvements of over 60% on average and also a substantial improvement in execution speed over standard Java-based compilation.
java technologies for real-time and embedded systems | 2009
Avinash Malik; Zoran Salcic; Alain Girault; Adam Walker; Sung Chul Lee
This paper presents a novel execution architecture for Globally Asynchronous Locally Synchronous (GALS) systems, in our case particularly targeting system level programming language SystemJ. SystemJ extends Java with both synchronous and asynchronous concurrency and reactivity to control program execution. The proposed architecture is based on separating the control-driven and data-driven operations onto two types of processors, respectively control and data processors, and it is aimed at complex embedded applications designed as GALS. The control processor is introduced to execute efficiently the control constructs, which implement concurrency, reactivity, and control flow in SystemJ. The data processor executes the Java data-driven transformational operations and can be any traditional processor. Control and data processors form hybrid multiprocessors, called GALS multiprocessors, which can then be easily customized for specific application and are implemented as a system on programmable chip (SoPC). Benchmarks show significant improvement in code size and execution speed of the resulting architecture over traditional processors.
embedded and real-time computing systems and applications | 2014
Zhenmin Li; Avinash Malik; Zoran Salcic
Static estimation of the Worst Case Reaction Time (WCRT) of synchronous programs is pivotal for designing hard-real time systems in these languages. The current approaches to WCRT estimation suffer from either large overestimation of the WCRT value or the state space explosion problem. In this paper, we present TACO: a framework that integrates model checking based WCRT estimation with code optimization techniques, which results in close to optimal WCRT estimates with orders of magnitude reduced worst case runtime complexity. Finally, the TACO framework also allows us to generate executables with a smaller overall memory footprint.
ACM Transactions on Architecture and Code Optimization | 2016
Andrew Anderson; Avinash Malik; David Gregg
Automatically exploiting short vector instructions sets (SSE, AVX, NEON) is a critically important task for optimizing compilers. Vector instructions typically work best on data that is contiguous in memory, and operating on non-contiguous data requires additional work to gather and scatter the data. There are several varieties of non-contiguous access, including interleaved data access. An existing approach used by GCC generates extremely efficient code for loops with power-of-2 interleaving factors (strides). In this paper we propose a generalization of this approach that produces similar code for any compile-time constant interleaving factor. In addition, we propose several novel program transformations, which were made possible by our generalized representation of the problem. Experiments show that our approach achieves significant speedups for both power-of-2 and non--power-of-2 interleaving factors. Our vectorization approach results in mean speedups over scalar code of 1.77x on Intel SSE and 2.53x on Intel AVX2 in real-world benchmarking on a selection of BLAS Level 1 routines. On the same benchmark programs, GCC 5.0 achieves mean improvements of 1.43x on Intel SSE and 1.30x on Intel AVX2. In synthetic benchmarking on Intel SSE, our maximum improvement on data movement is over 4x for gathering operations and over 6x for scattering operations versus scalar code.
design, automation, and test in europe | 2016
Nathan Allen; Sidharta Andalam; Partha S. Roop; Avinash Malik; Mark L. Trew; Nitish Patel
We study the problem of modular code generation for emulating the electrical conduction system of the heart, which is essential for the validation of implantable devices such as pacemakers. In order to develop high fidelity models, it is essential to consider the operation of hundreds, if not millions of conduction elements, called nodes of the heart. Published results so far, however, have considered a maximum of 33 nodes1, modelled as Hybrid Input Output Automata (HIOA). The behaviour of this model is captured using the well known commercial tool Simulink®. These approaches are limiting due to the lack of model fidelity of the conduction system. In this paper, we first develop a semantic preserving modular compilation approach for a network of HIOA, by proposing to translate them to a network of Finite State Machines (FSMs). We then demonstrate that a delayed synchronous composition of the cardiac nodes enables modular code generation that is both semantic preserving and efficient. In addition to the above example, we have developed several examples from other domains to compare Simulink® and the developed tool called Piha. The results show that we are able to generate code which, for the cardiac model, is 60% smaller in binary size while executing 20 times faster when compared to Simulink®.
ACM Transactions on Design Automation of Electronic Systems | 2015
HeeJong Park; Avinash Malik; Zoran Salcic
Safety-critical software systems need to guarantee functional correctness and bounded response times to external input events. Programs designed using reactive programming languages, based on formal mathematical semantics, can be automatically verified for functional correctness guarantees. Real-time guarantees on the other hand are much harder to achieve. In this article we provide a static analysis framework for guaranteeing response times for reactive programs developed using the Globally Asynchronous Locally Synchronous (GALS) model of computation. The proposed approach is applicable to scheduling of GALS programs for different target architectures with single or multiple processors or cores. A Satisfiability Modulo Theory (SMT) formulation in the quantifier free linear real arithmetic (QF_LRA) logic is used for scheduling. A novel technique to encode rendezvous used in synchronization of globally asynchronous processes in the presence of locally synchronous parallelism and arbitrary preemption into QF_LRA logic is presented. Finally, our SMT formulation is shown to produce schedules in reasonable time.
international conference of the ieee engineering in medicine and biology society | 2016
Sidharta Andalam; Harshavardhan Ramanna; Avinash Malik; Parthasarathi Roop; Nitish Patel; Mark L. Trew
Virtual heart models have been proposed for closed loop validation of safety-critical embedded medical devices, such as pacemakers. These models must react in real-time to off-the-shelf medical devices. Real-time performance can be obtained by implementing models in computer hardware, and methods of compiling classes of Hybrid Automata (HA) onto FPGA have been developed. Models of ventricular cardiac cell electrophysiology have been described using HA which capture the complex nonlinear behavior of biological systems. However, many models that have been used for closed-loop validation of pacemakers are highly abstract and do not capture important characteristics of the dynamic rate response. We developed a new HA model of cardiac cells which captures dynamic behavior and we implemented the model in hardware. This potentially enables modeling the heart with over 1 million dynamic cells, making the approach ideal for closed loop testing of medical devices.Virtual heart models have been proposed for closed loop validation of safety-critical embedded medical devices, such as pacemakers. These models must react in real-time to off-the-shelf medical devices. Real-time performance can be obtained by implementing models in computer hardware, and methods of compiling classes of Hybrid Automata (HA) onto FPGA have been developed. Models of ventricular cardiac cell electrophysiology have been described using HA which capture the complex nonlinear behavior of biological systems. However, many models that have been used for closed-loop validation of pacemakers are highly abstract and do not capture important characteristics of the dynamic rate response. We developed a new HA model of cardiac cells which captures dynamic behavior and we implemented the model in hardware. This potentially enables modeling the heart with over 1 million dynamic cells, making the approach ideal for closed loop testing of medical devices.
international conference on parallel and distributed systems | 2014
Aravind Vasudevan; Avinash Malik; David Gregg
We present a simulated annealing based partitioning technique for mapping task graphs, onto heterogeneous processing architectures. Task partitioning onto homogeneous architectures to minimize the makespan of a task graph, is a known NP-hard problem. Heterogeneity greatly complicates the aforementioned partitioning problem, thus making heuristic solutions essential. A number of heuristic approaches have been proposed, some using simulated annealing. We propose a simulated annealing method with a novel NEXT STATE function to enable exploration of different regions of the global search space when the annealing temperature is high and making the search more local as the temperature drops. The novelty of our approach is two fold: (1) we go a step further than the existing scientific literature, considering heterogeneity at levels of task parallelism, data parallelism and communication. (2) We present a novel algorithm that uses simulated annealing to find better partitions in the presence of heterogeneous architectures, data parallel execution units, and significant data communication costs. We conduct a statistical analysis of the performance of the proposed method, which shows that our approach clearly outperforms the existing simulated annealing method.
trust security and privacy in computing and communications | 2013
HeeJong Park; Zoran Salcic; Kevin I-Kai Wang; Udayanto Dwi Atmojo; Wei-Tsun Sun; Avinash Malik
Modern ubiquitous computing systems are created with large number of embedded sensing and actuation devices, which together form complex distributed collaborative systems. While the advancements in underlying embedded sensing, actuation and control technologies are tremendous, the system designers still lack proper software approach that can handle systems with complex and concurrent control flow on distributed networked infrastructure. In this paper, a system-level design language, SystemJ, which is based on a formal Model of Computation, is used to provide a new design paradigm for ambient intelligence systems. SystemJ has a set of kernel statements for modeling reactivity, preemptions and concurrency, which allow intuitive handling and composition of complex systems based on concurrent software behaviors. It also provides high level objects called signals and channels, to abstract away the underlying hardware devices and communication mechanisms. The run-time support of the language provides functionalities similar to middleware. An access and environment control system demonstrates the use of SystemJ in implementing typical reactive behaviors in ambient intelligence applications.
IEEE Transactions on Industrial Informatics | 2018
Avinash Malik; Partha S. Roop; Nathan Allen; Theo Steger
Automation systems used in smart grids, transportation, and medical electronics are cyber physical in nature. Automation standards, such as IEC-61499, while well suited to the design of discrete controllers, are not ideally suited to model the dynamics of the plant. Such modeling is essential for emulation-based validation of the controllers in the cyber-physical systems (CPS) domain. We use a well-known formal model for CPS, called hybrid input output automata (HIOA), as the main vehicle in the proposed formulation. A physical process (the plant) may be described as a synchronous composition of a network of such HIOA. We provide an approach to transform such a network to a composite function block (CFB) in IEC-61499. This transformation is shown to be semantics preserving. Code generated from such plant models can be executed on a computer chip to provide real-time response to their adjoining controllers. Through practical examples, we illustrate the scalability and practicability of the proposed approach. The developed approach enables the emulation of physical processes in industrial automation without using the actual plant.