Krishna M. Kavi
University of Texas at Arlington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Krishna M. Kavi.
IEEE Transactions on Computers | 2001
Krishna M. Kavi; Roberto Giorgi; Joseph Arul
In this paper, the scheduled dataflow (SDF) architecture-a decoupled memory/execution, multithreaded architecture using nonblocking threads-is presented in detail and evaluated against superscalar architecture. Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. This trend allows for better performance, but at the expense of increased hardware complexity and, possibly, higher power expenditures resulting from dynamic instruction scheduling. Our research deviates from this trend by exploring a simpler, yet powerful execution paradigm that is based on dataflow and multithreading. A program is partitioned into nonblocking execution threads. In addition, all memory accesses are decoupled from the threads execution. Data is preloaded into the threads context (registers) and all results are poststored after the completion of the threads execution. While multithreading and decoupling are possible with control-flow architectures, SDF makes it easier to coordinate the memory accesses and execution of a thread, as well as eliminate unnecessary dependencies among instructions. We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these architectures in order to have a fair comparison. The results show that SDF architecture can outperform the superscalar. SDF performance scales better with the number of functional units and allows for a good exploitation of Thread Level Parallelism (TLP) and available chip area.
IEEE Software | 1992
Barbara B. Wyatt; Krishna M. Kavi; Stephen P. Hufnagel
Fourteen concurrent object-oriented languages are compared in terms of how they deal with communication, synchronization, process management, inheritance, and implementation trade-offs. The ways in which they divide responsibility between the programmer, the compiler, and the operating system are also investigated. It is found that current object-oriented languages that have concurrency features are often compromised in important areas, including inheritance capability, efficiency, ease of use, and degree of parallel activity. Frequently, this is because the concurrency features were added after the language was designed. The languages discussed are Actors, Abd/1, Abd/R, Argus, COOL, Concurrent Smalltalk, Eiffel, Emerald, ES-Kit C++, Hybrid, Nexus, Parmacs, POOL-T, and Presto.<<ETX>>
IEEE Software | 1992
Frederick T. Sheldon; Krishna M. Kavi; Robert Tausworthe; James T. Yu; Ralph Brettschneider; William W. Everett
The gap between theory and practice of reliability measurement in software design is discussed, and key issues that underlie reliability measurements evolution from theory to practice are presented. A panel discussion in which reliability measurements salient issues, basic concepts, and underlying theory are outlined is included. Reliability measurements role in the development life cycle is also discussed.<<ETX>>
Advances in Computers | 1997
Ali R. Hurson; Joford T. Lim; Krishna M. Kavi; Ben Lee
Abstract Since loops in programs are the major source of parallelism, considerable research has been focused on strategies for parallelizing loops. For DOALL loops, loops can be allocated to processors either statically or dynamically. When the execution times of individual iterations vary, dynamic schemes can achieve better load balance, albeit at a higher runtime scheduling cost. The inter-iteration dependencies of DOACROSS loops can be constant (regular DOACROSS loops) or variable (irregular DOACROSS loops). In our research, we have proposed and tested two loop allocation techniques for regular DOACROSS loops, known as Staggered distribution (SD) and Cyclic Staggered (CSD) distribution. This article analyzes several classes of loop allocation algorithms for parallelizing DOALL, regular, and irregular DOACROSS loops.
IEEE Computer | 1999
Krishna M. Kavi; James C. Browne; Anand Tripathi
That computing and communication systems are becoming increasingly interdependent is evident in almost every aspect of society. Applications of these integrated systems are also spreading. As this trend continues, it will force the computing community not only to develop revolutionary systems but also to redefine computer system and the roles of traditional research disciplines, such as operating systems, architectures, compilers, languages, and networking. Systems research faces an unprecedented challenge. Systems developers are facing a major discontinuity in the scale and nature of both applications and execution environments. Applications are changing from transforming data to directly interacting with humans; they will use hardware and data that span wide area, even global, networks of resources and involve interactions among users as well. Even the architecture of individual processors is uncertain. The authors look at three challenges facing systems research, describe developing solutions, and review remaining obstacles. Using this information, they formulate three clear first steps to addressing the identified challenges: (a) define a new paradigm for systems research; (b) attack problems common to all system development; (c) build a research infrastructure.
international symposium on computer architecture | 1995
Krishna M. Kavi; Ali R. Hurson; Phenil Patadia; Elizabeth Abraham; Ponnarasu Shanmugam
Cache memories have proven their effectiveness in the von Neumann architecture when localities of reference govern the execution loci of programs. A pure dataflow program, in contrast, contains no locality of reference since the execution sequence is enforced only by the availability of arguments. Instruction locality may be enhanced if, dataflow programs are reordered. Enhancing the locality of data references in the dataflow architecture is a more challenging problem. In this paper we report our approaches to the design of instruction, data (operand) and I-Structure cache memories using the Explicit Token Store (ETS) model of dataflow systems. We will present the performance results obtained using various benchmark programs.
Journal of Universal Computer Science | 2000
Krishna M. Kavi; Joseph Arul; Roberto Giorgi
This paper presents an evaluation of our Scheduled Dataflow (SDF) Processor. Recent focus in the field of new processor architectures is mainly on VLIW (e.g. IA-64), superscalar and superspeculative architectures. This trend allows for better performance at the expense of an increased hardware complexity and a brute-force solution to the memory-wall problem. Our research substantially deviates from this trend by exploring a simpler, yet powerful execution paradigm that is based on dataflow concepts. A program is partitioned into functional execution threads, which are perfectly suited for our non-blocking multithreaded architecture. In addition, all memory accesses are decoupled from the threads execution. Data is pre-loaded into the threads context (registers), and all results are post-stored after the completion of the threads execution. The decoupling of memory accesses from thread execution requires a separate unit to perform the necessary pre-loads and post-stores, and to control the allocation of hardware thread contexts to enabled threads. The analytical analysis of our architecture showed that we could achieve a better performance than other classical dataflow architectures (i.e., ETS), hybrid models (e.g., EARTH) and decoupled multithreaded architectures (e.g., Rhamma processor). This paper analyzes the architecture using an instruction set level simulator for a variety of benchmark programs. We compared the execution cycles required for programs on SDF with the execution cycles required by the programs on DLX (or MIPS). Then we investigated the expected cache-memory performance by collecting address traces from programs and using a trace-driven cache simulator (Dinero-IV). We present these results in this paper.
international conference on parallel processing | 1993
Behrooz A. Shirazi; Krishna M. Kavi; Ali R. Hurson; Prasenjit Biswas
Efficient partitioning and scheduling of parallel programs and the distribution of data among pmessing elements are very important issues in parallel and distributed systems. Existing tools fdl short in addressing the issues satisfactorily. On one hand, it is believed to k unreasonable to leave the burden of these complex tasks to the programmers. On the other hand, fully automated schedulers have ken shown to be d little practical significance, or suitable only for restricted cases. In this paper we address the issues and algorithms for efficient partitioning and scheduling d parallel programs, including the distribution of data, in dislributed-memory multiprocessor systems, using the PARSA parallel software development environment. PARSA consists of a set of visual, interactive, compiletime tools that will provide automated program partitions and schedules whenever possible, while permitting the user to exert conml over these operations for a htter performance. The support program assessment tool provides the users the opportunity b fine-tune the program and achieve their performance objectives.
Journal of Systems and Software | 2002
Deng-Jyi Chen; Wu-Chi Chen; Krishna M. Kavi
Multimedia technology has played an important role in modern computing because it offers more natural and user-friendly interactions with an automated system. This is particularly true for systems utilizing graphical, icon or window-based input and output. Multimedia technology also facilitates reuse more naturally, since the basic components and functions of presentation and animation can be reused for several different animation scenarios. This is evidenced by the rapid prototyping capability of computer and video games where although the characters and story lines change, the basic animation remains constant. In this paper we utilize multimedia technology for eliciting requirements of software systems, particularly those systems that utilize windows- (or graphical)-based interactions with the user. Our methodology will implicitly emphasize reuse since in our approach reusable components include not only code and documents, but also voice narration, animation sequences and message mechanisms. We call such software components as multimedia reusable components (MRCs). Using MRCs, one can view software requirements instead of reading textual representation of the requirements.
Journal of Systems and Software | 1992
Krishna M. Kavi; Seung-Min Yang
Abstract In this article we describe examples of real-time systems in an attempt to characterize such systems. We address the issues as they relate to real-time embedded software systems, and issues that distinguish them from other software systems. The key feature of real-time systems is the timely response requirement. This often implies concurrent processing. The critical nature of many real-time applications necessitate fault-tolerant implementations. Future real-time systems will be more complex and would require distributed implementations. Although several design methods and tools have been proposed, they fall short in meeting all the requirements. We include a “wish-list” of desirable features of future software tools.