Niraj K. Jha
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Niraj K. Jha.
high-performance computer architecture | 2003
Li Shang; Li-Shiuan Peh; Niraj K. Jha
Originally developed to connect processors and memories in multicomputers, prior research and design of interconnection networks have focused largely on performance. As these networks get deployed in a wide range of new applications, where power is becoming a key design constraint, we need to seriously consider power efficiency in designing interconnection networks. As the demand for network bandwidth increases, communication links, already a significant consumer of power now, will take up an ever larger portion of total system power budget. In this paper we motivate the use of dynamic voltage scaling (DVS) for links, where the frequency and voltage of links are dynamically adjusted to minimize power consumption. We propose a history-based DVS policy that judiciously adjusts link frequencies and voltages based on past utilization. Our approach realizes up to 6.3/spl times/ power savings (4.6/spl times/ on average). This is accompanied by a moderate impact on performance (15.2% increase in average latency before network saturation and 2.5% reduction in throughput.) To the best of our knowledge, this is the first study that targets dynamic power optimization of interconnection networks.
international symposium on performance analysis of systems and software | 2009
Niket Agarwal; Tushar Krishna; Li-Shiuan Peh; Niraj K. Jha
Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing diminishing returns and the advent of chip multiprocessors (CMPs) in mainstream systems, the on-chip network that connects different processing cores has become a critical part of the design. Transistor miniaturization has led to high global wire delay, and interconnect power comparable to transistor power. CMP design proposals can no longer ignore the interaction between the memory hierarchy and the interconnection network that connects various elements. This necessitates a detailed and accurate interconnection network model within a full-system evaluation framework. Ignoring the interconnect details might lead to inaccurate results when simulating a CMP architecture. It also becomes important to analyze the impact of interconnection network optimization techniques on full system behavior. In this light, we developed a detailed cycle-accurate interconnection network model (GARNET), inside the GEMS full-system simulation framework. GARNET models a classic five-stage pipelined router with virtual channel (VC) flow control. Microarchitectural details, such as flit-level input buffers, routing logic, allocators and the crossbar switch, are modeled. GARNET, along with GEMS, provides a detailed and accurate memory system timing model. To demonstrate the importance and potential impact of GARNET, we evaluate a shared and private L2 CMP with a realistic state-of-the-art interconnection network against the original GEMS simple network. The objective of the evaluation was to figure out which configuration is better for a particular workload. We show that not modeling the interconnect in detail might lead to an incorrect outcome. We also evaluate Express Virtual Channels (EVCs), an on-chip network flow control proposal, in a full-system fashion. We show that in improving on-chip network latency-throughput, EVCs do lead to better overall system runtime, however, the impact varies widely across applications.
IEEE Transactions on Mobile Computing | 2006
Nachiketh R. Potlapally; Srivaths Ravi; Anand Raghunathan; Niraj K. Jha
Security is becoming an everyday concern for a wide range of electronic systems that manipulate, communicate, and store sensitive data. An important and emerging category of such electronic systems are battery-powered mobile appliances, such as personal digital assistants (PDAs) and cell phones, which are severely constrained in the resources they possess, namely, processor, battery, and memory. This work focuses on one important constraint of such devices-battery life-and examines how it is impacted by the use of various security mechanisms. In this paper, we first present a comprehensive analysis of the energy requirements of a wide range of cryptographic algorithms that form the building blocks of security mechanisms such as security protocols. We then study the energy consumption requirements of the most popular transport-layer security protocol: Secure Sockets Layer (SSL). We investigate the impact of various parameters at the protocol level (such as cipher suites, authentication mechanisms, and transaction sizes, etc.) and the cryptographic algorithm level (cipher modes, strength) on the overall energy consumption for secure data transactions. To our knowledge, this is the first comprehensive analysis of the energy requirements of SSL. For our studies, we have developed a measurement-based experimental testbed that consists of an iPAQ PDA connected to a wireless local area network (LAN) and running Linux, a PC-based data acquisition system for real-time current measurement, the OpenSSL implementation of the SSL protocol, and parameterizable SSL client and server test programs. Based on our results, we also discuss various opportunities for realizing energy-efficient implementations of security protocols. We believe such investigations to be an important first step toward addressing the challenges of energy-efficient security for battery-constrained systems.
international symposium on low power electronics and design | 2003
Nachiketh R. Potlapally; Srivaths Ravi; Anand Raghunathan; Niraj K. Jha
Security is critical to a wide range of wireless data applications and services. While several security mechanisms and protocols have been developed in the context of the wired Internet, many new challenges arise due to the unique characteristics of battery powered embedded systems. In this work, we focus on an important constraint of such devices -- battery life -- and examine how it is impacted by the use of security protocols.We present a comprehensive analysis of the energy requirements of a wide range of cryptographic algorithms that are used as building blocks in security protocols. Furthermore, we study the energy consumption requirements of the most popular transport-layer security protocol SSL (Secure Sockets Layer). To our knowledge, this is the first comprehensive analysis of the energy requirements of SSL. For our studies, we have developed a measurement-based experimental testbed that consists of an iPAQ PDA connected to a wireless LAN and running Linux, a PC-based data acquisition system for real-time current measurement, the OpenSSL implementation of the SSL protocol, and parametrizable SSL client and server test programs. We investigate the impact of various parameters at the protocol level (such as cipher suites, authentication mechanisms, and transaction sizes, etc.) and the cryptographic algorithm level (cipher modes, strength) on overall energy consumption for secure data transactions.Based on our results, we discuss various opportunities for realizing energy-efficient implementations of security protocols. We believe such investigations to be an important first step towards addressing the challenges of energy efficient security for battery-constrained systems.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1998
Robert P. Dick; Niraj K. Jha
In this paper, we present a hardware-software cosynthesis system, called MOGAC, that partitions and schedules embedded system specifications consisting of multiple periodic task graphs. MOGAC synthesizes real-time heterogeneous distributed architectures using an adaptive multiobjective genetic algorithm that can escape local minima. Price and power consumption are optimized while hard real-time constraints are met. MOGAC places no limit on the number of hardware or software processing elements in the architectures it synthesizes. Our general model for bus and point-to-point communication links allows a number of link types to be used in an architecture. Application-specific integrated circuits consisting of multiple processing elements are modeled. Heuristics are used to tackle multirate systems, as well as systems containing task graphs whose hyperperiods are large relative to their periods. The application of a multiobjective optimization strategy allows a single cosynthesis run to produce multiple designs that trade off different architectural features. Experimental results indicate that MOGAC has advantages over previous work in terms of solution quality and running time.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1993
Niraj K. Jha; Sying-Jyan Wang
Self-checking circuits can detect the presence of both transient and permanent faults. A self-checking circuit consists of a functional circuit that produces encoded output vectors and a checker that checks the output vectors. The checker has the ability to expose its own faults as well. The functional circuit can be either combinational or sequential. A self-checking system consists of an interconnection of self-checking circuits. The advantage of such a system is that errors can be caught as soon as they occur; thus, data contamination is prevented. Methods for the cost-effective design of combinational and sequential self-checking functional circuits and checkers are examined. The area overhead for all proposed design alternatives is studied in detail. >
international conference on computer aided design | 2000
Jiong Luo; Niraj K. Jha
In this paper, we present a power-conscious algorithm for jointly scheduling multi-rate periodic task graphs and aperiodic tasks in distributed real-time embedded systems. While the periodic task graphs have hard deadlines, the aperiodic tasks can have either hard or soft deadlines. Periodic task graphs are first scheduled statically. Slots are created in this static schedule to accommodate hard aperiodic tasks. Soft aperiodic tasks are scheduled dynamically with an on-line scheduler. Flexibility is introduced into the static schedule and optimized to allow the on-line scheduler to make dynamic modifications to the static schedule. This helps minimize the response times of soft aperiodic tasks through both resource reclaiming and slack stealing. Of course, the validity of the static schedule is maintained. The on-line scheduler also employs dynamic voltage scaling and power management to obtain a power-efficient schedule. Experimental results show that the flexibility introduced into the static schedule helps improve the response times of soft aperiodic tasks by up to 43%. Dynamic voltage scaling and power management reduce power by up to 68%. The scheme in which the static schedule is allowed to be flexible achieves up to 3.2% more power saving compared to the scheme in which no flexibility is allowed, when both schemes are power-conscious. Our work gives an average architecture price saving of 30% over a previous approach for embedded system architectures synthesized with execution slots for hard aperiodic tasks present.
international symposium on microarchitecture | 2004
Li Shang; Li-Shiuan Peh; Amit Kumar; Niraj K. Jha
Due to the wire delay constraints in deep submicron technology and increasing demand for on-chip bandwidth, networks are becoming the pervasive interconnect fabric to connect processing elements on chip. With ever-increasing power density and cooling costs, the thermal impact of on-chip networks needs to be urgently addressed. In this work, we first characterize the thermal profile of the MIT Raw chip. Our study shows networks having comparable thermal impact as the processing elements and contributing significantly to overall chip temperature, thus motivating the need for network thermal management. The characterization is based on an architectural thermal model we developed for on-chip networks that takes into account the thermal correlation between routers across the chip and factors in the thermal contribution of on-chip interconnects. Our thermal model is validated against finite-element based simulators for an actual chip with associated power measurements, with an average error of 5.3%. We next propose ThermalHerd, a distributed, collaborative run-time thermal management scheme for on-chip networks that uses distributed throttling and thermal-correlation based routing to tackle thermal emergencies. Our simulations show ThermalHerd effectively ensuring thermal safety with little performance impact. With Raw as our platform, we further show how our work can be extended to the analysis and management of entire on-chip systems, jointly considering both processors and networks.
international conference on computer design | 1994
Anand Raghunathan; Niraj K. Jha
We present a behavioral synthesis method targeting low power consumption for data-dominated CMOS circuits. A study of how the high-level synthesis process affects power consumption is presented, based on, which we have developed the first allocation method for low power. We also present a method of optimizing the controller to reduce data path power dissipation. We consider loops, conditional branches, and scheduling constructs such as multicycling, chaining and structural pipelining. The techniques were implemented within the framework of an existing behavioral synthesis system. Experiments performed on various examples and benchmarks show that low power circuits can be synthesized by our method with very low or zero overheads.<<ETX>>
design automation conference | 2001
Jiong Luo; Niraj K. Jha
This paper addresses battery-aware static scheduling in battery-powered distributed real-time embedded systems. As suggested by previous work, reducing the discharge current level and shaping its distribution are essential for extending the battery lifespan. We propose two battery-aware static scheduling schemes. The first one optimizes the discharge power profile in order to maximize the utilization of the battery capacity. The second one targets distributed systems composed of voltage-scalable processing elements (PEs). It performs variable-voltage scheduling via efficient slack time re-allocation, which helps reduce the average discharge power consumption as well as flatten the discharge power profile. Both schemes guarantee the hard real-time constraints and precedence relationships in the real-time distributed embedded system specification. Based on previous work, we develop a battery lifespan evaluation metric which is aware of the shape of the discharge power profile. Our experimental results show that the battery lifespan can be increased by up to 29% by optimizing the discharge power file alone. Our variable-voltage scheme increases the battery lifespan by up to 76% over the non-voltage-scalable scheme and by up to 56% over the variable-voltage scheme without slack-time re-allocation.