Lizy Kurian John
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lizy Kurian John.
IEEE Computer | 2004
Doug Burger; Stephen W. Keckler; Kathryn S. McKinley; Michael Dahlin; Lizy Kurian John; Calvin Lin; Charles R. Moore; James H. Burrill; Robert McDonald; William Yoder
Microprocessor designs are on the verge of a post-RISC era in which companies must introduce new ISAs to address the challenges that modern CMOS technologies pose while also exploiting the massive levels of integration now possible. To meet these challenges, we have developed a new class of ISAs, called explicit data graph execution (EDGE), that will match the characteristics of semiconductor technology over the next decade. The TRIPS architecture is the first instantiation of an EDGE instruction set, a new, post-RISC class of instruction set architectures intended to match semiconductor technology evolution over the next decade, scaling to new levels of power efficiency and high performance.
measurement and modeling of computer systems | 2003
Tao Li; Lizy Kurian John
The increasing constraints on power consumption in many computing systems point to the need for power modeling and estimation for all components of a system. The Operating System (OS) constitutes a major software component and dissipates a significant portion of total power in many modern application executions. Therefore, modeling OS power is imperative for accurate software power evaluation, as well as power management (e.g. dynamic thermal control and equal energy scheduling) in the light of OS-intensive workloads. This paper characterizes the power behavior of a commercial OS across a wide spectrum of applications to understand OS energy profiles and then proposes various models to cost-effectively estimate its run-time energy dissipation. The proposed models rely on a few simple parameters and have various degrees of complexity and accuracy. Experiments show that compared with cycle-accurate full-system simulation, the model can predict cumulative OS energy to within 1% accuracy for a set of benchmark programs evaluated on a high-end superscalar microprocessor. When applied to track run-time OS energy profiles, the proposed routine level OS power model offers superior accuracy than a simpler, flat OS power model, yielding per-routine estimation error of less than 6%. The most striking observation is the strong correlation between power consumption and the instructions per cycle (IPC) during OS routine executions. Since tools and methodology to measure IPC exist on modern microprocessors, the proposed models can estimate OS power for run-time dynamic thermal and energy management.
great lakes symposium on vlsi | 1999
R. Shalem; Eugene John; Lizy Kurian John
A novel low power and low transistor count static energy recovery full adder (SERF) is presented in this paper. The power consumption and general characteristics of the SERF adder are then compared against three low powerful adders; the transmission function adder (TFA) the dual value logic (DVL) adder and the fourteen transistor (14 T) full adder. The proposed SERF adder design was proven to be superior to the other three designs in power dissipation and area, and second in propagation delay only to the DVL adder. The combination of low power and low transistor count makes the new SERF cell a viable option for low power design.
international symposium on performance analysis of systems and software | 2005
Aashish Phansalkar; Ajay Joshi; Lieven Eeckhout; Lizy Kurian John
It is essential that a subset of benchmark programs used to evaluate an architectural enhancement, is well distributed within the target workload space rather than clustered in specific areas. Past efforts for identifying subsets have primarily relied on using microarchitecture-dependent metrics of program performance, such as cycles per instruction and cache miss-rate. The shortcoming of this technique is that the results could be biased by the idiosyncrasies of the chosen configurations. The objective of this paper is to present a methodology to measure similarity of programs based on their inherent microarchitecture-independent characteristics which will make the results applicable to any microarchitecture. We apply our methodology to the SPEC CPU2000 benchmark suite and demonstrate that a subset of 8 programs can be used to effectively represent the entire suite. We validate the usefulness of this subset by using it to estimate the average IPC and L1 data cache miss-rate of the entire suite. The average IPC of 8-way and 16-way issue superscalar processor configurations could be estimated with 3.9% and 4.4% error respectively. This methodology is applicable not only to find subsets from a benchmark suite, but also to identify programs for a benchmark suite from a list of potential candidates. Studying the four generations of SPEC CPU benchmark suites, we find that other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained the same
international conference on parallel architectures and compilation techniques | 2006
Kenneth Hoste; Aashish Phansalkar; Lieven Eeckhout; Andy Georges; Lizy Kurian John; Koen De Bosschere
A key challenge in benchmarking is to predict the performance of an application of interest on a number of platforms in order to determine which platform yields the best performance. This paper proposes an approach for doing this. We measure a number of microarchitecture-independent characteristics from the application of interest, and relate these characteristics to the characteristics of the programs from a previously profiled benchmark suite. Based on the similarity of the application of interest with programs in the benchmark suite, we make a performance prediction of the application of interest. We propose and evaluate three approaches (normalization, principal components analysis and genetic algorithm) to transform the raw data set of microarchitecture-independent characteristics into a benchmark space in which the relative distance is a measure for the relative performance differences. We evaluate our approach using all of the SPEC CPU2000 benchmarks and real hardware performance numbers from the SPEC website. Our framework estimates per-benchmark machine ranks with a 0.89 average and a 0.80 worst case rank correlation coefficient.
international symposium on computer architecture | 2004
Lieven Eeckhout; Robert H. Bell; B. Stougie; K. De Bosschere; Lizy Kurian John
Designing a new microprocessor is extremely time-consuming. One of the contributing reasons is that computer designers rely heavily on detailed architectural simulations, which are very time-consuming. Recent work has focused on statistical simulation to address this issue. The basic idea of statistical simulation is to measure characteristics during program execution, generate a synthetic trace with those characteristics and then simulate the synthetic trace. The statistically generated synthetic trace is orders of magnitude smaller than the original program sequence and hence results in significantly faster simulation. This paper makes the following contributions to the statistical simulation methodology. First, we propose the use of a statistical flow graph to characterize the control flow of a program execution. Second, we model delayed update of branch predictors while profiling program execution characteristics. Experimental results show that statistical simulation using this improved control flow modeling attains significantly better accuracy than the previously proposed HLS system. We evaluate both the absolute and the relative accuracy of our approach for power/performance modeling of superscalar microarchitectures. The results show that our statistical simulation framework can be used to efficiently explore processor design spaces.
IEEE Transactions on Computers | 2006
Ajay Joshi; Aashish Phansalkar; Lieven Eeckhout; Lizy Kurian John
This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites - SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures - this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged
Archive | 2005
Lizy Kurian John; Lieven Eeckhout
INTRODUCTION AND OVERVIEW L.K. John and L. Eeckhout PERFORMANCE MODELING AND MEASUREMENT TECHNIQUES L.K. John BENCHMARKS L.K. John AGGREGATING PERFORMANCE METRICS OVER A BENCHMARK SUITE L.K. John STATISTICAL TECHNIQUES FOR COMPUTER PERFORMANCE ANALYSIS D.L. Lilja and J.J. Yi STATISTICAL SAMPLING FOR PROCESSOR AND CACHE SIMULATION T.M. Conte and P.D. Bryan SIMPOINT: PICKING REPRESENTATIVE SAMPLES TO GUIDE SIMULATION C. Calder, T. Sherwood, G. Hamerly, and E. Perelman STATISTICAL SIMULATION L. Eeckhout BENCHMARK SELECTION L. Eeckhout INTRODUCTION TO ANALYTICAL MODELS E.J. Kim, K.H. Yum, and C.R. Das PERFORMANCE MONITORING HARDWARE AND THE PENTIUM 4 PROCESSOR B. Sprunt PERFORMANCE MONITORING ON THE POWER5(TM) MICROPROCESSOR A. Mericas PERFORMANCE MONITORING IN THE ITANIUM(R) PROCESSOR FAMILY R. Zahir, K. Menezes, and S. Fernando INDEX
design automation conference | 2009
Jian Chen; Lizy Kurian John
Heterogeneous multicore processors promise high execution efficiency under diverse workloads, and program scheduling is critical in exploiting this efficiency. This paper presents a novel method to leverage the inherent characteristics of a program for scheduling decisions in heterogeneous multicore processors. The proposed method projects the cores configuration and the programs resource demand to a unified multi-dimensional space, and uses weighted Euclidean distance between these two to guide the program scheduling. The experimental results show that on average, this distance based scheduling heuristic achieves 24.5% reduction in energy delay product, 6.1% reduction in energy, and 9.1% improvement in throughput when compared with traditional hardware oblivious scheduling algorithm.
international symposium on microarchitecture | 2011
Dimitris Kaseridis; Jeffrey A. Stuecheli; Lizy Kurian John
Contemporary DRAM systems have maintained impressive scaling by managing a careful balance between performance, power, and storage density. In achieving these goals, a significant sacrifice has been made in DRAMs operational complexity. To realize good performance, systems must properly manage the significant number of structural and timing restrictions of the DRAM devices. DRAMs use is further complicated in many-core systems where the memory interface is shared among multiple cores/threads competing for memory bandwidth.