Vasileios Kontorinis
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vasileios Kontorinis.
international symposium on computer architecture | 2012
Vasileios Kontorinis; Liuyi Eric Zhang; Baris Aksanli; Jack Sampson; Houman Homayoun; Eddie Pettis; Dean M. Tullsen; Tajana Simunic Rosing
Power over-subscription can reduce costs for modern data centers. However, designing the power infrastructure for a lower operating power point than the aggregated peak power of all servers requires dynamic techniques to avoid high peak power costs and, even worse, tripping circuit breakers. This work presents an architecture for distributed per-server UPSs that stores energy during low activity periods and uses this energy during power spikes. This work leverages the distributed nature of the UPS batteries and develops policies that prolong the duration of their usage. The specific approach shaves 19.4% of the peak power for modern servers, at no cost in performance, allowing the installation of 24% more servers within the same power budget. More servers amortize infrastructure costs better and, hence, reduce total cost of ownership per server by 6.3%.
high-performance computer architecture | 2012
Houman Homayoun; Vasileios Kontorinis; Amirali Shayan; Ta-Wei Lin; Dean M. Tullsen
This paper describes an architecture for a dynamically heterogeneous processor architecture leveraging 3D stacking technology. Unlike prior work in the 2D plane, the extra dimension makes it possible to share resources at a fine granularity between vertically stacked cores. As a result, each core can grow or shrink resources, as needed by the code running on the core. This architecture, therefore, enables runtime customization of cores at a fine granularity and enables efficient execution at both high and low levels of thread parallelism. This architecture achieves performance gains from 9-41%, depending on the number of executing threads, and gains significant advantage in energy efficiency of up to 43%.
international symposium on microarchitecture | 2009
Vasileios Kontorinis; Amirali Shayan; Dean M. Tullsen; Rakesh Kumar
The increasing power dissipation of current processors and processor cores constrains design options, increases packaging and cooling costs, increases power delivery costs, and decreases reliability. Much research has been focused on decreasing average power dissipation, which most directly addresses cooling costs and reliability. However, much less has been done to decrease peak power, which most directly impacts the processor design, packaging, and power delivery. This research proposes a new architecture which provides a significant decrease in peak power with limited performance loss. It does this through the use of a highly adaptive processor. Many components of the processor can be configured at different levels, but because they are centrally controlled, the architecture can guarantee that they are never all configured maximally at the same time. This paper describes this adaptive processor and explores mechanisms for transitioning between allowed configurations to maximize performance within a peak power constraint. Such an architecture can cut peak power by 25% with less than 5% performance loss; among other advantages, this frees 5.3% of total core area used for decoupling capacitors.
international conference on computer design | 2013
Nikolaos Strikos; Vasileios Kontorinis; Xiangyu Dong; Houman Homayoun; Dean M. Tullsen
MRAM has emerged as one of the most attractive non-volatile solutions due to fast read access, low leakage power, high bit density, and long endurance. However, the high power consumption of write operations remains a barrier to the commercial adoption of MRAM technology. This paper addresses this problem by introducing low-current probabilistic writes (LCPW), a technique that reduces write access energy by lowering the amplitude of the write current pulse. Although low current pulses no longer guarantee successful bit write operations, we propose and evaluate a simple technique to ensure correctness and achieve significant power reduction over a typical MRAM implementation.
design automation conference | 2014
Vasileios Kontorinis; Mohammad Khavari Tavana; Mohammad Hossein Hajkazemi; Dean M. Tullsen; Houman Homayoun
Future computing platforms will need to be flexible, scalable, and power-conservative, while saving size, weight, energy, etc. Heterogeneous architecture can address these challenges by allowing each application to run on a core that matches resource needs more closely than a one-size-fits-all core. Dynamic heterogeneous architectures can extend these benefits further, allowing the system to construct the right core at run-time for each application, borrowing or freeing resources only as needed by the particular application that is running. The key insight in the described design is that 3D stacking of cores eliminates the fundamental barrier to dynamic heterogeneity, allowing various resources belonging to different cores to be shared at run-time with minimal overhead.
international symposium on low power electronics and design | 2010
Gaurav Dhiman; Vasileios Kontorinis; Dean M. Tullsen; Tajana Simunic Rosing; Eric C. Saxe; Jonathan J. Chew
Runtime characteristics of individual threads (such as IPC, cache usage, etc.) are a critical factor in making efficient scheduling decisions in modern chip-multiprocessor systems. They provide key insights into how threads interact when they share processor resources, and affect the overall system power and performance efficiency. In this paper, we propose and implement mechanisms and policies for a commercial OS scheduler and load balancer which incorporates thread characteristics, and show that it results in improvements of up to 30% in performance per watt.
2013 International Green Computing Conference Proceedings | 2013
Abbas BanaiyanMofrad; Houam Homayoun; Vasileios Kontorinis; Dean M. Tullsen; Nikil D. Dutt
Technology scaling and process variation severely degrade the reliability of Chip Multiprocessors (CMPs), especially their large cache blocks. To improve cache reliability, we propose REMEDIATE, a scalable fault-tolerant architecture for low-power design of shared Non-Uniform Cache Access (NUCA) cache in Tiled CMPs. REMEDIATE achieves fault-tolerance through redundancy from multiple banks to maximize the amount of fault remapping, and minimize the amount of capacity lost in the cache when the failure rate is high. REMEDIATE leverages a scalable fault protection technique using two different remapping heuristics in a distributed shared cache architecture with non-uniform latencies. We deploy a graph coloring algorithm to optimize REMEDIATEs remapping configuration. We perform an extensive design space exploration of operating voltage, performance, and power that enables designers to select different operating points and evaluate their design efficacy. Experimental results on a 4×4 tiled CMP system voltage scaled to below 400mV show that REMEDIATE saves up to 50% power while recovering more than 80% of the faulty cache area with only modest performance degradation.
international symposium on quality electronic design | 2012
Houman Homayoun; Mehryar Rahmatian; Vasileios Kontorinis; Shahin Golshan; Dean M. Tullsen
Modern microprocessor caches are often regarded as cool chip components that dissipate power uniformly. This research demonstrates that this uniformity is a misconception. Memory cell peripherals dissipate considerably higher power than the actual memory cell and this can result in up to 30°C of temperature difference between the warmest and the coolest part of the cache. To be effective and accurate, cache temperature and power modeling and management must take this effect into account. Further, this paper focuses on the surrounding logic of the memory cell and applies two novel techniques, peripheral bit swapping (PBS) and peripheral monitor and shutdown (PMSD), to reduce the thermal variation as well as reduce the corresponding steady-state temperature and leakage power of the cache. Overall, these techniques decrease temperature by 8°C for the L1 Data Cache and 5°C for the shared L2 cache and reduce their thermal gradient by more than 75%, on average.
international conference on parallel processing | 2012
Gaurav Dhiman; Vasileios Kontorinis; Raid Ayoub; Liuyi Eric Zhang; Chris Sadler; Dean M. Tullsen; Tajana Simunic Rosing
Archive | 2012
Vasileios Kontorinis; Jack Sampson; Liuyi Eric; Zhang Baris; Homayoun Tajana; Simunic Rosing; Dean M. Tullsen