Shlomo Weiss
Tel Aviv University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shlomo Weiss.
IEEE Computer | 1994
James E. Smith; Shlomo Weiss
A discussion is given on two RISC implementations: from Digital Equipment Corporation, the Alpha 21064, and from IBM/Motorola/Apple, the PowerPC 601. Both are superscalar implementations, that is, they can sustain execution of two or more instructions per clock cycle. Otherwise, these two implementations present vastly different philosophies for achieving high performance. The PowerPC 601 focuses on powerful instructions and great flexibility in processing order, while the Alpha 21064 depends on a very fast clock, with simpler instructions and a more streamlined implementation structure. These two RISC microprocessors exemplify contrasting, but equally valid, implementation philosophies. An overview is given of the instruction sets and the authors emphasize the differences in design: PowerPC uses powerful instructions so that fewer are needed to get the job done; Alpha uses simple instructions so that the hardware can be kept simpler and faster. The authors also discuss the pipelined implementations of the two architectures; again, the contrast is between powerful and simple.<<ETX>>
design automation conference | 1984
Randy H. Katz; Shlomo Weiss
A design transaction is a sequence of operations mapping a consistent version of an object into a new version. We describe a mechanism, based on version checkout and change files, that supports controlled sharing and is resilient to crashes in a network of workstations and database servers.
ACM Computing Surveys | 2013
Omer Zilberberg; Shlomo Weiss; Sivan Toledo
This article surveys the current state of phase-change memory (PCM) as a nonvolatile memory technology set to replace flash and DRAM in modern computerized systems. It has been researched and developed in the last decade, with researchers providing better architectural designs which address the technologys main challenges—its limited write endurance, potential long latency, high energy writes, power dissipation, and some concerns for memory privacy. Some physical properties of the technology are also discussed, providing a basis for architectural discussions. Also briefly shown are other architectural alternatives, such as FeRAM and MRAM. The designs surveyed in this article include read before write, wear leveling, write cancellation, write pausing, some encryption schemes, and buffer organizations. These allow PCM to stand on its own as a replacement for DRAM as main memory. Designs for hybrid memory systems with both PCM and DRAM are also shown and some designs for SSDs incorporating PCM.
IEEE Computer Architecture Letters | 2008
Amit Golander; Shlomo Weiss; Ronny Ronen
DMR (dual modular redundancy) was suggested for increasing reliability. Classical DMR consists of pairs of cores that check each other and are pre-connected during manufacturing by dedicated links. In this paper we introduce the dynamic dual modular redundancy (DDMR) architecture. DDMR supports run-time scheduling of redundant threads, which has significant benefits relative to static binding. To allow dynamic pairing, DDMR replaces the special links with a novel ring architecture. DDMR uses short instruction sequences for validation, smaller than the processor reorder buffer. Such short sequences reduce latencies in parallel programs and save resources needed to buffer uncommitted data. DDMR scales with the number of cores and may be used in large multicore architectures.
Computer Standards & Interfaces | 2012
Gal Motika; Shlomo Weiss
One of the techniques used to improve I/O performance of virtual machines is paravirtualization. Paravirtualized devices are intended to reduce the performance overhead on full virtualization where all hardware devices are emulated. The interface of a paravirtualized device is not identical to that of the underlying hardware. The OS of the virtual guest machine must be ported in order to use a paravirtualized device. In this paper, the network virtualization done by the Kernel-based Virtual Machine (KVM) is described. The KVM model is different from other Virtual Machines Monitors (VMMs) because the KVM is a Linux kernel model and it depends on hardware support. In this work, the overhead of using such virtual networks is been measured. A paravirtualized model by using the virtio [38] network driver is described, and some performance results of web benchmark on the two models are presented. Research highlights This work provides an analysis of KVMs network I/O performance. Examines an Intel processors virtualization ability and how this ability is used for network I/O. Compares an emulated network driver with the virtio paravirtualized driver. Provides an applicative comparison on web servers, and Determine the difference on CPU and memory utilization of guest machines.
ieee convention of electrical and electronics engineers in israel | 2006
Simon Grinberg; Shlomo Weiss
The following outlines an effort to speedup the evaluation of a transactional memory system without loosing accuracy. Instead of using the traditional software simulation techniques we build our system within a large FPGA device. The system elements are a mix of commercially available IP cores and our own design. Together with appropriate runtime monitoring this approach yields a powerful substitute to simulation.
design automation conference | 1986
Shlomo Weiss; Katie Rotzell; Tom Rhyne; Arny Goldfein
This report describes DOSS and its capabilities, some design decisions made within it and the associated tradeoffs. DOSS is a storage system designed to support CAD applications efficiently. We define composite objects, examine their ability to capture design data and outline our approach to distributed object naming. We believe our choice of the system/application interface is crucial for achieving acceptable performance in the CAD environment. We also describe our approach to associative search, change control and version management.
international symposium on low power electronics and design | 2010
Nadav Levison; Shlomo Weiss
Modern embedded processors used in media and communication portable devices are now required to execute complex applications and their performance requirements are getting close to the demands of general purpose processors. The performance-per-Watt ratio is an extremely important measure in portable devices because of their limited power capacity. Branch predictors, and especially the BTB, are among the largest on-chip SRAM structures (after caches), and therefore are primary contributors to the total system power. We propose a novel micro-architectural method referred to as Shifted-Index BTB with a Set-Buffer, which reduces both dynamic and static power. Extensive simulations show that up to 80% reduction in dynamic power is achieved at the cost of up to 0.64% system slowdown. 58% reduction is static power is also achieved by applying low-leakage power techniques that mesh well with the Set-Buffer design.
Microprocessors and Microsystems | 2008
Roger Kahn; Shlomo Weiss
We propose Thrifty BTB, a mechanism to reduce the dynamic power dissipated by the BTB. We studied two mechanisms that reduce dynamic power dissipation. The first one is a serial-BTB configuration. The second mechanism is the filter-BTB, a combination of a low power counting Bloom filter placed in front of a conventional BTB. We also studied the effect of placing a small 32 entry direct-mapped BTB, functioning as a bypass, in parallel with the first two mechanisms. The filter-BTB reduces the number of lookups relative to a conventional BTB and the dynamic power dissipated. The serial-BTB variant only accesses the data array of the BTB upon a hit, therefore for most of the accesses the actual power dissipated is only what is dissipated by accessing the tag array. The bypass is used in parallel to either the filter- or the serial-BTB and reduces the performance cost by providing a low latency response in case of a hit. By integrating these mechanisms into a BTB design we achieve an average reduction of 51% in the dynamic power dissipation of the BTB. These benefits come at a small performance cost that is on average slightly less than 1.2%. The energy delay product was reduced by an average of 50%.
Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627) | 2002
Avishay Orpaz; Shlomo Weiss
CodePack is a code compression system used by IBM in its PowerPC family of embedded processors. CodePack combines high compression capability along with fast and simple decoding hardware. IBM did not release much information about the design of the system and the influence of various design parameters on its performance. In this paper we present the system and its design parameters and investigate how each affects its performance on the compression rate and decoder complexity. We also present a novel efficient algorithm to optimize the class structure of the system.