Gershon Kedem
Duke University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gershon Kedem.
high-performance computer architecture | 1996
Thomas Alexander; Gershon Kedem
Microprocessor execution speeds are improving at a rate of 50%-80% per year while DRAM access times are improving at a much lower rate of 5%-10% per year. Computer systems are rapidly approaching the point at which overall system performance is determined not by the speed of the CPU but by the memory system speed. We present a high performance memory system architecture that overcomes the growing speed disparity between high performance microprocessors and current generation DRAMs. A novel prediction and prefetching technique is combined with a distributed cache architecture to build a high performance memory system. We use a table based prediction scheme with a prediction cache to prefetch data from the on-chip DRAM array to an on-chip SRAM prefetch buffer. By prefetching data we are able to hide the large latency associated with DRAM access and cycle times. Our experiments show that with a small (32 KB) prediction cache we can get an effective main memory access time that is close to the access time of larger secondary caches.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1989
Peter Ramyalal Suaris; Gershon Kedem
A description is given of a placement technique based on hypergraph quadrisection. The authors have developed a standard cell placement procedure based on recursively dividing the netlist into four parts, while minimizing the division cost. They have combined two ideas for placement. One is the extension of the min-cut bisection algorithm to handle quadrisection. The second idea is the simultaneous calculation of min-cut quadrisection and hierarchical global routing. The implementation details are discussed. The results show the implementation to be competitive with simulated annealing. >
design automation conference | 1990
Sujit Dey; Franc Brglez; Gershon Kedem
This paper introduces a circuit partitioning method based on analysis of reconvergent fanout. We consider a DAG model for a circuit. We define a corolla as a set of overlapping reconvergent fanout regions. We partition the DAG into a set of non-overlapping corollas and use the corollas to resynthesize the circuit. We show that resynthesis of large benchmark circuits consistently reduces transistor pairs and layout area while improving delay and testability.
international conference on computer design | 1988
Robert Lisanke; Franc Brglez; Gershon Kedem
The authors present a method for transforming multilevel equations into a gate-level netlist of a given technology. The proposed mapping procedure performs multiple mappings, each with randomly selected program parameters. The number of mappings is user-settable, and it offers the designer an option to trade off CPU runtime for better results. This feature is important to designers who begin by exploring the space of architectural possibilities, then finally create a specific, highly optimized circuit. The proposed technology mapping method has been implemented in C as a logic-design tool (McMAP) that takes full advantage of any gate librarys timing and area information. Using default parameter settings, the tool synthesized several standard benchmark examples yielding higher-quality circuits with lower CPU requirements than previously reported.<<ETX>>
design automation conference | 1991
Jack V. Briner Jr.; John L. Ellis; Gershon Kedem
We show that digital systems contain significantly more parallelism than previously thought. By reducing the dependency on time as the mechanism for synchronization. significant speedups are possible. By using asynchronous control ad Virtual Time synchronization with Lazy Cancellation, limited component sizes, special clock distribution and bounding windows, we get up to 23X speedup on a 32 processor system over a good sequential algorithm for mixed-level simulation.
acm symposium on solid modeling and applications | 1991
John L. Ellis; Gershon Kedem; T. C. Lyerly; D. G. Thielman; Richard J. Marisa; Jai Menon; Herbert B. Voelcker
Solid modeling is computationally intensive. Thus far its use in industry has been limited mainly to simple parts and simple applications, and this is not likely to change much until ‘massive’ computing power can be made available at an affordable cost. The RayCasting Engine described in this paper is one specialized source of ‘massive’ computing power for solid modeling, and it is but the simplest member of a potentially large family of ‘classification computers’ for solid modeling. The Ray Casting Engine (RCE) is a highly parallel, custom-VLSI computer that classifies grids of parallel lines against solids represented in CSG. The sets of parallel ‘in’ segments that the RCE produces are called ray representations (ray-reps); they can be thought of as sampled boundary representations. Ray-reps are obviously useful for graphics and mass-property calculation. Less obviously, they are surprisingly versatile if one exploits special properties -for example, boolean combination of solids by interval operations on ray-reps -and the fact that ray-reps are cheap to compute. Overall, the combination of a ‘new’ representation scheme (ray-reps) and a fast custom processor (the RCE) is changing our approach to solid modeling. We are now seeking ‘brute force’ solutions to problems, and are finding that some previously intractable problems -for example, those involving spatial sweeping and offsetting -are effectively computable and easy to program. This paper summarizes the genesis and principles of the RCE, some important properties of ray representations, and some exemplary applications of the (ray-rep, RCE) combination.
international conference on computer aided design | 1988
Jack V. Briner; John L. Ellis; Gershon Kedem
The optimal level of performance from parallel discrete-event simulation depends on the circuit being simulated, the vectors being simulated, and the machine on which the simulation is being performed. Empirical studies based on very simple models suggest that the amount of parallelism available in typical circuits is very small. A model of optimal performance for a machine with an infinite number of processors having uniform memory accesses is presented. It demonstrates that some circuits have significantly more parallelism than previously believed. The model is refined to define the optimal load partitioning for a machine with a finite number of processors with uniform access and extended to define the optimal static data partitioning. A metric is obtained which can be used to benchmark different models of parallel simulation. The effectiveness of these models in detecting performance problems of the version of RSIM running on the BBN Butterfly is shown.<<ETX>>
international conference on computer design | 1991
Sujit Dey; Franc Brglez; Gershon Kedem
The concepts of corolla partitioning based on an analysis of signal reconvergence to cyclic sequential circuits are extended. The sequential circuit is partitioned into corollas that will contain latches but can be peripherally retimed and resynthesized using combinational techniques. Cycles are broken in the circuit by ensuring that the partitions that are formed are acyclic. Application of the proposed partitioning, retiming and resynthesis approach to a set of large sequential benchmarks has shown considerable gains after resynthesis.<<ETX>>
international conference on computer design | 2000
Haifeng Yu; Gershon Kedem
This paper describes and evaluates DRAM-page based cache-line prediction and prefetching architecture. The scheme takes DRAM access timing into consideration in order to reduce prefetching overhead, amortizing the high cost of DRAM access by fetching two cache lines that reside on the same DRAM-page in a single access. On each DRAM access, one or two cache blocks may be prefetched. We combine three prediction mechanisms: history mechanism, stride, and one block lookahead, make them DRAM page sensitive and deploy them in an effective adaptive prefetching strategy. Our simulation shows that the prefetch mechanism can greatly improve system performance. Using a 32-KB prediction table cache, the prefetching scheme improves performance by 26%-55% on average over a baseline configuration, depending on the memory model. Moreover, the simulation shows that prefetching is more cost-effective than simply increasing L2-cache size or using a one block lookahead prefetching scheme. Simulation results also show that DRAM-page based prefetching yields higher relative performance as processors get faster, making the prefetching scheme more attractive for next generation processors.
annual computer security applications conference | 2005
Fareed Zaffar; Gershon Kedem; Ashish Gehani
The Paranoid file system is an encrypted, secure, global file system with user managed access control. The system provides efficient peer-to-peer application transparent file sharing. This paper presents the design, implementation and evaluation of the Paranoid file system and its access-control architecture. The system lets users grant safe, selective, UNIX-like, file access to peer groups across administrative boundaries. Files are kept encrypted and access control translates into key management. The system uses a novel transformation key scheme to effect access revocation. The file system works seamlessly with existing applications through the use of interposition agents. The interposition agents provide a layer of indirection making it possible to implement transparent remote file access and data encryption/decryption without any kernel modifications. System performance evaluations show that encryption and remote file-access overheads are small, demonstrating that the Paranoid system is practical