Sivan Toledo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sivan Toledo is active.

Explore More

Publication

Featured researches published by Sivan Toledo.

ACM Computing Surveys | 2005

Algorithms and data structures for flash memories

Eran Gal; Sivan Toledo

Flash memory is a type of electrically-erasable programmable read-only memory (EEPROM). Because flash memories are nonvolatile and relatively dense, they are now used to store files and other persistent objects in handheld computers, mobile phones, digital cameras, portable music players, and many other computer systems in which magnetic disks are inappropriate. Flash, like earlier EEPROM devices, suffers from two limitations. First, bits can only be cleared by erasing a large block of memory. Second, each block can only sustain a limited number of erasures, after which it can no longer reliably store data. Due to these limitations, sophisticated data structures and algorithms are required to effectively use flash memories. These algorithms and data structures support efficient not-in-place updates of data, reduce the number of erasures, and level the wear of the blocks in the device. This survey presents these algorithms and data structures, many of which have only been described in patents until now.

Ibm Journal of Research and Development | 1997

Improving the memory-system performance of sparse-matrix vector multiplication

Sivan Toledo

Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses (originally due to Das et al.), blocking to reduce load instructions, and prefetching to prevent multiple load-store units from stalling simultaneously. The techniques improve performance from about 40 MFLOPS (on a well-ordered matrix) to more than 100 MFLOPS on a 266-MFLOPS machine. The techniques are applicable to other superscalar RISC processors as well, and have improved performance on a Sun UItraSPARC I workstation, for example.

symposium on discrete algorithms | 1992

Applications of parametric searching in geometric optimization

Pankaj K. Agarwal; Micha Sharir; Sivan Toledo

We present several applications in computational geometry of Megiddos parametric searching technique. These applications include; (1) Finding the minimum Hausdorff distance in the Euclidean metric between two polygonal regions under translation; (2) Computing the biggest line segment that can be placed inside a simple polygon; (3) Computing the smallest width annulus that can contain a given set of points in the plane; (4) Solving the 1-segment center problem—given a set of points in the plane, find a placement for a given line segment (under translation and rotation) which minimizes the largest distance from the segment to the given points; (5) Given a set of n points in 3-space, finding the largest radius r such that if we place a ball of radius r around each point, no segment connecting a pair of points is intersected by a third ball. Besides obtaining efficient solutions to all these problems (which, in every case, either improve considerably previous solutions or are the first non-trivial solutions to these problems), our goal is to demonstrate the versatility of the parametric searching technique.

Journal of Parallel and Distributed Computing | 2004

Communication lower bounds for distributed-memory matrix multiplication

Dror Irony; Sivan Toledo; Alexander Tiskin

We present lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely used class of algorithms, the so-called two-dimensional (2D) algorithms, are optimal, in the sense that in any algorithm that only uses O(n2/P) words of memory per processor, at least one processor must send or receive Ω(n2/P1/2) words. We also show that algorithms from another class, the so-called three-dimensional (3D) algorithms, are also optimal. These algorithms use replication to reduce communication. We show that in any algorithm that uses O(n2/P2/3) words of memory per processor, at least one processor must send or receive Ω(n2/P2/3) words. Furthermore, we show a continuous tradeoff between the size of local memories and the amount of communication that must be performed. The 2D and 3D bounds are essentially instantiations of this tradeoff. We also show that if the input is distributed across the local memories of multiple nodes without replication, then Ω(n2) words must cross any bisection cut of the machine. All our bounds apply only to conventional Θ(n3) algorithms. They do not apply to Strassens algorithm or other o(n3) algorithms.

SIAM Journal on Matrix Analysis and Applications | 1997

Locality of Reference in LU Decomposition with Partial Pivoting

Sivan Toledo

This paper presents a new partitioned algorithm for LU decomposition with partial pivoting. The new algorithm, called the recursively partitioned algorithm, is based on a recursive partitioning of the matrix. The paper analyzes the locality of reference in the new algorithm and the locality of reference in a known and widely used partitioned algorithm for LU decomposition called the right-looking algorithm. The analysis reveals that the new algorithm performs a factor of

workshop on i/o in parallel and distributed systems | 1996