John Turek
IBM
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by John Turek.
IEEE Transactions on Computers | 1992
Harold S. Stone; John Turek; Joel L. Wolf
A model for studying the optimal allocation of cache memory among two or more competing processes is developed and used to show that, for the examples studied, the least recently used (LRU) replacement strategy produces cache allocations that are very close to optimal. It is also shown that when program behavior changes, LRU replacement moves quickly toward the steady-state allocation if it is far from optimal, but converges slowly as the allocation approaches the steady-state allocation. An efficient combinatorial algorithm for determining the optimal steady-state allocation, which, in theory, could be used to reduce the length of the transient, is described. The algorithm generalizes to multilevel cache memories. For multiprogrammed systems, a cache-replacement policy better than LRU replacement is given. The policy increases the memory available to the running process until the allocation reaches a threshold time beyond which the replacement policy does not increase the cache memory allocated to the running process. >
acm symposium on parallel algorithms and architectures | 1992
John Turek; Joel L. Wolf; Philip S. Yu
A parallehzab[e task is one that can be run on an arbitrary number of processors with a running time that depends on the number of processors allotted to it. Consider a parallel system having m identical processors and n independent para//eb.zab/e tasks to be scheduled on those processors. The goal is to find (1) for each task j, an allotment of processors /3
IEEE Computer | 1992
John Turek; Dennis E. Shasha
, and, (2) overall, a nonpreemptive schedule assigning the tasks to the processors which minimizes the makespan, or latest task completion time. This multiprocessor scheduling problem is known to be NP-complete in the strong sense. We therefore concentrate on providing a heuristic that has polynomial running time with provable worst case bounds on the suboptimality of the solution. In particular, we give an algorithm that selects a family of (up to n(rn – 1) + 1) candidate allotments of processors to tasks, thereby allowing us to use as a subroutine any algorithm A that ‘(solves” the simpler multiprocessor scheduling problem in which the number of processors allotted to a task is fixed. Our algorithm has the property that for a large class of previously studied algorithms our extension will match the same worst case bounds on the suboptimality of the solution while increasing the computational complexity of A by at most a factor of O(nm). As consequences we get polynomial time algorithms for the following:
IEEE Parallel & Distributed Technology: Systems & Applications | 1993
Janice M. Stone; Harold S. Stone; Philip Heidelberger; John Turek
Known results regarding consensus among processors are surveyed and related to practice. The ideas embodied in the various proofs are explained. The goal is to give practitioners some sense of the system hardware and software guarantees that are required to achieve a given level of reliability and performance. The survey focuses on two categories of failures: fail-stop failures, which occur when processors fail by stopping; and Byzantine failures, which occur when processors fail by acting maliciously.<<ETX>>
symposium on principles of database systems | 1992
John Turek; Dennis E. Shasha; Sundeep Prakash
A multiple reservation approach that allows atomic updates of multiple shared variables and simplifies concurrent and nonblocking codes for managing shared data structures such as queues and linked lists is presented. The method can be implemented as an extension to any cache protocol that grants write access to at most one processor at a time. Performance improvement, automatic restart, and livelock avoidance are discussed. Some sample programs are examined.<<ETX>>
acm symposium on parallel algorithms and architectures | 1994
John Turek; Walter Ludwig; Joel L. Wolf; Lisa Fleischer; Prasoon Tiwari; Jason Glasgow; Uwe Schwiegelshohn; Philip S. Yu
Nonblocking algorithms for concurrent data structures guarantee that a data structure is always accessible. This is in contrast to blocking algorithms in which a slow or halted process can render part or all of the data structure inaccessible to other processes. This paper proposes a technique that can convert most existing lock-based blocking data structure algorithms into nonblocking algorithms with the same functionality. Our instruction-by-instruction transformation can be applied to any algorithm having the following properties: •Interprocess synchronization is established solely through the use of locks. •There is no possiblity of deadlock (e.g. because of a well-ordering among the lock requests). In contrast to a previous work, our transformation requires only a constant amount of overhead per operation and, in the absence of failures, it incurs no penalty in the amount of concurrency that was available in the original data structure. The techniques in this paper may obviate the need for a wholesale reinvention of techniques for nonblocking concurrent data structure algorithms.
Ibm Journal of Research and Development | 1998
Vittorio Castelli; Lawrence D. Bergman; Ioannis Kontoyiannis; Chung-Sheng Li; John T. Robinson; John Turek
A <italic>parallelizable</italic> (or <italic>malleable</italic>) task is one which can be run on an arbitrary number of processors, with a task execution time that depends on the number of processors allotted to it. Consider a system of <italic>M</italic> independent parallelizable tasks which are to be scheduled without preemption on a parallel computer consisting of <italic>P</italic> identical processors. For each task, the execution time is a known function of the number of processors allotted to it. The goal is to find (1) for each task <italic>i</italic>, an allotment of processors β, and (2) overall, a non-preemptive schedule assigning the tasks to the processors which minimizes the average response time of the tasks. Equivalently, we can minimize the <italic>flow time</italic> which is the sum of the completion times of each of the tasks. In this paper we tackle the problem of finding a schedule with minimum average response time in the special case where each task in the system has sublinear speedup. This natural restriction on the task execution time means simply that the efficiency of a task decrease or remains constant as the number of processors allotted to it increases. The scheduling problem with sublinear speedups has been shown to be <inline-equation> <f> <ty><sc>NP</sc></ty></f> </inline-equation>-complete in the strong sense. We therefore focus on finding a polynomial time algorithm whose solution comes within a fixed multiplicative constant of optimal. In particular, we given algorithm which finds a schedule having a response time that is within 2 times that of the optimal schedule and which runs in O(M(M<supscrpt>2</supscrpt> + P)) time.
IEEE Transactions on Parallel and Distributed Systems | 1993
Joel L. Wolf; Philip S. Yu; John Turek; Daniel M. Dias
In this paper, we describe the architecture and implementation of a framework to perform content-based search of an image database, where content is specified by the user at one or more of the following three abstraction levels: pixel, feature, and semantic. This framework incorporates a methodology that yields a computationally efficient implementation of image-processing algorithms, thus allowing the efficient extraction and manipulation of user-specified features and content during the execution of queries. The framework is well suited for searching scientific databases, such as satellite-image-, medical-, and seismic-data repositories, where the volume and diversity of the information do not allow the a priori generation of exhaustive indexes, but we have successfully demonstrated its usefulness on still-image archives.
measurement and modeling of computer systems | 1992
John Turek; Joel L. Wolf; Krishna R. Pattipati; Philip S. Yu
Presents a parallel hash join algorithm that is based on the concept of hierarchical hashing, to address the problem of data skew. The proposed algorithm splits the usual hash phase into a hash phase and an explicit transfer phase, and adds an extra scheduling phase between these two. During the scheduling phase, a heuristic optimization algorithm, using the output of the hash phase, attempts to balance the load across the multiple processors in the subsequent join phase. The algorithm naturally identifies the hash partitions with the largest skew values and splits them as necessary, assigning each of them to an optimal number of processors. Assuming for concreteness a Zipf-like distribution of the values in the join column, a join phase which is CPU-bound, and a shared nothing environment, the algorithm is shown to achieve good join phase load balancing, and to be robust relative to the degree of data skew and the total number of processors. The overall speedup due to this algorithm is compared to some existing parallel hash join methods. The proposed method does considerably better in high skew situations. >
SIAM Journal on Computing | 1999
Uwe Schwiegelshohn; Walter Ludwig; Joel L. Wolf; John Turek; Philip S. Yu
In this paper we formulate the following natural multiprocessor scheduling problem: Consider a parallel system with <italic>P</italic> processors. Suppose that there are <italic>N</italic>tasks to be scheduled on this system, and that the execution time of each task <italic>j</italic> ε {1,…,<italic>N</italic>} is a nonincreasing function <italic>t<subscrpt>j</subscrpt>(β<subscrpt>j</subscrpt>)</italic> of the number of processors <italic>&&bgr;β<subscrpt>j</subscrpt></italic> ε {1,…,<italic>P</italic>} allotted to it. The goal is to find, for each task <italic>j</italic>, an allotment of processors <italic>β<subscrpt>j</subscrpt></italic>, and, overall, a schedule assigning the tasks to the processors which minimizes the makespan, or latest task completion time. The so-called shelf strategy is commonly used for orthogonal rectangle packing, a related and classic optimization problem. The prime difference between the orthogonal rectangle problem and our own is that in our case the rectangles are, in some sense, malleable: The height of each rectangle is a nonincreasing function of its width. In this paper, we solve our multiprocessor scheduling problem exactly in the context of a shelf-based paradigm. The algorithm we give uses techniques from resource allocation theory and employs a variety of other combinatorial optimization techniques.
