Is this you? Create Your Porfile

Peter R. Cappello

University of California, Santa Barbara

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter R. Cappello is active.

Explore More

Publication

Featured researches published by Peter R. Cappello.

Concurrency and Computation: Practice and Experience | 1997

Javelin: Internet-based parallel computing using Java

Bernd Oliver Christiansen; Peter R. Cappello; Mihai F. Ionescu; Michael O. Neary; Klaus E. Schauser; Daniel Wu

The JAVM (Java Astra Virtual Machine) project is about harnessing the immense computational resource available in the Internet for parallel processing. In this paper, the suitability of Java for Internet-based parallel computing is explored. Next, existing implementations of systems that make use of Java for network parallel computing are presented and categorized. A critique of these implementations follows. Basing on the critique, the requirements and goals of an effective parallel computing system in the Internet environment are singled out. These serve as the blueprint for the development of the JAVM system. Its infrastructure and features, namely ease of use, heterogeneity, portability, security, fault tolerance, load balancing, scalability and accountability, are discussed. Lastly, experimental results based on the running of several parallel applications in the JAVM environment are presented. Basing on the results, the kind of parallel applications that would be well suited for running in JAVM are identified.

IEEE Transactions on Computers | 1990

Easily testable iterative logic arrays

Cheng-Wen Wu; Peter R. Cappello

Iterative logic arrays (ILAs) are studied with respect to two testing problems. First, a variety of conditions is presented. Meeting these conditions guarantees an upper bound on the size of the test set for the ILA under consideration. Second, techniques for designing optimally testable ILAs are presented. The arrays treated are, in some cases, more general than those that have been reported by other researchers: they include multidimensional and inhomogeneous arrays. Octagonally connected arrays and bilateral arrays are also discussed. The results indicate that the characteristics of the individual cell functions (e.g. whether they are bijective) are a good guide to the test complexity of the overall array. Matrix multiplication, as an example, is shown to have several different optimally testable implementations. The results are useful for combinational and pipelined arrays and for certain systolic arrays. >

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1984

Some complexity issues in digital signal processing

Peter R. Cappello; Kenneth Steiglitz

Over the past decade a large class of problems, called NP-complete [5], have been shown to be equivalent in the sense that if a fast algorithm can be found for one, fast algorithms can be found for all. At the same time, despite much effort, no fast algorithms have been found for any, and these problems are widely regarded as intractable. This class includes such notoriously difficult problems as the traveling salesman problem, graph coloring, and satisfiability of Boolean expressions. Using FIR filter implementation as an illustration, we describe some problems in digital signal processing that are NP-complete. These include: 1) minimize the number of additions needed to implement a fixed FIR filter; 2) minimize the number of registers needed to implement a fixed FIR filter; and 3) minimize the time to perform the additions of such an FIR filter using P adders. Large-seale instances of such problems may become important with the use of programmable chips to implement signal processing. Our main purpose in this paper is to illustrate the usefulness of asymptotic complexity theory in the field of digital signal processing. The theory discriminates between tractable and intractable problems, Sometimes identifies fast algorithms for the former, and justifies heuristics for the latter.

Proceedings of the ACM 1999 conference on Java Grande | 1999

Javelin++: scalability issues in global computing

Michael O. Neary; Sean P. Brydon; Paul Kmiec; Sami Rollins; Peter R. Cappello

Javelin is a Java-based infrastructure for global computing. This paper presents Javelin++, an extension of Javelin, intended to support a much larger set of computational hosts. First, Javelin++‘s switch from Java applets to Java applications is explained. Then, two scheduling schemes are presented: a probabilistic work-stealing scheduler and a deterministic scheduler. The deterministic scheduler also implements eager scheduling, as well as another fault-tolerance mechanism for hosts that have failed or retreated. A Javelin++ API is sketched, then illustrated on a raytracing application. Performance results for the two schedulers are reported, indicating that Javelin++, with its broker network, scales better than the original Javelin.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1988

Systolic architectures for vector quantization

G. Davidson; Peter R. Cappello; Allen Gersho

A family of architectural techniques are proposed which offer efficient computation of weighted Euclidean distance measures for nearest-neighbor codebook searching. The general approach uses a single metric comparator chip in conjunction with a linear array of inner product processor chips. Very high vector-quantization (VQ) throughout can be achieved for many speech and image-processing applications. Several alternative configurations allow reasonable tradeoffs between speed and VLSI chip area required. >

Future Generation Computer Systems | 1999

Javelin: parallel computing on the internet

Michael O. Neary; Bernd Oliver Christiansen; Peter R. Cappello; Klaus E. Schauser

Abstract Java offers the basic infrastructure needed to integrate computers connected to the Internet into a seamless distributed computational resource: an infrastructure for running coarse-grained parallel applications on numerous, anonymous machines. First, we sketch such a resource’s essential technical properties. Then, we present a prototype of Javelin , an infrastructure for global computing. The system is based on Internet software that is interoperable, increasingly secure, and ubiquitous: Java-enabled Web technology. Ease of participation is seen as a key property for such a resource to realize the vision of a multiprocessing environment comprising thousands of computers. Javelin’s architecture and implementation require participants to have access to only a Java-enabled Web browser. Experimental results are given in the form of a Mersenne Prime application and a ray-tracing application that run on a heterogeneous network of several parallel machines, workstations, and PCs. Two key areas of current research, fault-tolerance and scalability, are subsequently explored briefly.

european conference on parallel processing | 2000

Javelin 2.0: Java-Based Parallel Computing on the Internet

Michael O. Neary; Alan Phipps; Steven Richman; Peter R. Cappello

This paper presents Javelin 2.0. It presents architectural enhancements that facilitate aggregating larger sets of host processors. It then presents: a branch-and-bound computational model, the supporting architecture, a scalable task scheduler using distributed work stealing, a distributed eager scheduler implementing fault tolerance, and the results of performance experiments. Javelin 2.0 frees application developers from concerns about complex interprocessor communication and fault tolerance among Internetworked hosts. When all or part of their application can be cast as a piecework or a branch-and-bound computation, Javelin 2.0 allows developers to focus on the underlying application.

ACM Transactions on Computer Systems | 1983

A VLSI layout for a pipelined Dadda multiplier

Peter R. Cappello; Kenneth Steiglitz

Parallel counters (unary-to-binary converters) are the principal component of a dadda multiplier. The authors specify a design first for a pipelined parallel counter, and then for a complete multiplier. As a result of its structural regularity, the layout is suitable for use in a VLSI implementation. They analyze the complexity of the resulting design using a VLSI model of computation, showing that it is optimal with respect to both its period and latency. In this sense the design compares favorably with other recent VLSI multiplier designs. 24 references.

Concurrency and Computation: Practice and Experience | 2005

Advanced eager scheduling for Java-based adaptive parallel computing

Michael O. Neary; Peter R. Cappello

Javelin 3 is a software system for developing large‐scale, fault‐tolerant, adaptively parallel applications. When all or part of their application can be cast as a master–worker or branch‐and‐bound computation, Javelin 3 frees application developers from concerns about inter‐processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault‐tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load‐balancing in the presence of tasks whose non‐uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts. Copyright

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1983

Completely-pipelined architectures for digital signal processing

Peter R. Cappello; Kenneth Steiglitz

A class of completely-pipelined VLSI architectures is defined. Two topologies are then described: leaf-connected trees and mesh-connected trees. The leaf-connected tree structure is used to construct a completely-pipelined bit-serial multiplier and a completely-pipelined word-serial, bit-serial convolver. The mesh-connected tree structure is used to implement completely-pipelined bit-parallel multiplication and completely-pipelined word-parallel bit-parallel convolution. Layouts are described that are within log factors of asymptotic optimality. It is shown that, asymptotically, the area required for power distribution actually dominates the rest of the area for a wide class of structures. This illustrates the importance of studying the constants of proportionality in evaluating area, time, and energy requirements, and suggests that the choice of topologies may very well depend on the fabrication technology. The importance of parameterized and high-level design is stressed throughout. Also stressed is the idea of applying sound architectural technique at all levels of information organization, including, in particular, the bit level.

Explore More