Douglas W. Clark | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Douglas W. Clark is active.

Explore More

Publication

Featured researches published by Douglas W. Clark.

IEEE Computer Graphics and Applications | 2000

Building and using a scalable display wall system

Kai Li; Han Wu Chen; Yuqun Chen; Douglas W. Clark; Perry R. Cook; Stefanos N. Damianakis; Georg Essl; Adam Finkelstein; Thomas A. Funkhouser; T. Housel; Allison W. Klein; Zhiyan Liu; Emil Praun; Jaswinder Pal Singh; B. Shedd; J. Pal; George Tzanetakis; J. Zheng

Princetons scalable display wall project explores building and using a large-format display with commodity components. The prototype system has been operational since March 1998. Our goal is to construct a collaborative space that fully exploits a large-format display system with immersive sound and natural user interfaces. Our prototype system is built with low-cost commodity components: a cluster of PCs, PC graphics accelerators, consumer video and sound equipment, and portable presentation projectors. This approach has the advantages of low cost and of tracking technology well, as high-volume commodity components typically have better price-performance ratios and improve at faster rates than special-purpose hardware. We report our early experiences in building and using the display wall system. In particular, we describe our approach to research challenges in several specific research areas, including seamless tiling, parallel rendering, parallel data visualization, parallel MPEG decoding, layered multiresolution video input, multichannel immersive sound, user interfaces, application tools, and content creation.

architectural support for programming languages and operating systems | 2004

Formal online methods for voltage/frequency control in multiple clock domain microprocessors

Qiang Wu; Philo Juang; Margaret Martonosi; Douglas W. Clark

Multiple Clock Domain (MCD) processors are a promising future alternative to todays fully synchronous designs. Dynamic Voltage and Frequency Scaling (DVFS) in an MCD processor has the extra flexibility to adjust the voltage and frequency in each domain independently. Most existing DVFS approaches are profile-based offline schemes which are mainly suitable for applications whose execution char-acteristics are constrained and repeatable. While some work has been published about online DVFS schemes, the prior approaches are typically heuristic-based. In this paper, we present an effective online DVFS scheme for an MCD processor which takes a formal analytic approach, is driven by dynamic workloads, and is suitable for all applications. In our approach, we model an MCD processor as a queue-domain network and the online DVFS as a feedback control problem with issue queue occupancies as feedback signals. A dynamic stochastic queuing model is first proposed and linearized through an accu-rate linearization technique. A controller is then designed and verified by stability analysis. Finally we evaluate our DVFS scheme through a cycle-accurate simulation with a broad set of applications selected from MediaBench and SPEC2000 benchmark suites. Compared to the best-known prior approach, which is heuristic-based, the proposed online DVFS scheme is substantially more effective due to its automatic regulation ability. For example, we have achieved a 2-3 fold increase in efficiency in terms of energy-delay product improvement. In addition, our control theoretic technique is more resilient, requires less tuning effort, and has better scalability as compared to prior online DVFS schemes.We believe that the techniques and methodology described in this paper can be generalized for energy control in processors other than MCD, such as tiled stream processors.

ieee visualization | 2000

Automatic alignment of high-resolution multi-projector display using an un-calibrated camera

Yuqun Chen; Douglas W. Clark; Adam Finkelstein; Timothy C. Housel; Kai Li

A scalable, high-resolution display may be constructed by tiling many projected images over a single display surface. One fundamental challenge for such a display is to avoid visible seams due to misalignment among the projectors. Traditional methods for avoiding seams involve sophisticated mechanical devices and expensive CRT projectors, coupled with extensive human effort for fine-tuning the projectors. The paper describes an automatic alignment method that relies on an inexpensive, uncalibrated camera to measure the relative mismatches between neighboring projectors, and then correct the projected imagery to avoid seams without significant human effort.

Communications of The ACM | 1977

An empirical study of list structure in Lisp

Douglas W. Clark; Cordell Green

Static measurements of the list structure of five large Lisp programs are reported and analyzed in this paper. These measurements reveal substantial regularity, or predictability, among pointers to atoms and especially among pointers to lists. Pointers to atoms are found to obey, roughly, Zipfs law, which governs word frequencies in natural languages; pointers to lists usually point to a location physically nearby in memory. The use of such regularities in the space-efficient representation of list structure is discussed. Linearization of lists, whereby successive cdrs (or cars) are placed in consecutive memory locations whenever possible, greatly strengthens the observed regularity of list structure. It is shown that under some reasonable assumptions, the entropy or information content of a car-cdr pair in the programs measured is about 10 to 15 bits before linearization, and about 7 to 12 bits after.

IEEE Transactions on Computers | 1999

Branch prediction, instruction-window size, and cache size: performance trade-offs and simulation techniques

Kevin Skadron; Pritpal S. Ahuja; Margaret Martonosi; Douglas W. Clark

Design parameters interact in complex ways in modern processors, especially because out-of-order issue and decoupling buffers allow latencies to be overlapped. Trade-offs among instruction-window size, branch-prediction accuracy, and instruction- and data-cache size can change as these parameters move through different domains. For example, modeling unrealistic caches can under- or overstate the benefits of better prediction or a larger instruction window. Avoiding such pitfalls requires understanding how all these parameters interact. Because such methodological mistakes are common, this paper provides a comprehensive set of SimpleScalar simulation results from SPECint95 programs, showing the interactions among these major structures. In addition to presenting this database of simulation results, major mechanisms driving the observed trade-offs are described. The paper also considers appropriate simulation techniques when sampling full-length runs with the SPEC reference inputs. In particular, the results show that branch mispredictions limit the benefits of larger instruction windows, that better branch prediction and better instruction cache behavior have synergistic effects, and that the benefits of larger instruction windows and larger data caches trade off and have overlapping effects. In addition, simulations of only 50 million instructions can yield representative results if these short windows are carefully selected.

ACM Transactions on Computer Systems | 1983

Cache Performance in the VAX-11/780

Douglas W. Clark

The performance of memory caches is usually studied through trace-driven simulation. This approach has several drawbacks. Notably, it excludes realistic multiprogramming, operating system, and I/O activity. In this paper, cache performance is studied by direct measurement of the hardware. A hardware monitor was attached to a VAX-11/780 computer, whose cache was then measured during normal use. A reproducible synthetic timesharing workload was also run. This paper reports measurements including the hit ratios of data and instruction references, the rate of cache invalidations by I/O, and the amount of waiting time due to cache misses. Additional measurements were made with half the cache disabled, and with the entire cache disabled.

international symposium on microarchitecture | 1998

Improving prediction for procedure returns with return-address-stack repair mechanisms

Kevin Skadron; Pritpal S. Ahuja; Margaret Martonosi; Douglas W. Clark

This paper evaluates several mechanisms for repairing the return-address stack after branch mispredictions. The return-address stack is a small but important structure for achieving better control-flow prediction accuracy and therefore better performance. But wrong-path execution after mispredictions frequently corrupts the return-address stack, making repair mechanisms necessary. If the processor implements multipath execution-simultaneously executing both sides of a branch-the contention among different paths makes the problem more severe. For conventional, single-path processors, this paper proposes saving both the top-of-stack pointer and the top-of-stack contents for later restoration in case of a misprediction. This simple technique achieves nearly 100% hit rates and improves performance by up to 8.7% compared to a stack with no repair mechanism. For multipath processors, providing each path with its own return-address stack completely eliminates contention, improving performance by over 25%.

high-performance computer architecture | 1997

Design issues and tradeoffs for write buffers

Kevin Skadron; Douglas W. Clark

Processors with write-through caches typically require a write buffer to hide the write latency to the next level of memory hierarchy and to reduce write traffic. A write buffer can cause processor stalls when it is full, when it contends with a cache miss for access to the next level of the hierarchy and when it contains the freshest copy of data needed by a load. This paper uses instruction level simulation of SPEC92 benchmarks to investigate how different write buffer depths, retirement policies, and load-hazard policies affect these three types of write-buffer stalls. Deeper buffers with adequate headroom, lazier retirement policies, and the ability to read data directly from the write buffer combine to substantially reduce write-buffer-induced stalls.

high-performance computer architecture | 2005

Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors

Qiang Wu; Philo Juang; Margaret Martonosi; Douglas W. Clark

Dynamic voltage and frequency scaling (DVFS) is a widely used method for energy-efficient computing. In this paper, we present a new intra-task online DVFS scheme for multiple clock domain (MCD) processors. Most existing online DVFS schemes for MCD processors use a fixed time interval between possible voltage/frequency changes. The downside to this approach is that the interval boundaries are predetermined and independent of workload changes. Thus, they can be late in responding to large, severe activity swings. In this work, we propose an alternative online DVFS scheme in which the reaction time is self-tuned and adaptive to application and work-load changes. In addition to designing such a scheme, we model the proposed DVFS control and use the derived model in a formal stability analysis. The obtained analytical insight is then used to guide and improve the design in terms of stability margin and control effectiveness. We evaluate our DVFS scheme through cycle-accurate simulation over a wide set of MediaBench and SPEC2000 benchmarks. Compared to the best-known prior fixed-interval DVFS schemes for MCD processors, the proposed DVFS scheme has a simpler decision process, which leads to smaller and cheaper hardware. Our scheme has achieved significant energy savings over all studied benchmarks (19% energy savings with 3% performance degradation on average, which is close to the best results from existing fixed-interval DVFS schemes). For a group of applications with fast workload variations, our scheme outperforms existing fixed-interval DVFS schemes significantly due to its adaptive nature. Overall, we feel the proposed adaptive online DVFS scheme is an effective and promising alternative to existing fixed-interval DVFS schemes. Designers may choose the new scheme for processors with limited hardware budget, or if the anticipated work-load behavior is variable. In addition, the modeling and analysis techniques in this work serve as examples of using stability analysis in other aspects of high-performance CPU design and control.

international symposium on computer architecture | 1996

Early Experience with Message-Passing on the SHRIMP Multicomputer

Edward W. Felten; Richard D. Alpert; Angelos Bilas; Matthias A. Blumrich; Douglas W. Clark; Stefanos N. Damianakis; Cezary Dubnicki; Liviu Iftode; Kai Li

The SHRIMP multicomputer provides virtual memory-mapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the intervention of the receiving node CPU. An important question is whether such a mechanism can indeed deliver all of the available hardware performance to applications which use conventional message-passing libraries.This paper reports our early experience with message-passing on a small, working SHRIMP multicomputer. We have implemented several user-level communication libraries on top of the VMMC mechanism, including the NX message-passing interface, Sun RPC, stream sockets, and specialized RPC. The first three are fully compatible with existing systems. Our experience shows that the VMMC mechanism supports these message-passing interfaces well. When zero-copy protocols are allowed by the semantics of the interface, VMMC can effectively deliver to applications almost all of the raw hardwares communication performance.

Explore More