D. Scott Wills | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where D. Scott Wills is active.

Explore More

Publication

Featured researches published by D. Scott Wills.

international symposium on computer architecture | 1987

Architecture of a message-driven processor

William J. Dally; Linda Chao; Andrew A. Chien; Soha Hassoun; Waldemar Horwat; Jon Kaplan; Paul Song; Brian Totty; D. Scott Wills

We propose a machine architecture for a high-performance processing node for a message-passing, MIMD concurrent computer. The principal mechanisms for attaining this goal are the direct execution and buffering of messages and a memory-based architecture that permits very fast context switches. Our architecture also includes a novel memory organization that permits both indexed and associative accesses and that incorporates an instruction buffer and message queue. Simulation results suggest that this architecture reduces message reception overhead by more than an order of magnitude.

computer vision and pattern recognition | 2007

Multimodal Mean Adaptive Backgrounding for Embedded Real-Time Video Surveillance

Senyo Apewokin; Brian Valentine; Linda M. Wills; D. Scott Wills; Antonio Gentile

Automated video surveillance applications require accurate separation of foreground and background image content. Cost sensitive embedded platforms place realtime performance and efficiency demands on techniques to accomplish this task. In this paper we evaluate pixel-level foreground extraction techniques for a low cost integrated surveillance system. We introduce a new adaptive technique, multimodal mean (MM), which balances accuracy, performance, and efficiency to meet embedded system requirements. Our evaluation compares several pixel-level foreground extraction techniques in terms of their computation and storage requirements, and functional accuracy for three representative video sequences. The proposed MM algorithm delivers comparable accuracy of the best alternative (Mixture of Gaussians) with a 6X improvement in execution time and an 18% reduction in required storage.

international conference on parallel architectures and languages europe | 1989

Universal Mechanisms for Concurrency

William J. Dally; D. Scott Wills

We propose a machine model consisting of a set of primitive mechanisms for communication, synchronization, and naming. These mechanisms have been selected as a compromise between what can easily be implemented in hardware and what is required to support parallel models of computation. Implementations of three models of parallel computation: actors, dataflow, and shared-memory using this model are sketched. The costs of the mechanisms on a particular parallel machine are presented and issues involved in implementing the model are discussed. Identifying a primitive set of mechanisms separates issues of programming models from issues of machine organization. Problems are partitioned into those involving implementation of the primitive mechanisms and those involving implementation of programming models and systems using the mechanisms.

languages, compilers, and tools for embedded systems | 2005

Static strands: safely collapsing dependence chains for increasing embedded power efficiency

Peter G. Sassone; D. Scott Wills; Gabriel H. Loh

Modern embedded processors are designed to maximize execution efficiency--the amount of performance achieved per unit of energy dissipated while meeting minimum performance levels. To increase this efficiency we propose utilizing static strands, dependence chains without fan-out which are exposed by a compiler pass. These dependent instructions are resequenced to be sequential and annotated to communicate their location to the hardware. Importantly, this modified application is binary compatible and functionally identical to the original, allowing transparent execution on a baseline processor. However, these static strands can be easily collapsed and optimized by simple processor modifications, significantly reducing the workload energy. Results show that over 30% of MediaBench and Spec2000int dynamic instructions can be collapsed, reducing issue logic energy by 16 to 24%, bypass energy 17 to 20%, and register file energy 13 to 14%. Additionally, by increasing the effective capactity of pipeline resources by almost a third, average IPC can be improved up to 15%. This performance gain can then be traded in for a lower clock frequency to maintain a basline level of performance, reducing energy further.

IEEE Transactions on Computers | 2008

An Instruction Throughput Model of Superscalar Processors

Tarek M. Taha; D. Scott Wills

Advances in semiconductor technology enable larger processor design spaces, leading to increasingly complex systems. At an initial stage, designers must evaluate many architecture design points to achieve a suitable design. Currently, most architecture exploration is performed using cycle accurate simulators. Although accurate, these tools are slow, thus limiting a comprehensive design search. The vast design space of todays complex processors and time to market economic pressures motivate the need for faster architectural evaluation methods. This paper presents a superscalar processor performance model that enables rapid exploration of the architecture design space for superscalar processors. It supplements current design tools by quickly identifying promising areas for more thorough and time consuming exploration with traditional tools. The model estimates the instruction throughput of a superscalar processor based on early architectural design parameters and application properties. It has been validated with the SimpleScalar out-of-order simulator. The core of the model, which executes 1.6 million times faster, produces instruction throughput estimates that are with within 5.5 percent of the corresponding SimpleScalar values.

Applied Optics | 2000

Focal-plane processing architectures for real-time hyperspectral image processing

Sek M. Chai; Antonio Gentile; Wilfredo E. Lugo-Beauchamp; Javier Fonseca; José L. Cruz-Rivera; D. Scott Wills

Real-time image processing requires high computational and I/O throughputs obtained by use of optoelectronic system solutions. A novel architecture that uses focal-plane optoelectronic-area I/O with a fine-grain, low-memory, single-instruction-multiple-data (SIMD) processor array is presented as an efficient computational solution for real-time hyperspectral image processing. The architecture is evaluated by use of realistic workloads to determine data throughputs, processing demands, and storage requirements. We show that traditional store-and-process system performance is inadequate for this application domain, whereas the focal-plane SIMD architecture is capable of supporting real-time performances with sustained operation throughputs of 500-1500 gigaoperations/s. The focal-plane architecture exploits the direct coupling between sensor and parallel-processor arrays to alleviate data-bandwidth requirements, allowing computation to be performed in a stream-parallel computation model, while data arrive from the sensors.

international parallel processing symposium | 1999

Real-Time Image Processing on a Focal Plane SIMD Array

Antonio Gentile; José L. Cruz-Rivera; D. Scott Wills; Leugim Bustelo; José Figueroa; Javier E. Fonseca-Camacho; Wilfredo E. Lugo-Beauchamp; Ricardo Olivieri; Marlyn Quiñones-Cerpa; Alexis H. Rivera-Ríos; Iomar Vargas-Gonzáles; Michelle Viera-Vera

Real-time image processing applications have tremendous computational workloads and I/O throughput requirements. Operation in mobile, portable devices poses stringent resource limitations (size, weight, and power). The SIMD Pixel Processor (SIMPil) has been designed at Georgia Tech to address these problems. In SIMPil, an image sensor array (focal plane) is integrated on top of a SIMD computing layer, where processing elements (PEs) are connected in a torus. A prototype processing element has been implemented in 0.8 μm CMOS technology. This paper evaluates the effectiveness of the SIMPil design on a set of important image applications. A target SIMPil system is described, which is capable of operating in the Tops/sec range in Gigascale technology. Simulation results indicate sustained operation throughput in the range of 100–1000 Gops/sec. These results support the design choices and suggest that more complex, multistage applications can be implemented to execute at real-time frame rates.

Journal of Parallel and Distributed Computing | 2004

The impact of grain size on the efficiency of embedded SIMD image processing architectures

Antonio Gentile; Sam Sander; Linda M. Wills; D. Scott Wills

Pixel-per-processing element (PPE) ratio-the amount of image data directly mapped to each processing element-has a significant impact on the area and energy efficiency of embedded SIMD architectures for image processing applications. This paper quantitatively evaluates the impact of PPE ratio on system performance and efficiency for focal-plane SIMD image processing architectures by comparing throughput, area efficiency, and energy efficiency for a range of common application kernels using architectural and workload simulation. While the impact of grain size is affected by the mix of executed instructions within an application program, the most efficient PPE ratio often does not occur at PE grain size extremes (i.e., one pixel per processor or one processor per image). In this study, a set of four image processing application tasks is implemented on eight different SIMD configurations. Each configuration has a different PPE ratio and a different amount of local memory. Cycle accurate simulation and analytical technology modeling allows assessment of execution performance, area efficiency, and energy efficiency for each configuration. Results show the highest area and energy efficiency are achieved at PPE ratios between 16 and 256. Using these evaluation techniques (application grain size retargeting combined with area and energy technology modeling), a new class of efficient, embedded SIMD architectures for image processing can be designed.

advanced video and signal based surveillance | 2007

Midground object detection in real world video scenes

Brian Valentine; Senyo Apewokin; Linda M. Wills; D. Scott Wills; Antonio Gentile

Traditional video scene analysis depends on accurate background modeling to identify salient foreground objects. However, in many important surveillance applications, saliency is defined by the appearance of a new non-ephemeral object that is between the foreground and background. This midground realm is defined by a temporal window following the objects appearance; but it also depends on adaptive background modeling to allow detection with scene variations (e.g., occlusion, small illumination changes). The human visual system is ill-suited for midground detection. For example, when surveying a busy airline terminal, it is difficult (but important) to detect an unattended bag which appears in the scene. This paper introduces a midground detection technique which emphasizes computational and storage efficiency. The approach uses a new adaptive, pixel-level modeling technique derived from existing backgrounding methods. Experimental results demonstrate that this technique can accurately and efficiently identify midground objects in real-world scenes, including PETS2006 and AVSS2007 challenge datasets.

parallel computing | 1997

A 100 Mbps, LED Through-Wafer Optoelectronic Link for Multicomputer Interconnection Networks

Phil May; Myunghee Lee; S.T. Wilkinson; O. Vendier; Zhuang Ho; Steven W. Bond; D. Scott Wills; Martin A. Brooke; Nan Marie Jokerst; April S. Brown

Through-wafer optoelectronic interconnect offers some architectural alternatives that are not available with wire-based interconnects. In order to compete with wire-based technologies, optoelectronic interconnects must provide reasonable performance in terms of bandwidth, bit error rate (BER), and power, using inexpensive and manufacturable devices. This paper presents a 100 Mbps link design under development as part of a scalable three-dimensional multicomputer network for a 4096 node system. Empirical and analytical data for emitters, detectors, receivers, and optical coupling is used to examine the tradeoffs between link power and bit error rate (BER). Because multicomputer networks demand extremely low BERs (10?15?10?20),hop-by-hoperror correction circuitry is incorporated to optimize BER, providing a robust channel. This approach employs a novel adaptation of the widely used wormhole routing protocol to minimize overhead and maximize compatibility with existing interconnect techniques.

Explore More