Timothy G. Mattson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Timothy G. Mattson is active.

Explore More

Publication

Featured researches published by Timothy G. Mattson.

ieee international conference on high performance computing data and analytics | 2010

The 48-core SCC Processor: the Programmer's View

Timothy G. Mattson; Michael Riepen; Thomas Lehnig; Paul Brett; Patrick Kennedy; Jason Howard; Sriram R. Vangal; Nitin Borkar; Gregory Ruhl; Saurabh Dighe

The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures. The hardware has been described elsewhere. In this paper, we describe the programmers view of this chip. In particular we describe RCCE: the native message passing model created for the SCC processor.

ieee international conference on high performance computing data and analytics | 2008

Programming the Intel 80-core network-on-a-chip terascale processor

Timothy G. Mattson; Rob F. Van der Wijngaart; Michael Frumkin

Intels 80-core terascale processor was the first generally programmable microprocessor to break the Teraflops barrier. The primary goal for the chip was to study power management and on-die communication technologies. When announced in 2007, it received a great deal of attention for running a stencil kernel at 1.0 single precision TFLOPS while using only 97 Watts. The literature about the chip, however, focused on the hardware, saying little about the software environment or the kernels used to evaluate the chip. This paper completes the literature on the 80-core terascale processor by fully defining the chips software environment. We describe the instruction set, the programming environment, the kernels written for the chip, and our experiences programming this microprocessor. We close by discussing the lessons learned from this project and what it implies for future message passing, network-on-a-chip processors.

parallel computing | 1994

The Linda alternative to message-passing systems

Nicholas Carriero; David Gelernter; Timothy G. Mattson; Andrew H. Sherman

Abstract The use of distributed data structures in a logically-shared memory is a natural, readily-understood approach to parallel programming. The principal argument against such an approach for portable software has always been that efficient implementations could not scale to massively-parallel, distributed memory machines. Now, however, there is growing evidence that it is possible to develop efficient and portable implementations of virtual shared memory models on scalable architectures. In this paper we discuss one particular example: Linda. After presenting an introduction to the Linda model, we focus on the expressiveness of the model, on techniques required to build efficient implementations, and on observed performance both on workstation networks and distributed-memory parallel machines. Finally, we conclude by briefly discussing the range of applications developed with Linda and Lindas suitability for the sorts of heterogeneous, dynamically-changing computational environments that are of growing significance.

Operating Systems Review | 2011

Light-weight communications on Intel's single-chip cloud computer processor

Rob F. Van der Wijngaart; Timothy G. Mattson

Many-core chips are changing the way high-performance computing systems are built and programmed. As it is becoming increasingly difficult to maintain cache coherence across many cores, manufacturers are exploring designs that do not feature any cache coherence between cores. Communications on such chips are naturally implemented using message passing, which makes them resemble clusters, but with an important difference. Special hardware can be provided that supports very fast on-chip communications, reducing latency and increasing bandwidth. We present one such chip, the Single-Chip Cloud Computer (SCC). This is an experimental processor, created by Intel Labs. We describe two communication libraries available on SCC: RCCE and Rckmb. RCCE is a light-weight, minimal library for writing message passing parallel applications. Rckmb provides the data link layer for running network services such as TCP/IP. Both utilize SCCs non-cache-coherent shared memory for transferring data between cores without needing to go off-chip. In this paper we describe the design and implementation of RCCE and Rckmb. To compare their performance, we consider simple benchmarks run with RCCE, and MPI over TCP/IP.

international conference on management of data | 2015

The BigDAWG Polystore System

Jennie Duggan; Aaron J. Elmore; Michael Stonebraker; Magdalena Balazinska; Bill Howe; Jeremy Kepner; Samuel Madden; David Maier; Timothy G. Mattson; Stan Zdonik

This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that â no one size fits allâ . To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data

IEEE Design & Test of Computers | 2008

The Concurrency Challenge

Wen-mei W. Hwu; Kurt Keutzer; Timothy G. Mattson

The evolutionary path of microprocessor design includes both multicore and many-core architectures. Harnessing the most computing throughput from these architectures requires concurrent or parallel execution of instructions. The authors describe the challenges facing the industry as parallel-computing platforms become even more widely available.

Proceedings of the 2010 Workshop on Parallel Programming Patterns | 2010

A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects

Kurt Keutzer; Berna L. Massingill; Timothy G. Mattson; Beverly A. Sanders

Parallel programming is stuck. To make progress, we need to step back and understand the software people wish to engineer. We do this with a design pattern language. This paper provides background for a lively discussion of this pattern language. We present the context for the problem, the layers in the design pattern language, and descriptions of the patterns themselves.

european conference on parallel processing | 2000

A Pattern Language for Parallel Application Programs

Berna L. Massingill; Timothy G. Mattson; Beverly A. Sanders

A design pattern is a description of a high-quality solution to a frequently occurring problem in some domain. A pattern language is a collection of design patterns that are carefully organized to embody a design methodology. A designer is led through the pattern language, at each step choosing an appropriate pattern, until the final design is obtained in terms of a web of patterns. This paper describes a pattern language for parallel application programs. The goal of our pattern language is to lower the barrier to parallel programming by guiding a programmer through the entire process of developing a parallel program.

international conference on parallel processing | 1996

A TeraFLOP supercomputer in 1996: the ASCI TFLOP system

Timothy G. Mattson; David S. Scott; Stephen R. Wheat

To maintain the integrity of the US nuclear stockpile without detonating nuclear weapons, the DOE needs the results of computer-simulations that overwhelm the worlds most powerful supercomputers. Responding to this need, the US Department of Energy (DOE) initiated the Accelerated Strategic Computing Initiative (ASCI). This program accelerates the development of new scalable supercomputers resulting in a TeraFLOP computer before the end of 1996. In September 1995, DOE announced that it would work with Intel Corporation to build the ASCI TFLOP supercomputer. This system would use commodity commercial off-the-shelf (CCOTS) components to keep the price under control and would contain over 9000 Intel Pentium Pro processors. In this paper, we describe the hardware and software design of this supercomputer.

Multiprocessor System-on-Chip | 2011

The Case for Message Passing on Many-Core Chips

Rakesh Kumar; Timothy G. Mattson; Gilles Pokam; Rob F. Van der Wijngaart

The debate over shared memory vs. message passing programming models has raged for decades, with cogent arguments on both sides. In this paper, we revisit this debate for multicore chips and argue that message passing programming models are often more suitable than shared memory models for addressing the problems presented by the many-core era.

Explore More