Rebecca L. Collins | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rebecca L. Collins is active.

Explore More

Publication

Featured researches published by Rebecca L. Collins.

dependable systems and networks | 2005

Small parity-check erasure codes - exploration and observations

James S. Plank; Adam L. Buchsbaum; Rebecca L. Collins; Michael G. Thomason

Erasure codes have profound uses in wide- and medium-area storage applications. While infinite-size codes have been developed with optimal properties, there remains a need to develop small codes with optimal properties. In this paper, we provide a framework for exploring very small codes, and we use this framework to derive optimal and near-optimal ones for discrete numbers of data bits and coding bits. These codes have heretofore been unknown and unpublished, and should be useful in practice. We also use our exploration to make observations about upper bounds for these codes, in order to gain a better understanding of them and to spur future derivations of larger, optimal and near-optimal codes.

embedded software | 2009

Flexible filters: load balancing through backpressure for stream programs

Rebecca L. Collins; Luca P. Carloni

Stream processing is a promising paradigm for programming multi-core systems for high-performance embedded applications. We propose flexible filters as a technique that combines static mapping of the stream program tasks with dynamic load balancing of their execution. The goal is to improve the system-level processing throughput of the program when it is executed on a distributed-memory multi-core system as well as the local (core-level) memory utilization. Our technique is distributed and scalable because it is based on point-to-point handshake signals exchanged between neighboring cores. Load balancing with flexible filters can be applied to stream applications that present large dynamic variations in the computational load of their tasks and the dimension of the stream data tokens. In order to demonstrate the practicality of our technique, we present performance improvements for the case study of a JPEG encoder running on the IBM Cell multi-core processor.

dependable systems and networks | 2005

Assessing the performance of erasure codes in the wide-area

Rebecca L. Collins; James S. Plank

The problem of efficiently retrieving a file that has been broken into blocks and distributed across the wide-area pervades applications that utilize grid, peer-to-peer, and distributed file systems. While the use of erasure codes to improve the fault-tolerance and performance of wide-area file systems has been explored, there has been little work that assesses the performance and quantifies the impact of modifying various parameters. This paper performs such an assessment. We modify our previously defined framework for studying replication in the wide-area to include both Reed-Solomon and low-density parity-check (LDPC) erasure codes. We then use this framework to compare Reed-Solomon and LDPC erasure codes in three wide-area, distributed settings. We conclude that although LDPC codes have an advantage over Reed-Solomon codes in terms of decoding cost, this advantage does not always translate to the best overall performance in wide-area storage situations.

international parallel and distributed processing symposium | 2004

High performance computational tools for Motif discovery

Nicole Baldwin; Rebecca L. Collins; Michael A. Langston; Christopher T. Symons; Michael R. Leuze; Brynn H. Voy

Summary form only given. We highlight a fruitful interplay between biology and computation. The sequencing of complete genomes from multiple organisms has revealed that most differences in organism complexity are due to elements of gene regulation that reside in the non protein coding portions of genes. Both within and between species, transcription factor binding sites and the proteins that recognize them govern the activity of cellular pathways that mediate adaptive responses and survival. Experimental identification of these regulatory elements is by nature a slow process. The availability of complete genomic sequences, however, opens the door for computational methods to predict binding sites and expedite our understanding of gene regulation at a genomic level. Just as with traditional experimental approaches, the computational identification of the molecular factors that control a genes expression level has been problematic. As a case in point, the identification of putative motifs is a challenging combinatorial task. For it, powerful new motif finding algorithms and high performance implementations are described. Heavy use is made of graph algorithms, some of which are exceedingly computationally intensive and involve the use of emergent mathematical methods. An approach to fully dynamic load balancing is developed in order to make effective use of highly parallel platforms.

network computing and applications | 2004

Downloading replicated, wide-area files - a framework and empirical evaluation

Rebecca L. Collins; James S. Plank

The challenge of efficiently retrieving files that are broken into segments and replicated across the wide-area is of prime importance to wide-area, peer-to-peer, and grid file systems. Two differing algorithms addressing this challenge have been proposed and evaluated. While both have been successful in differing performance scenarios, there has been no unifying work that can view both algorithms under a single framework. We define such a framework, where download algorithms are defined in terms of four dimensions: the number of simultaneous downloads, the degree of work replication, the failover strategy, and the server selection algorithm. We then explore the impact of varying parameters along each of these dimensions.

design automation conference | 2007

Topology-based optimization of maximal sustainable throughput in a latency-insensitive system

Rebecca L. Collins; Luca P. Carloni

We consider the problem of optimizing the performance of a latency-insensitive system (LIS) where the addition of backpressure has caused throughput degradation. Previous works have addressed the problem of LIS performance in different ways. In particular, the insertion of relay stations and the sizing of the input queues in the shells are the two main optimization techniques that have been proposed. We provide a unifying framework for this problem by outlining which approaches work for different system topologies, and highlighting counterexamples where some solutions do not work. We also observe that in the most difficult class of topologies, instances with the greatest throughput degradation are typically very amenable to simplifications. The contributions of this paper include a characterization of topologies that maintain optimal throughput with fixed- size queues and a heuristic for sizing queues that produces solutions close to optimal in a fraction of the time.

design, automation, and test in europe | 2010

Recursion-driven parallel code generation for multi-core platforms

Rebecca L. Collins; Bharadwaj Vellore; Luca P. Carloni

We present Huckleberry, a tool for automatically generating parallel implementations for multi-core platforms from sequential recursive divide-and-conquer programs. The recursive programming model is a good match for parallel systems because it highlights the temporal and spatial locality of data use. Recursive algorithms are used by Huckleberrys code generator not only to automatically divide a problem up into smaller tasks, but also to derive lower-level parts of the implementation, such as data distribution and inter-core synchronization mechanisms. We apply Huckleberry to a multi-core platform based on the Cell BE processor and show how it generates parallel code for a variety of sequential benchmarks.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2008

Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems

Rebecca L. Collins; Luca P. Carloni

Latency-insensitive protocols allow system-on-chip (SoC) engineers to decouple the design of the computing cores from the design of the intercore communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS), each core is encapsulated within a shell, which is a synthesized interface module that dynamically controls its operation. At each clock period, if new data have not arrived on an input channel or if a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing (QS). We evaluate the heuristic algorithm with experiments over a large set of synthetically generated systems and with a case study of a real SoC system. We find that the topology of a LIS can impact not only how much throughput degradation will occur but also the difficulty of finding optimal QS solutions.

Archive | 2011

Data-driven programming abstractions and optimization for multi-core platforms

Luca P. Carloni; Rebecca L. Collins

Multi-core platforms have spread to all corners of the computing industry, and trends in design and power indicate that the shift to multi-core will become even wider-spread in the future. As the number of cores on a chip rises, the complexity of memory systems and on-chip interconnects increases drastically. The programmer inherits this complexity in the form of new responsibilities for task decomposition, synchronization, and data movement within an application, which hitherto have been concealed by complex processing pipelines or deemed unimportant since tasks were largely executed sequentially. To some extent, the need for explicit parallel programming is inevitable, due to limits in the instruction-level parallelism that can be automatically extracted from a program. However, these challenges create a great opportunity for the development of new programming abstractions which hide the low-level architectural complexity while exposing intuitive high-level mechanisms for expressing parallelism. Many models of parallel programming fall into the category of data-centric models, where the structure of an application depends on the role of data and communication in the relationships between tasks. The utilization of the inter-core communication networks and effective scaling to large data sets are decidedly important in designing efficient implementations of parallel applications. The questions of how many low-level architectural details should be exposed to the programmer, and how much parallelism in an application a programmer should expose to the compiler remain open-ended, with different answers depending on the architecture and the application in question. I propose that the key to unlocking the capabilities of multi-core platforms is the development of abstractions and optimizations which match the patterns of data movement in applications with the inter-core communication capabilities of the platforms. After a comparative analysis that confirms and stresses the importance of finding a good match between the programming abstraction, the application, and the architecture, this dissertation proposes two techniques that showcase the power of leveraging data dependency patterns in parallel performance optimizations. Flexible Filters dynamically balance load in stream programs by creating flexibility in the runtime data flow through the addition of redundant stream filters. This technique combines a static mapping with dynamic flow control to achieve light-weight, distributed and scalable throughput optimization. The properties of stream communication, i.e., FIFO pipes, enable flexible filters by exposing the backpressure dependencies between tasks. Next, I present Huckleberry, a novel recursive programming abstraction developed in order to allow programmers to expose data locality in divide-and-conquer algorithms at a high level of abstraction. Huckleberry automatically converts sequential recursive functions with explicit data partitioning into parallel implementations that can be ported across changes in the underlying architecture including the number of cores and the amount of on-chip memory. I then present a performance model for multicore applications which provides an efficient means to evaluate the trade-offs between the computational and communication requirements of applications together with the hardware resources of a target multi-core architecture. The model encompasses all data-driven abstractions that can be reduced to a task graph representation and is extensible to performance techniques such as Flexible Filters that alter an application’s original task graph. Flexible Filters and Huckleberry address the challenges of parallel programming on multi-core architectures by taking advantage of properties specific to the stream and recursive paradigms, and the performance model creates a unifying framework based on the communication between tasks in parallel applications. Combined, these contributions demonstrate that specialization with respect to communication patterns enhances the ability of parallel programming abstractions and optimizations to harvest the power of multi-core platforms.

ACM Transactions in Embedded Computing Systems | 2013

Flexible filters in stream programs

Rebecca L. Collins; Luca P. Carloni

The stream-processing model is a natural fit for multicore systems because it exposes the inherent locality and concurrency of a program and highlights its separable tasks for efficient parallel implementations. We present flexible filters, a load-balancing optimization technique for stream programs. Flexible filters utilize the programmability of the cores in order to improve the data-processing throughput of individual bottleneck tasks by “borrowing” resources from neighbors in the stream. Our technique is distributed and scalable because all runtime load-balancing decisions are based on point-to-point handshake signals exchanged between neighboring cores. Load balancing with flexible filters increases the system-level processing throughput of stream applications, particularly those with large dynamic variations in the computational load of their tasks. We empirically evaluate flexible filters in a homogeneous multicore environment over a suite of five real-word stream programs.

Explore More