Christopher Moretti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christopher Moretti is active.

Explore More

Publication

Featured researches published by Christopher Moretti.

international parallel and distributed processing symposium | 2008

All-pairs: An abstraction for data-intensive cloud computing

Christopher Moretti; Jared Bulosan; Douglas Thain; Patrick J. Flynn

Although modern parallel and distributed computing systems provide easy access to large amounts of computing power, it is not always easy for non-expert users to harness these large systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we propose that production systems should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data intensive workloads. We present one example of an abstraction - all-pairs - that fits the needs of several data-intensive scientific applications. We demonstrate that an optimized all-pairs abstraction is both easier to use than the underlying system, and achieves performance orders of magnitude better than the obvious but naive approach, and twice as fast as a hand-optimized conventional approach.

IEEE Transactions on Parallel and Distributed Systems | 2010

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

Christopher Moretti; Hoang Bui; Karen Hollingsworth; Brandon Rich; Patrick J. Flynn; Douglas Thain

Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads. We present one example of an abstraction-All-Pairs-that fits the needs of several applications in biometrics, bioinformatics, and data mining. We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieve performance orders of magnitude better than the obvious but naive approach, and is both faster and more efficient than a tuned conventional approach. This abstraction has been in production use for one year on a 500 CPU campus grid at the University of Notre Dame and has been used to carry out a groundbreaking analysis of biometric data.

Journal of Grid Computing | 2009

Chirp: a practical global filesystem for cluster and Grid computing

Douglas Thain; Christopher Moretti; Jeffrey Hemmes

Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.

international conference on data mining | 2008

Scaling up Classifiers to Cloud Computers

Christopher Moretti; Karsten Steinhaeuser; Douglas Thain; Nitesh V. Chawla

As the size of available datasets has grown from Megabytes to Gigabytes and now into Terabytes, machine learning algorithms and computing infrastructures have continuously evolved in an effort to keep pace. But at large scales, mining for useful patterns still presents challenges in terms of data management as well as computation. These issues can be addressed by dividing both data and computation to build ensembles of classifiers in a distributed fashion, but trade-offs in cost, performance, and accuracy must be considered when designing or selecting an appropriate architecture. In this paper, we present an abstraction for scalable data mining that allows us to explore these trade-offs. Data and computation are distributed to a computing cloud with minimal effort from the user, and multiple models for data management are available depending on the workload and system configuration. We demonstrate the performance and scalability characteristics of our ensembles using a wide variety of datasets and algorithms on a Condor-based pool with Chirp to handle the storage.

grid computing | 2007

Efficient access to many small files in a filesystem for grid computing

Douglas Thain; Christopher Moretti

Many potential users of grid computing systems have a need to manage large numbers of small files. However, computing and storage grids are generally optimized for the management of large files. As a result, users with small files achieve performance several orders of magnitude worse than possible. Archival tools and custom storage structures can be used to improve small-file performance, but this requires the end user to change the behavior of the application, which is not always practical. To address this problem, we augment the protocol of the Chirp filesystem for grid computing to improve small file performance. We describe in detail how this protocol compares to FTP and NFS, which are widely used in similar situations. In addition, we observe that changes to the system call interface are necessary to invoke the protocol properly. We demonstrate an order-of-magnitude performance improvement over existing protocols for copying files and manipulating large directory trees.

wireless algorithms, systems, and applications | 2007

Lessons Learned Building TeamTrak: An Urban/Outdoor Mobile Testbed

Jeffrey Hemmes; Douglas Thain; Christian Poellabauer; Christopher Moretti; Phil Snowberger; Brendan McNutt

Much research in mobile networks relies on the use of simulations for evaluation purposes. While a number of powerful simulation tools have been developed for this purpose, only recently has the need for physical implementations of mobile systems and applications been widely accepted in the literature. In recognition of this need, and to further our research objectives in the area of wireless sensor networks and mobile cooperative systems, we have built the TeamTrak mobile testbed, which gives us real-world experience with research concepts as we develop them. Additionally, results from outdoor field tests are used to further enhance the capabilities of the testbed itself.

IEEE Transactions on Parallel and Distributed Systems | 2012

A Framework for Scalable Genome Assembly on Clusters, Clouds, and Grids

Christopher Moretti; Andrew Thrasher; Li Yu; Michael Olson; Scott J. Emrich; Douglas Thain

Bioinformatics researchers need efficient means to process large collections of genomic sequence data. One application of interest, genome assembly, has great potential for parallelization; however, most previous attempts at parallelization require uncommon high-end hardware. This paper introduces the Scalable Assembler at Notre Dame (SAND) framework that can achieve significant speedup using large numbers of commodity machines harnessed from clusters, clouds, and grids. SAND interfaces with the Celera open-source assembly toolkit, replacing two independent sequential modules with scalable parallel alternatives: the candidate selector exploits distributed memory capacity, and the sequence aligner exploits distributed computing capacity. For large problems, these modules provide robust task and data management while also achieving speedup with high efficiency. We show results for several data sets ranging from 738 thousand to over 320 million alignments using resources ranging from a small cluster to more than a thousand nodes spanning three institutions.

high performance distributed computing | 2009

Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions

Li Yi; Christopher Moretti; Scott J. Emrich; Kenneth L. Judd; Douglas Thain

Both distributed systems and multicore computers are difficult programming environments. Although the expert programmer may be able to tune distributed and multicore computers to achieve high performance, the non-expert may struggle to achieve a program that even functions correctly. We argue that high level abstractions are an effective way of making parallel computing accessible to the non-expert. An abstraction is a regularly structured framework into which a user may plug in simple sequential programs to create very large parallel programs. By virtue of a regular structure and declarative specification, abstractions may be materialized on distributed, multicore, and distributed multicore systems with robust performance across a wide range of problem sizes. In previous work, we presented the All-Pairs abstraction for computing on distributed systems of single CPUs. In this paper, we extend All-Pairs to multicore systems, and introduce Wavefront, which represents a number of problems in economics and bioinformatics. We demonstrate good scaling of both abstractions up to 32-cores on one machine and hundreds of cores in a distributed system.

many task computing on grids and supercomputers | 2009

Highly scalable genome assembly on campus grids

Christopher Moretti; Michael Olson; Scott J. Emrich; Douglas Thain

Bioinformatics researchers need efficient means to process large collections of sequence data. One application of interest, genome assembly, has great potential for parallelization, however most previous attempts at parallelization require uncommon high-end hardware. This paper introduces a scalable modular genome assembler that can achieve significant speedup using large numbers of conventional desktop machines, such as those found in a campus computing grid. The system is based on the Celera open-source assembly toolkit, and replaces two independent sequential modules with scalable replacements: a scalable candidate selector exploits the distributed memory capacity of a campus grid, while the scalable aligner exploits the distributed computing capacity. For large problems, these modules provide robust task and data management while also achieving speedup with high efficiency on several scales of resources. We show results for several datasets ranging from 738 thousand to over 121 million alignments using campus grid resources ranging from a small cluster to more than a thousand nodes spanning three institutions. Our largest run so far achieves a 927x speedup with 71.3 percent efficiency.

international parallel and distributed processing symposium | 2007

Challenges in Executing Data Intensive Biometric Workloads on a Desktop Grid

Christopher Moretti; Timothy C. Faltemier; Douglas Thain; Patrick J. Flynn

Desktop grids have traditionally focused on executing computation intensive workloads. Can they also be used to execute data-intensive workloads? To answer this question, we present a case study of a data intensive biometric application which is infeasible to process on a single machine. We evaluate the capacity of a desktop grid to store and deliver the data need to execute the workload, and compare several general techniques for data deployment. Selecting the most scalable technique, we execute and evaluate five large production workloads on a 350-CPU desktop grid. We observe that this technique is sensitive to many parameters, and propose that an ideal system should be responsible for choosing the proper decomposition of a workload.

Explore More