Ivo Jimenez
University of California, Santa Cruz
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ivo Jimenez.
international parallel and distributed processing symposium | 2017
Ivo Jimenez; Michael A. Sevilla; Noah Watkins; Carlos Maltzahn; Jay F. Lofstead; Kathryn Mohror; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau
Independent validation of experimental results in the field of systems research is a challenging task, mainly due to differences in software and hardware in computational environments. Recreating an environment that resembles the original is difficult and time-consuming. In this paper we introduce _Popper_, a convention based on a set of modern open source software (OSS) development principles for generating reproducible scientific publications. Concretely, we make the case for treating an article as an OSS project following a DevOps approach and applying software engineering best-practices to manage its associated artifacts and maintain the reproducibility of its findings. Popper leverages existing cloud-computing infrastructure and DevOps tools to produce academic articles that are easy to validate and extend. We present a use case that illustrates the usefulness of this approach. We show how, by following the _Popper_ convention, reviewers and researchers can quickly get to the point of getting results without relying on the original authors intervention.
ieee international conference on high performance computing data and analytics | 2016
Jay F. Lofstead; Ivo Jimenez; Carlos Maltzahn; Quincey Koziol; John M. Bent; Eric Barton
The DOE Extreme-Scale Technology Acceleration Fast Forward Storage and IO Stack project is going to have significant impact on storage systems design within and beyond the HPC community. With phase two of the project starting, it is an excellent opportunity to explore the complete design and how it will address the needs of extreme scale platforms. This paper examines each layer of the proposed stack in some detail along with cross-cutting topics, such as transactions and metadata management. This paper not only provides a timely summary of important aspects of the design specifications but also captures the underlying reasoning that is not available elsewhere. We encourage the broader community to understand the design, intent, and future directions to foster discussion guiding phase two and the ultimate production storage stack based on this work. An initial performance evaluation of the early prototype implementation is also provided to validate the presented design.
Bulletin of the American Meteorological Society | 2017
Joshua P. Hacker; John Exby; David O. Gill; Ivo Jimenez; Carlos Maltzahn; Timothy See; Gretchen L. Mullendore; Kathryn R. Fossell
AbstractNumerical weather prediction (NWP) experiments can be complex and time consuming; results depend on computational environments and numerous input parameters. Delays in learning and obtaining research results are inevitable. Students face disproportionate effort in the classroom or beginning graduate-level NWP research. Published NWP research is generally not reproducible, introducing uncertainty and slowing efforts that build on past results. This work exploits the rapid emergence of software container technology to produce a transformative research and education environment. The Weather Research and Forecasting (WRF) Model anchors a set of linked Linux-based containers, which include software to initialize and run the model, to analyze results, and to serve output to collaborators. The containers are demonstrated with a WRF simulation of Hurricane Sandy. The demonstration illustrates the following: 1) how the often-difficult exercise in compiling the WRF and its many dependencies is eliminated, 2...
international conference on parallel processing | 2014
Jay F. Lofstead; Ivo Jimenez; Carlos Maltzahn
The DOE Extreme-Scale Technology Acceleration Fast Forward Storage and IO Stack project is going to have significant impact on storage systems design within and beyond the HPC community. With phase 1 of the project complete, it is an excellent opportunity to evaluate many of the decisions made to feed into the phase 2 effort. With this paper we not only provide a timely summary of important aspects of the design specifications but also capture the underlying reasoning that is not available elsewhere.The initial effort to define a next generation storage system has made admirable contributions in architecture and design. Formalizing the general idea of data staging into burst buffers for the storage system will help manage the performance variability and offer additional data processing opportunities outside the main compute and storage system. Adding a transactional mechanism to manage faults and data visibility helps enable effective analytics without having to work around the IO stack semantics. While these and other contributions are valuable, similar efforts made elsewhere may offer attractive alternatives or differing semantics that could yield a more feature rich environment with little to no additional overhead. For example, the Doubly Distributed Transactions (D2T) protocol offers an alternative approach for incorporating transactional semantics into the data path. Another project, PreDatA, examined how to get the best throughput for data operators and may offer additional insights into further refinements of the Burst Buffer concept. This paper examines some of the choices made by the Fast Forward team and compares them with other options and offers observations and suggestions based on these other efforts. This will include some non-core contributions of other projects, such as some of the demonstration metadata and data storage components generated while implementing D2T, to make suggestions that may help the next generation design for how the IO stack works as a whole.
european conference on computer systems | 2017
Michael A. Sevilla; Noah Watkins; Ivo Jimenez; Peter Alvaro; Shel Finkelstein; Jeff LeFevre; Carlos Maltzahn
Storage systems need to support high-performance for special-purpose data processing applications that run on an evolving storage device technology landscape. This puts tremendous pressure on storage systems to support rapid change both in terms of their interfaces and their performance. But adapting storage systems can be difficult because unprincipled changes might jeopardize years of code-hardening and performance optimization efforts that were necessary for users to entrust their data to the storage system. We introduce the programmable storage approach, which exposes internal services and abstractions of the storage stack as building blocks for higher-level services. We also build a prototype to explore how existing abstractions of common storage system services can be leveraged to adapt to the needs of new data processing systems and the increasing variety of storage devices. We illustrate the advantages and challenges of this approach by composing existing internal abstractions into two new higher-level services: a file system metadata load balancer and a high-performance distributed shared-log. The evaluation demonstrates that our services inherit desirable qualities of the back-end storage system, including the ability to balance load, efficiently propagate service metadata, recover from failure, and navigate trade-offs between latency and throughput using leases.
ieee international conference on cloud engineering | 2015
Ivo Jimenez; Carlos Maltzahn; Adam Moody; Kathryn Mohror; Jay F. Lofstead; Remzi H. Arpaci-Dusseau; Andrea C. Arpaci-Dusseau
Evaluating experimental results in the field of computer systems is a challenging task, mainly due to the many changes in software and hardware that computational environments go through. In this position paper, we analyze salient features of container technology that, if leveraged correctly, can help reduce the complexity of reproducing experiments in systems research. We present a use case in the area of distributed storage systems to illustrate the extensions that we envision, mainly in terms of container management infrastructure. We also discuss the benefits and limitations of using containers as a way of reproducing research in other areas of experimental systems research.
petascale data storage workshop | 2013
Jay F. Lofstead; Jai Dayal; Ivo Jimenez; Carlos Maltzahn
The rise of Integrated Application Workflows (IAWs) for processing data prior to storage on persistent media prompts the need to incorporate features that reproduce many of the semantics of persistent storage devices. One such feature is the ability to manage data sets as chunks with natural barriers between different data sets. Towards that end, we need a mechanism to ensure that data moved to an intermediate storage area is both complete and correct before allowing access by other processing components. The Doubly Distributed Transactions (D2T) protocol offers such a mechanism. The initial development [9] suffered from scalability limitations and undue requirements on server processes. The current version has addressed these limitations and has demonstrated scalability with low overhead.
international parallel and distributed processing symposium | 2016
Ivo Jimenez; Carlos Maltzahn; Jay F. Lofstead; Adam Moody; Kathryn Mohror; Remzi H. Arpaci-Dusseau; Andrea C. Arpaci-Dusseau
Independent validation of experimental results in the field of parallel and distributed systems research is a challenging task, mainly due to changes and differences in software and hardware in computational environments. In particular, when an experiment runs on different hardware than the one where it originally executed, predicting the differences in results is difficult. In this paper, we introduce an architecture-independent method for characterizing the performance of a machine by obtaining a profile (a vector of microbenchark results) that we use to quantify the variability between two hardware platforms. We propose the use of isolation features that OS-level virtualization offers to reduce the variability observed when validating application performance across multiple machines. Our results show that, using our variability characterization methodology, we can correctly predict the variability bounds of CPU-intensive applications, as well as reduce it by up to 2.8x if we make use of CPU bandwidth limitations, depending on the opcode mix of an application, as well as generational and architectural differences between two hardware platforms.
international conference on management of data | 2012
Ivo Jimenez; Huascar Sanchez; Quoc Trung Tran; Neoklis Polyzotis
Index tuning; i.e., selecting indexes that are appropriate for the workload to obtain good system performance, is a crucial task for database administrators. Administrators rely on automated index advisors for this task, but existing advisors work either offline, requiring a-priori knowledge of the workload, or online, taking the administrator out of the picture and assuming total control of the index tuning task. Semi-automatic index tuning is a new paradigm that achieves a middle ground: the advisor analyzes the workload online and provides recommendations tailored to the current workload, and the administrator is able to provide feedback to refine future recommendations. In this demonstration we present Kaizen, an index tuning tool that implements semi-automatic tuning.
conference on computer communications workshops | 2017
Ivo Jimenez; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Jay F. Lofstead; Carlos Maltzahn; Kathryn Mohror; Robert Ricci
This paper introduces PopperCI, a continous integration (CI) service hosted at UC Santa Cruz that allows researchers to automate the end-to-end execution and validation of experiments. PopperCI assumes that experiments follow Popper, a convention for implementing experiments and writing articles following a DevOps approach that has been proposed recently. PopperCI runs experiments on public, private or government-fundend cloud infrastructures in a fully automated way. We describe how PopperCI executes experiments and present a use case that illustrates the usefulness of the service.