Is this you? Create Your Porfile

Raphael R. Some

California Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Raphael R. Some is active.

Explore More

Publication

Featured researches published by Raphael R. Some.

dependable systems and networks | 2002

Experimental evaluation of a COTS system for space applications

Henrique Madeira; Raphael R. Some; Francisco Moreira; Diamantino Costa; David A. Rennels

This paper evaluates the impact of transient errors in the operating system of a COTS-based system (CETIA board with two PowerPC 750 processors running LynxOS) and quantifies their effects at both the OS and at the application level. The study has been conducted using a Software-Implemented Fault Injection tool (Xception) and both realistic programs and synthetic workloads (to focus on specific OS features) have been used. The results provide a comprehensive picture of the impact of faults on LynxOS key features (process scheduling and the most frequent system calls), data integrity, error propagation, application termination, and correctness of application results.

ieee aerospace conference | 2006

High Performance Dependable Multiprocessor II

Jeremy Ramos; John Samson; David Lupia; Ian A. Troxel; R. Subramaniyan; Adam Jacobs; James Greco; G. Cieslewski; J. Curreri; M. Fischer; E. Grobelny; Alan D. George; Vikas Aggarwal; M. Patel; Raphael R. Some

With the ever-increasing demand for higher bandwidth and processing capacity of todays space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf (COTS) processors for on-board computing has become a critical need. In response to this need, NASAs new millennium program (NMP) office commissioned the development of dependable multiprocessor (DM) technology for use in science and autonomy missions, but the technology is also applicable to a wide variety of DoD missions. The goal of the DM project is to provide spacecraft/payload processing capability 10x -100x what is available today, enabling heretofore unrealizable levels of science and autonomy. DM technology is being developed as part of the NMP ST8 (space technology 8) project. The objective of this NMP ST8 effort is to combine high-performance, fault tolerant, COTS-based cluster processing and fault tolerant middleware in an architecture and software framework capable of supporting a wide variety of mission applications. Dependable multiprocessor development is continuing as one of the four selected ST8 flight experiments planned to be flown in 2009.

IEEE Computer | 2003

NASA advances robotic space exploration

Daniel S. Katz; Raphael R. Some

NASAs successful exploration of space has uncovered vast amounts of new knowledge about the Earth, the solar system and its other planets, and the stellar spaces beyond. To continue gaining new knowledge has required - and will continue to require - new capabilities in onboard processing hardware, system software, and applications such as autonomy. For example, initial robotic space exploration missions functioned, for the most part, as large flying cameras. These instruments have evolved over time to include more sophisticated imaging radar, multispectral imagers, spectrometers, gravity wave detectors, a host of prepositioned sensors and, most recently, rovers.

dependable systems and networks | 2002

Reliability and availability analysis for the JPL Remote Exploration and Experimentation System

Dong Chen; Selvamuthu Dharmaraja; Dongyan Chen; Lei Li; Kishor S. Trivedi; Raphael R. Some

The NASA Remote Exploration and Experimentation (REE) Project, managed by the Jet Propulsion Laboratory, has the vision of bringing commercial supercomputing technology into space, in a form which meets the demanding environmental requirements, to enable a new class of science investigation and discovery. Dependability goals of the REE system are 99% reliability over 5 years and 99% availability. In this paper we focus on the reliability/availability modeling and analysis of the REE system. We carry out this task using fault trees, reliability block diagrams, stochastic reward nets and hierarchical models. Our analysis helps to determine the ranges of parameters for which the REE dependability goal will be met. The analysis also allows us to assess different hardware and software fault-tolerance techniques.

dependable systems and networks | 2002

An experimental evaluation of the REE SIFT environment for spaceborne applications

Keith Whisnant; Ravishankar K. Iyer; Phillip H. Jones; Raphael R. Some; David A. Rennels

Presents an experimental evaluation of a software-implemented fault tolerance (SIFT) environment built around a set of self-checking processes called ARMORs running on different machines that provide error detection and recovery services to themselves and to spaceborne scientific applications. The experiments are split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment adds negligible overhead to the application during failure-free runs. Only 11 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes-coupled with object-based incremental checkpointing-were effective in preventing system failures by protecting dynamic data within the SIFT processes.

IEEE Transactions on Software Engineering | 2004

The Effects of an ARMOR-based SIFT environment on the performance and dependability of user applications

Keith Whisnant; Ravishankar K. Iyer; Zbigniew Kalbarczyk; Phillip H. Jones; David A. Rennels; Raphael R. Some

Few, distributed software-implemented fault tolerance (SIFT) environments have been experimentally evaluated using substantial applications to show that they protect both themselves and the applications from errors. We present an experimental evaluation of a SIFT environment used to oversee spaceborne applications as part of the Remote Exploration and Experimentation (REE) program at the Jet Propulsion Laboratory. The SIFT environment is built around a set of self-checking ARMOR processes running on different machines that provide error detection and recovery services to themselves and to the REE applications. An evaluation methodology is presented in which over 28,000 errors were injected into both the SIFT processes and two representative REE applications. The experiments were split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment added negligible overhead to the applications execution time during failure-free runs. Correlated failures affecting a SIFT process and application process are possible, but the division of detection and recovery responsibilities in the SIFT environment allows it to recover from these multiple failure scenarios. Only 28 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes-coupled with object-based incremental checkpointing-were effective in preventing system failures by protecting dynamic data within the SIFT processes.

ieee aerospace conference | 2010

Investigation of the Tilera processor for real time hazard detection and avoidance on the Altair Lunar Lander

Carlos Y. Villalpando; Andrew Edie Johnson; Raphael R. Some; Jacob Oberlin; Steven Goldberg

The High Performance Processor (HPP) Task of the Advanced Avionics and Processor Systems (AAPS) Project, part of the Exploration Technology Development Program (ETDP), was to evaluate several high performance multicore processor architectures with respect to their ability to provide real time hazard detection and avoidance for the Constellation Programs Altair Lunar Lander. 12In this paper we review the Tilera Tile64 processor, the hazard detection and avoidance algorithm, strategies for parallelizing these algorithms, and preliminary performance study results. We were presented with the requirements of 30 Hz LIDAR frame processing rate and 10 second processing time for ALHAT HDA processing and were able to meet that requirement with the Tile64. We then project the performance of these algorithms on the OPERA MAESTRO Processor, a radiation tolerant version of the Tile 64 being developed by the Boeing Company.

dependable systems and networks | 2001

A software-implemented fault injection methodology for design and validation of system fault tolerance

Raphael R. Some; Won S. Kim; Garen Khanoyan; Leslie Callum; A. Agrawal; John Beahan

Presents our experience in developing a methodology and tool at the Jet Propulsion Laboratory (JPL) for software-implemented fault injection (SWIFI) into a parallel-processing supercomputer which is being designed for use in next-generation space exploration missions. The fault injector uses software-based strategies to emulate the effects of radiation-induced transients occurring in the system hardware components. JPLs SWIFI tool set, which is called JIFI (JPLs Implementation of a Fault Injector), is being used in conjunction with an appropriate system fault model to evaluate candidate hardware and software fault tolerance architectures, to determine the sensitivity of applications to faults, and to measure the effectiveness of fault detection, isolation and recovery strategies. JIFI has been validated to inject faults into user-specified CPU registers and memory regions with a uniform random distribution in location and time. Together with verifiers, classifiers and run scripts, JIFI enables massive fault injection campaigns and statistical data analysis.

dependable systems and networks | 2000

Demonstration of the remote exploration and experimentation (REE) fault-tolerant parallel-processing supercomputer for spacecraft onboard scientific data processing

Fannie Chen; Loring Craymer; Jeff Deifik; Alvin J. Fogel; Daniel S. Katz; Alfred G. Silliman Jr.; Raphael R. Some; Sean A. Upchurch; Keith Whisnant

Concerns a demonstration of the REE Projects work to date. The demonstration is intended to simulate an REE system that might exist on a Mars rover, consisting of multiple COTS processors, a COTS network, a COTS node-level operating system, REE middleware, and an REE application. The specific application performs texture processing of images. It was chosen as a building block of automated geological processing that will eventually be used for both navigation and data processing. Because the COTS hardware is not radiation hardened, single-event-upset-induced soft errors will occur. These errors are simulated in the demonstration by use of a software-implemented fault-injector, and are injected at a rate much higher than is realistic for the sake of viewer interest. Both the application and the middleware contain mechanisms for both detection of and recovery from these faults, and these mechanisms are tested by this very high fault-rate. The consequence of the REE system being able to tolerate this fault rate while continuing to process data is that the system will easily be able to handle the true fault rate.

ieee aerospace conference | 2002

Radiation fault modeling and fault rate estimation for a COTS based space-borne supercomputer

A.V. Karapetian; Raphael R. Some; John Beahan

Development of the Remote Exploration and Experimentation (REE) Commercial Off The Shelf (COTS) based space-borne supercomputer requires a detailed model of Single Event Upset (SEU) induced faults and fault-effects. Extensive ground based radiation testing has been performed on several generations of the Power PC processor family and related components. A set of relevant environments for NASA missions have been analyzed and detailed. Combining radiation test data, environmental data and architectural analysis, we have developed a radiation fault model for the REE system. The fault model is hierarchically organized and includes scaling factors and optional parameters for fault prediction in future technologies and alternative architectures. It has been implemented in a generic tool, which allows for ease of input and straight forward porting. The model currently includes the Power PC750 (G3), PCI bridge chips, L2 cache SRAM, main memory DRAM, and the Myrinet packet switched network. In this paper, we present the REE radiation fault model and accompanying tool set. We explain its derivation, its structure and use, and the work being done to validate it.

Explore More