Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Frank Xiaoxiao Chen is active.

Publication


Featured researches published by Frank Xiaoxiao Chen.


grid computing | 2010

Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems

James M. Brandt; Frank Xiaoxiao Chen; Vincent De Sapio; Ann C. Gentile; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson; Matthew H. Wong

Accurate failure prediction in conjunction with efficient process migration facilities including some Cloud constructs can enable failure avoidance in large-scale high performance computing (HPC) platforms. In this work we demonstrate a prototype system that incorporates our probabilistic failure prediction system with virtualization mechanisms and techniques to provide a whole system approach to failure avoidance. This work utilizes a failure scenario based on a real-world HPC case study.


ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Combining Virtualization, resource characterization, and Resource management to enable efficient high performance compute platforms through intelligent dynamic resource allocation

James M. Brandt; Frank Xiaoxiao Chen; V. De Sapio; Ann C. Gentile; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson; Matthew H. Wong

Improved resource utilization and fault tolerance of large-scale HPC systems can be achieved through fine-grained, intelligent, and dynamic resource (re)allocation. We explore components and enabling technologies applicable to creating a system to provide this capability: specifically 1) Scalable fine-grained monitoring and analysis to inform resource allocation decisions, 2) Virtualization to enable dynamic reconfiguration, 3) Resource management for the combined physical and virtual resources and 4) Orchestration of the allocation, evaluation, and balancing of resources in a dynamic environment. We discuss both general and HPC-centric issues that impact the design of such a system. Finally, we present our prototype system, giving both design details and examples of its application in real-world scenarios.


dependable systems and networks | 2010

Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example

James M. Brandt; Frank Xiaoxiao Chen; Vincent De Sapio; Ann C. Gentile; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson; Matthew H. Wong

Effective failure prediction and mitigation strategies in high-performance computing systems could provide huge gains in resilience of tightly coupled large-scale scientific codes. These gains would come from prediction-directed process migration and resource servicing, intelligent resource allocation, and checkpointing driven by failure predictors rather than at regular intervals based on nominal mean time to failure. Given probabilistic associations of outlier behavior in hardware-related metrics with eventual failure in hardware, system software, and/or applications, this paper explores approaches for quantifying the effects of prediction and mitigation strategies and demonstrates these using actual production system data. We describe context-relevant methodologies for determining the accuracy and cost-benefit of predictors.


international conference on parallel processing | 2011

Framework for enabling system understanding

Jim M. Brandt; Frank Xiaoxiao Chen; Ann C. Gentile; Chokchai Leangsuksun; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; Narate Taerat; David C. Thompson; Matthew H. Wong

Building the effective HPC resilience mechanisms required for viability of next generation supercomputers will require in depth understanding of system and component behaviors. Our goal is to build an integrated framework for high fidelity long term information storage, historic and run-time analysis, algorithmic and visual information exploration to enable system understanding, timely failure detection/prediction, and triggering of appropriate response to failure situations. Since it is unknown what information is relevant and since potentially relevant data may be expressed in a variety of forms (e.g., numeric, textual), this framework must provide capabilities to process different forms of data and also support the integration of new data, data sources, and analysis capabilities. Further, in order to ensure ease of use as capabilities and data sources expand, it must also provide interactivity between its elements. This paper describes our integration of the capabilities mentioned above into our OVIS tool.


Archive | 2010

Understanding large scale HPC systems through scalable monitoring and analysis.

Jackson R. Mayo; Frank Xiaoxiao Chen; Philippe Pierre Pebay; Matthew H. Wong; David C. Thompson; Ann C. Gentile; Diana C. Roe; Vincent De Sapio; James M. Brandt


Archive | 2010

Scalable HPC monitoring and analysis for understanding and automated response.

Jackson R. Mayo; Frank Xiaoxiao Chen; Philippe Pierre Pebay; Matthew H. Wong; David C. Thompson; Ann C. Gentile; Diana C. Roe; Vincent De Sapio; James M. Brandt


Archive | 2009

Interactive Data Fusion Capabilities for Large-Scale Compute Cluster Architects and Administrators.

James M. Brandt; Frank Xiaoxiao Chen; Vincent De Sapio; Ann C. Gentile; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson; Matthew H. Wong


Archive | 2009

Quantifying Failure Prediction in Large Scale HPC Systems: A Case Study.

James M. Brandt; Frank Xiaoxiao Chen; Vincent De Sapio; Ann C. Gentile; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson; Matthew H. Wong


Archive | 2009

Scalable Run Time Data Collection Analysis and Visualization (Presentation).

Ann C. Gentile; Frank Xiaoxiao Chen; Ananya Das


Archive | 2009

Data Fusion and Statistical Analysis: Piercing the Darkness of the Black Box.

James M. Brandt; Frank Xiaoxiao Chen; Vincent De Sapio; Ann C. Gentile; Jackson R. Mayo; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson; Matthew H. Wong

Collaboration


Dive into the Frank Xiaoxiao Chen's collaboration.

Top Co-Authors

Avatar

Ann C. Gentile

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

David C. Thompson

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar

Diana C. Roe

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Jackson R. Mayo

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Matthew H. Wong

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

James M. Brandt

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Vincent De Sapio

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jim M. Brandt

Sandia National Laboratories

View shared research outputs
Researchain Logo
Decentralizing Knowledge