Is this you? Create Your Porfile

Andres Marquez

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andres Marquez is active.

Explore More

Publication

Featured researches published by Andres Marquez.

computing frontiers | 2007

Evaluating the potential of multithreaded platforms for irregular scientific computations

Jarek Nieplocha; Andres Marquez; John Feo; Daniel G. Chavarría-Miranda; George Chin; Chad Scherrer; Nathaniel Beagley

The resurgence of current and upcoming multithreaded architectures and programming models led us to conduct a detailed study to understand the potential of these platforms to increase the performance of data-intensive, irregular scientific applications. Our study is based on a power system state estimation application and a novel anomaly detection application applied to network traffic data. We also conducted a detailed evaluation of the platforms using microbenchmarks in order to gain insight into their architectural capabilities and their interaction with programming models and application software. The evaluation was performed on the Cray MTA-2 and the Sun Niagar.

international parallel and distributed processing symposium | 2014

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures

Yang You; Shuaiwen Leon Song; Haohuan Fu; Andres Marquez; Maryam Mehri Dehnavi; Kevin J. Barker; Kirk W. Cameron; Amanda Randles; Guangwen Yang

Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. Advanced multi- and many-core architectures offer massive parallelism with complex memory hierarchies which can make runtime training possible, but form a barrier to efficient parallel SVM design. To address the challenges above, we designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multi-core and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools. MIC-SVM achieves 4.4-84x and 18-47x speedups against the popular LIBSVM, on MIC and Ivy Bridge CPUs respectively, for several real-world data-mining datasets. Even compared with GPUSVM, run on a top of the line NVIDIA k20x GPU, the performance of our MIC-SVM is competitive. We also conduct a cross-platform performance comparison analysis, focusing on Ivy Bridge CPUs, MIC and GPUs, and provide insights on how to select the most suitable advanced architectures for specific algorithms and input data patterns.

2006 IEEE Power Engineering Society General Meeting | 2006

Towards efficient power system state estimators on shared memory computers

Jaroslaw Nieplocha; Andres Marquez; Vinod Tipparaju; Daniel G. Chavarría-Miranda; Ross T. Guttromson; H. Huang

We are investigating the effectiveness of parallel weighted- least-square (WLS) state estimation solvers on shared-memory parallel computers. Shared-memory parallel architectures are rapidly becoming ubiquitous due to the advent of multi-core processors. In the current evaluation, we are using an LU-based solver as well as a conjugate gradient (CG)-based solver for a 1177-bus system. In lieu of a very wide multi-core system we evaluate the effectiveness of the solvers on an SGI Altix system on up to 32 processors. On this platform, as expected, the shared memory implementation (pthreads) of the LU solver was found to be more efficient than the MPI version. Our implementation of the CG solver scales and performs significantly better than the state-of-the-art implementation of the LU solver: with CG we can solve the problem 4.75 times faster than using LU. These findings indicate that CG algorithms should be quite effective on multicore processors

green computing and communications | 2010

Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

Abhinav Vishnu; Shuaiwen Song; Andres Marquez; Kevin J. Barker; Darren J. Kerbyson; Kirk W. Cameron; Pavan Balaji

The insatiable demand of high performance computing is being driven by the most computationally intensive applications such as computational chemistry, climate modeling, nuclear physics, etc. The last couple of decades have observed a tremendous rise in supercomputers with architectures ranging from traditional clusters to system-on-a-chip in order to achieve the petaflop computing barrier. However, with advent of petaflop-plus computing, we have ushered in an era where power efficient system software stack is imperative for execution on exascale systems and beyond. At the same time, computationally intensive applications are exploring programming models beyond traditional message passing, as a combination of Partitioned Global Address Space (PGAS) languages and libraries, providing one-sided communication paradigm with put, get and accumulate primitives. To support the PGAS models, it is critical to design power efficient and high performance one-sided communication runtime systems. In this paper, we design and implement PASCoL, a high performance power aware one-sided communication library using Aggregate Remote Memory Copy Interface (ARMCI), the communication runtime system of Global Arrays. For various communication primitives provided by ARMCI, we study the impact of Dynamic Voltage/Frequency Scaling (DVFS) and a combination of interrupt (blocking)/polling based mechanisms provided by most modern interconnects. We implement our design and evaluate it with synthetic benchmarks using an Infini Band cluster. Our results indicate that PASCoL can achieve significant reduction in energy consumed per byte transfer without additional penalty for various one-sided communication primitives and various message sizes and data transfer patterns.

international parallel and distributed processing symposium | 2008

Early experience with out-of-core applications on the cray XMT

Daniel G. Chavarría-Miranda; Andres Marquez; Jaroslaw Nieplocha; Kristyn J. Maschhoff; Chad Scherrer

This paper describes our early experiences with a pre- production Cray XMT system that implements a scalable shared memory architecture with hardware support for multithreading. Unlike its predecessor, the Cray MTA-2 that had very limited I/O capability, the Cray XMT offers Lustre, a scalable high-performance parallel filesystem. Therefore it enables development of out-of-core applications that can deal with very large data sets that otherwise would not fit in the system main memory. Our application performs statistically-based anomaly detection for categorical data that can be used for analysis of Internet traffic data. Experimental results indicate that the preproduction version of the machine is able to achieve good performance and scalability for the in- and out-of-core versions of the application.

The Journal of Supercomputing | 2013

Designing energy efficient communication runtime systems: a view from PGAS models

Abhinav Vishnu; Shuaiwen Song; Andres Marquez; Kevin J. Barker; Darren J. Kerbyson; Kirk W. Cameron; Pavan Balaji

As the march to the exascale computing gains momentum, energy consumption of supercomputers has emerged to be the critical roadblock. While architectural innovations are imperative in achieving computing of this scale, it is largely dependent on the systems software to leverage the architectural innovations. Parallel applications in many computationally intensive domains have been designed to leverage these supercomputers, with legacy two-sided communication semantics using Message Passing Interface. At the same time, Partitioned Global Address Space Models are being designed which provide global address space abstractions and one-sided communication for exploiting data locality and communication optimizations. PGAS models rely on one-sided communication runtime systems for leveraging high-speed networks to achieve best possible performance.In this paper, we present a design for Power Aware One-Sided Communication Llibrary – PASCoL. The proposed design detects communication slack, leverages Dynamic Voltage and Frequency Scaling (DVFS), and Interrupt driven execution to exploit the detected slack for energy efficiency. We implement our design and evaluate it using synthetic benchmarks for one-sided communication primitives, Put, Get, and Accumulate and uniformly noncontiguous data transfers. Our performance evaluation indicates that we can achieve significant reduction in energy consumption without performance loss on multiple one-sided communication primitives. The achieved results are close to the theoretical peak available with the experimental test bed.

international conference on e-science | 2010

Fault Detection in Distributed Climate Sensor Networks Using Dynamic Bayesian Networks

George Chin; Sutanay Choudhury; Lars J. Kangas; Sally A. McFarlane; Andres Marquez

The Atmospheric Radiation Measurement (ARM) program operated by the U.S. Department of Energy is one of the largest climate research programs dedicated to the collection of long-term continuous measurements of cloud properties and other key components of the earth’s climate system. Given the critical role that collected ARM data plays in the analysis of atmospheric processes and conditions and in the enhancement and evaluation of global climate models, the production and distribution of high-quality data is one of ARM’s primary mission objectives. Fault detection in ARM’s distributed sensor network is one critical ingredient towards maintaining high quality and useful data. We are modeling ARM’s distributed sensor network as a dynamic Bayesian network where key measurements are mapped to Bayesian network variables. We then define the conditional dependencies between variables by discovering highly correlated variable pairs from historical data. The resultant dynamic Bayesian network provides an automated approach to identifying whether certain sensors are malfunctioning or failing in the distributed sensor network. A potential fault or failure is detected when an observed measurement is not consistent with its expected measurement and the observed measurements of other related sensors in the Bayesian network. We present some of our experiences and promising results with the fault detection dynamic Bayesian network.

international parallel and distributed processing symposium | 2009

Implementing and evaluating multithreaded triad census algorithms on the Cray XMT

George Chin; Andres Marquez; Sutanay Choudhury; Kristyn J. Maschhoff

Commonly represented as directed graphs, social networks depict relationships and behaviors among social entities such as people, groups, and organizations. Social network analysis denotes a class of mathematical and statistical methods designed to study and measure social networks. Beyond sociology, social network analysis methods are being applied to other types of data in other domains such as bioinformatics, computer networks, national security, and economics. For particular problems, the size of a social network can grow to millions of nodes and tens of millions of edges or more. In such cases, researchers could benefit from the application of social network analysis algorithms on high-performance architectures and systems.

Journal of Parallel and Distributed Computing | 2015

Scaling Support Vector Machines on modern HPC platforms

Yang You; Haohuan Fu; Shuaiwen Leon Song; Amanda Randles; Darren J. Kerbyson; Andres Marquez; Guangwen Yang; Adolfy Hoisie

Support Vector Machines (SVM) have been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. Advanced multi- and many-core architectures offer massive parallelism with complex memory hierarchies which can make runtime training possible, but form a barrier to efficient parallel SVM design.To address the challenges above, we designed and implemented MIC-SVM, a highly efficient parallel SVM for?x86 based multi-core and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.MIC-SVM achieves 4.4-84i??and 18-47i??speedups against the popular LIBSVM, on MIC and Ivy Bridge CPUs respectively, for several real-world data-mining datasets. Even compared with GPUSVM, running on the NVIDIA k20x?GPU, the performance of our MIC-SVM is competitive. We also conduct a cross-platform performance comparison analysis, focusing on Ivy Bridge CPUs, MIC and GPUs, and provide insights on how to select the most suitable advanced architectures for specific algorithms and input data patterns. An efficient parallel support vector machine for?x86 based multi-core platforms.The novel optimization techniques to fully utilize the multi-level parallelism.The improvement for the deficiencies of the current SVM tools.Select the best architectures for input data patterns to achieve best performance.The large-scale distributed algorithm and power-efficient approach.

ACM Journal on Emerging Technologies in Computing Systems | 2012

Implementing the data center energy productivity metric

Landon H. Sego; Andres Marquez; Andrew R. Rawson; Tahir Cader; Kevin M. Fox; William I. Gustafson; Christopher J. Mundy

As data centers proliferate in size and number, the endeavor to improve their energy efficiency and productivity is becoming increasingly important. We discuss the properties of a number of the proposed metrics of energy efficiency and productivity. In particular, we focus on the Data Center Energy Productivity (DCeP) metric, which is the ratio of useful work produced by the data center to the energy consumed performing that work. We describe our approach for using DCeP as the principal outcome of a designed experiment using a highly instrumented, high-performance computing data center. We found that DCeP was successful in clearly distinguishing different operational states in the data center, thereby validating its utility as a metric for identifying configurations of hardware and software that would improve (or even maximize) energy productivity. We also discuss some of the challenges and benefits associated with implementing the DCeP metric, and we examine the efficacy of the metric in making comparisons within a data center and among data centers.

Explore More