Martins Innus | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martins Innus is active.

Explore More

Publication

Featured researches published by Martins Innus.

Journal of Applied Crystallography | 2002

SnB version 2.2: an example of crystallographic multiprocessing

Jason Rappleye; Martins Innus; Charles M. Weeks; Russ Miller

The computer program SnB implements a direct-methods algorithm, known as Shake-and-Bake, which optimizes trial structures consisting of randomly positioned atoms. Although large Shake-and-Bake applications require significant amounts of computing time, the algorithm can be easily implemented in parallel in order to decrease the real time required to achieve a solution. By using a master–worker model, SnB version 2.2 is amenable to all of the prevalent modern parallel-computing platforms, including (i) shared-memory multiprocessor machines, such as the SGI Origin2000, (ii) distributed-memory multiprocessor machines, such as the IBM SP, and (iii) collections of workstations, including Beowulf clusters. A linear speedup in the processing of a fixed number of trial structures can be obtained on each of these platforms.

international conference on cluster computing | 2015

Analysis of XDMoD/SUPReMM Data Using Machine Learning Techniques

Steven M. Gallo; Joseph P. White; Robert L. DeLeon; Thomas R. Furlani; Helen Ngo; Abani K. Patra; Matthew D. Jones; Jeffrey T. Palmer; Nikolay Simakov; Jeanette M. Sperhac; Martins Innus; Thomas Yearke; Ryan Rathsam

Machine learning techniques were applied to job accounting and performance data for application classification. Job data were accumulated using the XDMoD monitoring technology named SUPReMM, they consist of job accounting information, application information from Lariat/XALT, and job performance data from TACC_Stats. The results clearly demonstrate that community applications have characteristic signatures which can be exploited for job classification. We conclude that machine learning can assist in classifying jobs of unknown application, in characterizing the job mixture, and in harnessing the variation in node and time dependence for further analysis.

Scopus | 2014

Comprehensive, open-source resource usage measurement and analysis for HPC systems

James C. Browne; Robert L. DeLeon; Abani K. Patra; William L. Barth; John Hammond; Jones; Tom Furlani; Barry I. Schneider; Steven M. Gallo; Amin Ghadersohi; Ryan J. Gentner; Jeffrey T. Palmer; Nikolay Simakov; Martins Innus; Andrew E. Bruno; Joseph P. White; Cynthia D. Cornelius; Thomas Yearke; Kyle Marcus; G. Von Laszewski; Fugang Wang

The important role high‐performance computing (HPC) resources play in science and engineering research, coupled with its high cost (capital, power and manpower), short life and oversubscription, requires us to optimize its usage – an outcome that is only possible if adequate analytical data are collected and used to drive systems management at different granularities – job, application, user and system. This paper presents a method for comprehensive job, application and system‐level resource use measurement, and analysis and its implementation. The steps in the method are system‐wide collection of comprehensive resource use and performance statistics at the job and node levels in a uniform format across all resources, mapping and storage of the resultant job‐wise data to a relational database, which enables further implementation and transformation of the data to the formats required by specific statistical and analytical algorithms. Analyses can be carried out at different levels of granularity: job, user, application or system‐wide. Measurements are based on a new lightweight job‐centric measurement tool ‘TACC_Stats’, which gathers a comprehensive set of resource use metrics on all compute nodes and data logged by the system scheduler. The data mapping and analysis tools are an extension of the XDMoD project. The method is illustrated with analyses of resource use for the Texas Advanced Computing Centers Lonestar4, Ranger and Stampede supercomputers and the HPC cluster at the Center for Computational Research. The illustrations are focused on resource use at the system, job and application levels and reveal many interesting insights into system usage patterns and also anomalous behavior due to failure/misuse. The method can be applied to any system that runs the TACC_Stats measurement tool and a tool to extract job execution environment data from the system scheduler. Copyright

Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale | 2016

A Quantitative Analysis of Node Sharing on HPC Clusters Using XDMoD Application Kernels

Nikolay Simakov; Robert L. DeLeon; Joseph P. White; Thomas R. Furlani; Martins Innus; Steven M. Gallo; Matthew D. Jones; Abani K. Patra; Benjamin D. Plessinger; Jeanette M. Sperhac; Thomas Yearke; Ryan Rathsam; Jeffrey T. Palmer

In this investigation, we study how application performance is affected when jobs are permitted to share compute nodes. A series of application kernels consisting of a diverse set of benchmark calculations were run in both exclusive and node-sharing modes on the Center for Computational Researchs high-performance computing (HPC) cluster. Very little increase in runtime was observed due to job contention among application kernel jobs run on shared nodes. The small differences in runtime were quantitatively modeled in order to characterize the resource contention and attempt to determine the circumstances under which it would or would not be important. A machine learning regression model applied to the runtime data successfully fitted the small differences between the exclusive and shared node runtime data; it also provided insight into the contention for node resources that occurs when jobs are allowed to share nodes. Analysis of a representative job mix shows that runtime of shared jobs is affected primarily by the memory subsystem, in particular by the reduction in the effective cache size due to sharing; this leads to higher utilization of DRAM. Insights such as these are crucial when formulating policies proposing node sharing as a mechanism for improving HPC utilization.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2017

A Slurm Simulator: Implementation and Parametric Analysis

Nikolay Simakov; Martins Innus; Matthew D. Jones; Robert L. DeLeon; Joseph P. White; Steven M. Gallo; Abani K. Patra; Thomas R. Furlani

Slurm is an open-source resource manager for HPC that provides high configurability for inhomogeneous resources and job scheduling. Various Slurm parametric settings can significantly influence HPC resource utilization and job wait time, however in many cases it is hard to judge how these options will affect the overall HPC resource performance. The Slurm simulator can be a very helpful tool to aid parameter selection for a particular HPC resource. Here, we report our implementation of a Slurm simulator and the impact of parameter choice on HPC resource performance. The simulator is based on a real Slurm instance with modifications to allow simulation of historical jobs and to improve the simulation speed. The simulator speed heavily depends on job composition, HPC resource size and Slurm configuration. For an 8000 cores heterogeneous cluster, we achieve about 100 times acceleration, e.g. 20 days can be simulated in 5 h. Several parameters affecting job placement were studied. Disabling node sharing on our 8000 core cluster showed a 45% increase in the time needed to complete the same workload. For a large system (>6000 nodes) comprised of two distinct sub-clusters, two separate Slurm controllers and adding node sharing can cut waiting times nearly in half.

Proceedings of the Practice and Experience on Advanced Research Computing | 2018

Automatic Characterization of HPC Job Parallel Filesystem I/O Patterns

Joseph P. White; Alexander D. Kofke; Robert L. DeLeon; Martins Innus; Matthew D. Jones; Thomas R. Furlani

As part of the NSF funded XMS project, we are actively researching automatic detection of poorly performing HPC jobs. To aid the analysis we have generated a taxonomy of the temporal I/O patterns for HPC jobs. In this paper we describe the design of temporal pattern characterization algorithms for HPC job I/O. We have implemented these algorithms in the Open XDMoD job analysis framework. These I/O classifications include periodic patterns and a variety of characteristic non-periodic patterns. We present an analysis of the I/O patterns observed on the/scratch filesystem on an academic HPC cluster. This type of analysis can be extended to other HPC usage data such as memory, CPU and interconnect usage. Ultimately this analysis will be used to improve HPC throughput and efficiency by, for example, automatically identifying anomalous HPC jobs.

Proceedings of the Practice and Experience on Advanced Research Computing | 2018

Slurm Simulator: Improving Slurm Scheduler Performance on Large HPC systems by Utilization of Multiple Controllers and Node Sharing

Nikolay Simakov; Robert L. DeLeon; Martins Innus; Matthew D. Jones; Joseph P. White; Steven M. Gallo; Abani K. Patra; Thomas R. Furlani

A Slurm simulator was used to study the potential benefits of using multiple Slurm controllers and node-sharing on the TACC Stampede 2 system. Splitting a large cluster into smaller sub-clusters with separate Slurm controllers can offer better scheduling performance and better responsiveness due to an increased computational capability which increases the backfill scheduler efficiency. The disadvantage is additional hardware, more maintenance and an incapability to run jobs across the sub-clusters. Node sharing can increase system throughput by allowing several sub-node jobs to be executed on the same node. However, node sharing is more computationally demanding and might not be advantageous on larger systems. The Slurm simulator allows an estimation of the potential benefits from these configurations and provides information on the advantages to be expected from such a configuration deployment. In this work, multiple Slurm controllers and node-sharing were tested on a TACC Stampede 2 system consisting of two distinct node types: 4,200 Intel Xeon Phi Knights Landing (KNL) nodes and 1,736 Intel Xeon Skylake-X (SLX) nodes. For this system utilization of separate controllers for KNL and SLX nodes with node sharing allowed on SLX nodes resulted in a 40% reduction in waiting times for jobs executed on the SLX nodes. This improvement can be attributed to the better performance of the backfill scheduler. It scheduled 30% more SLX jobs, has a 30% reduction in the fraction of cycles that hit the time-limit and nearly doubles the jobs scheduling attempts.

Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact | 2017

Challenges of Workload Analysis on Large HPC Systems: A Case Study on NCSA Blue Waters

Joseph P. White; Martins Innus; Matthew D. Jones; Robert L. DeLeon; Nikolay Simakov; Jeffrey T. Palmer; Steven M. Gallo; Thomas R. Furlani; Michael T. Showerman; Robert J. Brunner; Andry Kot; Gregory H. Bauer; Brett Bode; Jeremy Enos; William T. Kramer

Blue Waters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. The analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013 - September 30, 2016). This paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various software engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to Blue Waters engendered by the extremely large jobs that it typically executed.

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure | 2015

TAS view of XSEDE users and usage

Robert L. DeLeon; Thomas R. Furlani; Steven M. Gallo; Joseph P. White; Matthew D. Jones; Abani K. Patra; Martins Innus; Thomas Yearke; Jeffrey T. Palmer; Jeanette M. Sperhac; Ryan Rathsam; Nikolay Simakov; Gregor von Laszewski; Fugang Wang

The Technology Audit Service has developed, XDMoD, a resource management tool. This paper utilizes XDMoD and the XDMoD data warehouse that it draws from to provide a broad overview of several aspects of XSEDE users and their usage. Some important trends include: 1) in spite of a large yearly turnover, there is a core of users persisting over many years, 2) user job submission has changed from primarily faculty members to students and postdocs, 3) increases in usage in Molecular Biosciences and Materials Research has outstripped that of other fields of science, 4) the distribution of user external funding is bimodal with one group having a large ratio of external funding to internal XSEDE funding (ie, CPU cycles) and a second group having a small ratio of external to internal (CPU cycle) funding, 5) user job efficiency is also bimodal with a group of presumably new users running mainly small inefficient jobs and another group of users running larger more efficient jobs, 6) finally, based on an analysis of citations of published papers, the scientific impact of XSEDE coupled with the service providers is demonstrated in the statistically significant advantage it provides to the research of its users.

ieee international conference on high performance computing data and analytics | 2011

Visualizing the wake of aquatic swimmers

Iman Borazjani; Mohsen Daghooghi; Nathaniel S. Barlow; Martins Innus; Adrian Levesque; Alisa Neeman; Matthew D. Jones; Cynthia D. Cornelius

Fish-like swimming is fascinating not only for its fundamental scientific value but also for engineering biomimetically inspired vehicles. Discovering physical principles behind the evolution of different aquatic swimmers can drastically improve the design of such vehicles. We are interested in the evolution of different caudal fin profiles (shape) because it is hypothesized that most of the thrust force is generated by the caudal fin. In fact, the caudal fin shape varies from hemocercal in mackerel to almost trapezoidal in trout and heterocercal in sharks. We investigate if such shape differences have hydrodynamic implications using numerical simulations. The equations governing the fluid motion are solved in the non-inertial reference frame attached to the fish center of mass (COM) via the curvilinear/immersed boundary method (CURVIB), which is capable of carrying out direct numerical simulation of flows with complex moving boundaries. The motion of the fish body is prescribed based on carangiform kinematics while the motion of the COM is calculated based on the fluid forces on the fish body through the fluid-structure interaction algorithm of Borazjani et. al. (2008) [3]. The reader is referred to Borazjani & Sotiropoulos (2010) [4] for the details of the method. For self-propelled simulations, the virtual swimmers start to undulate in an initially stagnant fluid and the swimming speed is determined based on the forces on the fish body. Therefore, physical parameters based on the swimming velocity change as the swimmer accelerates until the quasi-steady state is reached. The computational domain and time step for the self-propelled fish body simulations in the free stream is a cuboid with dimensions 2LxLx7L, which is discretized with 5.5 million grid nodes. The domain width 2L and height L are more than ten times the mackerel width 0.2L and height 0.1L, respectively. The fish is placed 1.5L from the inlet plane in the axial direction and centered in the transverse and the vertical directions. The simulations are partly run on our in-house computing cluster, Nami, with a total of 448 computing cores distributed across 28 nodes, each node containing a 2x8 Magny-Cours core (AMD 2.0 GHz). The memory available is 2GB RAM/core, 896GB total and the nodes are connected through QDR Infiniband. Some of the simulations were run on the dual-quad core nodes in the u2 cluster at CCR; these are also connected through QDR Infiniband. The simulations generate velocity field data in VTK format [5], allowing one to apply ParaViews tetrahedralize algorithm [2] to the 5.5 million point data set. The result is shown in Figure 1 for a swimming mackerel, where volume rendered points are colored by the magnitude of the velocity field. The domain has been truncated in the vicinity of the fish and an appropriate colormap has been chosen to emphasis dynamics local to the fish. For each time-step and viewing angle, the tetrahedralize algorithm is applied. A single frame takes (at least) 10 minutes to render on an Intel dual-quad core node (w/ 24GB RAM). An animation of 95 frames was generated in a batch job using off-screen rendering. The animation can be downloaded at [1].

Explore More