Is this you? Create Your Porfile

Pietro Cicotti

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pietro Cicotti is active.

Explore More

Publication

Featured researches published by Pietro Cicotti.

Computers in Biology and Medicine | 2012

A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach

Trilce Estrada; Boyu Zhang; Pietro Cicotti; Roger S. Armen

We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking.

ieee international conference on high performance computing data and analytics | 2012

Bamboo: translating MPI applications to a latency-tolerant, data-driven form

Tan Nguyen; Pietro Cicotti; Eric J. Bylaska; Dan Quinlan; Scott B. Baden

We present Bamboo, a custom source-to-source translator that transforms MPI C source into a data-driven form that automatically overlaps communication with available computation. Running on up to 98304 processors of NERSCs Hopper system, we observe that Bamboos overlap capability speeds up MPI implementations of a 3D Jacobi iterative solver and Cannons matrix multiplication. Bamboos generated code meets or exceeds the performance of hand optimized MPI, which includes split-phase coding, the method classically employed to hide communication. We achieved our results with only modest amounts of programmer annotation and no intrusive reprogramming of the original application source.

international conference on cluster computing | 2014

Evaluation of emerging memory technologies for HPC, data intensive applications

Amoghavarsha Suresh; Pietro Cicotti; Laura Carrington

DRAM technology has several shortcomings in terms of performance, energy efficiency and scaling. Several emerging memory technologies have the potential to compensate for the limitations of DRAM when replacing or complementing DRAM in the memory sub-system. In this paper, we evaluate the impact of emerging technologies on HPC and data-intensive workloads modeling a 5-level hybrid memory hierarchy design. Our results show that 1) an additional level of faster DRAM technology (i.e. EDRAM or HMC) interposed between the last level cache and DRAM can improve performance and energy efficiency, 2) a non-volatile main memory (i.e. PCM, STTRAM, or FeRAM) with a small DRAM acting as a cache can reduce the cost and energy consumption at large capacities, and 3) a combination of the two approaches, which essentially replaces the traditional DRAM with a small EDRAM or HMC cache between the last level cache and the non-volatile memory, can grant capacity and improved performance and energy efficiency. We also explore a hybrid DRAM-NVM design with a partitioned address space and find that this approach is marginally beneficial compared to the simpler 5-level design. Finally, we generalize our analysis and show the impact of emerging technologies for a range of latency and energy parameters.

international parallel and distributed processing symposium | 2017

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Ivy Bo Peng; Roberto Gioiosa; Gokcen Kestor; Pietro Cicotti; Erwin Laure; Stefano Markidis

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide ∼4× higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3× performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM.

international parallel and distributed processing symposium | 2014

Enabling In-Situ Data Analysis for Large Protein-Folding Trajectory Datasets

Boyu Zhang; Trilce Estrada; Pietro Cicotti

This paper presents a one-pass, distributed method that enables in-situ data analysis for large protein folding trajectory datasets by executing sufficiently fast, avoiding moving trajectory data, and limiting the memory usage. First, the method extracts the geometric shape features of each protein conformation in parallel. Then, it classifies sets of consecutive conformations into meta-stable and transition stages using a probabilistic hierarchical clustering method. Lastly, it rebuilds the global knowledge necessary for the intraand inter-trajectory analysis through a reduction operation. The comparison of our method with a traditional approach for a villin headpiece sub domain shows that our method generates significant improvements in execution time, memory usage, and data movement. Specifically, to analyze the same trajectory consisting of 20,000 protein conformations, our method runs in 41.5 seconds while the traditional approach takes approximately 3 hours, uses 6.9MB memory per core while the traditional method uses 16GB on one single node where the analysis is performed, and communicates only 4.4KB while the traditional method moves the entire dataset of 539MB. The overall results in this paper support our claim that our method is suitable for in-situ data analysis of folding trajectories.

ieee international conference on high performance computing data and analytics | 2012

Reengineering High-throughput Molecular Datasets for Scalable Clustering Using MapReduce

Trilce Estrada; Boyu Zhang; Pietro Cicotti; Roger S. Armen

We propose a linear clustering approach for large datasets of molecular geometries produced by high-throughput molecular dynamics simulations (e.g., protein folding and protein-ligand docking simulations). To this scope, we transform each three-dimensional (3D) molecular conformation into a single point in the 3D space reducing the space complexity while still encoding the molecular similarities and geometries. We assign an identifier to each single 3D point mapping a docked ligand, generate a tree from the whole space, and apply a tree-based clustering on the reduced conformation space that identifies most dense hyperspaces. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allows screening of very large conformation datasets not approachable with traditional clustering methods. We analyze results for datasets with different concentrations of optimal solutions, and draw conclusions about the limitations and usability of our method. The advantages of this approach make it attractive for complex applications in real-world high-throughput molecular simulations.

international symposium on memory management | 2017

RTHMS: a tool for data placement on hybrid memory system

Ivy Bo Peng; Roberto Gioiosa; Gokcen Kestor; Pietro Cicotti; Erwin Laure; Stefano Markidis

Traditional scientific and emerging data analytics applications require fast, power-efficient, large, and persistent memories. Combining all these characteristics within a single memory technology is expensive and hence future supercomputers will feature different memory technologies side-by-side. However, it is a complex task to program hybrid-memory systems and to identify the best object-to-memory mapping. We envision that programmers will probably resort to use default configurations that only require minimal interventions on the application code or system settings. In this work, we argue that intelligent, fine-grained data placement can achieve higher performance than default setups. We present an algorithm for data placement on hybrid-memory systems. Our algorithm is based on a set of single-object allocation rules and global data placement decisions. We also present RTHMS, a tool that implements our algorithm and provides recommendations about the object-to-memory mapping. Our experiments on a hybrid memory system, an Intel Knights Landing processor with DRAM and HBM, show that RTHMS is able to achieve higher performance than the default configuration. We believe that RTHMS will be a valuable tool for programmers working on complex hybrid-memory systems.

knowledge discovery and data mining | 2011

Data intensive analysis on the gordon high performance data and compute system

Robert S. Sinkovits; Pietro Cicotti; Shawn Strande; Mahidhar Tatineni; Paul Rodriguez; Nicole Wolter; Natasha Balac

The Gordon data intensive computing system was designed to handle problems with large memory requirements that cannot easily be solved using standard workstations or distributed memory supercomputers. We describe the unique features of Gordon that make it ideally suited for data mining and knowledge discovery applications: memory aggregation using the vSMP software solution from ScaleMP, I/O nodes containing 4 TB of low-latency flash memory, and a high performance parallel file system with 4 PB capacity. We also demonstrate how a number of standard data mining tools (e.g. Matlab, WEKA, R) can be used effectively on Dash, an early prototype of the full Gordon system.

conference on high performance computing (supercomputing) | 2006

Asynchronous programming with Tarragon

Pietro Cicotti; Scott B. Baden

Tarragon is an actor-based programming model and library for implementing parallel scientific applications requiring fine grain asynchronous communication. Tarragon raises the level of abstraction by encapsulating run-time services that mange the actor semantics. The workload is over-decomposed into many virtual processes called WorkUnits. WorkUnits can become ready for execution after receiving input; scheduling and communication services coordinate WorkUnit execution and management. In order to maintain balanced workloads, Tarragon automatically monitors workload distribution and redistributes as needed. Tarragon is novel in its support for meta data describing run-time virtual process structures used to manage actor semantics. This meta data may be used to guide run time services policies in order to optimize performance. We are currently applying Tarragon to the MCell cell microphysiology simulator and are considering other applications as well, such as sparse matrix linear algebra.

international parallel and distributed processing symposium | 2004

DGMonitor: a performance monitoring tool for sandbox-based desktop grid platforms

Pietro Cicotti; Andrew A. Chien

Summary form only given. Accurate, continuous resource monitoring and profiling are critical for enabling performance tuning and scheduling optimization. In desktop grid systems that employ sandboxing, these issues are challenging because (1) subjobs inside sandboxes are executed in a virtual computing environment and (2) the state of the virtual computing environment within the sandboxes is reset to empty after each subjob completes. DGMonitor is a monitoring tool, which builds a global, accurate, and continuous view of real resource utilization for desktop grids with sandboxing. Our monitoring tool measures performance unobtrusively and reliably, uses a simple performance data model, and is easy to use. Our measurements demonstrate that DGMonitor can scale to large desktop grids (up to 12000 workers) with low monitoring overhead in terms of resource consumption (less than 0.1%) on desktop PCs. Though we developed DGMonitor with the Entropia DCGrid platform, our tool is easily integrated into other desktop grid systems. In all of these systems, DGMonitor data can support existing and novel information services, particularly for performance tuning and scheduling.

Explore More