Trilce Estrada | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Trilce Estrada is active.

Explore More

Publication

Featured researches published by Trilce Estrada.

Journal of Grid Computing | 2009

Performance Prediction and Analysis of BOINC Projects: An Empirical Study with EmBOINC

Trilce Estrada; David P. Anderson

Middleware systems for volunteer computing convert a set of computers that is large and diverse (in terms of hardware, software, availability, reliability, and trustworthiness) into a unified computing resource. This involves a number of scheduling policies and parameters, which have a large impact on the throughput and other performance metrics. How can we study and refine these policies? Experimentation in the context of a working project is problematic, and it is difficult to accurately model complex middleware in a conventional simulator. Instead, we use an approach in which the policies being studied are “emulated”, using parts of the actual middleware. In this paper we describe EmBOINC, an emulator based on the BOINC middleware system. EmBOINC simulates a population of volunteered clients (including heterogeneity, churn, availability, and reliability) and emulates the BOINC server components. After describing the design of EmBOINC and its validation, we present three case studies in which the impact of different scheduling policies are quantified in terms of throughput, latency, and starvation metrics.

cluster computing and the grid | 2009

Modeling Job Lifespan Delays in Volunteer Computing Projects

Trilce Estrada; Kevin Reed

Volunteer Computing (VC) projects harness the power of computers owned by volunteers across the Internet to perform hundreds of thousands of independent jobs. In VC projects, the path leading from the generation of jobs to the validation of the job results is characterized by delays hidden in the job lifespan, i.e., distribution delay,in-progress delay, and validation delay. These delays are difficult to estimate because of the dynamic behavior and heterogeneity of VC resources. A wrong estimation of these delays can cause the loss of project throughput and job latency in VC projects. In this paper, we evaluate the accuracy of several probabilistic methods to model the upper time bounds of these delays. We show how our selected models predict up-and-down trends in traces from existing VC projects. The use of our models provides valuable insights on selecting project deadlines and taking scheduling decisions. By accurately predicting job lifespan delays, our models lead to more efficient resource use, higher project throughput, and lower job latency in VC projects.

Computers in Biology and Medicine | 2012

A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach

Trilce Estrada; Boyu Zhang; Pietro Cicotti; Roger S. Armen

We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking.

international conference on data mining | 2014

Time Series Join on Subsequence Correlation

Abdullah Mueen; Hossein Hamooni; Trilce Estrada

We consider the problem of joining two long time series based on their most correlated segments. Two time series can be joined at any locations and for arbitrary length. Such join locations and length provide useful knowledge about the synchrony of the two time series and have applications in many domains including environmental monitoring, patient monitoring and power monitoring. However, join on correlation is a computationally expensive task, specially when the time series are large. The naive algorithm requires O (n4) computation where n is the length of the time series. We propose an algorithm, named Jocor, that uses two algorithmic techniques to tackle the complexity. First, the algorithm reuses the computation by caching sufficient statistics and second, the algorithm prunes unnecessary correlation computation by admissible heuristics. The algorithm runs orders of magnitude faster than the naive algorithm and enables us to join long time series as well as many small time series. We propose a variant of Jocor for fast approximation and an extension to a GPU-based parallel method to bring down the running-time to interactive level for analytics applications. We show three independent uses of time series join on correlation which are made possible by our algorithm.

workshop on parallel and distributed simulation | 2007

SimBA: A Discrete Event Simulator for Performance Prediction of Volunteer Computing Projects

Andre Kerstens; Trilce Estrada; David A. Flores; Patricia J. Teller

SimBA (Simulator of BOINC Applications) is a discrete event simulator that models the main functions of BOINC, which is a well-known framework used in Volunteer Computing (VC) projects. SimBA simulates the generation and distribution of tasks that are executed in a highly volatile, heterogeneous, and distributed environment as well as the collection and validation of completed tasks. To understand the strengths and weaknesses of BOINC under distinct scenarios, project designers want to study and quantify project performance without negatively affecting people in the VC community. Although this is not possible on production systems, it is possible using SimBA, which is capable of testing a wide range of hypotheses in a short period of time. Our experience to date indicates that SimBA is a reliable tool for performance prediction of VC projects. Preliminary results show that SimBAs predictions of Predictor@Home performance are within approximately 5% of the performance reported by this BOINC project.

international parallel and distributed processing symposium | 2009

EmBOINC: An emulator for performance analysis of BOINC projects

Trilce Estrada; Kevin Reed; David P. Anderson

BOINC is a platform for volunteer computing. The server component of BOINC embodies a number of scheduling policies and parameters that have a large impact on the projects throughput and other performance metrics. We have developed a system, EmBOINC, for studying these policies and parameters. EmBOINC uses a hybrid approach: it simulates a population of volunteered clients (including heterogeneity, churn, availability, reliability) and it emulates the server component; that is, it uses the actual server software and its associated database. This paper describes the design of EmBOINC and validates its results based on trace data from an existing BOINC project.

computing frontiers | 2008

A distributed evolutionary method to design scheduling policies for volunteer computing

Trilce Estrada; Olac Fuentes

Volunteer Computing (VC) is a paradigm that uses idle cycles from computing resources donated by volunteers and connected through the Internet to compute large-scale, loosely-coupled simulations. A big challenge in VC projects is the scheduling of work-units across heterogeneous, volatile, and error-prone computers. The design of effective scheduling policies for VC projects involves subjective and time-demanding tuning that is driven by the knowledge of the project designer. VC projects are in need of a faster and project-independent method to automate the scheduling design. To automatically generate a scheduling policy, we must explore the extremely large space of syntactically valid policies. Given the size of this search space, exhaustive search is not feasible. Thus in this paper we propose to solve the problem using an evolutionary method to automatically generate a set of scheduling policies that are project-independent, minimize errors, and maximize throughput in VC projects. Our method includes a genetic algorithm where the representation of individuals, the fitness function, and the genetic operators are specifically tailored to get effective policies in a short time. The effectiveness of our method is evaluated with SimBA, a Simulator of BOINC Applications. Contrary to manually-designed scheduling policies that often perform well only for the specific project they were designed for and require months of tuning, our resulting scheduling policies provide better overall throughput across the different VC projects considered in this work and were generated by our method in a time window of one week.

conference on high performance computing (supercomputing) | 2006

SimBA: a discrete event simulator for performance prediction of volunteer computing projects

David A. Flores; Trilce Estrada; Patricia J. Teller; Andre Kerstens

SimBA (Simulator of BOINC Applications) is a discrete event simulator that accurately models the main functions of BOINC, a master-worker runtime framework. SimBA generates, distributes, and monitors tasks executed in a highly volatile, heterogeneous, and distributed environment such as Volunteer Computing (VC). In addition, it collects and validates results of executed tasks.To understand the strengths and weaknesses of BOINC under distinct scenarios, project designers must study and quantify its performance without affecting the VC community. Although this is not possible on production systems, it is possible using SimBA, which is capable of testing a wide range of hypotheses in a short period of time.Our experience to date indicates that SimBA is a reliable tool for performance prediction of VC projects. Preliminary results show that SimBAs predictions of Predictor@Home performance are within approximately 5% of the performance reported by this BOINC project.

international parallel and distributed processing symposium | 2014

Enabling In-Situ Data Analysis for Large Protein-Folding Trajectory Datasets

Boyu Zhang; Trilce Estrada; Pietro Cicotti

This paper presents a one-pass, distributed method that enables in-situ data analysis for large protein folding trajectory datasets by executing sufficiently fast, avoiding moving trajectory data, and limiting the memory usage. First, the method extracts the geometric shape features of each protein conformation in parallel. Then, it classifies sets of consecutive conformations into meta-stable and transition stages using a probabilistic hierarchical clustering method. Lastly, it rebuilds the global knowledge necessary for the intraand inter-trajectory analysis through a reduction operation. The comparison of our method with a traditional approach for a villin headpiece sub domain shows that our method generates significant improvements in execution time, memory usage, and data movement. Specifically, to analyze the same trajectory consisting of 20,000 protein conformations, our method runs in 41.5 seconds while the traditional approach takes approximately 3 hours, uses 6.9MB memory per core while the traditional method uses 16GB on one single node where the analysis is performed, and communicates only 4.4KB while the traditional method moves the entire dataset of 539MB. The overall results in this paper support our claim that our method is suitable for in-situ data analysis of folding trajectories.

ieee international conference on high performance computing data and analytics | 2012

Reengineering High-throughput Molecular Datasets for Scalable Clustering Using MapReduce

Trilce Estrada; Boyu Zhang; Pietro Cicotti; Roger S. Armen

We propose a linear clustering approach for large datasets of molecular geometries produced by high-throughput molecular dynamics simulations (e.g., protein folding and protein-ligand docking simulations). To this scope, we transform each three-dimensional (3D) molecular conformation into a single point in the 3D space reducing the space complexity while still encoding the molecular similarities and geometries. We assign an identifier to each single 3D point mapping a docked ligand, generate a tree from the whole space, and apply a tree-based clustering on the reduced conformation space that identifies most dense hyperspaces. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allows screening of very large conformation datasets not approachable with traditional clustering methods. We analyze results for datasets with different concentrations of optimal solutions, and draw conclusions about the limitations and usability of our method. The advantages of this approach make it attractive for complex applications in real-world high-throughput molecular simulations.

Explore More