Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cristian Coarfa is active.

Publication


Featured researches published by Cristian Coarfa.


ACM Transactions on Computer Systems | 2006

Performance analysis of TLS Web servers

Cristian Coarfa; Peter Druschel; Dan S. Wallach

TLS is the protocol of choice for securing todays e-commerce and online transactions but adding TLS to a Web server imposes a significant overhead relative to an insecure Web server on the same platform. We perform a comprehensive study of the performance costs of TLS. Our methodology is to profile TLS Web servers with trace-driven workloads, replace individual components inside TLS with no-ops, and measure the observed increase in server throughput. We estimate the relative costs of each TLS processing stage, identifying the areas for which future optimizations would be worthwhile. Our results show that while the RSA operations represent the largest performance cost in TLS Web servers, they do not solely account for TLS overhead. RSA accelerators are effective for e-commerce site workloads since they experience low TLS session reuse. Accelerators appear to be less effective for sites where all the requests are handled by a TLS server because they have a higher session reuse rate. In this case, investing in a faster CPU might provide a greater boost in performance. Our experiments show that having a second CPU is at least as useful as an RSA accelerator. Our results seem to suggest that, as CPUs become faster, the cryptographic costs of TLS will become dwarfed by the CPU costs of the nonsecurity aspects of a Web server. Optimizations aimed at general purpose Web servers should continue to be a focus of research and would benefit secure Web servers as well.


acm sigplan symposium on principles and practice of parallel programming | 2005

An evaluation of global address space languages: co-array fortran and unified parallel C

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; François Cantonnet; Tarek A. El-Ghazawi; Ashrujit Mohanti; Yiyi Yao; Daniel G. Chavarría-Miranda

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.


international conference on parallel architectures and compilation techniques | 2004

A Multi-Platform Co-Array Fortran Compiler

Yuri Dotsenko; Cristian Coarfa; John M. Mellor-Crummey

Co-array Fortran (CAF) - a small set of extensions to Fortran 90 - is an emerging model for scalable, global address space parallel programming. CAFs global address space programming model simplifies the development of single-program-multiple-data parallel programs by shifting the burden for managing the details of communication from developers to compilers. This paper describes CAFC - a prototype implementation of an open-source, multiplatform CAF compiler that generates code well-suited for todays commodity clusters. The CAFC compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. The paper describes key details of CAFCs approach to generating efficient code for multiple platforms. Experiments compare the performance of CAF and MPI versions of several NAS parallel benchmarks on an Alpha cluster with a Quadrics interconnect, an Itanium 2 cluster with a Myrinet 2000 interconnect and an Itanium 2 cluster with a Quadrics interconnect. These experiments show that CAFC compiles CAF programs into code that delivers performance roughly equal to that of hand-optimized MPI programs.


languages and compilers for parallel computing | 2003

Co-array Fortran Performance and Potential: An NPB Experimental Study

Cristian Coarfa; Yuri Dotsenko; Jason Eckhardt; John M. Mellor-Crummey

Co-array Fortran (CAF) is an emerging model for scalable, global address space parallel programming that consists of a small set of extensions to the Fortran 90 programming language. Compared to MPI, the widely-used message-passing programming model, CAF’s global address space programming model simplifies the development of single-program-multiple-data parallel programs by shifting the burden for choreographing and optimizing communication from developers to compilers. This paper describes an open-source, portable, and retargetable CAF compiler under development at Rice University that is well-suited for today’s high-performance clusters. Our compiler translates CAF into Fortran 90 plus calls to one-sided communication primitives. Preliminary experiments comparing CAF and MPI versions of several of the NAS parallel benchmarks on an Itanium 2 cluster with a Myrinet 2000 interconnect show that our CAF compiler delivers performance that is roughly equal to or, in many cases, better than that of programs parallelized using MPI, even though support for global optimization of communication has not yet been implemented in our compiler.


The Journal of Supercomputing | 2006

Experiences with Sweep3D implementations in Co-array Fortran

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey

As part of the recent focus on increasing the productivity of parallel application developers, Co-array Fortran (CAF) has emerged as an appealing alternative to the Message Passing Interface (MPI). CAF belongs to the family of global address space parallel programming languages; such languages provide the abstraction of globally addressable memory accessed using one-sided communication. At Rice University we are developing caf c, an open source, multiplatform CAF compiler. Our earlier studies show that caf c-compiled CAF programs achieve similar performance to that of corresponding MPI codes for the NAS Parallel Benchmarks. In this paper, we present a study of several CAF implementations of Sweep3D on four modern architectures. We analyze the impact of using one-sided communication in Sweep3D, identify potential sources of inefficiencies and suggest ways to address them. Our results show that we achieve comparable performance to that of the MPI version on three cluster-based architectures and outperform it by up to 10 % on the SGI Altix 3000.


principles and practice of constraint programming | 2000

Random 3-SAT: The Plot Thickens

Cristian Coarfa; Demetrios D. Demopoulos; Alfonso San Miguel Aguirre; Devika Subramanian; Moshe Y. Vardi

This paper presents an experimental investigation of the following questions: how does the average-case complexity of random 3- SAT, understood as a function of the order (number of variables) for fixed density (ratio of number of clauses to order) instances, depend on the density? Is there a phase transition in which the complexity shifts from polynomial to exponential? Is the transition dependent or independent of the solver? To study these questions, we gather median and mean running times for a large collection of random 3-SAT problems while systematically varying both densities and the order of the instances. We use three different complete SAT solvers, embodying very different underlying algorithms: GRASP, CPLEX, and CUDD. We observe new phase transitions for all three solvers, where the median running time shifts from polynomial in the order to exponential. The location of the phase transition appears to be solver-dependent. While GRASP and CUDD shift from polynomial to exponential complexity at a density of about 3.8, CUDD exhibits this transition between densities of 0.1 and 0.5. We believe these experimental observations are important for understanding the computational complexity of random 3-SAT, and can be used as a justification for developing density-aware solvers for 3-SAT.


ieee international conference on high performance computing data and analytics | 2004

Experiences with co-array fortran on hardware shared memory platforms

Yuri Dotsenko; Cristian Coarfa; John M. Mellor-Crummey; Daniel G. Chavarría-Miranda

When performing source-to-source compilation of Co-array Fortran (CAF) programs into SPMD Fortran 90 codes for shared-memory multiprocessors, there are several ways of representing and manipulating data at the Fortran 90 language level. We describe a set of implementation alternatives and evaluate their performance implications for CAF variants of the STREAM, Random Access, Spark98 and NAS MG & SP benchmarks. We compare the performance of library-based implementations of one-sided communication with fine-grain communication that accesses remote data using load and store operations. Our experiments show that using application-level loads and stores for fine-grain communication can improve performance by as much as a factor of 24; however, codes requiring only coarse-grain communication can achieve better performance by using an architectures tuned memcpy for bulk data movement.


international conference on parallel and distributed systems | 2005

PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction

Cristian Coarfa; Yuri Dotsenko; John M. Mellor-Crummey; Luay Nakhleh; Usman Roshan

Accurate reconstruction of phylogenetic trees very often involves solving hard optimization problems, particularly the maximum parsimony (MP) and maximum likelihood (ML) problems. Various heuristics have been devised for solving these two problems; however, they obtain good results within reasonable time only on small datasets. This has been a major impediment for large-scale phylogeny reconstruction, particularly for the effort to assemble the Tree of Life - the evolutionary relationship of all organisms on earth. Roshan et al. recently introduced Rec-I-DCM3, an efficient and accurate meta-method for solving the MP problem on large datasets of up to 14,000 taxa. Nonetheless, a drastic improvement in Rec-I-DCM3s performance is still needed in order to achieve similar (or better) accuracy on datasets at the scale of the Tree of Life. In this paper, we improve the performance of Rec-I-DCM3 via parallelization. Experimental results demonstrate that our parallel method, PRec-I-DCM3, achieves significant improvements, both in speed and accuracy, over its sequential counterpart


International Journal of Bioinformatics Research and Applications | 2006

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction

Yuri Dotsenko; Cristian Coarfa; Luay Nakhleh; John M. Mellor-Crummey; Usman Roshan

Accurate reconstruction of phylogenetic trees very often involves solving hard optimization problems, particularly the maximum parsimony (MP) and maximum likelihood (ML) problems. Various heuristics have been devised for solving these two problems; however, they obtain good results within reasonable time only on small datasets. This has been a major impediment for large-scale phylogeny reconstruction, particularly for the effort to assemble the Tree of Life - the evolutionary relationship of all organisms on earth. Roshan et al. recently introduced Rec-I-DCM3, an efficient and accurate meta-method for solving the MP problem on large datasets of up to 14,000 taxa. Nonetheless, a drastic improvement in Rec-I-DCM3s performance is still needed in order to achieve similar (or better) accuracy on datasets at the scale of the Tree of Life. In this paper, we improve the performance of Rec-I-DCM3 via parallelization. Experimental results demonstrate that our parallel method, PRec-I-DCM3, achieves significant improvements, both in speed and accuracy, over its sequential counterpart


network and distributed system security symposium | 2002

Performance Analysis of TLS Web Servers.

Cristian Coarfa; Peter Druschel; Dan S. Wallach

Collaboration


Dive into the Cristian Coarfa's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel G. Chavarría-Miranda

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Usman Roshan

New Jersey Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge