Anthony Simonet
University of Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anthony Simonet.
Future Generation Computer Systems | 2015
Anthony Simonet; Gilles Fedak; Matei Ripeanu
The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations performed on data: transfer, archiving, replication, deletion, etc. Indeed, data-intensive applications span over a large variety of devices and e-infrastructures which implies that many systems are involved in data management and processing. We propose Active Data, a programming model to automate and improve the expressiveness of data management applications. We first define the concept of data life cycle and introduce a formal model that allows to expose data life cycle across heterogeneous systems and infrastructures. The Active Data programming model allows code execution at each stage of the data life cycle: routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data. We implement and evaluate the model with four use cases: a storage cache to Amazon-S3, a cooperative sensor network, an incremental implementation of the MapReduce programming model and automated data provenance tracking across heterogeneous systems. Altogether, these scenarios illustrate the adequateness of the model to program applications that manage distributed and dynamic data sets. We also show that applications that do not leverage on data life cycle can still benefit from Active Data to improve their performances. We present a formal model to represent the life cycle of data distributed and replicated on many systems.We leverage this model to propose a programming model that allows users to react to life cycle progression.We illustrate the approach with examples of applications that we programmed with this model.
parallel, distributed and network-based processing | 2015
Anthony Simonet; Kyle Chard; Gilles Fedak; Ian T. Foster
Modern scientific experiments often involve multiple storage and computing platforms, software tools, and analysis scripts. The resulting heterogeneous environments make data management operations challenging, the significant number of events and the absence of data integration makes it difficult to track data provenance, manage sophisticated analysis processes, and recover from unexpected situations. Current approaches often require costly human intervention and are inherently error prone. The difficulties inherent in managing and manipulating such large and highly distributed datasets also limits automated sharing and collaboration. We study a real world e-Science application involving terabytes of data, using three different analysis and storage platforms, and a number of applications and analysis processes. We demonstrate that using a specialized data life cycle and programming model -- Active Data -- we can easily implement global progress monitoring, and sharing, recover from unexpected events, and automate a range of tasks.
petascale data storage workshop | 2013
Anthony Simonet; Gilles Fedak; Matei Ripeanu; Samer Al-Kiswany
Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.
Archive | 2012
Anthony Simonet; Gilles Fedak; Matei Ripeanu
poster in Computing in Hign Energy and Nuclear Physics (CHEP'12) | 2012
Oleg Lodygensky; Etienne Urbah; Simon Dadoun; Anthony Simonet; Gilles Fedak; Simon Delamare; Derrick Kondo; Laurent Duflot; Xavier Garrido
ieee international conference on smart city socialcom sustaincom | 2015
Haiwu He; Anthony Simonet; Julio Anjos Jose-Francisco Saray; Gilles Fedak; Bing Tang; Lu Lu; Xuanhua Shi; Hai Jin; Mircea Moca; Gheorghe Cosmin Silaghi; Asma Ben Cheikh; Heithem Abbes
Archive | 2015
Gilles Fedak; Julio Cesar Santos dos Anjos; Anthony Simonet
Archive | 2015
Gilles Fedak; Anthony Simonet
Archive | 2015
Gilles Fedak; Anthony Simonet
Archive | 2015
Gilles Fedak; Anthony Simonet