Alain Ketterlin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alain Ketterlin is active.

Explore More

Publication

Featured researches published by Alain Ketterlin.

Pattern Recognition | 2011

A global averaging method for dynamic time warping, with applications to clustering

François Petitjean; Alain Ketterlin; Pierre Gançarski

Mining sequential data is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. Most works in this field are centred on the definition and use of a distance (or, at least, a similarity measure) between sequences of elements. A measure called dynamic time warping (DTW) seems to be currently the most relevant for a large panel of applications. This article is about the use of DTW in data mining algorithms, and focuses on the computation of an average of a set of sequences. Averaging is an essential tool for the analysis of data. For example, the K-means clustering algorithm repeatedly computes such an average, and needs to provide a description of the clusters it forms. Averaging is here a crucial step, which must be sound in order to make algorithms work accurately. When dealing with sequences, especially when sequences are compared with DTW, averaging is not a trivial task. Starting with existing techniques developed around DTW, the article suggests an analysis framework to classify averaging techniques. It then proceeds to study the two major questions lifted by the framework. First, we develop a global technique for averaging a set of sequences. This technique is original in that it avoids using iterative pairwise averaging. It is thus insensitive to ordering effects. Second, we describe a new strategy to reduce the length of the resulting average sequence. This has a favourable impact on performance, but also on the relevance of the results. Both aspects are evaluated on standard datasets, and the evaluation shows that they compare favourably with existing methods. The article ends by describing the use of averaging in clustering. The last section also introduces a new application domain, namely the analysis of satellite image time series, where data mining techniques provide an original approach.

symposium on code generation and optimization | 2008

Prediction and trace compression of data access addresses through nested loop recognition

Alain Ketterlin; Philippe Clauss

This paper describes an algorithm that takes a trace (i.e., a sequence of numbers or vectors of numbers) as input, and from that produces a sequence of loop nests that, when run, produces exactly the original sequence. The input format is suitable for any kind of program execution trace, and the output conforms to standard models of loop nests. The first, most obvious, use of such an algorithm is for program behavior modeling for any measured quantity (memory accesses, number of cache misses, etc.). Finding loops amounts to detecting periodic behavior and provides an explanatory model. The second application is trace compression, i.e., storing the loop nests instead of the original trace. Decompression consists of running the loops, which is easy and fast. A third application is value prediction. Since the algorithm forms loops while reading input, it is able to extrapolate the loop under construction to predict further incoming values. Throughout the paper, we provide examples that explain our algorithms. Moreover, we evaluate trace compression and value prediction on a subset of the SPEC2000 benchmarks.

high performance embedded architectures and compilers | 2012

Polyhedral parallelization of binary code

Benoît Pradelle; Alain Ketterlin; Philippe Clauss

Many automatic software parallelization systems have been proposed in the past decades, but most of them are dedicated to source-to-source transformations. This paper shows that parallelizing executable programs is feasible, even if they require complex transformations, and in effect decouples parallelization from compilation, for example, for closed-source or legacy software, where binary code is the only available representation. We propose an automatic parallelizer, which is able to perform advanced parallelization on binary code. It first parses the binary code and extracts high-level information. From this information, a C program is generated. This program captures only a subset of the program semantics, namely, loops and memory accesses. This C program is then parallelized using existing, state-of-the-art parallelizers, including advanced polyhedral parallelizers. The original program semantics is then re-injected, and the transformed parallel loop nests are recompiled by a standard C compiler. We show on the PolyBench benchmark suite that our system successfully detects and parallelizes almost all the loop nests from the binary code, using a recent polyhedral loop parallelizer as a backend. The paper ends by elaborating a strategy to parallelize more complex programs, such as those containing non-linear accesses to memory, and provides a few example case-studies.

international symposium on performance analysis of systems and software | 2011

Efficient memory tracing by program skeletonization

Alain Ketterlin; Philippe Clauss

Memory profiling is useful for a variety of tasks, most notably to produce traces of memory accesses for cache simulation. However, instrumenting every memory access incurs a large overhead, in the amount of code injected in the original program as well as in execution time. This paper describes how static analysis of the binary code can be used to reduce the amount of instrumentation. The analysis extracts loops and memory access functions by tracking how memory addresses are computed from a small set of base registers holding, e.g., routine parameters and loop counters. Instrumenting these base registers instead of memory operands reduces the weight of instrumentation, first statically by reducing the amount of injected code, and second dynamically by reducing the amount of instrumentation code actually executed. Also, because the static analysis extracts intermediate-level program structures (loops and branches) and access functions in symbolic form, it is easy to transform the original executable into a skeleton program that consumes base register values and produces memory addresses. The first advantage of using a skeleton is to be able to overlap the execution of the instrumented program with that of the skeleton, thereby reducing the overhead of recomputing addresses. The second advantage is that the skeleton program and its shorter input trace can be saved and rerun as many times as necessary without requiring access to the original architecture, e.g., for cache design space exploration.

Image and Signal Processing for Remote Sensing II | 1995

Unsupervised learning of spatial regularities

Alain Ketterlin; Denis Blamont; Jerzy J. Korczak

This paper examines the task of remote-sensing image analysis as an unsupervised learning task. Images are usually (very) large, and represent complex objects. Unsupervised learning, or clustering, may be of great help at several phases of the analysis. First, this paper describes a clustering algorithm. Then, the application of this algorithm to the segmentation phase is demonstrated. It is then argued that radiometry is insufficient to fully understand the scene in thematic terms. The next level of complexity is related to the incorporation of spatial information. This paper shows how this kind of data can be expressed. Clustering is then extended to deal with such complex, structured data. Experiments are provided to assess the validity of the approach. The set of experiments proves that clustering is a fundamental tool of remote-sensing image analysis, and that its scope may well be larger than was initially expected.

international conference on embedded computer systems architectures modeling and simulation | 2015

Dynamic re-vectorization of binary code

Nabil Hallou; Erven Rohou; Philippe Clauss; Alain Ketterlin

In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, including legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions and propose to convert automatically, and at runtime, loops vectorized for an older version of the SIMD extension to a newer one. We propose a lightweight mechanism, that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependences and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The re-vectorizer is implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require rewriting binaries, and operates during program execution.

international conference on artificial intelligence and statistics | 1996

Hierarchical Clustering of Composite Objects with a Variable Number of Components

Alain Ketterlin; Pierre Gançarski; Jerzy J. Korczak

This paper examines the problem of clustering a sequence of objects that cannot be described with a predefined list of attributes (or variables). In many applications, a fixed list of attributes cannot be determined without substantial pre-processing. An extension of the traditional propositional formalism is thus proposed, which allows objects to be represented as a set of components, i.e. there is no mapping between attributes and values. The algorithm used for clustering is briefly illustrated, and mechanisms to handle sets are described. Some empirical evaluations are also provided to assess the validity of the approach.

Image and Signal Processing for Remote Sensing | 1994

Thematic image segmentation by a concept formation algorithm

Jerzy J. Korczak; Denis Blamont; Alain Ketterlin

Unsupervised empirical machine learning algorithms aim at discovering useful concepts in a stream of unclassified data. Since image segmentation is a particular instance of the problem addressed by these methods, one of these algorithms has been employed to automatically segment remote-sensing images. The region under study is Nepalese Himalayas. Because of important variations in altitude, effects of lighting conditions are multiplied, and the image becomes a very complex object. The behavior of the clustering algorithm is studied on such data. Because of the hierarchical organization of the resulting classes, the segmentation produced may be interpreted in a variety of thematic mappings, depending on the desired level of detail. Experimental results prove the influence of lighting conditions, but also demonstrate very good accuracy on sectors of the image where lighting in almost homogenous.

compiler construction | 2014

Improving the Performance of X10 Programs by Clock Removal

Paul Feautrier; Eric Violard; Alain Ketterlin

X10 is a promising recent parallel language designed specifically to address the challenges of productively programming a wide variety of target platforms. The sequential core of X10 is an object-oriented language in the Java family. This core is augmented by a few parallel constructs that create activities as a generalization of the well known fork/join model. Clocks are a generalization of the familiar barriers. Synchronization on a clock is specified by the Clock.advanceAll() method call. Activities that execute advances stall until all existent activities have done the same, and then are released at the same (logical) time.

international conference on parallel processing | 2017

A Space and Bandwidth Efficient Multicore Algorithm for the Particle-in-Cell Method

Yann Barsamian; Arthur Charguéraud; Alain Ketterlin

The Particle-in-Cell (PIC) method allows solving partial differential equation through simulations, with important applications in plasma physics. To simulate thousands of billions of particles on clusters of multicore machines, prior work has proposed hybrid algorithms that combine domain decomposition and particle decomposition with carefully optimized algorithms for handling particles processed on each multicore socket. Regarding the multicore processing, existing algorithms either suffer from suboptimal execution time, due to sorting operations or use of atomic instructions, or suffer from suboptimal space usage. In this paper, we propose a novel parallel algorithm for two-dimensional PIC simulations on multicore hardware that features asymptotically-optimal memory consumption, and does not perform unnecessary accesses to the main memory. In practice, our algorithm reaches 65% of the maximum bandwidth, and shows excellent scalability on the classical Landau damping and two-stream instability test cases.

Explore More