Thierry Lavoie
École Polytechnique de Montréal
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thierry Lavoie.
conference on privacy, security and trust | 2011
François Gauthier; Dominic Letarte; Thierry Lavoie; Ettore Merlo
Whether for development, maintenance or refactoring, multiple steps in software development cycle require comprehension of a programs access control model (AC model). In this paper, we present a novel approach to reverse-engineer AC model structure from PHP source code. Using an hybrid approach combining static analysis and model checking techniques, we are able to extract AC model structure in a fast and precise way.
international workshop on software clones | 2012
Thierry Lavoie; Ettore Merlo
This paper presents an original clone detection technique which is an accurate approximation of the Levenshtein distance. It uses groups of tokens extracted from source code called windowed-tokens. From these, frequency vectors are then constructed and compared with the Manhattan distance in a metric tree. The goal of this new technique is to provide a very high precision clone detection technique while keeping a high recall. Precision and recall measurement is done with respect to the Levenshtein distance. The testbench is a large scale open source software. The collected results proved the technique to be fast, simple, and accurate. Finally, this article presents further research opportunities.
annual computer security applications conference | 2013
François Gauthier; Thierry Lavoie; Ettore Merlo
Software clone detection techniques identify fragments of code that share some level of syntactic similarity. In this study, we investigate security-sensitive clone clusters: clusters of syntactically similar fragments of code that are protected by some privileges. From a security perspective, security-sensitive clone clusters can help reason about the implemented security model: given syntactically similar fragments of code, it is expected that they are protected by similar privileges. We hypothesize that clones that violate this assumption, defined as security-discordant clones, are likely to reveal weaknesses and flaws in access control models. In order to characterize security-discordant clones, we investigated two of the largest and most popular open-source PHP applications: Joomla! and Moodle, with sizes ranging from hundred thousands to more than a million lines of code. Investigation of security-discordant clone clusters in these systems revealed several previously undocumented, recurring, and application-independent security weaknesses. Moreover, security-discordant clones also revealed four, previously unreported, security flaws. Results also show how these flaws were revealed through the investigation of as little as 2% of the code base. Distribution of weaknesses and flaws between the two systems is investigated and discussed. Potential extensions to this exploratory work are also presented.
international workshop on software clones | 2011
Thierry Lavoie; Ettore Merlo
Clone detection techniques quality and performance evaluation require a system along with its clone oracle, that is a reference database of all accepted clones in the investigated system. Many challenges, including finding an adequate clone definition and scalability to industrial size systems, must be overcome to create good oracles. This paper presents an original method to construct clone oracles based on the Levenshtein metric. Although other oracles exist, this is the largest known oracle for type-3 clones that was created by an automated process on massive data sets. The method behind the creation of the oracle as well as actual oracles characteristics are presented. Discussion of the results in relation to other ways of building oracles is also provided along with future research possibilities.
international workshop on software clones | 2010
Thierry Lavoie; Michael Eilers-Smith; Ettore Merlo
Graphics Processing Unit (GPU) have been around for a while. Although they are primarily used for high-end 3D graphics processing, their use is now acknowledged for general massive parallel computing. This paper presents an original technique based on [10] to compute many instances of the longest common subsequence problem on a generic GPU architecture using classic DP-matching [7]. Application of this algorithm has been found useful to address the problem of filtering false positives produced by metrics-based clone detection methods. Experimental results of this application are presented along with a discussion of possibilities of using GPUs for other cloning related problems.
working conference on reverse engineering | 2012
Thierry Lavoie; Foutse Khomh; Ettore Merlo; Ying Zou
During the re-engineering of legacy software systems, a good knowledge of the history of past modifications on the system is important to recover the design of the system and transfer its functionalities. In the absence of a reliable revision history, development teams often rely on system experts to identify hidden history and recover software design. In this paper, we propose a new technique to infer the history of repository file modifications of a software system using only past released versions of the system. The proposed technique relies on nearest-neighbor clone detection using the Manhattan distance. We performed an empirical evaluation of the technique using Tomcat, JHotDraw and Adempiere SVN information as our oracle of file operations, and obtained an average precision of 97% and an average recall of 98%. Our evaluation also highlighted the phenomena of implicit Moves, which are, Moves between a systems versions, that are not recorded in the SVN repository. In the absence of revision history and software experts, development teams can make use of the proposed technique during the re-engineering of their legacy systems.
international workshop on software clones | 2013
Ettore Merlo; Thierry Lavoie; Pascal Potvin; Pierre Busnel
This paper presents results from an experience of transferring the CLAN clone detection technology into a telecommunication industrial setting. Eleven proprietary systems have been analyzed for a total of about 94 MLOC of C/C++ and Java source code. The characteristics of the analyzed systems together with a description of the Web portal that is used as an interface to the clone analysis environment is described. Reported results include figures and diagrams about clone frequencies, types, and similarity distributions. Processing times including parsing, clone clustering, and Dynamic Programming visualisation are presented. A discussion about lesson learned and future research work is also presented from an industrial point of view for real life practical applications of clone detection.
international workshop on software clones | 2013
Thierry Lavoie; Ettore Merlo
This paper focuses on the applicability of clone detectors for system evolution understanding. Specifically, it is a case study of Firefox for which the development release cycle changed from a slow release cycle to a fast release cycle two years ago. Since the transition of the release cycle, three times more versions of the software were deployed. To understand whether or not the changes between the newer versions are as significant as the changes in the older versions, we measured the similarity between consecutive versions.We analyzed 82MLOC of C/C++ code to compute the overall change distribution between all existing major versions of Firefox. The results indicate a significant decrease in the overall difference between many versions in the fast release cycle. We discuss the results and highlight how differently the versions have evolved in their respective release cycle. We also relate our results with other results assessing potential changes in the quality of Firefox. We conclude the paper by raising questions on the impact of a fast release cycle.
working conference on reverse engineering | 2009
Ettore Merlo; Thierry Lavoie
A clone classification scheme is presented based on the structure of the Abstract Syntax Tree (AST) of a system and on the similarity measures between syntactic blocks of source code. Syntactic blocks in a system may represent classes, methods, statement blocks, and so on.An inclusion relation may exist between the source code lines of some of these blocks, depending of the syntactic structure of the source code. For example, a block corresponding to a method body may contain several possibly nested statement blocks.This paper introduces an algorithm to identify different types of clone relations between blocks that are either method bodies or statement blocks.Clone relation types between these blocks are interesting because they indicate properties of the structural relation of these clones and may give hints on re-factoring opportunities.The proposed structural type clone classification scheme has been investigated on two open source Java systems, Tomcat and Eclipse.Experimental results are presented. Execution time performance of clone classification has been measured and reported. Results and further proposed research are discussed.
Information & Software Technology | 2017
Thierry Lavoie; Mathieu Mrineau; Ettore Merlo; Pascal Potvin
Context: This paper presents a novel experiment focused on detecting and analyzing clones in test suites written in TTCN-3, a standard telecommunication test script language, for different industrial projects.Objective: This paper investigates frequencies, types, and similarity distributions of TTCN-3 clones in test scripts from three industrial projects in telecommunication. We also compare the distribution of clones in TTCN-3 test scripts with the distribution of clones in C/C++ and Java projects from the telecommunication domain. We then perform a statistical analysis to validate the significance of differences between these distributions.Method: Similarity is computed using CLAN, which compares metrics syntactically derived from script fragments. Metrics are computed from the Abstract Syntax Trees produced by a TTCN-3 parser called Titan developed by Ericsson as an Eclipse plugin. Finally, clone classification of similar script pairs is computed using the Longest Common Subsequence algorithm on token types and token images.Results: This paper presents figures and diagrams reporting TTCN-3 clone frequencies, types, and similarity distributions. We show that the differences between the distribution of clones in test scripts and the distribution of clones in applications are statistically significant. We also present and discuss some lessons that can be learned about the transferability of technology from this study.Conclusion: About 24% of fragments in the test suites are cloned, which is a very high proportion of clones compared to what is generally found in source code. The difference in proportion of Type-1 and Type-2 clones is statistically significant and remarkably higher in TTCN-3 than in source code. Type-1 and Type-2 clones represent 82.9% and 15.3% of clone fragments for a total of 98.2%. Within the projects this study investigated, this represents more and easier potential re-factoring opportunities for test scripts than for code.