Andrea De Lorenzo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrea De Lorenzo is active.

Explore More

Publication

Featured researches published by Andrea De Lorenzo.

availability, reliability and security | 2015

Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware

Gerardo Canfora; Andrea De Lorenzo; Eric Medvet; Francesco Mercaldo; Corrado Aaron Visaggio

With the wide diffusion of smartphones and their usage in a plethora of processes and activities, these devices have been handling an increasing variety of sensitive resources. Attackers are hence producing a large number of malware applications for Android (the most spread mobile platform), often by slightly modifying existing applications, which results in malware being organized in families. Some works in the literature showed that opcodes are informative for detecting malware, not only in the Android platform. In this paper, we investigate if frequencies of ngrams of opcodes are effective in detecting Android malware and if there is some significant malware family for which they are more or less effective. To this end, we designed a method based on state-of-the-art classifiers applied to frequencies of opcodes ngrams. Then, we experimentally evaluated it on a recent dataset composed of 11120 applications, 5560 of which are malware belonging to several different families. Results show that an accuracy of 97% can be obtained on the average, whereas perfect detection rate is achieved for more than one malware family.

IEEE Computer | 2014

Automatic Synthesis of Regular Expressions from Examples

Alberto Bartoli; Giorgio Davanzo; Andrea De Lorenzo; Eric Medvet; Enrico Sorio

A system that can produce regular expressions from user-provided examples performed with high precision and recall in 12 text-extraction tasks from real-world datasets, demonstrating the effectiveness of text extraction based on genetic programming.

genetic and evolutionary computation conference | 2012

Automatic generation of regular expressions from examples with genetic programming

Alberto Bartoli; Giorgio Davanzo; Andrea De Lorenzo; Marco Mauri; Eric Medvet; Enrico Sorio

We explore the practical feasibility of a system based on genetic programming (GP) for the automatic generation of regular expressions. The user describes the desired task by providing a set of labeled examples, in the form of text lines. The system uses these examples for driving the evolutionary search towards a regular expression suitable for the specified task. Usage of the system should require neither familiarity with GP nor with regular expressions syntax. In our GP implementation each individual represents a syntactically correct regular expression. We performed an experimental evaluation on two different extraction tasks applied to real-world datasets and obtained promising results in terms of precision and recall, even in comparison to an earlier state-of-the-art proposal.

IEEE Transactions on Knowledge and Data Engineering | 2016

Inference of Regular Expressions for Text Extraction from Examples

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Presents corrections to typographical errors in the paper, “Inference of regular expressions for text extraction from examples,” (Bartoli, A., et al), IEEE Trans. Knowl. Data Eng., vol. 28, no. 5, pp. 1217–1230, May 2016.

parallel problem solving from nature | 2016

Syntactical Similarity Learning by Means of Grammatical Evolution

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Several research efforts have shown that a similarity function synthesized from examples may capture an application-specific similarity criterion in a way that fits the application needs more effectively than a generic distance definition. In this work, we propose a similarity learning algorithm tailored to problems of syntax-based entity extraction from unstructured text streams. The algorithm takes in input pairs of strings along with an indication of whether they adhere or not adhere to the same syntactic pattern. Our approach is based on Grammatical Evolution and explores systematically a similarity definition space including all functions that may be expressed with a specialized, simple language that we have defined for this purpose. We assessed our proposal on patterns representative of practical applications. The results suggest that the proposed approach is indeed feasible and that the learned similarity function is more effective than the Levenshtein distance and the Jaccard similarity index.

european conference on genetic programming | 2015

Learning Text Patterns Using Separate-and-Conquer Genetic Programming

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

The problem of extracting knowledge from large volumes of unstructured textual information has become increasingly important. We consider the problem of extracting text slices that adhere to a syntactic pattern and propose an approach capable of generating the desired pattern automatically, from a few annotated examples. Our approach is based on Genetic Programming and generates extraction patterns in the form of regular expressions that may be input to existing engines without any post-processing. Key feature of our proposal is its ability of discovering automatically whether the extraction task may be solved by a single pattern, or rather a set of multiple patterns is required. We obtain this property by means of a separate-and-conquer strategy: once a candidate pattern provides adequate performance on a subset of the examples, the pattern is inserted into the set of final solutions and the evolutionary search continues on a smaller set of examples including only those not yet solved adequately. Our proposal outperforms an earlier state-of-the-art approach on three challenging datasets.

web intelligence | 2011

Automatic Face Annotation in News Images by Mining the Web

Eric Medvet; Alberto Bartoli; Giorgio Davanzo; Andrea De Lorenzo

We consider the automatic annotation of faces of people mentioned in news. News stories provide a constant flow of potentially useful image indexing information, due to their huge diffusion on the web and to the involvement of human operators in selecting relevant images for the stories. In this work we investigate the possibility of actually exploiting this wealth of information. We propose and evaluate a system for automatic face annotation of image news that is fully unsupervised and does not require any prior knowledge about topic or people involved. Key feature of our proposal is that it attempts to identify the essential piece of information -- how a person with a given name looks like -- by querying popular image search engines. Mining the web allows overcoming intrinsic limitations of approaches built above a predefined collection of stories: our system can potentially annotate people never handled before since its knowledge base is constantly expanded, as long as search engines keep on indexing the web. On the other hand, leveraging on image search engines forces to cope with the substantial amount of noise in search engine results. Our contribution shows experimentally that automatic face annotation may indeed be achieved based entirely on knowledge that lives in the web.

genetic and evolutionary computation conference | 2014

Playing regex golf with genetic programming

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Regex golf has recently emerged as a specific kind of code golf, i.e., unstructured and informal programming competitions aimed at writing the shortest code solving a particular problem. A problem in regex golf consists in writing the shortest regular expression which matches all the strings in a given list and does not match any of the strings in another given list. The regular expression is expected to follow the syntax of a specified programming language, e.g., Javascript or PHP. In this paper, we propose a regex golf player internally based on Genetic Programming. We generate a population of candidate regular expressions represented as trees and evolve such population based on a multi-objective fitness which minimizes the errors and the length of the regular expression. We assess experimentally our player on a popular regex golf challenge consisting of 16 problems and compare our results against those of a recently proposed algorithm---the only one we are aware of.Our player obtains scores which improve over the baseline and are highly competitive also with respect to human players. The time for generating a solution is usually in the order of tens minutes, which is arguably comparable to the time required by human players.

european conference on genetic programming | 2011

GP-based electricity price forecasting

Alberto Bartoli; Giorgio Davanzo; Andrea De Lorenzo; Eric Medvet

The electric power market is increasingly relying on competitive mechanisms taking the form of day-ahead auctions, in which buyers and sellers submit their bids in terms of prices and quantities for each hour of the next day. Methods for electricity price forecasting suitable for these contexts are crucial to the success of any bidding strategy. Such methods have thus become very important in practice, due to the economic relevance of electric power auctions. In this work we propose a novel forecasting method based on Genetic Programming. Key feature of our proposal is the handling of outliers, i.e., regions of the input space rarely seen during the learning. Since a predictor generated with Genetic Programming can hardly provide acceptable performance in these regions, we use a classifier that attempts to determine whether the system is shifting toward a difficult-to-learn region. In those cases, we replace the prediction made by Genetic Programming by a constant value determined during learning and tailored to the specific subregion expected. We evaluate the performance of our proposal against a challenging baseline representative of the state-of-the-art. The baseline analyzes a real-world dataset by means of a number of different methods, each calibrated separately for each hour of the day and recalibrated every day on a progressively growing learning set. Our proposal exhibits smaller prediction error, even though we construct one single model, valid for each hour of the day and used unmodified across the entire testing set. We believe that our results are highly promising and may open a broad range of novel solutions.

availability, reliability and security | 2016

Your Paper has been Accepted, Rejected, or Whatever: Automatic Generation of Scientific Paper Reviews

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Peer review is widely viewed as an essential step for ensuring scientific quality of a work and is a cornerstone of scholarly publishing. On the other hand, the actors involved in the publishing process are often driven by incentives which may, and increasingly do, undermine the quality of published work, especially in the presence of unethical conduits. In this work we investigate the feasibility of a tool capable of generating fake reviews for a given scientific paper automatically. While a tool of this kind cannot possibly deceive any rigorous editorial procedure, it could nevertheless find a role in several questionable scenarios and magnify the scale of scholarly frauds.

Explore More