Fabiano Tarlao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabiano Tarlao is active.

Explore More

Publication

Featured researches published by Fabiano Tarlao.

IEEE Transactions on Knowledge and Data Engineering | 2016

Inference of Regular Expressions for Text Extraction from Examples

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Presents corrections to typographical errors in the paper, “Inference of regular expressions for text extraction from examples,” (Bartoli, A., et al), IEEE Trans. Knowl. Data Eng., vol. 28, no. 5, pp. 1217–1230, May 2016.

parallel problem solving from nature | 2016

Syntactical Similarity Learning by Means of Grammatical Evolution

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Several research efforts have shown that a similarity function synthesized from examples may capture an application-specific similarity criterion in a way that fits the application needs more effectively than a generic distance definition. In this work, we propose a similarity learning algorithm tailored to problems of syntax-based entity extraction from unstructured text streams. The algorithm takes in input pairs of strings along with an indication of whether they adhere or not adhere to the same syntactic pattern. Our approach is based on Grammatical Evolution and explores systematically a similarity definition space including all functions that may be expressed with a specialized, simple language that we have defined for this purpose. We assessed our proposal on patterns representative of practical applications. The results suggest that the proposed approach is indeed feasible and that the learned similarity function is more effective than the Levenshtein distance and the Jaccard similarity index.

european conference on genetic programming | 2015

Learning Text Patterns Using Separate-and-Conquer Genetic Programming

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

The problem of extracting knowledge from large volumes of unstructured textual information has become increasingly important. We consider the problem of extracting text slices that adhere to a syntactic pattern and propose an approach capable of generating the desired pattern automatically, from a few annotated examples. Our approach is based on Genetic Programming and generates extraction patterns in the form of regular expressions that may be input to existing engines without any post-processing. Key feature of our proposal is its ability of discovering automatically whether the extraction task may be solved by a single pattern, or rather a set of multiple patterns is required. We obtain this property by means of a separate-and-conquer strategy: once a candidate pattern provides adequate performance on a subset of the examples, the pattern is inserted into the set of final solutions and the evolutionary search continues on a smaller set of examples including only those not yet solved adequately. Our proposal outperforms an earlier state-of-the-art approach on three challenging datasets.

genetic and evolutionary computation conference | 2014

Playing regex golf with genetic programming

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Regex golf has recently emerged as a specific kind of code golf, i.e., unstructured and informal programming competitions aimed at writing the shortest code solving a particular problem. A problem in regex golf consists in writing the shortest regular expression which matches all the strings in a given list and does not match any of the strings in another given list. The regular expression is expected to follow the syntax of a specified programming language, e.g., Javascript or PHP. In this paper, we propose a regex golf player internally based on Genetic Programming. We generate a population of candidate regular expressions represented as trees and evolve such population based on a multi-objective fitness which minimizes the errors and the length of the regular expression. We assess experimentally our player on a popular regex golf challenge consisting of 16 problems and compare our results against those of a recently proposed algorithm---the only one we are aware of.Our player obtains scores which improve over the baseline and are highly competitive also with respect to human players. The time for generating a solution is usually in the order of tens minutes, which is arguably comparable to the time required by human players.

availability, reliability and security | 2016

Your Paper has been Accepted, Rejected, or Whatever: Automatic Generation of Scientific Paper Reviews

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Peer review is widely viewed as an essential step for ensuring scientific quality of a work and is a cornerstone of scholarly publishing. On the other hand, the actors involved in the publishing process are often driven by incentives which may, and increasingly do, undermine the quality of published work, especially in the presence of unethical conduits. In this work we investigate the feasibility of a tool capable of generating fake reviews for a given scientific paper automatically. While a tool of this kind cannot possibly deceive any rigorous editorial procedure, it could nevertheless find a role in several questionable scenarios and magnify the scale of scholarly frauds.

IEEE Transactions on Systems, Man, and Cybernetics | 2018

Active Learning of Regular Expressions for Entity Extraction

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

We consider the automatic synthesis of an entity extractor, in the form of a regular expression, from examples of the desired extractions in an unstructured text stream. This is a long-standing problem for which many different approaches have been proposed, which all require the preliminary construction of a large dataset fully annotated by the user. In this paper, we propose an active learning approach aimed at minimizing the user annotation effort: the user annotates only one desired extraction and then merely answers extraction queries generated by the system. During the learning process, the system digs into the input text for selecting the most appropriate extraction query to be submitted to the user in order to improve the current extractor. We construct candidate solutions with genetic programming (GP) and select queries with a form of querying-by-committee, i.e., based on a measure of disagreement within the best candidate solutions. All the components of our system are carefully tailored to the peculiarities of active learning with GP and of entity extraction from unstructured text. We evaluate our proposal in depth, on a number of challenging datasets and based on a realistic estimate of the user effort involved in answering each single query. The results demonstrate high accuracy with significant savings in terms of computational effort, annotated characters, and execution time over a state-of-the-art baseline.

web intelligence | 2016

Best Dinner Ever!!!: Automatic Generation of Restaurant Reviews with LSTM-RNN

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Dennis Morello; Fabiano Tarlao

Consumer reviews are an important information resource for people and a fundamental part of everyday decision-making. Product reviews have an economical relevance which may attract malicious people to commit a review fraud, by writing false reviews. In this work, we investigate the possibility of generating hundreds of false restaurant reviews automatically and very quickly. We propose and evaluate a method for automatic generation of restaurant reviews tailored to the desired rating and restaurant category. A key feature of our work is the experimental evaluation which involves human users. We assessed the ability of our method to actually deceive users by presenting to them sets of reviews including a mix of genuine reviews and of machine-generated reviews. Users were not aware of the aim of the evaluation and the existence of machine-generated reviews. As it turns out, it is feasible to automatically generate realistic reviews which can manipulate the opinion of the user.

IEEE Intelligent Systems | 2016

Can a Machine Replace Humans in Building Regular Expressions? A Case Study

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Regular expressions are routinely used in a variety of different application domains. But building a regular expression involves a considerable amount of skill, expertise, and creativity. In this work, the authors investigate whether a machine can surrogate these qualities and automatically construct regular expressions for tasks of realistic complexity. They discuss a large-scale experiment involving more than 1,700 users on 10 challenging tasks. The authors compare the solutions constructed by these users to those constructed by a tool based on genetic programming that they recently developed and made publicly available. The quality of automatically constructed solutions turned out to be similar to the quality of those constructed by the most skilled user group; the time for automatic construction was likewise similar to the time required by human users.

acm symposium on applied computing | 2016

Active learning approaches for learning regular expressions with genetic programming

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user annotates only one desired extraction and then merely answers extraction queries generated by the system. The resulting framework is attractive because it is the system, not the user, which digs out the data in search of the samples most suitable to the specific learning task. We tailor our proposals to a state-of-the-art learner based on Genetic Programming and we assess them experimentally on a number of challenging tasks of realistic complexity. The results indicate that active learning is indeed a viable framework in this application domain and may thus significantly decrease the amount of costly annotation effort required.

Applied Soft Computing | 2016

Predicting the effectiveness of pattern-based entity extractor inference

Alberto Bartoli; Andrea De Lorenzo; Eric Medvet; Fabiano Tarlao

Graphical abstractDisplay Omitted HighlightsPattern-based entity extraction is an essential component of many digital workflows.No accuracy prediction methods exist for extractor generators from examples.We propose a predictor based on string similarity and machine learning.In-depth experiments on real and challenging data give promising results. An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.

Explore More