Gaston L'Huillier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gaston L'Huillier is active.

Explore More

Publication

Featured researches published by Gaston L'Huillier.

knowledge discovery and data mining | 2010

Topic-based social network analysis for virtual communities of interests in the Dark Web

Gaston L'Huillier; Sebastián A. Ríos; H. Alvarez; Felipe Aguilera

The study of extremist groups and their interaction is a crucial task in order to maintain homeland security and peace. Tools such as social networks analysis and text mining have contributed to the understanding of this kind of groups in order to develop counter-terrorism applications. This work addresses the topic-based community key members extraction problem, for which our method combines both text mining and social network analysis techniques. This is achieved by first applying latent Dirichlet allocation to build two topic-based social networks: one social network oriented towards the thread creator point-of-view, and the other one oriented towards the repliers of the overall forum. Then, by using different Social Network Analysis measures, topic-based key members are evaluated using as benchmark a social network built using the plain documents. Experiments were performed using an English language based forum available in the Dark Web portal.

intelligence and security informatics | 2010

Latent semantic analysis and keyword extraction for phishing classification

Gaston L'Huillier; Alejandro Hevia; Richard Weber; Sebastián A. Ríos

Phishing email fraud has been considered as one of the main cyber-threats over the last years. Its development has been closely related to social engineering techniques, where different fraud strategies are used to deceit a naïve email user. In this work, a latent semantic analysis and text mining methodology is proposed for the characterisation of such strategies, and further classification using supervised learning algorithms. Results obtained showed that the feature set obtained in this work is competitive against previous phishing feature extraction methodologies, achieving promising results over different benchmark machine learning classification techniques.

Sigkdd Explorations | 2011

Topic-based social network analysis for virtual communities of interests in the dark web

Gaston L'Huillier; H. Alvarez; Sebastián A. Ríos; Felipe Aguilera

The study of extremist groups and their interaction is a crucial task in order to maintain homeland security and peace. Tools such as social networks analysis and text mining have contributed to their understanding in order to develop counter-terrorism applications. This work addresses the topic-based community key-members extraction problem, for which our method combines both text mining and social network analysis techniques. This is achieved by first applying latent Dirichlet allocation to build two topic-based social networks in online forums: one social network oriented towards the thread creator point-of-view, and the other is oriented towards the repliers of the overall forum. Then, by using different network analysis measures, topic-based key members are evaluated using as benchmark a social network built a plain representation of the network of posts. Experiments were successfully performed using an English language based forum available in the Dark Web portal.

Information Fusion | 2014

Detecting trends on the Web: A multidisciplinary approach

Rodrigo Dueñas-Fernández; Juan D. Velásquez; Gaston L'Huillier

This paper introduces a framework for trend modeling and detection on the Web through the usage of Opinion Mining and Topic Modeling tools based on the fusion of freely available information. This framework consists of a four step model that runs periodically: crawl a set of predefined sources of documents; search for potential sources and extract topics from the retrieved documents; retrieve opinionated documents from social networks for each detected topic and extract sentiment information from them. The proposed framework was applied to a set of 20 sources of documents over a period of 8 months. After the analysis period and that the proposed experiments were run, an F-Measure of 0.56 was obtained for the detection of significant events, implying that the proposed framework is a feasible model of how trends could be represented through the analysis of documents freely available on the Web.

string processing and information retrieval | 2010

Hypergeometric language model and zipf-like scoring function for web document similarity retrieval

Felipe Bravo-Marquez; Gaston L'Huillier; Sebastián A. Ríos; Juan D. Velásquez

The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models are used as a random query generator for the given document. Queries are submitted to a customizable set of Web search engines. Once all results obtained are gathered, its evaluation is determined by a proposed scoring function based on the Zipf law. Results obtained showed that the proposed methodology for query generation and scoring procedure solves the problem with acceptable levels of precision.

web intelligence | 2011

A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records

Felipe Bravo-Marquez; Gaston L'Huillier; Sebastián A. Ríos; Juan D. Velásquez

The retrieval of similar documents from the Web using documents as input instead of key-term queries is not currently supported by traditional Web search engines. One approach for solving the problem consists of fingerprint the documents content into a set of queries that are submitted to a list of Web search engines. Afterward, results are merged, their URLs are fetched and their content is compared with the given document using text comparison algorithms. However, the action of requesting results to multiple web servers could take a significant amount of time and effort. In this work, a similarity function between the given document and retrieved results is estimated. The function uses as variables features that come from information provided by search engine results records, like rankings, titles and snippets. Avoiding therefore, the bottleneck of requesting external Web Servers. We created a collection of around 10,000 search engine results by generating queries from 2,000 crawled Web documents. Then we fitted the similarity function using the cosine similarity between the input and results content as the target variable. The execution time between the exact and approximated solution was compared. Results obtained for our approximated solution showed a reduction of computational time of 86% at an acceptable level of precision with respect to the exact solution of the web document retrieval problem.

international conference hybrid intelligent systems | 2008

A Hybrid System for Probability Estimation in Multiclass Problems Combining SVMs and Neural Networks

Cristián Bravo; Jose Luis Lobato; Richard Weber; Gaston L'Huillier

This paper addresses the problem of probability estimation in multiclass classification tasks combining two well known data mining techniques: support vector machines and neural networks. We present an algorithm which uses both techniques in a two-step procedure. The first step employs support vector machines within a one-vs-all reduction from multiclass to binary approach to obtain the distances between each observation and the support vectors representing the classes. The second step uses these distances as inputs for a neural network, built with an entropy cost function and softmax transfer function for the output layer where class membership is used for training. Consequently, this network estimates probabilities of class membership for new observations. A benchmark using different databases demonstrates that the proposed algorithm is highly competitive with the most recent techniques for multiclass probability estimation.

web intelligence | 2011

Enhancing Community Discovery and Characterization in VCoP Using Topic Models

Lautaro Cuadra; Sebasti´n A. Rios; Gaston L'Huillier

The identification of communities in social networks is a common problem that researchers have been dealing using network analysis properties. However, in environments where community members are connected by digital documents, most researchers have either emphasize to solve the community discovery problem computing structural properties of networks, ignoring the underlying semantic information from digital documents. In this paper, we propose a novel approach to combine traditional network analysis methods for community detection with text mining techniques. This way, extracted communities can be labeled according to latent semantic information within documents, called topics. Our proposal was evaluated in Plexilandia, a virtual community of practice with more than 2,500 members and 9 years of commentaries.

international conference on knowledge based and intelligent information and engineering systems | 2011

Outlier-based approaches for intrinsic and external plagiarism detection

Gabriel Oberreuter; Gaston L'Huillier; Sebastián A. Ríos; Juan D. Velásquez

Plagiarism detection, one of the main problems that educational institutions have been dealing with since the massification of Internet, can be considered as a classification problem using both self-based information and text processing algorithms whose computational complexity is intractable without using space search reduction algorithms. First, self-based information algorithms treat plagiarism detection as an outlier detection problem for which the classifier must decide plagiarism using only the text in a given document. Then, external plagiarism detection uses text matching algorithms where it is fundamental to reduce the matching space with text search space reduction techniques, which can be represented as another outlier detection problem. The main contribution of this work is the inclusion of text outlier detection methodologies to enhance both intrinsic and external plagiarism detection. Results shows that our approach is highly competitive with respect to the leading research teams in plagiarism detection.

Data Mining and Knowledge Discovery | 2017

Adversarial classification using signaling games with an application to phishing detection

Nicolás Figueroa; Gaston L'Huillier; Richard Weber

In adversarial classification, the interaction between classifiers and adversaries can be modeled as a game between two players. It is natural to model this interaction as a dynamic game of incomplete information, since the classifier does not know the exact intentions of the different types of adversaries (senders). For these games, equilibrium strategies can be approximated and used as input for classification models. In this paper we show how to model such interactions between players, as well as give directions on how to approximate their mixed strategies. We propose perceptron-like machine learning approximations as well as novel Adversary-Aware Online Support Vector Machines. Results in a real-world adversarial environment show that our approach is competitive with benchmark online learning algorithms, and provides important insights into the complex relations among players.

Explore More