Is this you? Create Your Porfile

Lourdes Araujo

National University of Distance Education

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lourdes Araujo is active.

Explore More

Publication

Featured researches published by Lourdes Araujo.

Expert Systems With Applications | 2013

Detecting malicious tweets in trending topics using a statistical analysis of language

Juan Martinez-Romo; Lourdes Araujo

Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybodys lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34K trending topics and 20million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-of-the-art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.

adversarial information retrieval on the web | 2009

Web spam identification through language model analysis

Juan Martinez-Romo; Lourdes Araujo

This paper applies a language model approach to different sources of information extracted from a Web page, in order to provide high quality indicators in the detection of Web Spam. Two pages linked by a hyperlink should be topically related, even though this were a weak contextual relation. For this reason we have analysed different sources of information of a Web page that belongs to the context of a link and we have applied Kullback-Leibler divergence on them for characterising the relationship between two linked pages. Moreover, we combine some of these sources of information in order to obtain richer language models. Given the different nature of internal and external links, in our study we also distinguished these types of links getting a significant improvement in classification tasks. The result is a system that improves the detection of Web Spam on two large and public datasets such as WEBSPAM-UK2006 and WEBSPAM-UK2007.

IEEE Transactions on Evolutionary Computation | 2011

Diversity Through Multiculturality: Assessing Migrant Choice Policies in an Island Model

Lourdes Araujo; Juan J. Merelo

The natural mate-selection behavior of preferring individuals which are somewhat (but not too much) different has been proved to increase the resistance to infection of the resulting offspring, and thus fitness. Inspired by these results we have investigated the improvement obtained from diversity induced by differences between individuals sent and received and the resident population in an island model, by comparing different migration policies, including our proposed multikulti methods, which choose the individuals that are going to be sent to other nodes based on the principle of multiculturality; the individual sent should be different enough to the target population, which will be represented through a proxy string (computed in several possible ways) in the emitting population. We have checked a set of policies following these principles on two discrete optimization problems of diverse difficulty for different sizes and number of nodes, and found that, in average or in median, multikulti policies outperform the usual policy of sending the best or a random individual; however, the size of this advantage changes with the number of nodes involved and the difficulty of the problem, tending to be greater as the number of nodes increases. The success of this kind of policies will be explained via the measurement of entropy as a representation of population diversity for the policies tested.

IEEE Transactions on Information Forensics and Security | 2010

Web Spam Detection: New Classification Features Based on Qualified Link Analysis and Language Models

Lourdes Araujo; Juan Martinez-Romo

Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links. We consider, for instance, the ability of a search engine to find, using information provided by the page for a given link, the page that the link actually points at. This can be regarded as indicative of the link reliability. We also check the coherence between a page and another one pointed at by any of its links. Two pages linked by a hyperlink should be semantically related, by at least a weak contextual relation. Thus, we apply an LM approach to different sources of information from a Web page that belongs to the context of a link, in order to provide high-quality indicators of Web spam. We have specifically applied the Kullback-Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. The result is a system that significantly improves the detection of Web spam using fewer features, on two large and public datasets SUchasWEBSPAM-UK2006 and WEBSPAM-UK2007.

Information Processing Letters | 2006

Natural language tagging with genetic algorithms

Enrique Alba; Gabriel Luque; Lourdes Araujo

This work analyzes the relative advantages of different metaheuristic approaches to the well-known natural language processing problem of part-of-speech tagging. This consists of assigning to each word of a text its disambiguated part-of-speech according to the context in which the word is used. We have applied a classic genetic algorithm (GA), a CHC algorithm, and a simulated annealing (SA). Different ways of encoding the solutions to the problem (integer and binary) have been studied, as well as the impact of using parallelism for each of the considered methods. We have performed experiments on different linguistic corpora and compared the results obtained against other popular approaches plus a classic dynamic programming algorithm. Our results claim for the high performances achieved by the parallel algorithms compared to the sequential ones, and state the singular advantages for every technique. Our algorithms and some of its components can be used to represent a new set of state-of-the-art procedures for complex tagging scenarios.

european conference on evolutionary computation in combinatorial optimization | 2008

Improving query expansion with stemming terms: a new genetic algorithm approach

Lourdes Araujo; José R. Pérez-Agüera

Nowadays, searching information in the web or in any kind of document collection has become one of the most frequent activities. However, user queries can be formulated in a way that hinder the recovery of the requested information. The objective of automatic query transformation is to improve the quality of the recovered information. This paper describes a new genetic algorithm used to change the set of terms that compose a user query without user supervision, by complementing an expansion process based on the use of a morphological thesaurus. We apply a stemming process to obtain the stem of a word, for which the thesaurus provides its different forms. The set of candidate query terms is constructed by expanding each term in the original query with the terms morphologically related. The genetic algorithm is in charge of selecting the terms of the final query from the candidate term set. The selection process is based on the retrieval results obtained when searching with different combination of candidate terms. We have obtained encouraging results, improving the performance of a standard set of tests.

IEEE Transactions on Evolutionary Computation | 2004

Symbiosis of evolutionary techniques and statistical natural language processing

Lourdes Araujo

Presents some applications of evolutionary programming to different tasks of natural language processing (NLP). First of all, the work defines a general scheme of application of evolutionary techniques to NLP, which gives the mainstream for the design of the elements of the algorithm. This scheme largely relies on the success of probabilistic approaches to NLP. Secondly, the scheme has been illustrated with two fundamental applications in NLP: tagging, i.e., the assignment of lexical categories to words and parsing, i.e., the determination of the syntactic structure of sentences. In both cases, the elements of the evolutionary algorithm are described in detail, as well as the results of different experiments carried out to show the viability of this evolutionary approach to deal with tasks as complex as those of NLP.

Journal of Logic Programming | 1997

A parallel prolog system for distributed memory

Lourdes Araujo; José J. Ruz

Abstract This paper presents a parallel execution system (PDP: Prolog Distributed Processor) for efficiently supporting both Independent_AND OR parallelism on distributed-memory multiprocessors. The system is composed of a set of workers with a hierarchical structure scheduler. Each worker operates on its own private memory and interprocessor communication is performed only by the passing of messages. The execution model follows a multisequential approach in order to maintain the sequential optimizations. Independent AND_parallelism is exploited following a fork-join approach and OR_parallelism is exploited following a recomputation approach. PDP deals with OR_under_AND parallelism by producing the solutions of a set of parallel goals in a distributed way, that is, by creating a new task for each element of the cross product. This approach has the advantage of avoiding both storing partial solutions and synchronizing workers, resulting in a largely increased performance. Different scheduling policies have been studied, and granularity controls have been introduced for each kind of parallelism. PDP has been implemented on a network of transputers and performance results show that PDP introduces very little overhead into sequential programs, and provides a high speedup for coarse-grain parallel programs.

parallel problem solving from nature | 2008

Testing the Intermediate Disturbance Hypothesis: Effect of Asynchronous Population Incorporation on Multi-Deme Evolutionary Algorithms

Juan J. Merelo; Antonio M. Mora; Pedro A. Castillo; Juan Luis Jiménez Laredo; Lourdes Araujo; Ken Sharman; Anna I. Esparcia-Alcázar; Eva Alfaro-Cid; Carlos Cotta

In P2P and volunteer computing environments, resources are not always available from the beginning to the end, getting incorporated into the experiment at any moment. Determining the best way of using these resources so that the exploration/exploitation balance is kept and used to its best effect is an important issue. The Intermediate Disturbance Hypothesis states that a moderate population disturbance (in any sense that could affect the population fitness) results in the maximum ecological diversity. In the line of this hypothesis, we will test the effect of incorporation of a second population in a two-population experiment. Experiments performed on two combinatorial optimization problems, MMDP and P-Peaks , show that the highest algorithmic effect is produced if it is done in the middle of the evolution of the first population; starting them at the same time or towards the end yields no improvement or an increase in the number of evaluations needed to reach a solution. This effect is explained in the paper, and ascribed to the intermediate disturbanceproduced by first-population immigrants in the second population.

PLOS ONE | 2012

Local-Based Semantic Navigation on a Networked Representation of Information

Jose A. Capitan; Javier Borge-Holthoefer; Sergio Gómez; Juan Martinez-Romo; Lourdes Araujo; José A. Cuesta; Alex Arenas

The size and complexity of actual networked systems hinders the access to a global knowledge of their structure. This fact pushes the problem of navigation to suboptimal solutions, one of them being the extraction of a coherent map of the topology on which navigation takes place. In this paper, we present a Markov chain based algorithm to tag networked terms according only to their topological features. The resulting tagging is used to compute similarity between terms, providing a map of the networked information. This map supports local-based navigation techniques driven by similarity. We compare the efficiency of the resulting paths according to their length compared to that of the shortest path. Additionally we claim that the path steps towards the destination are semantically coherent. To illustrate the algorithm performance we provide some results from the Simple English Wikipedia, which amounts to several thousand of pages. The simplest greedy strategy yields over an 80% of average success rate. Furthermore, the resulting content-coherent paths most often have a cost between one- and threefold compared to shortest-path lengths.

Explore More