Pedro Gabriel Ferreira

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro Gabriel Ferreira is active.

Explore More

Publication

Featured researches published by Pedro Gabriel Ferreira.

discovery science | 2006

Mining approximate motifs in time series

Pedro Gabriel Ferreira; Paulo J. Azevedo; Cândida G. Silva; Rui M. M. Brito

The problem of discovering previously unknown frequent patterns in time series, also called motifs, has been recently introduced. A motif is a subseries pattern that appears a significant number of times. Results demonstrate that motifs may provide valuable insights about the data and have a wide range of applications in data mining tasks. The main motivation for this study was the need to mine time series data from protein folding/unfolding simulations. We propose an algorithm that extracts approximate motifs, i.e. motifs that capture portions of time series with a similar and eventually symmetric behavior. Preliminary results on the analysis of protein unfolding data support this proposal as a valuable tool. Additional experiments demonstrate that the application of utility of our algorithm is not limited to this particular problem. Rather it can be an interesting tool to be applied in many real world problems.

international conference on data mining | 2006

Establishing fraud detection patterns based on signatures

Pedro Gabriel Ferreira; Ronnie Alves; Orlando Belo; Luís Cortesão

All over the world we have been assisting to a significant increase of the telecommunication systems usage. People are faced day after day with strong marketing campaigns seeking their attention to new telecommunication products and services. Telecommunication companies struggle in a high competitive business arena. It seems that their efforts were well done, because customers are strongly adopting the new trends and use (and abuse) systematically communication services in their quotidian. Although fraud situations are rare, they are increasing and they correspond to a large amount of money that telecommunication companies lose every year. In this work, we studied the problem of fraud detection in telecommunication systems, especially the cases of superimposed fraud, providing an anomaly detection technique, supported by a signature schema. Our main goal is to detect deviate behaviors in useful time, giving better basis to fraud analysts to be more accurate in their decisions in the establishment of potential fraud situations.

european conference on machine learning | 2005

Protein sequence pattern mining with constraints

Pedro Gabriel Ferreira; Paulo J. Azevedo

Considering the characteristics of biological sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. The algorithm exhibits a high adaptability, yielding a smooth and direct introduction of various types of features into the mining process, namely the extraction of rigid and arbitrary gap patterns. Both breadth or a depth first traversal are possible. The experimental evaluation, in synthetic and real life protein databases, has shown that our algorithm has superior performance to state-of-the art algorithms. The use of constraints has also proved to be a very useful tool to specify user interesting patterns.

portuguese conference on artificial intelligence | 2005

Protein sequence classification through relevant sequence mining and bayes classifiers

Pedro Gabriel Ferreira; Paulo J. Azevedo

We tackle the problem of sequence classification using relevant subsequences found in a dataset of protein labelled sequences. A subsequence is relevant if it is frequent and has a minimal length. For each query sequence a vector of features is obtained. The features consist in the number and average length of the relevant subsequences shared with each of the protein families. Classification is performed by combining these features in a Bayes Classifier. The combination of these characteristics results in a multi-class and multi-domain method that is exempt of data transformation and background knowledge. We illustrate the performance of our method using three collections of protein datasets. The performed tests showed that the method has an equivalent performance to state of the art methods in protein classification.

Algorithms for Molecular Biology | 2007

Evaluating deterministic motif significance measures in protein databases

Pedro Gabriel Ferreira; Paulo J. Azevedo

BackgroundAssessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations.ResultsFrom the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs.ConclusionIn this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases.

computational intelligence and data mining | 2007

Evaluating Protein Motif Significance Measures: A Case Study on Prosite Patterns

Pedro Gabriel Ferreira; Paulo J. Azevedo

The existence of preserved subsequences in a set of related protein sequences suggests that they might play a structural and functional role in proteins mechanisms. Due to its exploratory approach, the mining process tends to deliver a large number of motifs. Therefore it is critical to release methods that identify relevant significant motifs. Many measures of interest and significance have been proposed. However, since motifs have a wide range of applications, how to choose the appropriate significance measures is application dependent. Some measures show consistent results being highly correlated, while others show disagreements. In this paper we review existent measures and study their behavior in order to assist the selection of the most appropriate set of measures. An experimental evaluation of the measures for high quality patterns from the Prosite database is presented

computational intelligence in bioinformatics and computational biology | 2007

A Closer Look on Protein Unfolding Simulations through Hierarchical Clustering

Pedro Gabriel Ferreira; Candida Silva; Rui M. M. Brito; Paulo J. Azevedo

Understanding protein folding and unfolding mechanisms are a central problem in molecular biology. Data obtained from molecular dynamics unfolding simulations may provide valuable insights for a better understanding of these mechanisms. Here, we propose the application of an augmented version of hierarchical clustering analysis to detect clusters of amino-acid residues with similar behavior in protein unfolding simulations. These clusters hold similar global pattern behavior of solvent accessible surface area (SASA) variation in unfolding simulations of the protein transthyretin (TTR). Classical hierarchical clustering was applied to build a dendrogram based on the SASA variation of each amino-acid residue. The dendrogram was enriched with background information on the amino-acid residues, enabling the extraction of sub-clusters with well differentiated characteristics

computational intelligence methods for bioinformatics and biostatistics | 2009

Spatial Clustering of Molecular Dynamics Trajectories in Protein Unfolding Simulations

Pedro Gabriel Ferreira; Cândida G. Silva; Paulo J. Azevedo; Rui M. M. Brito

Molecular dynamics simulations is a valuable tool to study protein unfolding in silico . Analyzing the relative spatial position of the residues during the simulation may indicate which residues are essential in determining the protein structure. We present a method, inspired by a popular data mining technique called Frequent Itemset Mining, that clusters sets of amino acid residues with a synchronized trajectory during the unfolding process. The proposed approach has several advantages over traditional hierarchical clustering.

international conference on data mining | 2012

Detecting abnormal patterns in call graphs based on the aggregation of relevant vertex measures

Ronnie Alves; Pedro Gabriel Ferreira; Joel Ribeiro; Orlando Belo

Graphs are a very important abstraction to model complex structures and respective interactions, with a broad range of applications including web analysis, telecommunications, chemical informatics and bioinformatics. In this work we are interested in the application of graph mining to identify abnormal behavior patterns from telecom Call Detail Records (CDRs). Such behaviors could also be used to model essential business tasks in telecom, for example churning, fraud, or marketing strategies, where the number of customers is typically quite large. Therefore, it is important to rank the most interesting patterns for further analysis. We propose a vertex relevant ranking score as a unified measure for focusing the search of abnormal patterns in weighted call graphs based on CDRs. Classical graph-vertex measures usually expose a quantitative perspective of vertices in telecom call graphs. We aggregate wellknown vertex measures for handling attribute-based information usually provided by CDRs. Experimental evaluation carried out with real data streams, from a local mobile telecom company, showed us the feasibility of the proposed strategy.

Archive | 2006