Carlos Eduardo S. Pires

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlos Eduardo S. Pires is active.

Explore More

Publication

Featured researches published by Carlos Eduardo S. Pires.

international conference on data management in grid and p2p systems | 2009

A Semantic-Based Ontology Matching Process for PDMS

Carlos Eduardo S. Pires; Damires Souza; Thiago Pacheco; Ana Carolina Salgado

In Peer Data Management Systems (PDMS), ontology matching can be employed to reconcile peer ontologies and find correspondences between their elements. However, traditional approaches to ontology matching mainly rely on linguistic and/or structural techniques. In this paper, we propose a semantic-based ontology matching process which tries to overcome the limitations of traditional approaches by using semantics. To this end, we present a semantic matcher which identifies, besides the common types of correspondences (equivalence), some other ones (e.g., closeness). We also present an approach for determining a global similarity measure between two peer ontologies based on the identified similarity value of each correspondence. To clarify matters, we provide an example illustrating how the proposed approach can be used in a PDMS and some obtained experimental results.

international conference on data engineering | 2010

Summarizing ontology-based schemas in PDMS

Carlos Eduardo S. Pires; Paulo Sousa; Zoubida Kedad; Ana Carolina Salgado

Quickly understanding the content of a data source is very useful in several contexts. In a Peer Data Management System (PDMS), peers can be semantically clustered, each cluster being represented by a schema obtained by merging the local schemas of the peers in this cluster. In this paper, we present a process for summarizing schemas of peers participating in a PDMS. We assume that all the schemas are represented by ontologies and we propose a summarization algorithm which produces a summary containing the maximum number of relevant concepts and the minimum number of non-relevant concepts of the initial ontology. The relevance of a concept is determined using the notions of centrality and frequency. Since several possible candidate summaries can be identified during the summarization process, classical Information Retrieval metrics are employed to determine the best summary.

International Journal of Distributed Systems and Technologies | 2012

Ontology-Based Clustering in a Peer Data Management System

Carlos Eduardo S. Pires; Rocir Marcos Leite Santiago; Ana Carolina Salgado; Zoubida Kedad; Mokrane Bouzeghoub

Peer Data Management Systems PDMSs are advanced P2P applications in which each peer represents an autonomous data source making available an exported schema to be shared with other peers. Query answering in PDMSs can be improved if peers are efficiently disposed in the overlay network according to the similarity of their content. The set of peers can be partitioned into clusters, so as the semantic similarity among the peers participating into the same cluster is maximal. The creation and maintenance of clusters is a challenging problem in the current stage of development of PDMSs. This work proposes an incremental peer clustering process. The authors present a PDMS architecture designed to facilitate the connection of new peers according to their exported schema described by an ontology. The authors propose a clustering process and the underlying algorithm. The authors present and discuss some experimental results on peer clustering using the approach.

acm symposium on applied computing | 2015

Adaptive sorted neighborhood blocking for entity matching with MapReduce

Demetrio Gomes Mestre; Carlos Eduardo S. Pires; Dimas Cassimiro Nascimento

Cloud computing has proven to be a powerful ally to efficient parallel execution of data-intensive tasks such as Entity Matching (EM) in the era of Big Data. For this reason, studies about challenges and possible solutions of how EM can benefit from the cloud computing paradigm have become an important demand nowadays. In this paper, we investigate how the MapReduce programming model can be used to perform efficient parallel EM using a variation of the Sorted Neighborhood Method (SNM) that uses a varying size (adaptive) window. We propose MapReduce Duplicate Count Strategy (MR--DCS ++), an efficient MapReduce-based approach for the adaptive SNM, aiming to increase even more the performance of SNM. The evaluation results based on real-world datasets and cloud infrastructure show that our approach increases the performance of MapReduce-based SNM by providing better results for the EM execution time.

brazilian symposium on multimedia and the web | 2012

Improving location recommendations with temporal pattern extraction

Leandro Balby Marinho; Iury Nunes; Thomas Sandholm; Caio Nóbrega; Jordão Araújo; Carlos Eduardo S. Pires

A key challenge in mobile social media applications is how to present personalized content that is both geographically and temporally relevant. In this paper, we propose a new and generic temporal weighting function for improving location recommendations. First, we identify areas of interest to recommend by clustering geographic activity based on a trace of geotagged photos. Next, the clusters are temporally weighted using TF-IDF, in order to capture seasonality, and a decay scoring function to capture preference drift. Finally, these weights are combined with the cluster scores based on geographic relevance. We evaluate our recommender on a large dataset collected from Panoramio consisting of the top-100 most populated cities in the world and show that incorporating the proposed temporal weighting function improves recommendation quality.

Transactions on large-scale data- and knowledge-centered systems III | 2011

A semantic-based approach for data management in a P2P system

Damires Souza; Carlos Eduardo S. Pires; Zoubida Kedad; Patricia Azevedo Tedesco; Ana Carolina Salgado

Data management in P2P Systems is a challenging problem, due to the high number of autonomous and heterogeneous peers. In some Peer Data Management Systems (PDMSs), peers are semantically clustered in the overlay network. A peer joining the system is assigned to an appropriate cluster, and a query issued by a user at a given peer is routed to semantic neighbor clusters which can provide relevant answers. To help matters, semantic knowledge in the form of ontologies and contextual information has been used successfully to support the techniques used to manage data in such systems. Ontologies can be used to solve the heterogeneities between the peers, while contextual information allows a PDMS to deal with information that is acquired dynamically during the execution of a given query. The goal of this paper is to point out how the semantics provided by ontologies and contextual information can be used to enhance the results of two important data management issues in PDMSs, namely, peer clustering and query reformulation. We present a semantic-based approach to support these processes and we report some experimental results which show how semantics can improve them.

Applied Intelligence | 2016

Applying machine learning techniques for scaling out data quality algorithms in cloud computing environments

Dimas Cassimiro Nascimento; Carlos Eduardo S. Pires; Demetrio Gomes Mestre

Deduplication is the task of identifying the entities in a data set which refer to the same real world object. Over the last decades, this problem has been largely investigated and many techniques have been proposed to improve the efficiency and effectiveness of the deduplication algorithms. As data sets become larger, such algorithms may generate critical bottlenecks regarding memory usage and execution time. In this context, cloud computing environments have been used for scaling out data quality algorithms. In this paper, we investigate the efficacy of different machine learning techniques for scaling out virtual clusters for the execution of deduplication algorithms under predefined time restrictions. We also propose specific heuristics (Best Performing Allocation, Probabilistic Best Performing Allocation, Tunable Allocation, Adaptive Allocation and Sliced Training Data) which, together with the machine learning techniques, are able to tune the virtual cluster estimations as demands fluctuate over time. The experiments we have carried out using multiple scale data sets have provided many insights regarding the adequacy of the considered machine learning algorithms and proposed heuristics for tackling cloud computing provisioning.

acm symposium on applied computing | 2015

A data quality-aware cloud service based on metaheuristic and machine learning provisioning algorithms

Dimas C. Nascimento; Carlos Eduardo S. Pires; Demetrio Gomes Mestre

Cloud Computing as a service has become a topic of increasing interest. The outsourcing of duties and infrastructure to external parties became a crucial concept for many business models. In this paper we discuss the design and experimental evaluation of provisioning algorithms, in a Data Quality-aware Service (DQaS) context, that enables dynamic Data Quality Service Level Agreements (DQSLA) management and optimization of cloud resources. The DQaS has been designed to respond effectively to the DQSLA requirements of the service customers, by minimizing SLA penalties and provisioning the cloud infrastructure for the execution of data quality algorithms. An experimental evaluation of the proposed provisioning algorithms, carried out through simulation, has provided very encouraging results that confirm the adequacy of these algorithms in the DQaS context.

database and expert systems applications | 2011

Exploring Web Semantic Knowledge and User Feedback to Improve Ontology Matching

Thiago Pachêco Andrade Pereira; Carlos Eduardo S. Pires; Ana Carolina Salgado

The first step to integrate multiple data sources is to decrease the heterogeneity between their schemas. This task can be facilitated if data source schemas are represented as ontologies. In this case, ontology matching enables the identification of correspondences between elements of data source schemas. We propose an ontology matching approach to improve the accuracy of correspondences using an additional source of knowledge. We take advantage of the knowledge available on the Semantic Web obtaining an ontology to be used as background knowledge. Semantic rules are executed in order to discover new correspondences between ontology elements. Our approach also offers to the users the possibility of rejecting invalid correspondences. These rejections are stored and used on further executions of the ontology matching to remove invalid correspondences.

Journal of Parallel and Distributed Computing | 2017

Towards the efficient parallelization of multi-pass adaptive blocking for entity matching

Demetrio Gomes Mestre; Carlos Eduardo S. Pires; Dimas Cassimiro Nascimento

Modern parallel computing programming models, such as MapReduce (MR), have proven to be powerful tools for efficient parallel execution of data-intensive tasks such as Entity Matching (EM) in the era of Big Data. For this reason, studies about challenges and possible solutions of how EM can benefit from this well-known cloud computing programming model have become an important demand nowadays. Furthermore, the effectiveness and scalability of MR-based implementations for EM depend on how well the workload distribution is balanced among all reduce tasks. In this article, we investigate how MapReduce can be used to perform efficient (load balanced) parallel EM using a variation of the multi-pass Sorted Neighborhood Method (SNM) that uses a varying size (adaptive) window. We propose Multi-pass MapReduce Duplicate Count Strategy (MultiMR-DCS++), a MR-based approach for multi-pass adaptive SNM, aiming to increase even more the performance of the SNM. The evaluation results based on real-world datasets and cluster infrastructure show that our approach increases the performance of MapReduce-based SNM regarding the EM execution time and detection quality. A new approach for the Entity Matching task parallelization is proposed.The idea relies on performing a MapReduce-based multi-pass adaptive blocking strategy.The proposed approach shows significant superior performance efficiency.

Explore More