Is this you? Create Your Porfile

Carina F. Dorneles

Universidade Federal de Santa Catarina

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carina F. Dorneles is active.

Explore More

Publication

Featured researches published by Carina F. Dorneles.

Knowledge and Information Systems | 2011

Approximate data instance matching: a survey

Carina F. Dorneles; Rodrigo Gonçalves; Ronaldo dos Santos Mello

Approximate data matching is a central problem in several data management processes, such as data integration, data cleaning, approximate queries, similarity search and so on. An approximate matching process aims at defining whether two data represent the same real-world object. For atomic values (strings, dates, etc), similarity functions have been defined for several value domains (person names, addresses, and so on). For matching aggregated values, such as relational tuples and XML trees, approaches alternate from the definition of simple functions that combine values of similarity of record attributes to sophisticated techniques based on machine learning, for example. For complex data comparison, including structured and semistructured documents, existing approaches use both structure and data for the comparison, by either considering or not considering data semantics. This survey presents terminology and concepts that base approximated data matching, as well as discusses related work on the use of similarity functions in such a subject.

international conference on management of data | 2013

Web table taxonomy and formalization

Larissa R. Lautert; Marcelo Scheidt; Carina F. Dorneles

The Web is the largest repository of data available, with over 150 million high-quality tables. Several works have combined efforts to allow queries on these tables, but there are still challenges, like the various different types of structures found on the Web. In this paper, we propose a taxonomy for the tabular structures and formalize the ones used with relational data and show, through an experimental evaluation, that WTClassifier, our supervised framework, classifies Web tables with high accuracy. Additionally, we use WTClassifier to categorize more than 300 thousandWeb tables into our taxonomy and found that 82.25% are not formatted similarly to relational structure.

international conference on web engineering | 2014

SSUP – A URL-Based Method to Entity-Page Discovery

Edimar Manica; Renata de Matos Galante; Carina F. Dorneles

Entity-pages are Web pages that publish data representing one only instance of a certain conceptual entity. In this paper we propose SSUP, a new method to entity-page discovery. Specifically, given a sample entity-page from a Web site (e.g., Jolyon Palmer entity-page from GP2 Web site) we aim to find all same type entity-pages (driver entity-pages) from this Web site. We propose two structural URL similarity metrics and a set of algorithms to combine URL features with HTML features in order to improve the quality results and minimize the number of downloaded pages and processing time. We evaluate our method in real world Web sites and compare it with two baselines to demonstrate the effectiveness of our method.

information integration and web-based applications & services | 2017

QSMatching: an approach to calculate similarity between questionnaires

Richard H. de Souza; Carina F. Dorneles

The creation of questionnaires for use in interviews, statistical surveys or scientific research is not a trivial task. Poorly elaborated worked out questions can lead to answers with meaningless or naive interpretations. Therefore, it may be interesting to reuse, partially or totally, questionnaires already created with the same purpose. In this paper we propose the QSMatching approach to calculate the similarity between questionnaires and consequently to obtain a ranking of questionnaires according to the users query. In order to verify the effectiveness of the proposed approach, an experiment was carried out comparing QSMatching and the vector model. The result of the analysis of the experiment shows that the QSMatching is more effective than the vector model for questionnaires retrieval.

information integration and web-based applications & services | 2017

Improving performance in a DBaaS environment through the use of resource reservation

Vinicius da Silveira Segalin; Carina F. Dorneles; Mario A. R. Dantas

Resource reservation provides a user with the requested resource (usually, CPU, memory, space in disk and bandwidth) at the requested time, allowing the user to have the expected performance in a defined time interval. In the context of databases systems, applications with long running time queries represent a big challenge when there is a requirement to know approximately how long a query will take to execute. This is a prediction that might be relevant for several reasons. For instance, by knowing that a query will take longer than expect to execute, resource reservation can be performed, which means reserving more resources in order to execute this query in a shorter time in a future execution. This paper presents an approach that conceives an advance reservation mechanism in a DBaaS environment using machine learning techniques. In a general way, the proposal is to use a prediction mechanism based on machine learning to give the user a resource recommendation regarding time and cost. In this paper, we present the proposed model, the machine learning configuration, as well as some experiments to indicate benefits and efficiency from our proposal.

database and expert systems applications | 2017

Orion: A Cypher-Based Web Data Extractor

Edimar Manica; Carina F. Dorneles; Renata de Matos Galante

The challenges in Big Data start during the data acquisition, where it is necessary to transform non-structured data into a structured format. One example of relevant data in a non-structured format is observed in entity-pages. An entity-page publishes data that describe an entity of a particular type (e.g. a soccer player). Extracting attribute values from these pages is a strategic task for data-driven companies. This paper proposes a novel class of data extraction methods inspired by the graph databases and graph query languages. Our method, called Orion, uses the same declarative language to learn the extraction rules and to express the extraction rules. The use of a declarative language allows the specification to be decoupled from the implementation. Orion models the problem of extracting attribute values from entity-pages as Cypher queries in a graph database. To the best of our knowledge, this is the first work that models the problem of extracting attribute values from template-based entity-pages in this way. Graph databases integrate the alternative database management systems, which are taking over Big Data (with the generic name of NoSQL) because they implement novel representation paradigms and data structures. Cypher is more robust than XPath (a query language that is common used to handle web pages) because it allows traversing, querying and updating the graph, while XPath is a language specific for traversing DOM trees. We carried out experiments on a dataset with more than 145k web pages from different real-world websites of a wide range of entity types. The Orion method reached 98% of F1. Our method was compared with one state-of-the-art method and outperformed it with a gain regarding F1 of 5%.

conference on information and knowledge management | 2017

Extracting Records from the Web Using a Signal Processing Approach

Carina F. Dorneles

Extracting records from web pages enables a number of important applications and has immense value due to the amount and diversity of available information that can be extracted. This problem, although vastly studied, remains open because it is not a trivial one. Due to the scale of data, a feasible approach must be both automatic and efficient (and of course effective). We present here a novel approach, fully automatic and computationally efficient, using signal processing techniques to detect regularities and patterns in the structure of web pages. Our approach segments the web page, detects the data regions within it, identifies the records boundaries and aligns the records. Results show high f-score and linearithmic time complexity behaviour.

brazilian symposium on multimedia and the web | 2017

Towards Recency Ranking in Community Question Answering: A Case Study of Stack Overflow

Leandro Amancio; Carina F. Dorneles

In Community Question Answering, recency ranking refers to put the freshness answers with high quality in top positions of a ranking. Freshness is not related to how recent is the answer creation date, but to how up-to-date is the answer content. This is extremely important because the users need to get best answers quickly to solve their questions and, usually, they expect up-to-date solutions. In this paper, we propose a new approach to provide recency ranking in these environments and present a set of experiments that show the effectiveness of our proposal.

brazilian symposium on multimedia and the web | 2016

Weight Adjusment for Multi-criteria Ratings in Items Recommendation

Felipe Born de Jesus; Carina F. Dorneles

In this paper we propose to use implicit ratings of multiple criteria to mitigate the data sparsity problem. The intuition is to predict the overall relevance of an item for a given user, based on her/his own implicit feedback instead of using similar users ratings (commonly used in collaborative filtering). Furthermore, since we believe one criterion may be more important than others, we propose a weighting schema, in which we estimate how interesting is each criterion for a given user, in order to generate a personalized ranking. The weighting schema do not suppose the generation of predicted explicit ratings. Instead, we reorganize the weights in such a way that just the criterion that has rating are weighted. For predicting the weight of each criterion to each user, we propose a genetic programming to predict how interesting is each criterion for a user, in which the initial weight values are randomly generated. In our experiments, we show that when having a sufficient corpus of historical user implicit feedback we can obtain higher precision for ranking items to a user, considering a predicted set of weight.

international conference of the chilean computer science society | 2015

Towards automatic document classification by exploiting only knowledge resources

Gleidson Antonio Cardoso da Silva; Carina F. Dorneles

Document classification is critical to optimize information retrieval tasks, especially over the web. In this environment, the open domain nature and growing volume of available data remain a challenge for the classification task. In this paper, we deal with these problems by only using knowledge resources. Our approach relies on concepts instances derived from the document and an open domain knowledge base for concept generalization. The set of broader concepts is ranked according to a disparity value, and then the best-placed concept is considered as the document class label. Experimental results on real-world datasets show that this approach can achieve document classification without the need to build an ontology or train and keep a classification model.

Explore More