Carmem S. Hara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carmem S. Hara is active.

Explore More

Publication

Featured researches published by Carmem S. Hara.

Information Systems | 2003

Reasoning about keys for XML

Peter Buneman; Susan B. Davidson; Wenfei Fan; Carmem S. Hara; Wang Chiew Tan

We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, we show that these keys are always (finitely) satisfiable, and their (finite) implication problem is finitely axiomatizable. Furthermore, we provide a polynomial time algorithm for determining (finite) implication in the size of keys. Our results also demonstrate, among other things, that the analysis of XML keys is far more intricate than its relational counterpart.

Computer Networks | 2002

Keys for XML

Peter Buneman; Susan B. Davidson; Wenfei Fan; Carmem S. Hara; Wang Chiew Tan

Abstract We discuss the definition of keys for XML documents, paying particular attention to the concept of a relative key , which is commonly used in hierarchically structured documents and scientific databases.

database programming languages | 2001

Reasoning about Keys for XML

Peter Buneman; Susan B. Davidson; Wenfei Fan; Carmem S. Hara; Wang Chiew Tan

We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, these keys can be reasoned about efficiently. We show that the (finite) satisfiability problem for these keys is trivial, and their (finite) implication problem is finitely axiomatizable and decidable in PTIME in the size of keys.

very large data bases | 2003

RRXS: redundancy reducing XML storage in relations

Yi Chen; Susan B. Davidson; Carmem S. Hara; Yifeng Zheng

Current techniques for storing XML using relational technology consider the structure of an XML document but ignore its semantics as expressed by keys or functional dependencies. However, when the semantics of a document are considered redundancy may be reduced, node identifiers removed where value-based keys are available, and semantic constraints validated using relational primary key technology. In this paper, we propose a novel constraint definition called XFDs that capture structural as well as semantic information. We present a set of rewriting rules for XFDs, and use them to design a polynomial time algorithm which, given an input set of XFDs, computes a reduced set of XFDs. Based on this algorithm, we present a redundancy removing storage mapping from XML to relations called RRXS. The effectiveness of the mapping is demonstrated by experiments on three data sets.

symposium on principles of database systems | 1999

Reasoning about nested functional dependencies

Carmem S. Hara; Susan B. Davidson

Functional dependencies add semantics to a database schema, and are useful for studying various problems, such as database design, query optimization and how dependencies are carried into a view. In the context of a nested relational model, these dependencies can be extended by using path expressions instead of attribute names, resulting in a class of dependencies that we call nested functional dependencies (NFDs). NFDs define a natural class of dependencies in complex data structures; in particular they allow the specification of many useful intraand inter-set dependencies (i.e., dependencies that are local to a set and dependencies that require consistency between sets). Such constraints cannot be captured by existing notions of functional, multi-valued, or join dependencies. This paper presents the definition of NFDs and gives their meaning by translation to logic. It then presents a sound and complete set of eight inference rules for NFDs, and discusses approaches to handling the existence of empty sets in instances. Empty sets add complexity in reasoning since formulas such as VZ E R.P(z) are trivially true when R is empty. This axiomatization represents a first step in reasoning about constraints on data warehouse applications, where both the source and target databases support complex types.

Journal of Computer and System Sciences | 2007

Propagating XML constraints to relations

Susan B. Davidson; Wenfei Fan; Carmem S. Hara

We present a technique for refining the design of relational storage for XML data. The technique is based on XML key propagation: given a set of keys on XML data and a mapping (transformation) from the XML data to relations, what functional dependencies must hold on the relations produced by the mapping? With the functional dependencies one can then convert the relational design into, e.g. 3NF, BCNF, and thus develop efficient relational storage for XML data. We provide several algorithms for computing XML key propagation. One algorithm is to check whether a functional dependency is propagated from a set of XML keys via a predefined mapping; this allows one to determine whether or not the relational design is in a normal form. The others are to compute a minimum cover for all functional dependencies that are propagated from a set of XML keys and hold on a universal relation; these provide guidance for how to design a relational schema for storing XML data. These algorithms show that XML key propagation and its associated minimum cover can be computed in polynomial time. Our experimental results verify that these algorithms are efficient in practice. We also investigate the complexity of propagating other XML constraints to relations. The ability to compute XML key propagation is a first step toward establishing a connection between XML data and its relational representation at the semantic level.

international symposium on computers and communications | 2008

A flexible network monitoring tool based on a data stream management system

Natascha Petry Ligocki; Carmem S. Hara; Christiano Lyra

Network monitoring is a complex task that generally requires the use of different tools for specific purposes. This paper describes a flexible network monitoring tool, called PaQueT, designed to meet a wide range of monitoring needs. The user can define metrics as queries in a process similar to writing queries on a database management system. This approach provides an easy mechanism to adapt the tool as system requirements evolve. PaQueT allows one to monitor values ranging from packet level metrics to those usually provided only by tools based on Netflow or SNMP. PaQueT has been developed as an extension of Borealis Data Stream Management System. The first advantage of our approach is the ability to generate measurements in real time, minimizing the volume of data stored; second, the tool can be easily extended to consider several types of network protocols. We have conducted an experimental study to verify the effectiveness of our approach, and to determine its capacity to process large volumes of data.

data and knowledge engineering | 2013

Empowering integration processes with data provenance

Bruno Tomazela; Carmem S. Hara; Ricardo Rodrigues Ciferri; Cristina Dutra de Aguiar Ciferri

Abstract In some integration applications, users are allowed to import data from heterogeneous sources, but are not allowed to update these source data directly. Imported data may be inconsistent, and even when inconsistencies are detected and solved, these changes may not be propagated to the sources due to their update policies. Therefore, they continue to provide the same inconsistent data in future imports until the proper authority updates them. In this paper, we propose PrInt, a model that supports users decisions on cleaning data to be automatically reapplied in subsequent integration processes. By reproducing previous decisions, the user may focus only on new inconsistencies originated from source modified data. The reproducibility provided by PrInt is based on logging, and by incorporating data provenance into the integration process. Other major features of PrInt are described as follows. It is based on a repository of operations, which contains provenance data and represents integration decisions that the user takes to solve attribute value conflicts among data sources. It is designed to maintain the repository consistency and to provide a strict reproduction of users decisions by guaranteeing the validity of operations and by reapplying only valid operations. It is also designed to safely reorder the operations stored in the repository to improve the performance of the reapplication process. We applied PrInt to a real application and the experimental results showed remarkable performance gains. Reapplying users decisions based on our model was at least 89% faster than naively re-executing the integration process. We conclude that the characteristics of PrInt make the integration process less error-prone and less time-consuming.

international conference on cloud computing | 2011

Phoenix: A Relational Storage Component for the Cloud

Davi Arnaut; Rebeca Schroeder; Carmem S. Hara

This paper describes the design and architecture of a cloud-based relational database system. The systems core component is a storage engine, which is responsible for mapping the logical schema, based on relations, to a physical storage, based on a distributed key-value data store. The proposed stratified architecture provides physical data independence, by allowing different approaches for data mapping and partitioning, while the distributed data store is responsible for providing scalability, availability, data replication and ACID properties. A prototype of the system, named Phoenix, has been developed based on the proposed architecture using a transactional key-value store. Experimental studies on a cluster of commodity servers show that Phoenix preserves the desired properties of key-value stores, while providing relational database functionality at a very low overhead.

ieee international conference on cloud computing technology and science | 2016

Exploring Controlled RDF Distribution

Raqueline R. M. Penteado; Rebeca Scroeder; Carmem S. Hara

RDF datasets have increased rapidly over the last few years. In order to process SPARQL queries on these large datasets, much effort has been spent on developing horizontally scalable techniques, which involve data partitioning and parallel query processing. While distribution may provide storage scalability, it may also incur high communication costs for processing queries. In this paper, we present a parallel and distributed query rocessing approach that explores the existence of data allocation patterns, provided by a controlled data distribution, that determine how RDF triples should be grouped and stored on the same server. Fragments of the RDF datastore follow a given allocation pattern and correspond also to units of communication among servers. Based on this distribution model, we define two communication strategies for query processing: get-frag, which requests remote servers to send fragments that contain data required by a query, and send-result, which forwards intermediate results. These strategies are combined on a method, called 2ways, that chooses the adequate communication strategy whenever queries traverse fragment boundaries. We provide a cost function used to determine this choice and present experimental results. They show that our proposed technique effectively reduces the communication cost and improves the response time for processing SPARQL queries on a distributed RDF datastore.

Explore More