Mapping Relational Operations onto Hypergraph Model
TThe Python Papers 6(1): 4 1
Mapping Relational Operations onto Hypergraph Model
Amani Naser Tahat
1, 2, 5 , Maurice HT Ling
3, 4, 6 1
Department of Basic Science, Philadelphia University, Jordan Department of Physics, Hashemite University, Jordan School of Chemical and Life Sciences, Singapore Polytechnic, Singapore Department of Zoology, The University of Melbourne, Australia [email protected]; [email protected] Keywords:
Relational Model, Relational Operations, Hypergraph Model.
Abstract
The relational model is the most commonly used data model for storing large datasets, perhaps due to the simplicity of the tabular format which had revolutionized database management systems. However, many real world objects are recursive and associative in nature which makes storage in the relational model difficult. The hypergraph model is a generalization of a graph model, where each hypernode can be made up of other nodes or graphs and each hyperedge can be made up of one or more edges. It may address the recursive and associative limitations of relational model. However, the hypergraph model is non-tabular; thus, loses the simplicity of the relational model. In this study, we consider the means to convert a relational model into a hypergraph model in two layers. At the bottom layer, each relational tuple can be considered as a star graph centered where the primary key node is surrounded by non-primary key attributes. At the top layer, each tuple is a hypernode, and a relation is a set of hypernodes. We presented a reference implementation of relational operators (project, rename, select, inner join, natural join, left join, right join, outer join and Cartesian join) on a hypergraph model. Using a simple example, we demonstrate that a relation and relational operators can be implemented on this hypergraph model. Introduction
A database can be seen as a structured collection of records that allows for proper storing, searching as well as retrieving of data. The conceptual organization of a database is known as the data model which describes the data, relationships between data elements and semantics along with data constraints. Data models can be categorized based on their underlying theoretical principles, such as the hierarchical model (Tsichritzis, 1976), the network model (CODASYL, 1974), and the relational model (Codd, 1970), as well as an emerging database model known as the object-oriented model (Arlow and Neustadt, 2001). Among them, the relational model (Codd, 1970), which is based on the set theory to construct data in terms of rows and columns and can be defined as a database that groups data by using common attributes found in the data set, is the most established and the most commonly used. Hence, this study shall focus on relational model. The data model is crucial in terms of system design, functionality and maintainability (Deraman and Layzelll, 1995). It should reflect real world objects and their relationships to ensure durability. he Python Papers 6(1): 4 2
A good data model serves and outlasts applications. Data models are domain and application specific. For example, data models which are useful for modeling inventories or financial records may well be different from data model which are important for modeling the domain of computer aided design applications or of genomic applications. Therefore, depending on the nature of the problem domain, new approaches might be needed. Due to the simplicity of relational model in the operations of data storage and retrieval, relational databases have revolutionized database management systems. However, various shortcomings persist for use in data management needs for some domains. Firstly, poor representation of “real world” entities (considered as normalization) may lead to relations that do not keep up a correspondence to entities in “real world” (Reese, 2003). Secondly, the relational model lacks support for data-intensive and complex applications (Reese, 2003). Relational databases generally lack the ability to handle complex interrelationships of data such as images, and audio/video files and other digital files. Thirdly, relational databases require a homogeneous data structure which assumes both horizontal and vertical homogeneity (tabular form). However, most real-world objects are more complex. Thus, a homogeneous data structure is unnatural representation of real-world objects. Although many Relational Data Base Managements systems (RDBMSs) allow Binary Large Objects (BLOBs) (Shapiro and Miller, 1999), they are typically referenced to files, and as a result some advantages provided by DBMSs may be lost -- for instance the security. Furthermore, the inner structure of BLOBs cannot be accessed. Lastly, the relational model does not cater for semantic overloading. For example, the relational model only follows one construction for representing data as well as data relationships. Both concepts are presented by relation. Accordingly there will be no distinction between entity and relationship; no difference between different types of relations, then such semantic cannot be expressed (Chen, 1976). Alternative models have been proposed to address the deficiencies of relational model, such as the graph model (Kunii, 1987), the graph-object oriented model (Gyssens et al., 1990), and the hypergraph model (Berge, 1973). A number of authors have found the hypergraph to be a useful means of modeling relational database designs (Beeri et al., 1981; Beeri et al., 1983; Chase, 1980; Fagin, 1983; Fagin et al., 1982; Yannakakis, 1981) as well as acting as an unifying data model for a wide variety of other data models (Angles and Gutierrez, 2008; Eschbach et al., 2006; Berge and Ray-Chaudhuri, 1972; Makinen, 1990). Sowa (1999) showed that any information can be represented by conceptual graphs. Thus, we are motivated to use hypergraph model as a unified data model. In hypergraph, each node (known as a hypernode) can consist of one or more nodes or hypergraphs and each edge (known as a hyperedge) or link can consist of one or more edges. As the relational model is widely taught and used, there appears to be a need to bridge between the relational model and the hypergraph model. New approaches, algorithms and theories for database query optimization have been developed that take advantage of advanced graph theoretic concepts. Hence, the relational model can be mapped onto a hypergraph model. Therefore, this study presents a reference implementation of mapping relational operators (Maier, 1983; Ullman, 1982) onto a hypergraph model. he Python Papers 6(1): 4 3 Example of a Hypergraph Model
The hypergraph has played an important role in the recent research where the connection between databases and hypergraphs is straightforward and based on how to transform relational databases into graph databases in order to be able to reuse relational data already existent. The relational database schema can be considered as a set of attributes and a set of relations on those attributes. This can easily be represented by a hypergraph where the set of nodes in the hypergraph keep up a correspondence to the set of attributes in the database schema, along with each hyperedge corresponding to a set of attributes included in a relation in the database schema (Figure 1). Figure 1: The building block of the Graph. The recognized definition of this graph can be written as:
A combination of these graphs leads to a big graph close to most real world problems. The issue of what we basing our implementation of a hypergraph-based DBMS is the similarity between a table in dictionary form and a graph in dictionary form. The data persistence mechanism used for the hypergraph is Python shelve module. Since "shelf" is a dictionary-like object, a graph can be made into a dictionary like: graph [
Similarly, a relational table can be constructed as a dictionary, such as: table [
Our first task now is to map relational operations on a graph data model where the implementation of relational operations on graph (shelve) is that initial step: Step 1: Construct 2 database tables using shelve in the format of: he Python Papers 6(1): 4 4 table [
Step 2: Write functions to simulate the following relational operations: select, project, rename, inner join, left join, right join, outer join, inner join, Cartesian join (cross join), and natural join. One may ask how to describe each edge in order to construct functions which will describe all edges for the graph model. It is quite an easy idea - each tuple (row) of a relational table is a star graph. Simply, if we have 2000 rows, we will have 2000 graphs. The central node is the primary key, other nodes are the fields and the data is on the edge as as a “star graph” shown in Figure 1. Let’s use a simple library database to illustrate here. In a relational model, let a “books” table where ISBN is the primary key be defined as follows: ISBN, Title, Publisher, First author, Catalog --------------------------------------------------------------- 9780596159818, Beautiful testing, O'Reilly, Tim Riley, 001 9781933988542, Open source SOA, Manning, Jeff Davis, 001 9780596516499, Natural language processing with Python, O'Reilly, Steven Bird, 001 9780521741033, Presentation skills for scientists, CUP, Edward Zanders, 002 9780751404624, E. coli, Blackie Academic, Chris Bell, 003 The catalog table will be: Code, Description ---------------------------------------------------------------- 001, computing 002, academic skills 003, biology In graph model, the above tables will be:
Books = {'primary key': 'ISBN', '9780596159818': {'ISBN': '9780596159818', 'title': 'Beautiful testing', 'Publisher': "O'Reilly", 'first author': 'Tim Riley', 'catalog': '001'}, '9781933988542': {'ISBN': '9781933988542', 'title': 'Open source SOA', 'publisher': 'Manning', 'first author': 'Jeff Davis', 'catalog': '001'}, '9780596516499': {'ISBN': '9780596516499', 'title': 'Natural language processing Python', 'publisher': "O'Reilly", he Python Papers 6(1): 4 5 'first author': 'Steven Bird', 'catalog': '001'}, '9780521741033': {'ISBN': '9780521741033', 'title': 'Presentation skills for scientists', 'publisher': 'CUP', 'first author': 'Edward Zanders', 'catalog': '002'}, '9780751404624': {'ISBN': '9780751404624', 'title': 'E. coli', 'publisher': 'Blackie Academic', 'first author': 'Chris Bell', 'catalog': '003'} }
Hence, each book is a graph in the format of
The catalog table can be:
Catalog = {'primary key': 'catalog', '001': {'catalog': '001', 'description': 'computing'}, '002': {'catalog': '002', 'description': 'academic skills'}, '003': {'catalog': '003', 'description': 'biology'} }
This means that we can define all the relational joins. For example, inner join can be def inner_join(
The set of eight relational operations (project, select, inner join, left join, right join, outer join, inner join, Cartesian product) can then form the reference idea to implement more complicated operations. Reference Implementation of Relational Operators on Hypergraph Model
This reference implementation employs the one-table, one-file approach where each table is persisted in a shelve file. Therefore, adding new tables we only need to call the function shelve.open(