Dimitri Theodoratos
New Jersey Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dimitri Theodoratos.
data and knowledge engineering | 1999
Dimitri Theodoratos; Timos K. Sellis
Abstract A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented.
data warehousing and olap | 2000
Dimitri Theodoratos; Mokrane Bouzeghoub
A Data Warehouse (DW) can be seen as a set of materialized views de ned over remote source relations. During the initial design and evolution of a DW, the DW designer is faced, on many occasions, with the problem of selecting views to materialize in the DW. This problem has been addressed for di erent classes of queries/views, and with di erent design goals. In this work we unify these approaches in a general framework for materialized view selection for Data Warehousing. We rst identify and analyze di erent design goals. A design goal can be the minimization of a cost function or a constraint of di erent types. We then de ne the general view selection problem that aims at satisfying all these goals together. This de nition of the problem allows us to deal not only with the static design of a DW, but also with its evolution. We use expression AND/OR dags to represent alternative ways of evaluating multiple queries and views, and subexpression sharing. Our formalism is general enough to allow the representation of complex queries including grouping/aggregation queries, necessary in DW applications. We show how the design goals can be mapped into conditions on expression AND/OR dag structures. Using this mapping, we determine the search space for the general view selection problem, and we discuss algorithms for exploring it. Our approach can be used as is but it can be also applied to particular DW design cases where not all the design goals are required.
data warehousing and olap | 2004
Dimitri Theodoratos; Wugang Xu
Deciding which views to materialize is an important problem in the design of a Data Warehouse. Solving this problem requires generating a space of candidate view sets from which an optimal or near-optimal one is chosen for materialization. In this paper we address the problem of constructing this search space. This is an intricate issue because it requires detecting and exploiting common subexpressions among queries and views. Our approach suggests adding to the alternative evaluation plans of multiple queries views called closest common derivators (CCDs) and rewriting the queries using CCDs. A CCD of two queries is a view that is as close to the queries as possible and that allows both queries to be (partially or completely) rewritten using itself. CCDs generalize previous definitions of common subexpressions. Using a declarative query graph representation for queries we provide necessary and sufficient conditions for a view to be a CCD of two queries. We exploit these results to describe a procedure for generating all the CCDs of two queries and for rewriting the queries using each of their CCDs.
data warehousing and knowledge discovery | 1999
Dimitri Theodoratos; Timos K. Sellis
A data warehouse (DW) can be seen as a set of materialized views defined over remote base relations. When a query is posed, it is evaluated locally, using the materialized views, without accessing the original information sources. The DWs are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered by them. Some of these queries can be answered using exclusively the materialized views. In general though new views need to be added to the DW. In this paper we investigate the problem of incrementally designing a DW when new queries need to be answered and extra space is allocated for view materialization. Based on an AND/OR dag representation of multiple queries, we model the problem as a state space search problem. We design incremental algorithms for selecting a set of new views to additionally materialize in the DW that fits in the extra space, allows a complete rewriting of the new queries over the materialized views and minimizes the combined new query evaluation and new view maintenance cost.
data warehousing and knowledge discovery | 2000
Dimitri Theodoratos; Timos K. Sellis
A data warehouse (DW) can be seen as a set of materialized views defined over remote base relations. When a query is posed, it is evaluated locally, using the materialized views, without accessing the original information sources. The DWs are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered by them. Some of these queries can be answered using exclusively the materialized views. In general though new views need to be added to the DW.In this paper we investigate the problem of incrementally designing a DW when new queries need to be answered and possibly extra space is allocated for view materialization. Based on an AND/OR dag representation of multiple queries, we model the problem as a state space search problem. We design incremental algorithms for selecting a set of new views to additionally materialize in the DW that: (a) fits in the extra space, (b) allows a complete rewriting of the new queries over the materialized views, and (c) minimizes the combined new query evaluation and new view maintenance cost. Finally, we discuss methods for pruning the search space so that efficiency is improved.
conference on information and knowledge management | 2009
Xiaoying Wu; Dimitri Theodoratos; Wendy Hui Wang
Answering queries using views is a well-established technique in databases. In this context, two outstanding problems can be formulated. The first one consists in deciding whether a query can be answered exclusively using one or multiple materialized views. Given the many alternative ways to compute the query from the materialized views, the second problem consists in finding the best way to compute the query from the materialized views. In the realm of XML, there is a restricted number of contributions in the direction of these problems due to the many limitations associated with the use of materialized views in traditional XML query evaluation models. In this paper, we adopt a recent evaluation model, called inverted lists model, and holistic algorithms which together have been established as the prominent technique for evaluating queries on large persistent XML data, and we address the previous two problems. This new context revises these problems since it requires new conditions for view usability and new techniques for computing queries from materialized views. We suggest an original approach for materializing views which stores for every view node only the list of XML nodes necessary for computing the answer of the view. We specify necessary and sufficient conditions for answering a tree-pattern query using one or multiple materialized views in terms of homomorphisms from the views to the query. In order to efficiently answer queries using materialized views, we design a stack-based algorithm which compactly encodes in polynomial time and space all the homomorphisms from a view to a query. We further propose space and time optimizations by using bitmaps to encode view materializations and by employing bitwise operations to minimize the evaluation cost of the queries. Finally, we conducted an extensive experimentation which demonstrates that our approach yields impressive query hit rates in the view pool, achieves significant time and space savings and shows smooth scalability.
data warehousing and olap | 2001
Dimitri Theodoratos; Aris Tsois
On-line analytical processing (OLAP) is a technology that encompasses applications requiring a multidimensional and hierarchical view of data. OLAP applications often require fast response time to complex grouping/aggregation queries on enormous quantities of data. Commercial relational database management systems use mainly multiple one-dimensional indexes to process OLAP queries that restrict multiple dimensions. However, in many cases, multidimensional access methods outperform one-dimensional indexing methods.We present an architecture for multidimensional databases that are clustered with respect to multiple hierarchical dimensions. It is based on the star schema and is called CSB star. Then, we focus on heuristically optimizing OLAP queries over this schema using multidimensional access methods. Users can still formulate their queries over a traditional star scheme, which are then rewritten by the query processor over the CSB star. We exploit the different clustering features of the CSB star to efficiently process a class of typical OLAP queries. We detect special cases where the construction of an evaluation plan can be simplified and we discuss improvements of our technique.
conference on information and knowledge management | 2005
Dimitri Theodoratos; Theodore Dalamagas; Antonis Koufopoulos; Narain Gehani
Nowadays, huge volumes of data are organized or exported in a tree-structured form. Querying capabilities are provided through queries that are based on branching path expression. Even for a single knowledge domain structural differences raise difficulties for querying data sources in a uniform way. In this paper, we present a method for semantically querying tree-structured data sources using partially specified tree patterns. Based on dimensions which are sets of semantically related nodes in tree structures, we define dimension graphs. Dimension graphs can be automatically extracted from trees and abstract their structural information. They are semantically rich constructs that support the formulation of queries and their efficient evaluation. We design a tree-pattern query language to query multiple tree-structured data sources. A central feature of this language is that the structure can be specified fully, partially, or not at all in the queries. Therefore, it can be used to query multiple trees with structural differences. %and We study the derivation of structural expressions in queries by introducing a set of inference rules for structural expressions. We define two types of query unsatisfiability and we provide necessary and sufficient conditions for checking each of them. Our approach is validated through experimental evaluation.
International Journal of Cooperative Information Systems | 2001
Dimitri Theodoratos; Mokrane Bouzeghoub
A Data Warehouse (DW) is a large collection of data integrated from multiple distributed autonomous databases and other information sources. A DW can be seen as a set of materialized views defined ...
british national conference on databases | 2002
Dimitri Theodoratos
Information integration in the World Wide Web has evolved to a new framework where the information is represented and manipulated using a wide range of modeling languages. Current approaches to data integration use wrappers to convert the different modeling languages into a common data model. In this work we use a nested hypergraph based data model (called HDM) as a common data model for integrating different structured or semi-structured data. We present a hypergraph query language (HQL) that allows the integration of the wrapped data sources through the creation of views for mediators, and the querying of the wrapped data sources and the mediator views by the end users. We also show that HQL queries (views) can be constructed from other views and/or source schemas using a set of primitive transformations. Our integration architecture is flexible and allows some (or all) of the views in a mediator to be materialized.