Stelios Paparizos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stelios Paparizos is active.

Explore More

Publication

Featured researches published by Stelios Paparizos.

very large data bases | 2002

TIMBER: A native XML database

H. V. Jagadish; Shurug Al-Khalifa; Adriane Chapman; Laks V. S. Lakshmanan; Andrew Nierman; Stelios Paparizos; Jignesh M. Patel; Divesh Srivastava; Nuwee Wiwatwattana; Yuqing Wu; Cong Yu

Abstract. This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.

very large data bases | 2003

From tree patterns to generalized tree patterns: on efficient evaluation of XQuery

Zhimin Chen; H. V. Jagadish; Laks V. S. Lakshmanan; Stelios Paparizos

XQuery is the de facto standard XML query language, and it is important to have efficient query evaluation techniques available for it. A core operation in the evaluation of XQuery is the finding of matches for specified tree patterns, and there has been much work towards algorithms for finding such matches efficiently. Multiple XPath expressions can be evaluated by computing one or more tree pattern matches. However, relatively little has been done on efficient evaluation of XQuery queries as a whole. In this paper, we argue that there is much more to XQuery evaluation than a tree pattern match. We propose a structure called generalized tree pattern (GTP) for concise representation of a whole XQuery expression. Evaluating the query reduces to finding matches for its GTP. Using this idea we develop efficient evaluation plans for XQuery expressions, possibly involving join, quantifiers, grouping, aggregation, and nesting. XML data often conforms to a schema. We show that using relevant constraints from the schema, one can optimize queries significantly, and give algorithms for automatically inferring GTP simplifications given a schema. Finally, we show, through a detailed set of experiments using the TIMBER XML database system, that plans via GTPs (with or without schema knowledge) significantly outperform plans based on navigation and straightforward plans obtained directly from the query.

international conference on management of data | 2004

Tree logical classes for efficient evaluation of XQuery

Stelios Paparizos; Yuqing Wu; Laks V. S. Lakshmanan; H. V. Jagadish

XML is widely praised for its flexibility in allowing repeated and missing sub-elements. However, this flexibility makes it challenging to develop a bulk algebra, which typically manipulates sets of objects with identical structure. A set of XML elements, say of type book, may have members that vary greatly in structure, e.g. in the number of author sub-elements. This kind of heterogeneity may permeate the entire document in a recursive fashion: e.g., different authors of the same or different book may in turn greatly vary in structure. Even when the document conforms to a schema, the flexible nature of schemas for XML still allows such significant variations in structure among elements in a collection. Bulk processing of such heterogeneous sets is problematic.In this paper, we introduce the notion of logical classes (LC) of pattern tree nodes, and generalize the notion of pattern tree matching to handle node logical classes. This abstraction pays off significantly in allowing us to reason with an inherently heterogeneous collection of elements in a uniform, homogeneous way. Based on this, we define a Tree Logical Class (TLC) algebra that is capable of handling the heterogeneity arising in XML query processing, while avoiding redundant work. We present an algorithm to obtain a TLC algebra expression from an XQuery statement (for a large fragment of XQuery). We show how to implement the TLC algebra efficiently, introducing the nest-join as an important physical operator for XML query processing. We show that evaluation plans generated using the TLC algebra not only are simpler but also perform better than those generated by competing approaches. TLC is the algebra used in the Timber [8] system developed at the University of Michigan.

international conference on management of data | 2010

Structured annotations of web queries

Nikos Sarkas; Stelios Paparizos; Panayiotis Tsaparas

Queries asked on web search engines often target structured data, such as commercial products, movie showtimes, or airline schedules. However, surfacing relevant results from such data is a highly challenging problem, due to the unstructured language of the web queries, and the imposing scalability and speed requirements of web search. In this paper, we discover latent structured semantics in web queries and produce Structured Annotations for them. We consider an annotation as a mapping of a query to a table of structured data and attributes of this table. Given a collection of structured tables, we present a fast and scalable tagging mechanism for obtaining all possible annotations of a query over these tables. However, we observe that for a given query only few are sensible for the user needs. We thus propose a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and we define a dynamic threshold for filtering out misinterpreted query annotations. Our techniques are completely unsupervised, obviating the need for costly manual labeling effort. We evaluated our techniques using real world queries and data and present promising experimental results.

international conference on management of data | 2003

TIMBER: a native system for querying XML

Stelios Paparizos; Shurug Al-Khalifa; Adriane Chapman; H. V. Jagadish; Laks V. S. Lakshmanan; Andrew Nierman; Jignesh M. Patel; Divesh Srivastava; Nuwee Wiwatwattana; Yuqing Wu; Cong Yu

XML has become ubiquitous, and XML data has to be managed in databases. The current industry standard is to map XML data into relational tables and store this information in a relational database. Such mappings create both expressive power problems and performance problems.In the TIMBER [7] project we are exploring the issues involved in storing XML in native format. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.

extending database technology | 2002

Grouping in XML

Stelios Paparizos; Shurug Al-Khalifa; H. V. Jagadish; Laks V. S. Lakshmanan; Andrew Nierman; Divesh Srivastava; Yuqing Wu

XML permits repeated and missing sub-elements, and missing attributes. We discuss the consequent implications on grouping, both with respect to specification and with respect to implementation. The techniques described here have been implemented in the TIMBER native XML database system being developed at the University of Michigan.

international database engineering and applications symposium | 2003

Persistent applications via automatic recovery

Roger S. Barga; David B. Lomet; Stelios Paparizos; Haifeng Yu; Sirish Chandrasekaran

Building highly available enterprise applications using Web-oriented middleware is hard. Runtime implementations frequently do not address the problems of application state persistence and fault-tolerance, placing the burden of managing session state and, in particular, handling system failures on application programmers. This paper describes Phoenix/APP, a runtime service based on the notion of recovery guarantees. Phoenix/APP transparently masks failures and automatically recovers component-based applications. This both increases application availability and simplifies application development. We demonstrate the feasibility of this approach by describing the design and implementation of Phoenix/APP in Microsofts .NET runtime and present results on the cost of persisting and recovering component-based applications.

international conference on management of data | 2011

Facet discovery for structured web search: a query-log mining approach

Jeffrey Pound; Stelios Paparizos; Panayiotis Tsaparas

In recent years, there has been a strong trend of incorporating results from structured data sources into keyword-based web search systems such as Bing or Amazon. When presenting structured data, facets are a powerful tool for navigating, refining, and grouping the results. For a given structured data source, a fundamental problem in supporting faceted search is finding an ordered selection of attributes and values that will populate the facets. This creates two sets of challenges. First, because of the limited screen real-estate, it is important that the top facets best match the anticipated user intent. Second, the huge scale of available data to such engines demands an automated unsupervised solution. In this paper, we model the user faceted-search behavior using the intersection of web query-logs with existing structured data. Since web queries are formulated as free-text queries, a challenge in our approach is the inherent ambiguity in mapping keywords to the different possible attributes of a given entity type. We present an automated solution that elicits user preferences on attributes and values, employing different disambiguation techniques ranging from simple keyword matching, to more sophisticated probabilistic models. We demonstrate experimentally the scalability of our solution by running it on over a thousand categories of diverse entity types and measure the facet quality with a real-user study.

IEEE Transactions on Knowledge and Data Engineering | 2012

Entity Synonyms for Structured Web Search

Tao Cheng; Hady Wirawan Lauw; Stelios Paparizos

Nowadays, there are many queries issued to search engines targeting at finding values from structured data (e.g., movie showtime of a specific location). In such scenarios, there is often a mismatch between the values of structured data (how content creators describe entities) and the web queries (how different users try to retrieve them). Therefore, recognizing the alternative ways people use to reference an entity, is crucial for structured web search. In this paper, we study the problem of automatic generation of entity synonyms over structured data toward closing the gap between users and structured data. We propose an offline, data-driven approach that mines query logs for instances where content creators and web users apply a variety of strings to refer to the same webpages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings (entity synonyms) for each entity. Our framework consists of three modules: candidate generation, candidate selection, and noise cleaning. We further study the cause of the problem through the identification of different entity synonym classes. The proposed method is verified with experiments on real-life data sets showing that we can significantly increase the coverage of structured web queries with good precision.

international conference on data engineering | 2010

Fuzzy matching of Web queries to structured data

Tao Cheng; Hady Wirawan Lauw; Stelios Paparizos

Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.

Explore More