Zhen Hua Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhen Hua Liu is active.

Explore More

Publication

Featured researches published by Zhen Hua Liu.

international conference on management of data | 2005

Native Xquery processing in oracle XMLDB

Zhen Hua Liu; Muralidhar Krishnaprasad; Vikas Arora

With XQuery becoming the standard language for querying XML, and the relational SQL platform being recognized as an important platform to store and process XML, the SQL/XML standard is integrating XML query capability into the SQL system by introducing new SQL functions and constructs such as XMLQuery() and XMLTable. This paper discusses the Oracle XMLDB XQuery architecture for supporting XQuery in the Oracle ORDBMS kernel which has the XQuery processing tightly integrated with the SQL/XML engine using native XQuery compilation, optimization and execution techniques.

international conference on management of data | 2005

Towards an enterprise XML architecture

Ravi Murthy; Zhen Hua Liu; Muralidhar Krishnaprasad; Sivasankaran Chandrasekar; Anh-Tuan Tran; Eric Sedlar; Daniela Florescu; Susan Kotsovolos; Nipun Agarwal; Vikas Arora; Viswanathan Krishnamurthy

XML is being increasingly used in diverse domains ranging from data and application integration to content management. Oracle provides an enterprise wide platform for managing all types of XML content. Within the Oracle database and the application server, the XML content can be efficiently stored using a variety of storage and indexing methods and it can be processed using multiple standard languages within different programmatic environments.

very large data bases | 2004

Query rewrite for XML in Oracle XML DB

Muralidhar Krishnaprasad; Zhen Hua Liu; Anand Manikutty; James W. Warner; Vikas Arora; Susan Kotsovolos

Oracle XML DB integrates XML storage and querying using the Oracle relational and object relational framework. It has the capability to physically store XML documents by shredding them as relational or object relational data, and creating logical XML documents using SQL/XML publishing functions. However, querying XML in a relational or object relational database poses several challenges. The biggest challenge is to efficiently process queries against XML in a database whose fundamental storage is table-based and whose fundamental query engine is tuple-oriented. In this paper, we present the XML Query Rewrite technique used in Oracle XML DB. This technique integrates querying XML using XPath embedded inside SQL operators and SQL/XML publishing functions with the object relational and relational algebra. A common set of algebraic rules is used to reduce both XML and object queries into their relational equivalent. This enables a large class of XML queries over XML type tables and views to be transformed into their semantically equivalent relational or object relational queries. These queries are then amenable to classical relational optimisations yielding XML query performance comparable to relational. Furthermore, this rewrite technique lays out a foundation to enable rewrite of XQuery [1] over XML.

international conference on management of data | 2014

JSON data management: supporting schema-less development in RDBMS

Zhen Hua Liu; Beda Christoph Hammerschmidt; Doug McMahon

Relational Database Management Systems (RDBMS) have been very successful at managing structured data with well-defined schemas. Despite this, relational systems are generally not the first choice for management of data where schemas are not pre-defined or must be flexible in the face of variations and changes. Instead, No-SQL database systems supporting JSON are often selected to provide persistence to such applications. JSON is a light-weight and flexible semi-structured data format supporting constructs common in most programming languages. In this paper, we analyze the way in which requirements differ between management of relational data and management of JSON data. We present three architectural principles that facilitate a schema-less development style within an RDBMS so that RDBMS users can store, query, and index JSON data without requiring schemas. We show how these three principles can be applied to industry-leading RDBMS platforms, such as the Oracle RDBMS Server, with relatively little effort. Consequently, an RDBMS can unify the management of both relational data and JSON data in one platform and use SQL with an embedded JSON path language as a single declarative language to query both relational data and JSON data. This SQL/JSON approach offers significant benefits to application developers as they can use one product to manage both relational data and semi-structured flexible schema data.

international conference on data engineering | 2005

Towards an industrial strength SQL/XML infrastructure

Muralidhar Krishnaprasad; Zhen Hua Liu; Anand Manikutty; James W. Warner; Vikas Arora

XML has become an attractive data processing model for applications. SQL/XML is a SQL standard that integrates XML with SQL. It introduces the XML datatype as a native SQL datatype and defines XML generation functions in the SQL/XML 2003 standard. The goal for the next version of SQL/XML is integrating XQuery with SQL by supporting XQuery embedded inside SQL functions such as the XMLQuery and XMLTable functions. Starting with the 9i database release, Oracle has supported the XML datatype and various operations on XML instances. In this paper, we present the design and implementation strategies of the SQL/XML standard in Oracle XMLDB. We explore the various critical infrastructures needed in the SQL database kernel to support an efficient native XML datatype implementation and the design approaches for efficient generation, query and update of the XML instances. Furthermore, we also illustrate extensions to SQL/XML that makes Oracle XMLDB a truly industrial strength platform for XML processing.

international conference on data engineering | 2009

A Decade of XML Data Management: An Industrial Experience Report from Oracle

Zhen Hua Liu; Ravi Murthy

XML and its related technologies have now been in use for almost a decade. There has been considerable amount of effort both from research and industry focusing on XML, XQuery/XPath, XSLT and SQL/XML processing in the database. Many research prototypes and industrial products have been built to satisfy the XML use cases. This paper reviews several use cases where XML databases are leveraged to build real-world XML applications. We discuss the lessons learnt in supporting both data-centric and document-centric XMLDB applications within a single database system and the need for the implementation of different XML storage, index and query optimisation techniques for different XML use cases. We show the value of managing XML in databases, the current challenges and improvements that will hopefully promote future research directions. This paper also provides a timely checkpoint of XML data management from industrial perspective with experience of developing and supporting Oracle XML products.

international conference on management of data | 2016

Closing the functional and Performance Gap between SQL and NoSQL

Zhen Hua Liu; Beda Christoph Hammerschmidt; Doug McMahon; Ying Liu; Hui Joe Chang

Oracle release 12cR1 supports JSON data management that enables users to store, index and query JSON data along with relational data. The integration of the JSON data model into the RDBMS allows a new paradigm of data management where data is storable, indexable and queryable without upfront schema definition. We call this new paradigm Flexible Schema Data Management (FSDM). In this paper, we present enhancements to Oracles JSON data management in the upcoming 12cR2 release. We present JSON DataGuide, an auto-computed dynamic soft schema for JSON collections that closes the functional gap between the fixed-schema SQL world and the schema-less NoSQL world. We present a self-contained query friendly binary format for encoding JSON (OSON) to close the query performance gap between schema-encoded relational data and schema free JSON textual data. The addition of these new features makes the Oracle RDBMS well suited to both fixedschema SQL and flexible-schema NoSQL use cases, and allows users to freely mix the two paradigms in a single data management system.

very large data bases | 2008

Towards a physical XML independent XQuery/SQL/XML engine

Zhen Hua Liu; Sivasankaran Chandrasekar; Thomas Baby; Hui J. Chang

There has been a lot of research and industrial effort on building XQuery engines with different kinds of XML storage and index models. However, most of these efforts focus on building either an efficient XQuery engine with one kind of XML storage, index, view model in mind or a general XQuery engine without any consideration of the underlying XML storage, index and view model. We need an underlying framework to build an XQuery engine that can work with and provide optimization for different XML storage, index and view models. Besides XQuery, RDBMSs also support SQL/XML, a standard language that integrates XML and relational processing. There are industrial efforts for building hybrid XQuery and SQL/XML engines that support both languages so that users can manage and query both relational and XML data on one platform. However, we need a theoretical framework to optimize both SQL/XML and XQuery languages in one RDBMS. In this paper, we show our industrial work of building a combined XQuery and SQL/XML engine that is able to work and provide optimization for different kinds of XML storage and index models in Oracle XMLDB. This work is based on XML extended relational algebra as the underlying tuple-based logical algebra and incorporates tree and automata based physical algebra into the logical tuple-based algebra so as to provide optimization for different physical XML formulations. This results in logical and physical rewrite techniques to optimize XQuery and SQL/XML over a variety of physical XML storage, index and view models, including schema aware object relational XML storage with relational indexes, binary XML storage with schema agnostic path-value-order key XMLIndex, SQL/XML view over relational data and relational view over XML. Furthermore, we show the approach of leveraging cost based XML physical rewrite strategy to evaluate different physical rewrite plans.

international conference on data engineering | 2007

XMLTable Index - An Efficient Way of Indexing and Querying XML Property Data

Zhen Hua Liu; Muralidhar Krishnaprasad; Hui J. Chang; Vikas Arora

Efficiently storing and querying XML has been widely studied in research and industrial settings. Major RDBMS vendors now support XML as a native datatype in their systems and provide physical means of storing schema agnostic XML data. Typically this data is stored in CLOBs, BLOBs, or tree forms, with path or value indices used to efficiently process XQuery and SQL queries. However, in many use case queries derived from industrial XML applications, we find that it is very common to query XML based on a group of related property data and to query on the master-detail relationships using the SQL XMLTable construct. We propose an indexing mechanism called the XMLTable Index which is more efficient than the path and value index approach for this class of queries, and provides a way to efficiently process these queries over any physical XML storage form. The XMLTable Index complements the path/value index approach, and can be enhanced in its capabilities by using it in conjunction with path, value, text and other domain indices.

very large data bases | 2015

Lenses: an on-demand approach to ETL

Ying Yang; Niccolo' Meneghetti; Ronny Fehling; Zhen Hua Liu; Oliver Kennedy

Three mentalities have emerged in analytics. One view holds that reliable analytics is impossible without high-quality data, and relies on heavy-duty ETL processes and upfront data curation to provide it. The second view takes a more ad-hoc approach, collecting data into a data lake, and placing responsibility for data quality on the analyst querying it. A third, on-demand approach has emerged over the past decade in the form of numerous systems like Paygo or HLog, which allow for incremental curation of the data and help analysts to make principled trade-offs between data quality and effort. Though quite useful in isolation, these systems target only specific quality problems (e.g., Paygo targets only schema matching and entity resolution). In this paper, we explore the design of a general, extensible infrastructure for on-demand curation that is based on probabilistic query processing. We illustrate its generality through examples and show how such an infrastructure can be used to gracefully make existing ETL workflows on-demand. Finally, we present a user interface for On-Demand ETL and address ensuing challenges, including that of efficiently ranking potential data curation tasks. Our experimental results show that On-Demand ETL is feasible and that our greedy ranking strategy for curation tasks, called CPI, is effective.

Explore More