Daniela Florescu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniela Florescu is active.

Explore More

Publication

Featured researches published by Daniela Florescu.

international conference on management of data | 1998

Database techniques for the World-Wide Web: a survey

Daniela Florescu; Alon Y. Levy; Alberto O. Mendelzon

The popularity of the World-Wide Web (WWW) has made it a prime vehicle for disseminating information. The relevance of database concepts to the problems of managing and querying this information has led to a signi cant body of recent research addressing these problems. Even though the underlying challenge is the one that has been traditionally addressed by the database community { how to manage large volumes of data { the novel context of the WWW forces us to signi cantly extend previous techniques. The primary goal of this survey is to classify the di erent tasks to which database concepts have been applied, and to emphasize the technical innovations that were required to do so.

international conference on management of data | 1999

An adaptive query execution system for data integration

Zachary G. Ives; Daniela Florescu; Marc Friedman; Alon Y. Levy; Daniel S. Weld

Query processing in data integration occurs over network-bound, autonomous data sources. This requires extensions to traditional optimization and execution techniques for three reasons: there is an absence of quality statistics about the data, data transfer rates are unpredictable and bursty, and slow or unavailable data sources can often be replaced by overlapping or mirrored sources. This paper presents the Tukwila data integration system, designed to support adaptivity at its core using a two-pronged approach. Interleaved planning and execution with partial optimization allows Tukwila to quickly recover from decisions based on inaccurate estimates. During execution, Tukwila uses adaptive query operators such as the double pipelined hash join, which produces answers quickly, and the dynamic collector, which robustly and efficiently computes unions across overlapping data sources. We demonstrate that the Tukwila architecture extends previous innovations in adaptive execution (such as query scrambling, mid-execution re-optimization, and choose nodes), and we present experimental evidence that our techniques result in behavior desirable for a data integration system.

international workshop on the web and databases | 2000

Quilt: An XML Query Language for Heterogeneous Data Sources

Donald D. Chamberlin; Jonathan Robie; Daniela Florescu

The World Wide Web promises to transform human society by making virtually all types of information instantly available everywhere. Two prerequisites for this promise to be realized are a universal markup language and a universal query language. The power and flexibility of XML make it the leading candidate for a universal markup language. XML provides a way to label information from diverse data sources including structured and semi-structured documents, relational databases, and object repositories. Several XML-based query languages have been proposed, each oriented toward a specific category of information. Quilt is a new proposal that attempts to unify concepts from several of these query languages, resulting in a new language that exploits the full versatility of XML. The name Quilt suggests both the way in which features from several languages were assembled to make a new query language, and the way in which Quilt queries can combine information from diverse data sources into a query result with a new structure of its own.

international conference on management of data | 2008

Building a database on S3

Matthias Brantner; Daniela Florescu; David Graf; Donald Kossmann; Tim Kraska

There has been a great deal of hype about Amazons simple storage service (S3). S3 provides infinite scalability and high availability at low cost. Currently, S3 is used mostly to store multi-media documents (videos, photos, audio) which are shared by a community of people and rarely updated. The purpose of this paper is to demonstrate the opportunities and limitations of using S3 as a storage system for general-purpose database applications which involve small objects and frequent updates. Read, write, and commit protocols are presented. Furthermore, the cost (

international world wide web conferences | 2000

Integrating keyword search into XML query processing

Daniela Florescu; Donald Kossmann; Ioana Manolescu

), performance, and consistency properties of such a storage system are studied.

international conference on management of data | 2000

AJAX: an extensible data cleaning tool

Helena Galhardas; Daniela Florescu; Dennis E. Shasha; Eric Simon

Abstract Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data of which the structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial RDBMS; we then discuss implementation choices and performance results.

international conference on management of data | 1997

A query language for a Web-site management system

Mary F. Fernández; Daniela Florescu; Alon Y. Levy; Dan Suciu

@@@@ groups together matching pairs with a high similarity value by applying a given grouping criteria (e.g. by transitive closure). Finally, ging collapses each individual cluster into a tuple of the resulting data source. AJAX provides @@@@ for specifying data cleaning programs, which consists of SQL statements enriched with a set of specific primitives to express these transformations. AJAX also @@@@. It allows the user to interact with an executing data cleaning program to handle exceptional cases and to inspect intermediate results. Finally, AJAX provides @@@@ @@@@ that permits users to determine the source and processing of data for debugging purposes. We will present the AJAX system applied to two real world problems: the consolidation of a telecommunication database, and the conversion of a dirty database of bibliographic references into a set of clean, normalized, and redundancy free relational tables maintaining the same data.

international conference on management of data | 1999

Query optimization in the presence of limited access patterns

Daniela Florescu; Alon Y. Levy; Ioana Manolescu; Dan Suciu

We have designed a system, called STRUDEL, which applies familiar concepts from database management systems, to the process of building web sites. The main motivation for developing STRUDEL is the observation that with current technology, creating and managing large sites is tedious, because a site designer must simultaneously perform (at least) three tasks: (1) choosing what information will be available at the site, (2) organizing that information in individual pages or in graphs of linked pages, and (3) specifying the visual presentation of pages in HTML. Furthermore, since there is no separation between the physical organization of the information underlying a web site and the logical view we have on it, changing or restructuring a site are unwieldy tasks. In STRUDEL, the web site manager can separate the logical view of information available at a web site, the structure of that information in linked pages, and the graphical presentation of pages in HTML. First, the site builder defines independently the data that will be available at the site. This process may require creating an integrated view of data from multiple (external) sources. Second, the site builder defines the structure of the web-site. The structure is defined as a view over the underlying information, and different versions of the site can be defined by specifying multiple views. Finally, the graphical representation of the pages in the web site is specified. This paper describes the query language that lies at the heart of the STRUDEL system. In STRUDEL, we model the da ta at the different levels as graphs. That is, the data in the external sources, the da ta in the integrated view and the web-site itself are modeled as graphs. A graph model is appropriate because site da ta may be derived from multiple sources, such as existing database systems and HTML files. Consequently, our system requires a query language for (1) defining the integrated view of the data, and (2) defining the structure of web sites. An important requirement of our query language is that it be able to construct graphs. Our query processor needs to be able to answer queries tha t involve accessing different da ta sources. Even though we model the sources as containing graphs, we cannot assume they have a uniform representation of graphs. Hence, our query processor needs to adhere to possible limitations on access to data in the graphs, and should be able to exploit additional querying capabilities that an external source may have. We have designed a general framework for processing STRUDEL queries over multiple unstructured data sources, and are designing optimizations that use the capabilities of external sources whenever possible. The purpose of this paper is to describe the syntax and semantics of STRUQL, the query language at the core of STRUDEL. We believe that STRuQL is a language of independent interest, and is useful for other applications involving the management of semistructured data, as well as a view definition language for such data. We discuss the relationship of STRUQL to other languages proposed in the li terature in Section 6: see [Abi97, Bun97].

very large data bases | 2003

The BEA/XQRL streaming XQuery processor

Daniela Florescu; Chris Hillery; Donald Kossmann; Paul J. Lucas; Fabio Riccardi; Till Westmann; Michael J. Carey; Arvind Sundararajan; Geetika Agrawal

We consider the problem of query optimization in the presence of limitations on access patterns to the data (i.e., when one must provide values for one of the attributes of a relation in order to obtain tuples). We show that in the presence of limited access patterns we must search a space of annotated query plans, where the annotations describe the inputs that must be given to the plan. We describe a theoretical and experimental analysis of the resulting search space and a novel query optimization algorithm that is designed to perform well under the different conditions that may arise. The algorithm searches the set of annotated query plans, pruning invalid and non-viable plans as early as possible in the search space, and it also uses a best-first search strategy in order to produce a first complete plan early in the search. We describe experiments to illustrate the performance of our algorithm.

symposium on principles of database systems | 1998

Query containment for conjunctive queries with regular expressions

Daniela Florescu; Alon Y. Levy; Dan Suciu

In this paper, we describe the design, implementation, and performance characteristics of a complete, industrial-strength XQuery engine, the BEA streaming XQuery processor. The engine was designed to provide very high performance for message processing applications, i.e., for transforming XML data streams, and it is a central component of the 8.1 release of BEAs WebLogic Integration (WLI) product. This XQuery engine is fully compliant with the August 2002 draft of the W3C XML Query Language specification. A goal of this paper is to describe how an efficient, fully compliant XQuery engine can be built from a few relatively simple components and well-understood technologies.

Explore More