Brad Adelberg
Northwestern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Brad Adelberg.
international conference on management of data | 1998
Brad Adelberg
Often interesting structured or semistructured data is not in database systems but in HTML pages, text files, or on paper. The data in these formats is not usable by standard query processing engines and hence users need a way of extracting data from these sources into a DBMS or of writing wrappers around the sources. This paper describes NoDoSE, the Northwestern Document Structure Extractor, which is an interactive tool for semi-automatically determining the structure of such documents and then extracting their data. Using a GUI, the user hierarchically decomposes the file, outlining its interesting regions and then describing their semantics. This task is expedited by a mining component that attempts to infer the grammar of the file from the information the user has input so far. Once the format of a document has been determined, its data can be extracted into a number of useful forms. This paper describes both the NoDoSE architecture, which can be used as a test bed for structure mining algorithms in general, and the mining algorithms that have been developed by the author. The prototype, which is written in Java, is described and experiences parsing a variety of documents are reported.
international conference on data engineering | 1997
Wilburt Juan Labio; Dallan Quass; Brad Adelberg
Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views defined on the local relations to support common or complicated queries. Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting views and of indexes to materialize to minimize the down time. We call this the view index selection (VIS) problem. We present an A* search based solution to the problem as well as rules of thumb. We also perform additional experiments to understand the space-time tradeoff as it applies to data warehouses.
extending database technology | 1996
Brad Adelberg; Ben Kao; Hector Garcia-Molina
Derived data is maintained in a database system to correlate and summarize base data which record real world facts. As base data changes, derived data needs to be recomputed. A high performance system should execute all these updates and recomputations in a timely fashion so that the data remains fresh and useful, while at the same time executing user transactions quickly. This paper studies the intricate balance between recomputing derived data and transaction execution. Our focus is on efficient recomputation strategies — how and when recomputations should be done to reduce their cost without jeopardizing data timeliness. We propose the Forced Delay recomputation algorithm and show how it can exploit update locality to improve both data freshness and transaction response time.
international conference on management of data | 1996
Brad Adelberg; Ben Kao; Hector Garcia-Molina
We believe that the greatest growth potential for soft real-time databases is not as isolated monolithic databases but as components in open systems consisting of many heterogenous databases. In such environments, the flexibility to deal with unpredictable situations and the ability to cooperate with other databases (often non-real-time databases) is just as important as the guarantee of stringent timing constraints. In this paper, we describe a database designed explicitly for heterogeneous environments, the STanford Real-time Information Processor (STRIP). STRIP, which runs on standard Posix Unix, is a soft real-time main memory database with special facilities for importing and exporting data as well as handling derived data. We will describe the architecture of STRIP, its unique features, and its potential uses in overall system architectures.
IEEE Transactions on Computers | 2003
Ben Kao; Kam-Yiu Lam; Brad Adelberg; Reynold Cheng; Tony Lee
A real-time database system contains base data items which record and model a physical, real-world environment. For better decision support, base data items are summarized and correlated to derive ...A real-time database system contains base data items which record and model a physical, real-world environment. For better decision support, base data items are summarized and correlated to derive views. These base data and views are accessed by application transactions to generate the ultimate actions taken by the system. As the environment changes, updates are applied to base data, which subsequently trigger view recomputations. There are thus three types of activities: base data update, view recomputation, and transaction execution. In a real-time database system, two timing constraints need to be enforced. We require that transactions meet their deadlines (transaction timeliness) and read fresh data (data timeliness). In this paper, we define the concept of absolute and relative temporal consistency from the perspective of transactions for discrete data objects. We address the important issue of transaction scheduling among the three types of activities such that the two timing requirements can be met. We also discuss how a real-time database system should be designed to enforce different levels of temporal consistency.
international conference on management of data | 1997
Brad Adelberg; Hector Garcia-Molina; Jennifer Widom
Derived data is maintained in a database system to correlate and summarize base data which records real world facts. As base data changes, derived data needs to be recomputed. This is often implemented by writing active rules that are triggered by changes to base data. In a system with rapidly changing base data, a database with a standard rule system may consume most of its resources running rules to recompute data. This paper presents the rule system implemented as part of the STandard Real-time Information Processor (STRIP). The STRIP rule system is an extension of SQL3-type rules that allows groups of rule actions to be batched together to reduce the total recomputation load on the system. In this paper we describe the syntax and semantics of the STRIP rule system, present an example set of rules to maintain stock index and theoretical option prices in a program trading application, and report the results of experiments performed on the running system. The experiments verify that STRIPs rules allow much more efficient derived data maintenance than conventional rules without batching.
conference on information and knowledge management | 1999
Ben Kao; Kam-Yiu Lam; Brad Adelberg; Reynold Cheng; Tony Lee
A database system contains base data items which record and model a physical, real world environment. For better decision support, base data items are summarized and correlated to derive views. These base data and views are accessed by application transactions to generate the ultimate actions taken by the system. As the environment changes, updates are applied to the base data, which subsequently trigger view recomputations. There are thus three types of activities: base data update, view recomputation, and transaction execution. In a real-time system, two timing constrains need to be enforced. We require transactions meet their deadlines (transaction timeliness) and read fresh data (data timeliness). In this paper we define the concept of absolute and relative temporal consistency from the perspective of transactions. We address the important issue of transaction scheduling among the three types of activities such that the two timing requirements can be met. We also discuss how a real-time database system should be designed to enforce different levels of temporal consistency.
international conference on management of data | 1999
Brad Adelberg; Matthew Denny
This paper describes a tool, called Nodose, we have developed to expedite the creation of robust wrappers. Nodose allows non-programmers to build components that can convert data from the source format to XML or another generic format. Further, the generated code performs a set of statistical checks at runtime that attempt to find extraction errors before they are propogated back to users.
Proceedings of Third Workshop on Parallel and Distributed Real-Time Systems | 1995
Ben Kao; Hector Garcia-Molina; Brad Adelberg
When building a distributed real-time system, one can either build the whole system from scratch, or from pre-existing standard components. Although the former allows better scheduling design, it may not be economical in terms of the cost and time of development. This paper studies the performance of distributed soft real-time systems that use standard components with various scheduling algorithms and suggests ways to improve them.<<ETX>>
Real-Time Database Systems | 2002
Ben Kao; Kam-Yiu Lam; Brad Adelberg
Unlike the structured data organization in traditional relational database management system, the semi-structured data can be irregular and incomplete which is associated with schemas contained in the data itself. The materialized view for semi-structured data needs to be maintained in response to changes of base data. This report addresses the batch-mode updating problem in view maintenance over semi-structured data. We extend the base view maintenance algorithm from [1]. We propose a batch mode update algorithms that does not incur dependence problem when view is updated out of the order of base data updates. Our initial experiments show that when the number of updates is small, overhead of batch mode update as block I/O is on the same level of B+ tree index and hash index. And it takes around two orders of magnitude less access time than these two index structures when number of updates increase to more than 100,000. Additionally, it does not require special index structure if the semi-structured data is stored using relational database which saves the graph structured data by edge.