Anthony Tomasic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anthony Tomasic is active.

Explore More

Publication

Featured researches published by Anthony Tomasic.

ACM Transactions on Database Systems | 1999

GlOSS : text-source discovery over the Internet

Luis Gravano; Hector Garcia-Molina; Anthony Tomasic

The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS, Glossary of Servers Server, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.

international conference on distributed computing systems | 1996

Scaling heterogeneous databases and the design of Disco

Anthony Tomasic; Louiqa Raschid; Patrick Valduriez

Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating new sources into the model. Database implementers must deal with the translation of queries between query languages and schemas. The Distributed Information Search COmponent (Disco) addresses these problems. Query processing semantics are developed to process queries over data sources which do not return answers. Data modeling techniques manage connections to data sources. The component interface to data sources flexibly handles different query languages and translates queries. This paper describes (a) the distributed mediator architecture of Disco, (b) its query processing semantics, (C) the data model and its modeling of data source connections, and (d) the interface to underlying data sources.

IEEE Transactions on Knowledge and Data Engineering | 1998

Scaling access to heterogeneous data sources with DISCO

Anthony Tomasic; Louiqa Raschid; Patrick Valduriez

Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be significantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementers must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent (Disco) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources flexibly handles different query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are efficiently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article, we describe: 1) the distributed mediator architecture of Disco; 2) the data model and its modeling of data source connections; 3) the interface to underlying data sources and the query rewriting process; and 4) query processing semantics. We describe several advantages of our system.

international conference on management of data | 1994

The effectiveness of GIOSS for the text database discovery problem

Luis Gravano; Hector Garcia-Molina; Anthony Tomasic

The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS—Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.

international conference on management of data | 1994

Incremental updates of inverted lists for text document retrieval

Anthony Tomasic; Hector Garcia-Molina; Kurt A. Shoens

With the proliferation of the worlds “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

international conference on parallel and distributed information systems | 1993

Performance of inverted indices in shared-nothing distributed text document information retrieval systems

Anthony Tomasic; Hector Garcia-Molina

The impact on query processing performance of various physical organizations for inverted lists is compared. A probabilistic mode of the database and queries is introduced. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design tradeoffs over a range of hardware configurations and new parallel query processing strategies.<<ETX>>

human factors in computing systems | 2011

Field trial of Tiramisu: crowd-sourcing bus arrival times to spur co-design

John Zimmerman; Anthony Tomasic; Charles Garrod; Daisy Yoo; Chaya Hiruncharoenvate; Rafae Dar Aziz; Nikhil Thiruvengadam; Yun Huang; Aaron Steinfeld

Crowd-sourcing social computing systems represent a new material for HCI designers. However, these systems are difficult to work with and to prototype, because they require a critical mass of participants to investigate social behavior. Service design is an emerging research area that focuses on how customers co-produce the services that they use, and thus it appears to be a great domain to apply this new material. To investigate this relationship, we developed Tiramisu, a transit information system where commuters share GPS traces and submit problem reports. Tiramisu processes incoming traces and generates real-time arrival time predictions for buses. We conducted a field trial with 28 participants. In this paper we report on the results and reflect on the use of field trials to evaluate crowd-sourcing prototypes and on how crowd sourcing can generate co-production between citizens and public services.

international conference on parallel and distributed information systems | 1996

Scrambling query plans to cope with unexpected delays

L. Amsaleg; Anthony Tomasic; Michael J. Franklin; Tolga Urhan

Accessing data from numerous widely distributed sources poses significant new challenges for query optimization and execution. Congestion and failures in the network can introduce highly variable response times for wide area data access. The paper is an initial exploration of solutions to this variability. We introduce a class of dynamic, run time query plan modification techniques that we call query plan scrambling. We present an algorithm that modifies execution plans on-the-fly in response to unexpected delays in obtaining initial requested tuples from remote sources. The algorithm both reschedules operators and introduces new operators into the query plan. We present simulation results that demonstrate how the technique effectively hides delays by performing other useful work while waiting for missing data to arrive.

international conference on management of data | 1997

The distributed information search component (Disco) and the World Wide Web

Anthony Tomasic; Rémy Amouroux; Philippe Bonnet; Olga Kapitskaia; Hubert Naacke; Louiqa Raschid

The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The DISCO prototype currently focuses on three central research problems in the context of these systems. First, since the capabilities of each data source is different, transforming queries into subqueries on data source is difficult. We call this problem the weak data source problem. Second, since each data source performs operations in a generally unique way, the cost for performing an operation may vary radically from one wrapper to another. We call this problem the radical cost problem. Finally, existing systems behave rudely when attempting to access an unavailable data source. We call this problem the ungraceful failure problem. DISCO copes with these problems. For the weak data source problem, the database implementor defines precisely the capabilities of each data source. For the radical cost problem, the database implementor (optionally) defines cost information for some of the operations of a data source. The mediator uses this cost information to improve its cost model. To deal with ungraceful failures, queries return partial answers. A partial answer contains the part of the final answer to the query that was produced by the available data sources. The current working prototype of DISCO contains implementations of these solutions and operations over a collection of wrappers that access information both in files and on the World Wide Web.

international conference on data engineering | 1998

Leveraging mediator cost models with heterogeneous data sources

Hubert Naacke; Georges Gardarin; Anthony Tomasic

Distributed systems require declarative access to diverse information sources. One approach to solving this heterogeneous distributed database problem is based on mediator architectures. In these architectures, mediators accept queries from users, process them with respect to wrappers, and return answers. Wrappers provide access to underlying sources. To efficiently process queries, the mediator must optimize the plan used for processing the query. In classical databases, cost-estimate based query optimization is effective. In a heterogeneous distributed databases, cost-estimate based query optimization is difficult to achieve because the underlying data sources do not export cost information. This paper describes a new method that permits the wrapper programmer to export cost estimates. For the wrapper programmer to describe all cost estimates may be impossible due to lack of information or burdensome due to the amount of information. We ease this responsibility of the wrapper programmer by leveraging the generic cost model of the mediator with specific cost estimates from the wrappers.

Explore More