Arnaud Sahuguet
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arnaud Sahuguet.
data and knowledge engineering | 2001
Arnaud Sahuguet; Fabien Azavant
Abstract The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier.
international conference on management of data | 2000
Lucian Popa; Alin Deutsch; Arnaud Sahuguet; Val Tannen
In a previous paper we proposed a novel method for generating alternative query plans that uses chasing (and back-chasing) with logical constraints. The method brings together use of indexes, use of materialized views, semantic optimization and join elimination (minimization). Each of these techniques is known separately to be beneficial to query optimization. The novelty of our approach is in allowing these techniques to interact systematically, eg. non-trivial use of indexes and materialized views may be enabled only by semantic constraints. We have implemented our method for a variety of schemas and queries. We examine how far we can push the method in term of complexity of both schemas and queries. We propose a technique for reducing the size of the search space by “stratifying” the sets of constraints used in the (back)chase. The experimental results demonstrate that our method is practical (i.e., feasible and worthwhile).
british national conference on databases | 2002
Richard Hull; Bharat Kumar; Arnaud Sahuguet; Ming Xiong
This paper surveys recent trends in network-hosted end-user services, with an emphasis on the increasing complexity of those services due to the convergence of the conventional telephony, wireless, and data networks. In order to take full advantage of these services, private and corporate users will need personalized ways of accessing them. Support for this personalization will involve advances in both data and policy management.
international workshop on the web and databases | 2000
Arnaud Sahuguet
For the last two years, XML has become an increasingly popular data-format unanimously accepted by a lot of different communities. In this paper, we present some preliminary results that explore how XML DTDs are actually being used. By studying some publicly available DTDs, we look at how people are actually (mis)using DTDs, show some shortcomings, list some requirements and discuss possible replacements.
international conference on management of data | 1998
Zoé Lacroix; Arnaud Sahuguet; Raman Chandrasekar
Standard database approaches to querying information on the Web focus on the source(s) and provide a query language based on a given predefined organization (schema) of the data: this is the source-driven approach. However, can the Web be seen as a standard database? There is no super-user in charge of monitoring the source(s) (the data is constantly updated), there is no homogeneous structure (no common explicit structure thus), the Web itself never stops growing, etc. For these reasons, we believe that the source-driven standard approach is not suitable to the Web. As an alternative, we propose a user-oriented approach based on the idea that the schema is a posteriori expressed by the users needs when asking a query. Given a user query, AKIRA (Agentive Knowledge-based Information Retrieval Architecture) [6] extracts a target structure (structure expressed in the query) and uses standard information retrieval and filtering techniques to access potentially relevant documents. The user-oriented paradigm means that the structure through which the data is viewed does not come from the source but is extracted from the user query. When a user asks a query, the relevant information is retrieved from the Web and stored as is in a cache. Then the information is extracted from the raw data using computational linguistic techniques. The AKIRA cache (smart-cache) represents these extracted layers of meta-information on top of the raw data. The smart-cache is an object-oriented database whose schema is inferred from the users target structure. It is designed on demand through a library of concepts that can be assembled together to match concepts and meta-concepts required in the users query. The smart cache can be seen as a view of the Web. To the best of our knowledge, AKIRA is the only system that uses information retrieval and extraction integrated with database techniques to provide maximum flexibility to the user and offer transparent access to the content of Web documents.
International Journal of Cooperative Information Systems | 2005
Richard Hull; Bharat Kumar; Daniel F. Lieuwen; Peter F. Patel-Schneider; Arnaud Sahuguet; Sriram Varadarajan; Avinash Vyas
The web and converged services paradigm promises tremendous flexibility in the creation of rich composite services for enterprises and end users. The flexibility and richness offers the possibility of highly customized, individualized services for the end user and hence revenue generating services for service providers (e.g. ASPs, telecom network operators, ISPs). But how can end users (and enterprises) specify their preferences when a myriad of possibilities and potential circumstances need to be addressed? In this paper, we advocate a solution based on policy management where user preferences are specified through forms but translated into rules in a high-level policy language. This paper identifies the requirements for this kind of interpretation, and describes the Houdini system (developed at Bell Labs) which offers a rich rule-based language and a framework that supports intuitive, forms-based provisioning interfaces.
conference on advanced information systems engineering | 1998
Zoé Lacroix; Arnaud Sahuguet; Raman Chandrasekar
We propose a novel approach to querying the Web with a system named AKIRA (Agentive Knowledge-based Information Retrieval Architecture) which combines advanced technologies from Information Retrieval and Extraction together with Database techniques. The former enable the system to access the explicit as well as the implicit structure of Web documents and organize them into a hierarchy of concepts and metaconcepts; the latter provide tools for data-manipulation. We propose a useroriented approach: given the users query, AKIRA extracts a target structure (structure expressed in the query) and uses standard retrieval techniques to access potentially relevant documents. The content of these documents is processed using extraction techniques (along with a flexible agentive structure) to filter for relevance and to extract from them implicit or explicit structure matching the target structure. The information garnered is used to populate a smart-cache (an object-oriented database) whose schema is inferred from the target structure. This smart-cache, whose schema is thus defined a posteriori, is populated and queried with an expression of PIQL, our query language. AKIRA integrates these complementary techniques to provide maximum flexibility to the user and offer transparent access to the content of Web documents.
very large data bases | 1999
Arnaud Sahuguet; Fabien Azavant
Archive | 2000
Lucian Popa; Val Tannen; Alin Deutsch; Arnaud Sahuguet
international workshop on the web and databases | 2000
Arnaud Sahuguet