Amélie Marian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amélie Marian is active.

Explore More

Publication

Featured researches published by Amélie Marian.

international conference on data engineering | 2002

Detecting changes in XML documents

Gregory Cobena; Serge Abiteboul; Amélie Marian

We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.

very large data bases | 2003

Projecting XML documents

Amélie Marian; Jérôme Siméon

XQuery is not only useful to query XML in databases, but also to applications that must process XML documents as files or streams. These applications suffer from the limitations of current main-memory XQuery processors which break for rather small documents. In this paper we propose techniques, based on a notion of projection for XML, which can be used to drastically reduce memory requirements in XQuery processors. The main contribution of the paper is a static analysis technique that can identify at compile time which parts of the input document are needed to answer an arbitrary XQuery. We present a loading algorithm that takes the resulting information to build a projected document, which is smaller than the original document, and on which the query yields the same result. We implemented projection in the Galax XQuery processor. Our experiments show that projection reduces memory requirements by a factor of 20 on average, and is effective for a wide variety of queries. In addition, projection results in some speedup during query evaluation.

IEEE Transactions on Knowledge and Data Engineering | 2004

Optimizing top-k selection queries over multimedia repositories

Surajit Chaudhuri; Luis Gravano; Amélie Marian

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically, request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted.

very large data bases | 2003

Implementing XQuery 1.0: the Galax experience

Mary F. Fernández; Jérôme Siméon; Byron Choi; Amélie Marian; Gargi Sur

Galax is a light-weight, portable, open-source implementation of XQuery 1.0. Started in December 2000 as a small prototype designed to test the XQuery static type system, Galax has now become a solid implementation, aiming at full conformance with the family of XQuery 1.0 specifications. Because of its completeness and open architecture, Galax also turns out to be a very convenient platform for researchers interested in experimenting with XQuery optimization. We demonstrate the Galax system as well as its most advanced features, including support for XPath 2.0, XML Schema and static type-checking. We also present some of our first experiments with optimization. Notably, we demonstrate query rewriting capabilities in the Galax compiler, and the ability to run queries on documents up to a Gigabyte without the need for preindexing. Although early versions of Galax have been shown in industrial conferences over the last two years, this is the first time it is demonstrated in the database community.

international conference on data engineering | 2005

Adaptive processing of top-k queries in XML

Amélie Marian; Sihem Amer-Yahia; Nick Koudas; Divesh Srivastava

The ability to compute top-k matches to XML queries is gaining importance due to the increasing number of large XML repositories. The efficiency of top-k query evaluation relies on using scores to prune irrelevant answers as early as possible in the evaluation process. In this context, evaluating the same query plan for all answers might be too rigid because, at any time in the evaluation, answers have gone through the same number and sequence of operations, which limits the speed at which scores grow. Therefore, adaptive query processing that permits different plans for different partial matches and maximizes the best scores is more appropriate. In this paper, we propose an architecture and adaptive algorithms for efficiently computing top-k matches to XML queries. Our techniques can be used to evaluate both exact and approximate matches where approximation is defined by relaxing XPath axes. In order to compute the scores of query answers, we extend the traditional tf*idf measure to account for document structure. We conduct extensive experiments on a variety of benchmark data and queries, and demonstrate the usefulness of the adaptive approach for computing top-k queries in XML.

Information Systems | 2013

Improving the quality of predictions using textual information in online user reviews

Gayatree Ganu; Yogesh Kakodkar; Amélie Marian

Online reviews are often accessed by users deciding to buy a product, see a movie, or go to a restaurant. However, most reviews are written in a free-text format, usually with very scant structured metadata information and are therefore difficult for computers to understand, analyze, and aggregate. Users then face the daunting task of accessing and reading a large quantity of reviews to discover potentially useful information. We identified topical and sentiment information from free-form text reviews, and use this knowledge to improve user experience in accessing reviews. Specifically, we focus on improving recommendation accuracy in a restaurant review scenario. We propose methods to derive a text-based rating from the body of the reviews. We then group similar users together using soft clustering techniques based on the topics and sentiments that appear in the reviews. Our results show that using textual information results in better review score predictions than those derived from the coarse numerical star ratings given by the users. In addition, we use our techniques to make fine-grained predictions of user sentiments towards the individual topics covered in reviews with good accuracy.

Information Systems | 2011

A framework for corroborating answers from multiple web sources

Minji Wu; Amélie Marian

Search engines are increasingly efficient at identifying the best sources for any given keyword query, and are often able to identify the answer within the sources. Unfortunately, many web sources are not trustworthy, because of erroneous, misleading, biased, or outdated information. In many cases, users are not satisfied with the results from any single source. In this paper, we propose a framework to aggregate query results from different sources in order to save users the hassle of individually checking query-related web sites to corroborate answers. To return the best answers to the users, we assign a score to each individual answer by taking into account the number, relevance and originality of the sources reporting the answer, as well as the prominence of the answer within the sources, and aggregate the scores of similar answers. We conducted extensive qualitative and quantitative experiments of our corroboration techniques on queries extracted from the TREC Question Answering track and from a log of real web search engine queries. Our results show that taking into account the quality of web pages and answers extracted from the pages in a corroborative way results in the identification of a correct answer for a majority of queries.

extending database technology | 2008

Multi-dimensional search for personal information management systems

Christopher Peery; Wei Wang; Amélie Marian; Thu D. Nguyen

With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a need for complex search tools to retrieve often very heterogeneous data in a simple and efficient way. Existing tools usually index text content, allowing for some IR-style ranking on the textual part of the query, but only consider structure (e.g., file directory) and metadata (e.g., date, file type) as filtering conditions. We propose a novel multi-dimensional approach to semi-structured data searches in personal information management systems by allowing users to provide fuzzy structure and metadata conditions in addition to keyword conditions. Our techniques provide a complex query interface that is more comprehensive than content-only searches as it considers three query dimensions (content, structure, metadata) in the search. We propose techniques to individually score each dimension, as well as a framework to integrate the three dimension scores into a meaningful unified score. Our work is integrated in Wayfinder, an existing fully-functioning file system. We perform a thorough experimental evaluation of our techniques to show the effect of approximating individual dimensions on the overall scores and ranks of files, as well as on query performance. Our experiments show that our scoring strategy adequately takes into account the approximation in each dimension to efficiently evaluate fuzzy multi-dimensional queries. In addition, fuzzy query conditions in non-content dimensions can significantly improve scoring (and thus ranking) accuracy.

international conference on data engineering | 2011

Social networking on top of the WebdamExchange system

Émilien Antoine; Alban Galland; Kristian Lyngbaek; Amélie Marian; Neoklis Polyzotis

The demonstration presents the WebdamExchange system, a distributed knowledge base management system with access rights, localization and provenance. This system is based on the exchange of logical statements that describe documents, collections, access rights, keys and localization information and updates of this data. We illustrate how the model can be used in a social-network context to help users keep control on their data on the web. In particular, we show how users within very different schemes of data-distribution (centralized, dht, unstructured P2P, etc.) can still transparently collaborate while keeping a good control over their own data.

conference on information and knowledge management | 2013

One size does not fit all: multi-granularity search of web forums

Gayatree Ganu; Amélie Marian

Users rely increasingly on online forums, blogs, and mailing lists to exchange information, practical tips, and stories. Although this type of social interaction has become central to our daily lives and decision-making processes, forums are surprisingly technologically poor: often there is no choice but to browse through massive numbers of posts while looking for specific information. A critical challenge then for forum search is to provide results that are as complete as possible and that do not miss some relevant information but that are not too broad. In this paper, we address the problem of presenting textual search results in a concise manner to answer user needs. Specifically, we propose a new search approach over free-form text in forums that allows for the search results to be returned at varying granularity levels. We implement a novel hierarchical representation and scoring technique for objects at multiple granularities, taking into account the inherent containment relationship provided by the hierarchy. We also present a score optimization algorithm that efficiently chooses the best k-sized result set while ensuring no overlap between the results. We evaluate the effectiveness of multi-granularity search by conducting extensive user studies and show that a mixed granularity set of results is more relevant to users than standard post-only approaches.

Explore More