Is this you? Create Your Porfile

Benjamin Nguyen

French Institute for Research in Computer Science and Automation

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benjamin Nguyen is active.

Explore More

Publication

Featured researches published by Benjamin Nguyen.

international conference on management of data | 2001

Monitoring XML data on the Web

Benjamin Nguyen; Serge Abiteboul; Gregory Cobena; Mihai Preda

We consider the monitoring of a flow of incoming documents. More precisely, we present here the monitoring used in a very large warehouse built from XML documents found on the web. The flow of documents consists in XML pages (that are warehoused) and HTML pages (that are not). Our contributions are the following:a subscription language which specifies the monitoring of pages when fetched, the periodical evaluation of continuous queries and the production of XML reports. the description of the architecture of the system we implemented that makes it possible to monitor a flow of millions of pages per day with millions of subscriptions on a single PC, and scales up by using more machines. a new algorithm for processing alerts that can be used in a wider context. We support monitoring at the page level (e.g., discovery of a new page within a certain semantic domain) as well as at the element level (e.g., insertion of a new electronic product in a catalog). This work is part of the Xyleme system. Xyleme is developed on a cluster of PCs under Linux with Corba communications. The part of the system described in this paper has been implemented. We mention first experiments.

very large data bases | 2003

THESUS: Organizing Web document collections based on link semantics

Maria Halkidi; Benjamin Nguyen; Iraklis Varlamis; Michalis Vazirgiannis

Abstract.The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into account the fact that these documents are connected to each other by links. We claim that a page’s classification is enriched by the detection of its incoming links’ semantics. This would enable effective browsing and enhance the validity of search results in the WWW context. Another aspect that is underaddressed and strictly related to the tasks of browsing and searching is the similarity of documents at the semantic level. The above observations lead us to the adoption of a hierarchy of concepts (ontology) and a thesaurus to exploit links and provide a better characterization of Web documents. The enhancement of document characterization makes operations such as clustering and labeling very interesting. To this end, we devised a system called THESUS. The system deals with an initial sets of Web documents, extracts keywords from all pages’ incoming links, and converts them to semantics by mapping them to a domain’s ontology. Then a clustering algorithm is applied to discover groups of Web documents. The effectiveness of the clustering process is based on the use of a novel similarity measure between documents characterized by sets of terms. Web documents are organized into thematic subsets based on their semantics. The subsets are then labeled, thereby enabling easier management (browsing, searching, querying) of the Web. In this article, we detail the process of this system and give an experimental analysis of its results.

very large data bases | 2010

Secure personal data servers: a vision paper

Tristan Allard; Nicolas Anciaux; Luc Bouganim; Yanli Guo; Lionel Le Folgoc; Benjamin Nguyen; Philippe Pucheral; Indrajit Ray; Indrakshi Ray; Shaoyi Yin

An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on internet companies to store their data and make them reliable and highly available through the internet. However, these benefits must be weighed against privacy risks incurred by centralization. This paper suggests a radically different way of considering the management of personal data. It builds upon the emergence of new portable and secure devices combining the security of smart cards and the storage capacity of NAND Flash chips. By embedding a full-fledged Personal Data Server in such devices, user control of how her sensitive data is shared by others (by whom, for how long, according to which rule, for which purpose) can be fully reestablished and convincingly enforced. To give sense to this vision, Personal Data Servers must be able to interoperate with external servers and must provide traditional database services like durability, availability, query facilities, transactions. This paper proposes an initial design for the Personal Data Server approach, identifies the main technical challenges associated with it and sketches preliminary solutions. We expect that this paper will open exciting perspectives for future database research.

IEEE Transactions on Knowledge and Data Engineering | 2004

THESUS, a closer view on Web content management enhanced with link semantics

Iraklis Varlamis; Michalis Vazirgiannis; Maria Halkidi; Benjamin Nguyen

With the unstoppable growth of the world wide Web, the great success of Web search engines, such as Google and AltaVista, users now turn to the Web whenever looking for information. However, many users are neophytes when it comes to computer science, yet they are often specialists of a certain domain. These users would like to add more semantics to guide their search through world wide Web material, whereas currently most search features are based on raw lexical content. We show how the use of the incoming links of a page can be used efficiently to classify a page in a concise manner. This enhances the browsing and querying of Web pages. We focus on the tools needed in order to manage the links and their semantics. We further process these links using a hierarchy of concepts, akin to an ontology, and a thesaurus. This work is demonstrated by an prototype system, called THESUS, that organizes thematic Web documents into semantic clusters. Our contributions are the following: 1) a model and language to exploit link semantics information, 2) the THESUS prototype system, 3) its innovative aspects and algorithms, more specifically, the novel similarity measure between Web documents applied to different clustering schemes (DB-Scan and COBWEB), and 4) a thorough experimental evaluation proving the value of our approach.

very large data bases | 2008

WebContent: efficient P2P Warehousing of web data

Serge Abiteboul; Tristan Allard; Philippe Chatalic; Georges Gardarin; A. Ghitescu; François Goasdoué; Ioana Manolescu; Benjamin Nguyen; M. Ouazara; A. Somani; Nicolas Travers; Gabriel Vasile; Spyros Zoupanos

We present the WebContent platform for managing distributed repositories of XML and semantic Web data. The platform allows integrating various data processing building blocks (crawling, translation, semantic annotation, full-text search, structured XML querying, and semantic querying), presented as Web services, into a large-scale efficient platform. Calls to various services are combined inside ActiveXML [8] documents, which are XML documents including service calls. An ActiveXML optimizer is used to: (i) efficiently distribute computations among sites; (ii) perform XQuery-specific optimizations by leveraging an algebraic XQuery optimizer; and (iii) given an XML query, chose among several distributed indices the most appropriate in order to answer the query.

extending database technology | 2014

Privacy-Preserving Query Execution using a Decentralized Architecture and Tamper Resistant Hardware

Quoc-Cuong To; Benjamin Nguyen; Philippe Pucheral

Current applications, from complex sensor systems (e.g. quantified self) to online e-markets acquire vast quantities of personal information which usually ends-up on central servers. Decentralized architectures, devised to help individuals keep full control of their data, hinder global treatments and queries, impeding the development of services of great interest. This paper promotes the idea of pushing the security to the edges of applications, through the use of secure hardware devices controlling the data at the place of their acquisition. To solve this problem, we propose secure distributed querying protocols based on the use of a tangible physical element of trust, reestablishing the capacity to perform global computations without revealing any sensitive information to central servers. There are two main problems when trying to support SQL in this context: perform joins and perform aggregations. In this paper, we study the subset of SQL queries without joins and show how to secure their execution in the presence of honest-but-curious attackers. Cost models and experiments demonstrate that this approach can scale to nationwide infrastructures.

Distributed and Parallel Databases | 2014

MET𝔸P: revisiting Privacy-Preserving Data Publishing using secure devices

Tristan Allard; Benjamin Nguyen; Philippe Pucheral

The goal of Privacy-Preserving Data Publishing (PPDP) is to generate a sanitized (i.e. harmless) view of sensitive personal data (e.g. a health survey), to be released to some agencies or simply the public. However, traditional PPDP practices all make the assumption that the process is run on a trusted central server. In this article, we argue that the trust assumption on the central server is far too strong. We propose Met𝔸P, a generic fully distributed protocol, to execute various forms of PPDP algorithms on an asymmetric architecture composed of low power secure devices and a powerful but untrusted infrastructure. We show that this protocol is both correct and secure against honest-but-curious or malicious adversaries. Finally, we provide an experimental validation showing that this protocol can support PPDP processes scaling up to nation-wide surveys.

very large data bases | 2014

SQL/AA: executing SQL on an asymmetric architecture

Quoc-Cuong To; Benjamin Nguyen; Philippe Pucheral

Current applications, from complex sensor systems (e.g. quantified self) to online e-markets acquire vast quantities of personal information which usually end-up on central servers. This information represents an unprecedented potential for user customized applications and business (e.g., car insurance billing, carbon tax, traffic decongestion, resource optimization in smart grids, healthcare surveillance, participatory sensing). However, the PRISM affair has shown that public opinion is starting to wonder whether these new services are not bringing us closer to science fiction dystopias. It has become clear that centralizing and processing all ones data on a single server is a major problem with regards to privacy concerns. Conversely, decentralized architectures, devised to help individuals keep full control of their data, complexify global treatments and queries, often impeding the development of innovative services and applications.

international database engineering and applications symposium | 2001

Xyleme, a dynamic warehouse for XML data of the Web

Serge Abiteboul; Vincent Aguilera; S. Ailleret; Bernd Amann; F. Arambarri; Sophie Cluet; Gregory Cobena; G. Corona; Guy Ferran; Alban Galland; M. Hascoet; C.-C. Kanne; B. Koechlin; D. Le Niniven; Amélie Marian; Laurent Mignet; Guido Moerkotte; Benjamin Nguyen; Mihai Preda; Marie-Christine Rousset; M. Sebag; J.-P. Sirot; Pierangelo Veltri; Dan Vodislav; F. Watez; Till Westmann

The current development of the Web and the generalization of XML technology provide a major opportunity which can radically change the face of the Web. Xyleme intends to be a leader of this revolution by providing database services over the XML data of the Web. Originally, Xyleme was a research project functioning as an open, loosely coupled network of researchers. At the end of 2000, a prototype had been implemented. A start-up company, also called Xyleme, is now turning into a product. The authors summarize the main research efforts of the Xyleme team. They concern: a scalable architecture; the efficient storage of huge quantities of XML data (hundreds of millions of pages); XML query processing with full-text and structural indexing; data acquisition strategies to build the repository and keep it up-to-date; change control with services such as query subscription; and semantic data integration to free users from having to deal with many specific DTDs when expressing queries.

conference on privacy, security and trust | 2012

Limiting data collection in application forms: A real-case application of a founding privacy principle

Nicolas Anciaux; Benjamin Nguyen; Michalis Vazirgiannis

Application forms are often used by companies and administrations to collect personal data about applicants and tailor services to their specific situation. For example, taxes rates, social care, or personal loans, are usually calibrated based on a set of personal data collected through application forms. In the eyes of privacy laws and directives, the set of personal data collected to achieve a service must be restricted to the minimum necessary. This reduces the impact of data breaches both in the interest of service providers and applicants. In this article, we study the problem of limiting data collection in those application forms, used to collect data and subsequently feed decision making processes. In practice, the set of data collected is far excessive because application forms are filled in without any means to know what data will really impact the decision. To overcome this problem, we propose a reverse approach, where the set of strictly required data items to fill in the application form can be computed on the users side. We formalize the underlying NP Hard optimization problem, propose algorithms to compute a solution, and validate them with experiments. Our proposal leads to a significant reduction of the quantity of personal data filled in application forms while still reaching the same decision.

Explore More