Is this you? Create Your Porfile

Pierre-Edouard Portier

Institut national des sciences Appliquées de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pierre-Edouard Portier is active.

Explore More

Publication

Featured researches published by Pierre-Edouard Portier.

Information Processing and Management | 2012

Modeling, encoding and querying multi-structured documents

Pierre-Edouard Portier; Noureddine Chatti; Sylvie Calabretto; Elöd Egyed-Zsigmond; Jean-Marie Pinon

The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure for the logical organization of content (logical structure), and a structure expressing a set of content formatting rules (physical structure). In this paper, we present MSDM, a generic model for multi-structured documents, in which several important features are established. We also address the problem of efficiently encoding multi-structured documents by introducing MultiX, a new XML formalism based on the MSDM model. Finally, we propose a library of Xquery functions for querying MultiX documents. We will illustrate all the contributions with a use case based on a fragment of an old manuscript.

Expert Systems With Applications | 2018

Sequence Classification for Credit-Card Fraud Detection

Johannes Jurgovsky; Michael Granitzer; Konstantin Ziegler; Sylvie Calabretto; Pierre-Edouard Portier; Liyun He-Guelton; Olivier Caelen

Abstract Due to the growing volume of electronic payments, the monetary strain of credit-card fraud is turning into a substantial challenge for financial institutions and service providers, thus forcing them to continuously improve their fraud detection systems. However, modern data-driven and learning-based methods, despite their popularity in other domains, only slowly find their way into business applications. In this paper, we phrase the fraud detection problem as a sequence classification task and employ Long Short-Term Memory (LSTM) networks to incorporate transaction sequences. We also integrate state-of-the-art feature aggregation strategies and report our results by means of traditional retrieval metrics. A comparison to a baseline random forest (RF) classifier showed that the LSTM improves detection accuracy on offline transactions where the card-holder is physically present at a merchant. Both the sequential and non-sequential learning approaches benefit strongly from manual feature aggregation strategies. A subsequent analysis of true positives revealed that both approaches tend to detect different frauds, which suggests a combination of the two. We conclude our study with a discussion on both practical and scientific challenges that remain unsolved.

workshops on enabling technologies: infrastracture for collaborative enterprises | 2017

Injecting Semantic Background Knowledge into Neural Networks using Graph Embeddings

Konstantin Ziegler; Olivier Caelen; Mathieu Garchery; Michael Granitzer; Liyun He-Guelton; Johannes Jurgovsky; Pierre-Edouard Portier; Stefan Zwicklbauer

The inferences of a machine learning algorithm are naturally limited by the available data. In many real-world applications, the provided internal data is domain-specific and we use external background knowledge to derive or add new features. Semantic networks, like linked open data, provide a largely unused treasure trove of background knowledge. This drives a recent surge of interest in unsupervised methods to automatically extract such semantic background knowledge and inject it into machine learning algorithms. In this work, we describe the general process of extracting knowledge from semantic networks through vector space embeddings. The locations in the vector space then reflect relations in the original semantic network. We perform this extraction for geographic background knowledge and inject it into a neural network for the complicated real-world task of credit-card fraud detection. This improves the performance by 11.2%.

european semantic web conference | 2015

Ranking Entities in the Age of Two Webs, an Application to Semantic Snippets

Mazen Alsarem; Pierre-Edouard Portier; Sylvie Calabretto; Harald Kosch

The advances of the Linked Open Data LOD initiative are giving rise to a more structured Web of data. Indeed, a few datasets act as hubs e.g., DBpedia connecting many other datasets. They also made possible new Web services for entity detection inside plain text e.g.,i¾?DBpedia Spotlight, thus allowing for new applications that can benefit from a combination of the Web of documents and the Web of data. To ease the emergence of these new applications, we propose a query-biased algorithm LDRANK for the ranking of web of data resources with associated textual data. Our algorithm combines link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset for the evaluation of query-biased ranking of Web of data resources detected in Web pages. We show that, on this dataset, LDRANK outperforms the state of the art. Finally, we use this algorithm for the construction of semantic snippets of which we evaluate the usefulness with a crowdsourcing-based approach.

document engineering | 2009

Creation and maintenance of multi-structured documents

Pierre-Edouard Portier; Sylvie Calabretto

In this article, we introduce a new problem: the construction of multi-structured documents. We first offer an overview of existing solutions to the representation of such documents. We then notice that none of them consider the problem of their construction. In this context, we use our experience with philosophers who are building a digital edition of the work of Jean-Toussaint Desanti, in order to present a methodology for the construction of multi-structured documents. This methodology is based on the MSDM model in order to represent such documents. Moreover each step of the methodology has been implemented in the Haskell functional programming language.

document engineering | 2016

Schema-aware Extended Annotation Graphs

Vincent Barrellon; Pierre-Edouard Portier; Sylvie Calabretto; Olivier Ferret

Multistructured (M-S) documents were introduced as an answer to the need of ever more expressive data models for scholarly annotation, as experienced in the frame of Digital Humanities. Many proposals go beyond XML, that is the gold standard for annotation, and allow the expression of multilevel, concurrent annotation. However, most of them lack support for algorithmic tasks like validation and querying, despite those being central in most of their application contexts. In this paper, we focus on two aspects of annotation: data model expressiveness and validation. We introduce extended Annotation Graphs (eAG), a highly expressive graph-based data model, fit for the enrichment of multimedia resources. Regarding validation of M-S documents, we identify algorithmic complexity as a limiting factor. We advocate that this limitation may be bypassed provided validation can be checked by construction, that is by constraining the shape of data during its very manufacture. So far as we know, no existing validation mechanism for graph-structured data meets this goal. We define here such a mechanism, based on the simulation relation, somehow following a track initiated in Dataguides. We prove that thanks to this mechanism, the validity of M-S data regarding a given schema can be guaranteed without any algorithmic check.

european conference on research and advanced technology for digital libraries | 2010

DINAH, a philological platform for the construction of multi-structured documents

Pierre-Edouard Portier; Sylvie Calabretto

We consider how the construction of multi-structured documents implies the definition of structuration vocabularies. In a multiusers context, the growth of these vocabularies has to be controlled. Therefore, we propose using the trace of users activity to limit this growth and document the vocabularies. A user will, for example, be able to follow and annotate the track of a vocabulary concept: from its creation to the last time it was used. From a broader point of view, this work is grounded on our Web based philological platform, DINAH, and is mainly motivated by our collaboration with a group of philosophers studying the handwritten manuscripts of Jean-Toussaint Desanti.

arXiv: Information Retrieval | 2015

Ordonnancement d'entités pour la rencontre du web des documents et du web des données

Mazen Alsarem; Pierre-Edouard Portier; Sylvie Calabretto; Harald Kosch

The advances of the Linked Open Data (LOD) initiative are giving rise to a more structured web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that will benefit from a combination of the web of documents and the web of data. To ease the emergence of these new use-cases, we propose a query-biased algorithm for the ranking of entities detected inside a web page. Our algorithm combine link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset on which we compare our algorithm to the state of the art. Finally, we use this algorithm for the construction of semantic snippets for which we evaluate the usability and the usefulness with a crowdsourcing-based approach.

european semantic web conference | 2014

Making Use of Linked Data for Generating Enhanced Snippets

Mazen Alsarem; Pierre-Edouard Portier; Sylvie Calabretto; Harald Kosch

We enhance an existing search engine’s snippet (i.e. excerpt from a web page determined at query-time in order to efficiently express how the web page may be relevant to the query) with linked data (LD) in order to highlight non trivial relationships between the information need of the user and LD resources related to the result page. To do this, we introduce a multi-step unsupervised co-clustering algorithm so as to use the textual data associated with the resources for discovering additional relationships. Next, we use a 3-way tensor to mix these new relationships with the ones available from the LD graph. Then, we apply a first PARAFAC tensor decomposition [5] in order to (i) select the most promising nodes for a 1-hop extension, and (ii) build the enhanced snippet. A video demonstration is available online (http://liris.cnrs.fr/drim/projects/ensen/).

Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009 | 2009