Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Maud Ehrmann is active.

Publication


Featured researches published by Maud Ehrmann.


Sprachwissenschaft | 2016

JRC-Names: Multilingual entity name variants and titles as Linked Data

Maud Ehrmann; Guillaume Jacquet; Ralf Steinberger

Since 2004 the European Commissions Joint Research Centre (JRC) has been analysing the online version of printed media in over twenty languages and has automatically recognised and compiled large amounts of named entities ( persons and organisations) and their many name variants. The collected variants not only include standard spellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser used name forms, all occurring in real-life text (e.g. Benjamin/ Binyamin/Bibi/Benyamin/Biniamin/Netanyahu/Netanjahu/Neanyahou/Netahny/ ). This entity name variant data, known as JRCNames, has been available for public download since 2011. In this article, we report on our efforts to render JRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to Semantic Web standards, this new release goes beyond the initial one in that it includes titles found next to the names, as well as date ranges when the titles and the name variants were found. It also establishes links towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linked dataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supporting large-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking. JRC-Names is publicly available through the dataset catalogue of the European Unions Open Data Portal.


text speech and dialogue | 2013

Multilingual Media Monitoring and Text Analysis – Challenges for Highly Inflected Languages

Ralf Steinberger; Maud Ehrmann; Júlia Pajzs; Mohamed Ebrahim; Josef Steinberger; Marco Turchi

We present the highly multilingual news analysis system Europe Media Monitor (EMM), which gathers an average of 175,000 online news articles per day in tens of languages, categorises the news items and extracts named entities and various other information from them. We also give an overview of EMM’s text mining tool set, focusing on the issue of how the software deals with highly inflected languages such as those of the Slavic and Finno-Ugric language families. The questions we ask are: How to adapt extraction patterns to such languages? How to de-inflect extracted named entities? And: Will document categorisation benefit from lemmatising the texts?


Polibits | 2011

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Marco Turchi; Maud Ehrmann

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model. are highly affected by the presence of OOV words. The other way around, the number of source phrases covered during the translation is higher, but target sentences contain more incorrect translated words. Adding more data is the most obvious solution, but this has well-known drawbacks: it heavily increases the dimension of the tables, which reduces the translation speed, and parallel data are not always available for all the language pairs. In case of low quality parallel data, it can be even harmful because more data imply a bigger number of unreliable or incorrect associations built during the training phase. In this paper, we address the problem of expanding the knowledge of an SMT system without adding parallel data, but extending the knowledge produced during the training phase. The main idea consists of inserting artificial entries in the phrase and reordering models using external morphological resources; the goal is to provide more translation options to the system during the construction of the target sentence.


recent advances in natural language processing | 2011

Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection

Maud Ehrmann; Marco Turchi; Ralf Steinberger


international conference on weblogs and social media | 2012

Enhancing Event Descriptions through Twitter Mining

Hristo Tanev; Maud Ehrmann; Jakub Piskorski; Vanni Zavarella


meeting of the association for computational linguistics | 2013

On Named Entity Recognition in Targeted Twitter Streams in Polish.

Jakub Piskorski; Maud Ehrmann


recent advances in natural language processing | 2013

Acronym recognition and processing in 22 languages

Maud Ehrmann; Leonida Della Rocca; Ralf Steinberger; Hristo Tanev


recent advances in natural language processing | 2011

Highly Multilingual Coreference Resolution Exploiting a Mature Entity Repository

Josef Steinberger; Jenya Belyaeva; Jonathan Crawley; Leonida Della-Rocca; Mohamed Ebrahim; Maud Ehrmann; Mijail A. Kabadjov; Ralf Steinberger; Erik Van-der-Goot


Archive | 2010

Building Multilingual Named Entity Annotated Corpora Exploiting Parallel Corpora

Maud Ehrmann; Marco Turchi


language resources and evaluation | 2014

Clustering of Multi-Word Named Entity variants: Multilingual Evaluation

Guillaume Jacquet; Maud Ehrmann; Ralf Steinberger

Collaboration


Dive into the Maud Ehrmann's collaboration.

Top Co-Authors

Avatar

Marco Turchi

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar

Josef Steinberger

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Júlia Pajzs

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marco Turchi

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eszter Simon

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Tamás Váradi

Hungarian Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge