Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Serge Sharoff is active.

Publication


Featured researches published by Serge Sharoff.


language resources and evaluation | 2014

Corpus-based vocabulary lists for language learners for nine languages

Adam Kilgarriff; Frieda Charalabopoulou; Maria Gavrilidou; Janne Bondi Johannessen; Saussan Khalil; Sofie Johansson Kokkinakis; Robert Lew; Serge Sharoff; Ravikiran Vadlapudi; Elena Volodina

We present the KELLY project and its work on developing monolingual and bilingual word lists for language learning, using corpus methods, for nine languages and thirty-six language pairs. We describe the method and discuss the many challenges encountered. We have loaded the data into an online database to make it accessible for anyone to explore and we present our own first explorations of it. The focus of the paper is thus twofold, covering pedagogical and methodological aspects of the lists’ construction, and linguistic aspects of the by-product of the project, the KELLY database.


meeting of the association for computational linguistics | 2006

Using Comparable Corpora to Solve Problems Difficult for Human Translators

Serge Sharoff; Bogdan Babych; Anthony Hartley

In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.


Archive | 2014

Building and Using Comparable Corpora

Serge Sharoff; Reinhard Rapp; Pierre Zweigenbaum; Pascale Fung

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume providesa reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.


Archive | 2013

Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora

Serge Sharoff; Reinhard Rapp; Pierre Zweigenbaum

The beginning of the 1990s marked a radical turn in various NLP applications towards using large collections of texts.


international conference on computational linguistics | 2000

Multilinguality in a text generation system for three Slavic languages

Geert-Jan M. Kruijff; Elke Teich; John A. Bateman; Ivana Kruijff-Korbayová; Hana Skoumalová; Serge Sharoff; Lena Sokolova; Tony Hartley; Kamenka Staykova; Jiří Hana

This paper describes a multilingual text generation system in the domain of CAD/CAM software instructions for Bulgarian, Czech and Russian. Starting from a language-independent semantic representation, the system drafts natural, continuous text as typically found in software manuals. The core modules for strategic and tactical generation are implemented using the KPML platform for linguistic resource development and generation. Prominent characteristics of the approach implemented are a treatment of multilinguality that makes maximal use of the commonalities between languages while also accounting for their differences and a common representational strategy for both text planning and sentence generation.


language resources and evaluation | 2009

‘Irrefragable answers’ using comparable corpora to retrieve translation equivalents

Serge Sharoff; Bogdan Babych; Anthony Hartley

In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.


Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra) | 2014

Extracting Multiword Translations from Aligned Comparable Documents

Reinhard Rapp; Serge Sharoff

Most previous attempts to identify translations of multiword expressions using comparable corpora relied on dictionaries of single words. The translation of a multiword was then constructed from the translations of its components. In contrast, in this work we try to determine the translation of a multiword unit by analyzing its contextual behaviour in aligned comparable documents, thereby not presupposing any given dictionary. Whereas with this method translation results for single words are rather good, the results for multiword units are considerably worse. This is an indication that the type of multiword expressions considered here is too infrequent to provide a sufficient amount of contextual information. Thus indirectly it is confirmed that it should make sense to look at the contextual behaviour of the components of a multiword expression individually, and to combine the results.


Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment | 2013

SentiML: functional annotation for multilingual sentiment analysis

Marilena Di Bari; Serge Sharoff; Martin Thomas

Sentiment Analysis is the task of automatically identifying whether a text or a single sentence is intended to carry a positive or negative connotation. The commonly used Bag-of-Words approach that relies on counting positive and negative words, whose connotation is indicated by specially crafted sentiment dictionaries, is not ideal because it does not take into account the relations between words and how the connotation of single words changes according to the context. This paper proposes a way of identifying and analysing the targets of the opinions and their modifiers, along with their linkage (appraisal group) through an annotation schema called SentiML. Such schema has been developed in order to facilitate the identification of these elements and the annotation of their sentiment, along with advanced linguistic features such as their appraisal type according to the Appraisal Framework. The schema is XML-based and has been also designed to be language-independent. Preliminary results show that the schema allows more coverage than a sentiment dictionary, while achieving reasonably fast and reliable annotation in spite of its fine granularity.


conference of the european chapter of the association for computational linguistics | 2006

ASSIST: automated semantic assistance for translators

Serge Sharoff; Bogdan Babych; Paul Rayson; Olga Mudraya; Scott Piao

The problem we address in this paper is that of providing contextual examples of translation equivalents for words from the general lexicon using comparable corpora and semantic annotation that is uniform for the source and target languages. For a sentence, phrase or a query expression in the source language the tool detects the semantic type of the situation in question and gives examples of similar contexts from the target language corpus.


language resources and evaluation | 2016

Crowdsourcing for web genre annotation

Noushin Rezapour Asheghi; Serge Sharoff; Katja Markert

Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are either not tested for inter-annotator reliability or exhibit low inter-coder agreement. Annotation has also mostly been carried out by a small number of experts, leading to concerns with regard to scalability of these annotation efforts and transferability of the schemes to annotators outside these small expert groups. In this paper, we tackle these problems by using crowd-sourcing for genre annotation, leading to the Leeds Web Genre Corpus—the first web corpus which is, demonstrably reliably annotated for genre and which can be easily and cost-effectively expanded using naive annotators. We also show that the corpus is source and topic diverse.

Collaboration


Dive into the Serge Sharoff's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pierre Zweigenbaum

Institut national des langues et civilisations orientales

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge