Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sadaf Abdul-Rauf is active.

Publication


Featured researches published by Sadaf Abdul-Rauf.


meeting of the association for computational linguistics | 2009

On the Use of Comparable Corpora to Improve SMT performance

Sadaf Abdul-Rauf; Holger Schwenk

We present a simple and effective method for extracting parallel sentences from comparable corpora. We employ a statistical machine translation (SMT) system built from small amounts of parallel texts to translate the source side of the non-parallel corpus. The target side texts are used, along with other corpora, in the language model of this SMT system. We then use information retrieval techniques and simple filters to create French/English parallel data from a comparable news corpora. We evaluate the quality of the extracted data by showing that it significantly improves the performance of an SMT systems.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

Empirical use of information retrieval to build synthetic data for SMT domain adaptation

Sadaf Abdul-Rauf; Holger Schwenk; Patrik Lambert; Mohammad Nawaz

In this paper, we present information retrieval as a powerful tool for addressing an imperative problem in the field of statistical machine translation, i.e., improving translation quality when not enough parallel corpora are available. We devise a framework, which uses information retrieval to create a synthetic corpus from the easily available monolingual corpora. We propose an improved unsupervised training approach with a data selection mechanism, which selects only the most appropriate sentences, thus reducing the amount of data, which is less related to the domain in the additional bitext. We also introduce a new method to choose sentences based on their relative similarity/difference from the query sentence. Using the synthetic corpus created by our method, we are able to improve state-of-the-art statistical machine translation systems.


Computer Speech & Language | 2017

Parallel fragments

Sadaf Abdul-Rauf; Holger Schwenk; Mohammad Nawaz

Phrase fragments have proved to be a valuable resource for increasing translation and natural language generation performance.A novel approach to find parallel fragments from comparable corpora is presented which is simple and efficient in processing.Difference in translation improvement for fragments extracted from related versus non related corpus is presented.Comparison of impact of parallel fragments vs. sentences is reported highlighting the significance of parallel segments.Proposed approach is compared theoretically with an earlier approach on all phases of the fragment extraction pipeline. Lack of parallel corpora have diverted the direction of research towards exploring other arenas to fill in the dearth. Comparable corpora have proved to be a valuable resource in this regard. Interestingly other than the parallel sentences extracted from comparable corpora, parallel phrase fragments have also proved to be beneficial for statistical machine translation. We present a novel approach based on an efficient framework for parallel fragment extraction from comparable corpora. Using the fragments as additional corpus for translation, we are able to obtain an improvement of 0.88 and 0.89 BLEU points on test data for ArabicEnglish and FrenchEnglish systems respectively. We have also conducted a detailed analysis of impact of fragments extracted from related vs non-related corpus. A comparison of impact of parallel fragments vs. parallel sentences is also presented highlighting the significance of parallel segments for statistical machine translation. The article concludes with a crude comparative analysis of our approach with an existing fragment extraction technique at various stages of the fragment extraction pipeline.


workshop on statistical machine translation | 2011

Investigations on Translation Model Adaptation Using Monolingual Data

Patrik Lambert; Holger Schwenk; Christophe Servan; Sadaf Abdul-Rauf


workshop on statistical machine translation | 2011

LIUM’s SMT Machine Translation Systems for WMT 2012

Holger Schwenk; Patrik Lambert; Loïc Barrault; Christophe Servan; Sadaf Abdul-Rauf; Haithem Afli; Kashif Shah


Abdul-Rauf, Sadaf; Fishel, Mark; Lambert, Patrik; Noubours, Sandra; Sennrich, Rico (2012). Extrinsic evaluation of sentence alignment systems. In: Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, Istanbul, 27 May 2012 - 27 May 2012, 6-10. | 2012

Extrinsic evaluation of sentence alignment systems

Sadaf Abdul-Rauf; Mark Fishel; Patrik Lambert; Sandra Noubours; Rico Sennrich


meeting of the association for computational linguistics | 2011

Exploiting Comparable Corpora with TER and TERp.

Sadaf Abdul-Rauf; Holger Schwenk


workshop on statistical machine translation | 2010

LIUM SMT Machine Translation System for WMT 2010

Patrik Lambert; Sadaf Abdul-Rauf; Holger Schwenk


arXiv: Computers and Society | 2017

Mr. Doc: A Doctor Appointment Application System.

Shafaq Malik; Nargis Bibi; Sehrish Khan; Razia Sultana; Sadaf Abdul-Rauf


Workshop on Creating Cross-language Resources for Disconnected Languages and Styles | 2012

Workshop on Creating Cross-language Resources for Disconnected Languages and Styles

Sadaf Abdul-Rauf; Mark Fishel; Patrik Lambert; Sandra Noubours; Rico Sennrich

Collaboration


Dive into the Sadaf Abdul-Rauf's collaboration.

Top Co-Authors

Avatar

Holger Schwenk

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Patrik Lambert

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mohammad Nawaz

Balochistan University of Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kashif Shah

University of Sheffield

View shared research outputs
Researchain Logo
Decentralizing Knowledge