Sadaf Abdul-Rauf
University of Maine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sadaf Abdul-Rauf.
meeting of the association for computational linguistics | 2009
Sadaf Abdul-Rauf; Holger Schwenk
We present a simple and effective method for extracting parallel sentences from comparable corpora. We employ a statistical machine translation (SMT) system built from small amounts of parallel texts to translate the source side of the non-parallel corpus. The target side texts are used, along with other corpora, in the language model of this SMT system. We then use information retrieval techniques and simple filters to create French/English parallel data from a comparable news corpora. We evaluate the quality of the extracted data by showing that it significantly improves the performance of an SMT systems.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Sadaf Abdul-Rauf; Holger Schwenk; Patrik Lambert; Mohammad Nawaz
In this paper, we present information retrieval as a powerful tool for addressing an imperative problem in the field of statistical machine translation, i.e., improving translation quality when not enough parallel corpora are available. We devise a framework, which uses information retrieval to create a synthetic corpus from the easily available monolingual corpora. We propose an improved unsupervised training approach with a data selection mechanism, which selects only the most appropriate sentences, thus reducing the amount of data, which is less related to the domain in the additional bitext. We also introduce a new method to choose sentences based on their relative similarity/difference from the query sentence. Using the synthetic corpus created by our method, we are able to improve state-of-the-art statistical machine translation systems.
Computer Speech & Language | 2017
Sadaf Abdul-Rauf; Holger Schwenk; Mohammad Nawaz
Phrase fragments have proved to be a valuable resource for increasing translation and natural language generation performance.A novel approach to find parallel fragments from comparable corpora is presented which is simple and efficient in processing.Difference in translation improvement for fragments extracted from related versus non related corpus is presented.Comparison of impact of parallel fragments vs. sentences is reported highlighting the significance of parallel segments.Proposed approach is compared theoretically with an earlier approach on all phases of the fragment extraction pipeline. Lack of parallel corpora have diverted the direction of research towards exploring other arenas to fill in the dearth. Comparable corpora have proved to be a valuable resource in this regard. Interestingly other than the parallel sentences extracted from comparable corpora, parallel phrase fragments have also proved to be beneficial for statistical machine translation. We present a novel approach based on an efficient framework for parallel fragment extraction from comparable corpora. Using the fragments as additional corpus for translation, we are able to obtain an improvement of 0.88 and 0.89 BLEU points on test data for ArabicEnglish and FrenchEnglish systems respectively. We have also conducted a detailed analysis of impact of fragments extracted from related vs non-related corpus. A comparison of impact of parallel fragments vs. parallel sentences is also presented highlighting the significance of parallel segments for statistical machine translation. The article concludes with a crude comparative analysis of our approach with an existing fragment extraction technique at various stages of the fragment extraction pipeline.
workshop on statistical machine translation | 2011
Patrik Lambert; Holger Schwenk; Christophe Servan; Sadaf Abdul-Rauf
workshop on statistical machine translation | 2011
Holger Schwenk; Patrik Lambert; Loïc Barrault; Christophe Servan; Sadaf Abdul-Rauf; Haithem Afli; Kashif Shah
Abdul-Rauf, Sadaf; Fishel, Mark; Lambert, Patrik; Noubours, Sandra; Sennrich, Rico (2012). Extrinsic evaluation of sentence alignment systems. In: Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, Istanbul, 27 May 2012 - 27 May 2012, 6-10. | 2012
Sadaf Abdul-Rauf; Mark Fishel; Patrik Lambert; Sandra Noubours; Rico Sennrich
meeting of the association for computational linguistics | 2011
Sadaf Abdul-Rauf; Holger Schwenk
workshop on statistical machine translation | 2010
Patrik Lambert; Sadaf Abdul-Rauf; Holger Schwenk
arXiv: Computers and Society | 2017
Shafaq Malik; Nargis Bibi; Sehrish Khan; Razia Sultana; Sadaf Abdul-Rauf
Workshop on Creating Cross-language Resources for Disconnected Languages and Styles | 2012
Sadaf Abdul-Rauf; Mark Fishel; Patrik Lambert; Sandra Noubours; Rico Sennrich