Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rejwanul Haque is active.

Publication


Featured researches published by Rejwanul Haque.


workshop on statistical machine translation | 2008

MaTrEx: The DCU MT System for WMT 2008

Sergio Penkale; Rejwanul Haque; Sandipan Dandapat; Pratyush Banerjee; Ankit Kumar Srivastava; Jinhua Du; Pavel Pecina; Sudip Kumar Naskar; Mikel L. Forcada; Andy Way

In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our data-driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish-English and French-English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech-English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets.


Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) | 2009

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

Rejwanul Haque; Sandipan Dandapat; Ankit Kumar Srivastava; Sudip Kumar Naskar; Andy Way

This paper presents English---Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framework that enables efficient estimation of these features while avoiding data sparseness problems.We carried out experiments both at character and transliteration unit (TU) level. Position-dependent source context features produce significant improvements in terms of all evaluation metrics.


cross language evaluation forum | 2008

Bengali, Hindi and Telugu to English Ad-Hoc Bilingual Task at CLEF 2007

Sivaji Bandyopadhyay; Tapabrata Mondal; Sudip Kumar Naskar; Asif Ekbal; Rejwanul Haque; Srinivasa Rao Godhavarthy

This paper presents the experiments carried out at Jadavpur University as part of the participation in the CLEF 2007 ad-hoc bilingual task. This is our first participation in the CLEF evaluation task and we have considered Bengali, Hindi and Telugu as query languages for the retrieval from English document collection. We have discussed our Bengali, Hindi and Telugu to English CLIR system as part of the ad-hoc bilingual task, the English IR system for the ad-hoc monolingual task and the associated experiments at CLEF. Query construction was manual for Telugu-English ad-hoc bilingual task, while it was automatic for all other tasks.


international conference on asian language processing | 2010

Sentence Similarity-Based Source Context Modelling in PBSMT

Rejwanul Haque; Sudip Kumar Naskar; Andy Way; Marta Ruiz Costa-Jussà; Rafael E. Banchs

Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation(PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of super tags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with super tag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a super tag-based feature.


workshop on statistical machine translation | 2014

DCU-Lingo24 Participation in WMT 2014 Hindi-English Translation task

Xiaofeng Wu; Rejwanul Haque; Tsuyoshi Okita; Piyush Arora; Andy Way; Qun Liu

This paper describes the DCU-Lingo24 submission to WMT 2014 for the HindiEnglish translation task. We exploit miscellaneous methods in our system, including: Context-Informed PB-SMT, OOV Word Conversion (OWC), MultiAlignment Combination (MAC), Operation Sequence Model (OSM), Stemming Align and Normal Phrase Extraction (SANPE), and Language Model Interpolation (LMI). We also describe various preprocessing steps we tried for Hindi in this task.


Proceedings of the 4th International Workshop on Computational Terminology (Computerm) | 2014

Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation

Rejwanul Haque; Sergio Penkale; Andy Way

Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminology with the extracted monolingual term lists. We manually evaluate our novel terminology extraction model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains. Furthermore, we report the performance of our monolingual terminology extraction model comparing with a number of the state-of-the-art terminology extraction models on the English-to-Hindi datasets.


international conference on asian language processing | 2012

Source-Side Suffix Stripping for Bengali-to-English SMT

Rejwanul Haque; Sergio Penkale; Jie Jiang; Andy Way

Data sparseness is a well-known problem for statistical machine translation (SMT) when morphologically rich and highly inflected languages are involved. This problem become worse in resource-scarce scenarios where sufficient parallel corpora are not available for model training. Recent research has shown that morphological segmentation can be employed on either side of the translation pair to reduce data sparsity. In this work, we consider a highly inflected Indian language as the source-side of the translation pair, Bengali. This paper presents study of morphological segmentation in SMT with a less explored translation pair, Bengali-to-English. We worked with a tiny training set available for this language-pair. We employ a simple suffix-stripping method for lemmatizing inflected Bengali words. We show that our morphological suffix separation process significantly reduces data sparseness. We also show that an SMT model trained on suffix-stripped (source) training data significantly outperforms the state-of-the-art phrase-based SMT (PB-SMT) baseline.


language resources and evaluation | 2018

TermFinder: log-likelihood comparison and phrase-based statistical machine translation models for bilingual terminology extraction

Rejwanul Haque; Sergio Penkale; Andy Way

Bilingual termbanks are important for many natural language processing applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. The initial candidate terminology list is prepared by taking all arbitrary n-gram word sequences from the corpus. Then, a well-known statistical measure (the Dice coefficient) is employed in order to remove any multi-word terms with weak associations from the candidate term list. Thereafter, the log-likelihood comparison method is applied to rank the phrasal candidate term list. Then, using a phrase-based statistical machine translation model, we create a bilingual terminology with the extracted monolingual term lists. We integrate an external knowledge source—the Wikipedia cross-language link databases—into the terminology extraction (TE) model to assist two processes: (a) the ranking of the extracted terminology list, and (b) the selection of appropriate target terms for a source term. First, we report the performance of our monolingual TE model compared to a number of the state-of-the-art TE models on English-to-Turkish and English-to-Hindi data sets. Then, we evaluate our novel bilingual TE model on an English-to-Turkish data set, and report the automatic evaluation results. We also manually evaluate our novel TE model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains.


international joint conference on natural language processing | 2008

Language Independent Named Entity Recognition in Indian Languages

Asif Ekbal; Rejwanul Haque; Amitava Das; Venkateswarlu Poka; Sivaji Bandyopadhyay


international joint conference on natural language processing | 2008

Named Entity Recognition in Bengali: A Conditional Random Field Approach.

Asif Ekbal; Rejwanul Haque; Sivaji Bandyopadhyay

Collaboration


Dive into the Rejwanul Haque's collaboration.

Top Co-Authors

Avatar

Andy Way

Dublin City University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Asif Ekbal

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jie Jiang

Dublin City University

View shared research outputs
Top Co-Authors

Avatar

Jinhua Du

Dublin City University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge