Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sandipan Dandapat is active.

Publication


Featured researches published by Sandipan Dandapat.


workshop on statistical machine translation | 2008

MaTrEx: The DCU MT System for WMT 2008

Sergio Penkale; Rejwanul Haque; Sandipan Dandapat; Pratyush Banerjee; Ankit Kumar Srivastava; Jinhua Du; Pavel Pecina; Sudip Kumar Naskar; Mikel L. Forcada; Andy Way

In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008. We describe the modular design of our data-driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: Spanish-English and French-English, in which we employed our hybrid EBMT-SMT architecture to translate. We also participated in the Czech-English News and News Commentary tasks which represented a previously untested language pair for our system. We report results on the provided development and test sets.


international conference natural language processing | 2010

OpenMaTrEx: a free/open-source marker-driven example-based machine translation system

Sandipan Dandapat; Mikel L. Forcada; Declan Groves; Sergio Penkale; John Tinsley; Andy Way

We describe OpenMaTrEx, a free/open-source example-based machine translation (EBMT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and two engines: one based on a simple proof-of-concept monotone EBMT recombinator and a Moses-based statistical decoder. Open-MaTrEx is a free/open-source release of the basic components of MaTrEx, the Dublin City University machine translation system.


linguistic annotation workshop | 2009

Complex Linguistic Annotation -- No Easy Way Out! A Case from Bangla and Hindi POS Labeling Tasks

Sandipan Dandapat; Priyanka Biswas; Monojit Choudhury; Kalika Bali

Alternative paths to linguistic annotation, such as those utilizing games or exploiting the web users, are becoming popular in recent times owing to their very high benefit-to-cost ratios. In this paper, however, we report a case study on POS annotation for Bangla and Hindi, where we observe that reliable linguistic annotation requires not only expert annotators, but also a great deal of supervision. For our hierarchical POS annotation scheme, we find that close supervision and training is necessary at every level of the hierarchy, or equivalently, complexity of the tagset. Nevertheless, an intelligent annotation tool can significantly accelerate the annotation process and increase the inter-annotator agreement for both expert and non-expert annotators. These findings lead us to believe that reliable annotation requiring deep linguistic knowledge (e.g., POS, chunking, Treebank, semantic role labeling) requires expertise and supervision. The focus, therefore, should be on design and development of appropriate annotation tools equipped with machine learning based predictive modules that can significantly boost the productivity of the annotators.


Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) | 2009

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

Rejwanul Haque; Sandipan Dandapat; Ankit Kumar Srivastava; Sudip Kumar Naskar; Andy Way

This paper presents English---Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framework that enables efficient estimation of these features while avoiding data sparseness problems.We carried out experiments both at character and transliteration unit (TU) level. Position-dependent source context features produce significant improvements in terms of all evaluation metrics.


Machine Translation | 2011

Nitin Indurkhya and Fred J. Damerau (eds): Handbook of Natural Language Processing (second edition)

Sandipan Dandapat

The second edition of the Handbook of Natural Language Processing provides detailed coverage of the techniques and applications of current natural language processing (NLP). This edition has removed outdated material and upgrades and expands some of the chapters from the earlier version of the handbook (Dale et al. 2000). This handbook also covers some emerging areas of recent NLP, such as sentiment analysis, web distance and word similarity. This edition of the handbook was compiled by Nitin Indurkhya, a researcher at the University of New South Wales, and late text processing pioneer Fred J. Damerau of the IBM T.J. Watson Research Centre. A review of this book has already been published by Jochen Leidner (2011) which focuses on the overview of the topics covered therein compared with other available related handbooks. In this review, I aim to provide a more detailed outline of the different research areas covered in the book, focusing particularly on the area of machine translation (MT). The book has 26 chapters in three different sections, namely, Classical Approach (Part I), Empirical and Statistical Approach (Part II) and Applications (Part III). The first part of the book has six chapters. The organization of these chapters is based on the chronological flow pattern of standard processing stages of NLP, typically found in pipelined rule-based MT (RBMT) architectures. The subsequent 17 chapters in the second part of the book follow a similar organizational pattern. However, in this review, I will primarily focus on the chapters directly related to MT. I will also cover MT-related topics detailed in other chapters of this book. There are primarily two chapters about MT in this book, namely, Statistical Machine Translation (SMT) (Chapter 17), by Abraham Ittycheriah, and Chinese Machine Translation (Chapter 18), by Pascale Fung. Along with these two chapters dealing directly with MT, there is a separate chapter on Alignment (Chapter 16), by Dekai Wu.


Archive | 2011

Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting

Sandipan Dandapat; Sara Morrissey; Andy Way; Mikel L. Forcada


Archive | 2009

A review of EBMT using proportional analogies

Harold L. Somers; Sandipan Dandapat; Sudip Kumar Naskar


pacific asia conference on language information and computation | 2010

Mitigating Problems in Analogy-based EBMT with SMT and vice versa: A Case Study with Named Entity Transliteration

Sandipan Dandapat; Sara Morrissey; Sudip Kumar Naskar; Harold L. Somers


language resources and evaluation | 2010

Building a sign language corpus for use in machine translation

Sara Morrissey; Harold L. Somers; Robert Smith; Shane Gilchrist; Sandipan Dandapat


Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra) | 2012

Combining EBMT, SMT, TM and IR Technologies for Quality and Scale

Sandipan Dandapat; Sara Morrissey; Andy Way; Joseph van Genabith

Collaboration


Dive into the Sandipan Dandapat's collaboration.

Top Co-Authors

Avatar

Andy Way

Dublin City University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge