Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sanjika Hewavitharana is active.

Publication


Featured researches published by Sanjika Hewavitharana.


meeting of the association for computational linguistics | 2011

Extracting Parallel Phrases from Comparable Data

Sanjika Hewavitharana; Stephan Vogel

Mining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other NLP applications. Even if two comparable documents have few or no parallel sentence pairs, there is still potential for parallelism in the sub-sentential level. The ability to detect these phrases creates a valuable resource, especially for low-resource languages. In this paper we explore three phrase alignment approaches to detect parallel phrase pairs embedded in comparable sentences: the standard phrase extraction algorithm, which relies on the Viterbi path; a phrase extraction approach that does not rely on the Viterbi path, but uses only lexical features; and a binary classifier that detects parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the effectiveness of these approaches in detecting alignments for phrase pairs that have a known alignment in comparable sentence pairs. The results show that the Non-Viterbi alignment approach outperforms the other two approaches on F1 measure.


north american chapter of the association for computational linguistics | 2006

Thai Grapheme-Based Speech Recognition

Paisarn Charoenpornsawat; Sanjika Hewavitharana; Tanja Schultz

In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of sharing parameters across models. We compare our system with two phoneme-based systems; one that uses a hand-crafted dictionary and another that uses an automatically generated dictionary. Experiment results show that the grapheme-based system with enhanced tree clustering outperforms the phoneme-based system using an automatically generated dictionary, and has comparable results to the phoneme-based system with the hand-crafted dictionary.


meeting of the association for computational linguistics | 2008

Recent Improvements in the CMU Large Scale Chinese-English SMT System

Almut Silja Hildebrand; Kay Rottmann; Mohamed Noamany; Quin Gao; Sanjika Hewavitharana; Nguyen Bach; Stephan Vogel

In this paper we describe recent improvements to components and methods used in our statistical machine translation system for Chinese-English used in the January 2008 GALE evaluation. Main improvements are results of consistent data processing, larger statistical models and a POS-based word reordering approach.


Natural Language Engineering | 2013

A unified alignment algorithm for bilingual data

Christoph Tillmann; Sanjika Hewavitharana

The paper presents a novel unified algorithm for aligning sentences with their translations in bilingual data. With the help of ideas from a stack-based, dynamic programming (DP) decoder for speech recognition (SR) (Ney 1984), the search is parametrized in a novel way, such that the unified algorithm can be used on various types of data that have been previously handled by separate implementations: the extracted text chunk pairs can be either sub-sentential pairs, one-to-one, or many-to-many sentence-level pairs. The onestage search algorithm is carried out in a single run over the data. Its memory requirements are independent of the length of the source document, and it is applicable to sentence-level parallel as well as comparable data. With the help of a unified beam-search candidate pruning, the algorithm is very efficient: it avoids any document-level pre-filtering and uses less restrictive sentence-level filtering. Results are presented on a Russian-English, a Spanish-English and an Arabic-English extraction task. Based on simple word-based scoring features, text chunk pairs are extracted out of several trillion candidates, where the search is carried out on 300 processors in parallel.


IWSLT | 2004

The ISL statistical translation system for spoken language translation.

Stephan Vogel; Sanjika Hewavitharana; Muntsin Kolss; Alex Waibel


IWSLT | 2005

The UKA/CMU Statistical Machine Translation System for IWSLT 2006

Sanjika Hewavitharana; Bing Zhao; Almut Silja Hildebrand; Matthias Eck; Chiori Hori; Stephan Vogel; Alex Waibel


Archive | 2005

Augmenting a Statistical Translation System with a Translation Memory

Sanjika Hewavitharana; Stephan Vogel; Alex Waibel


workshop on statistical machine translation | 2011

CMU Haitian Creole-English Translation System for WMT 2011

Sanjika Hewavitharana; Nguyen Bach; Qin Gao; Vamshi Ambati; Stephan Vogel


meeting of the association for computational linguistics | 2011

Active Learning with Multiple Annotations for Comparable Data Classification Task

Vamshi Ambati; Sanjika Hewavitharana; Stephan Vogel; Jaime G. Carbonell


Archive | 2005

The CMU SMT System for IWSLT 2005

Sanjika Hewavitharana; Bing Zhao; Almut Silja Hildebrand; Matthias Gerhard Eck; Chiori Hori; Stephan Vogel; Alexander Waibel

Collaboration


Dive into the Sanjika Hewavitharana's collaboration.

Top Co-Authors

Avatar

Stephan Vogel

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Bing Zhao

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Nguyen Bach

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Vamshi Ambati

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Chiori Hori

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mridul Gupta

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Rohit Kumar

International Institute of Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge