Nizar Habash | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nizar Habash is active.

Explore More

Publication

Featured researches published by Nizar Habash.

meeting of the association for computational linguistics | 2005

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop

Nizar Habash; Owen Rambow

We present an approach to using a morphological analyzer for tokenizing and morphologically tagging (including part-of-speech tagging) Arabic words in one process. We learn classifiers for individual morphological features, as well as ways of using these classifiers to choose among entries from the output of the analyzer. We obtain accuracy rates on all tasks in the high nineties.

north american chapter of the association for computational linguistics | 2006

Arabic Preprocessing Schemes for Statistical Machine Translation

Nizar Habash; Fatiha Sadat

In this paper, we study the effect of different word-level preprocessing decisions for Arabic on SMT quality. Our results show that given large amounts of training data, splitting off only proclitics performs best. However, for small amounts of training data, it is best to apply English-like to-kenization using part-of-speech tags, and sophisticated morphological analysis and disambiguation. Moreover, choosing the appropriate preprocessing produces a significant increase in BLEU score if there is a change in genre between training and test data.

Archive | 2007

On Arabic Transliteration

Nizar Habash; Abdelhadi Soudi; Timothy Buckwalter

This chapter introduces the transliteration scheme used to represent Arabic characters in this book. The scheme is a one-to-one transliteration of the Arabic script that is complete, easy to read, and consistent with Arabic computer encodings. We present guidelines for Arabic pronunciation using this transliteration scheme and discuss various idiosyncrasies of Arabic orthography

meeting of the association for computational linguistics | 2008

Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking

Ryan M. Roth; Owen Rambow; Nizar Habash; Mona T. Diab; Cynthia Rudin

We investigate the tasks of general morphological tagging, diacritization, and lemmatization for Arabic. We show that for all tasks we consider, both modeling the lexeme explicitly, and retuning the weights of individual classifiers for the specific task, improve the performance.

meeting of the association for computational linguistics | 2006

MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects

Nizar Habash; Owen Rambow

We present MAGEAD, a morphological analyzer and generator for the Arabic language family. Our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. MAGEAD performs an on-line analysis to or generation from a root+pattern+features representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects. We present a detailed evaluation of MAGEAD.

north american chapter of the association for computational linguistics | 2007

Arabic Diacritization through Full Morphological Tagging

Nizar Habash; Owen Rambow

We present a diacritization system for written Arabic which is based on a lexical resource. It combines a tagger and a lexeme language model. It improves on the best results reported in the literature.

conference of the european chapter of the association for computational linguistics | 2006

Parsing Arabic Dialects

David Chiang; R.M. Diab; Nizar Habash; R. Hwa; Roger Levy; Owen Rambow; Khalil Sima'an

The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem of parsing transcribed spoken Levantine Arabic (LA).We do not assume the existence of any annotated LA corpus (except for development and testing), nor of a parallel corpus LAMSA. Instead, we use explicit knowledge about the relation between LA and MSA.

meeting of the association for computational linguistics | 2009

CATiB: The Columbia Arabic Treebank

Nizar Habash; Ryan M. Roth

The Columbia Arabic Treebank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on speed with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach: no annotation of redundant information and using representations and terminology inspired by traditional Arabic syntax. We describe CATiBs representation and annotation procedure, and report on inter-annotator agreement and speed.

meeting of the association for computational linguistics | 2008

Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation

Nizar Habash

We present four techniques for online handling of Out-of-Vocabulary words in Phrase-based Statistical Machine Translation. The techniques use spelling expansion, morphological expansion, dictionary term expansion and proper name transliteration to reuse or extend a phrase table. We compare the performance of these techniques and combine them. Our results show a consistent improvement over a state-of-the-art baseline in terms of BLEU and a manual error analysis.

Archive | 2006

Online Arabic Handwriting Recognition Using Hidden Markov Models

Fadi Biadsy; Jihad El-Sana; Nizar Habash

Online handwriting recognition of Arabic script is a difficult problem since it is naturally both cursive and unconstrained. The analysis of Arabic script is further complicated in comparison to Latin script due to obligatory dots/stokes that are placed above or below most letters. This paper introduces a Hidden Markov Model (HMM) based system to provide solutions for most of the difficulties inherent in recognizing Arabic script including: letter connectivity, position-dependent letter shaping, and delayed strokes. This is the first HMM-based solution to online Arabic handwriting recognition. We report successful results for writerdependent and writer-independent word recognition.

Explore More