Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Atelach Alemu Argaw is active.

Publication


Featured researches published by Atelach Alemu Argaw.


Proceedings of the First Workshop on Language Technologies for African Languages | 2009

Methods for Amharic Part-of-Speech Tagging

Björn Gambäck; Fredrik Olsson; Atelach Alemu Argaw; Lars Asker

The paper describes a set of experiments involving the application of three state-of-the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for English, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy approach, while HMM-based and SVM-based taggers got comparable results.


meeting of the association for computational linguistics | 2007

An Amharic Stemmer : Reducing Words to their Citation Forms

Atelach Alemu Argaw; Lars Asker

Stemming is an important analysis step in a number of areas such as natural language processing (NLP), information retrieval (IR), machine translation(MT) and text classification. In this paper we present the development of a stemmer for Amharic that reduces words to their citation forms. Amharic is a Semitic language with rich and complex morphology. The application of such a stemmer is in dictionary based cross language IR, where there is a need in the translation step, to look up terms in a machine readable dictionary (MRD). We apply a rule based approach supplemented by occurrence statistics of words in a MRD and in a 3.1M words news corpus. The main purpose of the statistical supplements is to resolve ambiguity between alternative segmentations. The stemmer is evaluated on Amharic text from two domains, news articles and a classic fiction text. It is shown to have an accuracy of 60% for the old fashioned fiction text and 75% for the news articles.


cross language evaluation forum | 2005

Dictionary-based amharic-french information retrieval

Atelach Alemu Argaw; Lars Asker; Rickard Cöster; Jussi Karlgren; Magnus Sahlgren

We present four approaches to the Amharic – French bilingual track at CLEF 2005. All experiments use a dictionary based approach to translate the Amharic queries into French Bags-of-words, but while one approach uses word sense discrimination on the translated side of the queries, the other one includes all senses of a translated word in the query for searching. We used two search engines: The SICS experimental engine and Lucene, hence four runs with the two approaches. Non-content bearing words were removed both before and after the dictionary lookup. TF/IDF values supplemented by a heuristic function was used to remove the stop words from the Amharic queries and two French stopwords lists were used to remove them from the French translations. In our experiments, we found that the SICS search engine performs better than Lucene and that using the word sense discriminated keywords produce a slightly better result than the full set of non discriminated keywords.


Information Retrieval | 2009

Classifying Amharic webnews

Lars Asker; Atelach Alemu Argaw; Björn Gambäck; Samuel Eyassu Asfeha; Lemma Nigussie Habte

We present work aimed at compiling an Amharic corpus from the Web and automatically categorizing the texts. Amharic is the second most spoken Semitic language in the World (after Arabic) and used for countrywide communication in Ethiopia. It is highly inflectional and quite dialectally diversified. We discuss the issues of compiling and annotating a corpus of Amharic news articles from the Web. This corpus was then used in three sets of text classification experiments. Working with a less-researched language highlights a number of practical issues that might otherwise receive less attention or go unnoticed. The purpose of the experiments has not primarily been to develop a cutting-edge text classification system for Amharic, but rather to put the spotlight on some of these issues. The first two sets of experiments investigated the use of Self-Organizing Maps (SOMs) for document classification. Testing on small datasets, we first looked at classifying unseen data into 10 predefined categories of news items, and then at clustering it around query content, when taking 16 queries as class labels. The second set of experiments investigated the effect of operations such as stemming and part-of-speech tagging on text classification performance. We compared three representations while constructing classification models based on bagging of decision trees for the 10 predefined news categories. The best accuracy was achieved using the full text as representation. A representation using only the nouns performed almost equally well, confirming the assumption that most of the information required for distinguishing between various categories actually is contained in the nouns, while stemming did not have much effect on the performance of the classifier.


cross language evaluation forum | 2008

Amharic-English Information Retrieval with Pseudo Relevance Feedback

Atelach Alemu Argaw

We describe cross language retrieval experiments using Amharic queries and English language d ocument collection. Two monolingual and eight bilingual runs were submitted with variations in terms of usage of long and short queries, presence of pseudo relevance feedback (PRF), and approaches for word sense disambiguation (WSD). We used an Amharic-English machine readable dictionary (MRD), and an online Amharic-English dictionary for lookup translation of query terms. Out of dictionary Amharic query terms were considered as possible named entities, and further filtering was attained through restricted fuzzy matching based on edit distance which is calculated against automatically extracted English proper names. The obtained results indicate that longer queries tend to perform similar to short ones, PRF improves performance considerably, and that queries tend to fare better with WSD rather than using maximal expansion of terms by taking all the translations given in the MRD.


cross language evaluation forum | 2007

Amharic-English Information Retrieval

Atelach Alemu Argaw; Lars Asker

We describe Amharic-English cross lingual information retrieval experiments in the ad hoc bilingual tracks of the CLEF 2006. The query analysis is supported by morphological analysis and part of speech tagging while we used two machine readable dictionaries supplemented by online dictionaries for term lookup in the translation process. Out of dictionary terms were handled using fuzzy matching and Lucene[4] was used for indexing and searching. Four experiments that differed in terms of utilized fields in the topic set, fuzzy matching, and term weighting, were conducted. The results obtained are reported and discussed.


Archive | 2009

An Amharic Corpus for Machine Learning

Björn Gambäck; Fredrik Olsson; Atelach Alemu Argaw; Lars Asker


WOCAL 5: 5th World Congress of African Linguistics, 7-11 August 2006, Addis Ababa University, Ethiopia | 2006

Applying machine learning to Amharic text classification

Björn Gambäck; Magnus Sahlgren; Atelach Alemu Argaw; Lars Asker


international conference on web information systems and technologies | 2005

WEB MINING FOR AN AMHARIC - ENGLISH BILINGUAL CORPUS

Atelach Alemu Argaw; Lars Asker


Archive | 2008

Ranked Terms List for Computer Assisted Translation

Atelach Alemu Argaw

Collaboration


Dive into the Atelach Alemu Argaw's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Björn Gambäck

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Fredrik Olsson

Swedish Institute of Computer Science

View shared research outputs
Top Co-Authors

Avatar

Magnus Sahlgren

Swedish Institute of Computer Science

View shared research outputs
Top Co-Authors

Avatar

Jussi Karlgren

Swedish Institute of Computer Science

View shared research outputs
Top Co-Authors

Avatar

Rickard Cöster

Swedish Institute of Computer Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge