Eric Atwell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric Atwell is active.

Explore More

Publication

Featured researches published by Eric Atwell.

conference of the european chapter of the association for computational linguistics | 1987

How to detect grammatical errors in a text without parsing it

Eric Atwell

The Constituent Likelihood Automatic Word-tagging System (CLAWS) was originally designed for the low-level grammatical analysis of the million-word LOB Corpus of English text samples. CLAWS does not attempt a full parse, but uses a first-order Markov model of language to assign word-class labels to words. CLAWS can be modified to detect grammatical errors, essentially by flagging unlikely word-class transitions in the input text. This may seem to be an intuitively implausible and theoretically inadequate model of natural language syntax, but nevertheless it can successfully pinpoint most grammatical errors in a text. Several modifications to CLAWS have been explored. The resulting system cannot detect all errors in typed documents; but then neither do far more complex systems, which attempt a full parse, requiring much greater computation.

Journal of Experimental and Theoretical Artificial Intelligence | 1990

Natural language analysis by stochastic optimization: a progress report on project APRIL

Geoffrey Sampson; Robin Haigh; Eric Atwell

Abstract Parsing techniques based on rules defining grammaticality are difficult to use with authentic natural-language inputs, which are often grammatically messy. Instead, the APRIL system seeks a labelled tree structure which maximizes a numerical measure of conformity to statistical norms derived from a sample of parsed text. No distinction between legal and illegal trees arises: any labelled tree has a value. Because the search space is large and has an irregular geometry, APRIL seeks the best tree using simulated annealing, a stochastic optimization technique. Beginning with an arbitrary tree, many randomly-generated local modifications are considered and adopted or rejected according to their effect on tree-value: acceptance decisions are made probabilistically, subject to a bias against adverse moves which is very weak at the outset but is made to increase as the random walk through the search space continues. This enables the system to converge on the global optimum without getting trapped in loc...

applications of natural language to data bases | 2004

Accessing an Information System by Chatting

Bayan Abu Shawar; Eric Atwell

In this paper, we describe a new way to access information by “chatting” to an information source. This involves a chatbot, a program that emulates human conversation; the chatbot must be trainable with a text, to accept input and match it against the text to generate replies in the conversation. We have developed a Machine Learning approach to retrain the ALICE chatbot with a transcript of human dialogue, and used this to develop a range of chatbots conversing in different styles and languages. We adapted this chatbot-training program to the Qur’an, to allow users to learn from the Qur’an in a conversational information-access style. The process and results are illustrated in this paper.

Southern African Linguistics and Applied Language Studies | 2003

Using the Corpus of Spoken Afrikaans to generate an Afrikaans chatbot

Bayan Abu Shawar; Eric Atwell

This paper presents two chatbot systems, ALICE and Elizabeth, illustrating the dialogue knowledge representation and pattern matching techniques of each. We discuss the problems which arise when using the Corpus of Spoken Afrikaans (Korpus Gesproke Afrikaans) to retrain the ALICE chatbot system with human dialogue examples. A Java program to convert from dialogue transcripts to the AIML linguistic knowledge representation formalism provides a basic implementation of corpus-based chatbot training. The Java program used the Afrikaans dialogue corpus texts to generate two versions of the Afrikaans chatbot.

conference of the association for machine translation in the americas | 2004

A fluency error categorization scheme to guide automated machine translation evaluation

Debbie Elliott; Anthony Hartley; Eric Atwell

Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designing an automated MT evaluation system, intended for use by post-editors, purchasers and developers, that requires nothing but the raw MT output. Furthermore, our research is based on texts that reflect corporate use of MT. This paper describes our first step in system design: a hierarchical classification scheme of fluency errors in English MT output, to enable us to identify error types and frequencies, and guide the selection of errors for automated detection. We present results from the statistical analysis of 20,000 words of MT output, manually annotated using our classification scheme, and describe correlations between error frequencies and human scores for fluency and adequacy.

international conference on communications | 2013

SALMA: Standard Arabic Language Morphological Analysis

Majdi Sawalha; Eric Atwell; Mohammad A. M. Abushariah

Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. This paper reviews the SALMA-Tools (Standard Arabic Language Morphological Analysis) [1]. The SALMA-Tools is a collection of open-source standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis - particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, fine-grained distinctions may actually help to disambiguate other words in the local context. The SALMA - Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior-knowledge broad-coverage lexical resources; the SALMA - ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA - Tag Set is a standard tag set for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent.

meeting of the association for computational linguistics | 1988

PROJECT APRIL A PROGRESS REPORT

Robin Haigh; Geoffrey Sampson; Eric Atwell

Parsing techniques based on rules defining grammaticality are difficult to use with authentic inputs, which are often grammatically messy. Instead, the APRIL system seeks a labelled tree structure which maximizes a numerical measure of conformity to statistical norms derived from a sample of parsed text. No distinction between legal and illegal trees arises: any labelled tree has a value. Because the search space is large and has an irregular geometry, APRIL seeks the best tree using simulated annealing, a stochastic optimization technique. Beginning with an arbitrary tree, many randomly-generated local modifications are considered and adopted or rejected according to their effect on tree-value: acceptance decisions are made probabilistically, subject to a bias against adverse moves which is very weak at the outset but is made to increase as the random walk through the search space continues. This enables the system to converge on the global optimum without getting trapped in local optima. Performance of an early version of the APRIL system on authentic inputs is yielding analyses with a mean accuracy of 75.3% using a schedule which increases processing linearly with sentence-length; modifications currently being implemented should eliminate a high proportion of the remaining errors.

applications of natural language to data bases | 2016

Arabic Quranic Search Tool Based on Ontology

Mohammad Alqahtani; Eric Atwell

This paper reviews and classifies most of the common types of search techniques that have been applied on the Holy Quran. Then, it addresses the limitations of these methods. Additionally, this paper surveys most existing Quranic ontologies and what are their deficiencies. Finally, it explains a new search tool called: a semantic search tool for Al-Quran based on Qur’anic ontologies. This tool will overcome all limitations in the existing Quranic search applications.

2016 Conference of Basic Sciences and Engineering Studies (SGCAC) | 2016

Quran question and answer corpus for data mining with WEKA

Bothaina Hamoud; Eric Atwell

This paper presents the compilation of a holy Quran question and answer dataset corpus, created for data mining with Waikato Environment for Knowledge Analysis (WEKA). Questions and answers from the Quran were collected from multiple data sources, and then a representative sample of the question and answers were selected to be used in our model. Then the data was cleaned to improve data quality to the level required by the WEKA tool, and then converted to a comma separated value (CSV) file format to provide a suitable corpus dataset that can be loaded into WEKA. Then StringToWordVector filter was used to process each string into a bag or vector of word frequencies for further analysis with different data mining techniques. After that we applied a clustering algorithm to the processed attributes, and show the WEKA cluster visualizer.

international conference on pattern recognition | 1988

Grammatical Analysis of English by Statistical Pattern Recognition

Eric Atwell

Artificial Intelligence and Computational Linguistics researchers are currently debating the value of ‘Deep’ knowledge-representations in language processing and related computations. Incorporating deep knowledge as well as surface statistical pattern recognition requires much greater processing, but it has been assumed that, for many applications of Artificial Intelligence, purely surface statistical analyses cannot yield useful results. One NLP application provides a counter-argument to this widespread tenet: a system for grammatical error detection, using only probabilistic, Markovian, pattern-matching was devised, and in tests compared favourably with a much larger system which computed deep grammatical analyses of each sentence. Those who argue that statistical pattern recognition has no place in Computational Linguistics or Artificial Intelligence have still to prove their case.

Explore More