Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gerhard van Huyssteen is active.

Publication


Featured researches published by Gerhard van Huyssteen.


language resources and evaluation | 2011

The South African Human Language Technology Audit

Aditi Sharma Grover; Gerhard van Huyssteen; Marthinus W. Pretorius

Human language technology (HLT) has been identified as a priority area by the South African government. However, despite efforts by government and the research and development (R&D) community, South Africa has not yet been able to maximise the opportunities of HLT and create a thriving HLT industry. One of the key challenges is the fact that there is insufficient codified knowledge about the current South African HLT components, their attributes and existing relationships. Hence a technology audit was conducted for the South African HLT landscape, to create a systematic and detailed inventory of the status of the HLT components across the eleven official languages. Based on the Basic Language Resource Kit (BLaRK) framework Krauwer (ELRA Newslett 3(2), 1998), we used various data collection methods (such as focus groups, questionnaires and personal consultations with HLT experts) to gather detailed information. The South African HLT landscape is analysed using a number of complementary approaches and based on the interpretations of the results, recommendations are made on how to accelerate HLT development in South Africa, as well as on how to conduct similar audits in other countries and contexts.


South African journal of african languages | 2005

Automatic lemmatization in Setswana: towards a prototype

Karien Brits; Rigardt Pretorius; Gerhard van Huyssteen

Development of human language technologies for the indigenous South African languages is currently being undertaken in various projects across South Africa. In one such project a lemmatizer for Setswana is being developed, and this article reports on work towards the development of a first prototype. A prerequisite of lemmatization is to determine what the output of a lemmatizer for a specific language should be (i.e. what should be considered a lemma in that language). Consequently, the concept of a lemma as it should be understood in the context of Setswana lemmatization is defined, and it is indicated that only nouns and verbs really pose challenges for the lemmatization of Setswana. The computational approach taken in this research, and the implementation applied, which use FSA 6, are described at length. Preliminary results indicate that the rules for nouns and verbs are rather accurate, with precision scores of 93–94% obtained in a small, contained experiment. The article concludes with a discussion of future work.


south african institute of computer scientists and information technologists | 2010

An overview of HLTs for South African Bantu languages

Aditi Sharma Grover; Karen Calteaux; Gerhard van Huyssteen; Marthinus W. Pretorius

South Africa (SA) is one of the few countries in the world that boasts a large number of official languages. Due to the efforts of government and the local research and development (R&D) community (comprising universities, science councils and a few private sector companies) all the official languages are -- to varying degrees -- enabled with regard to human language technology (HLT). We present in this paper the current status of HLTs for a few selected official South African languages, namely isiZulu, Sepedi, Tshivenda and, Xitsonga based on a national HLT audit covering all official languages of South Africa. We discuss the HLT position of the above languages in relation to other official South African languages, and also explore the types of data collections, technology modules and applications currently available in the R&D community for these four languages.


Archive | 2007

Designing an e-Learning System for Language Learning: A Case Study

Gerhard van Huyssteen

Within the South African context, e-learning provides various opportunities to contribute towards a multilingual society. This paper describes a new project, ICALLESAL (Intelligent Computer-Assisted Language Learning for Eleven South African Languages), where an e-learning system is being developed for the acquisition of the official South African languages. The paper commences by defining computer-assisted language learning (CALL) and intelligent computer-assisted language learning (ICALL) within the context of e-learning. The benefits of CALL within the South African context are discussed, with specific focus on how it could promote a culture of multilingualism, and also help towards bridging the Digital Divide in South Africa. In the subsequent section, the ICALLESAL system is discussed in more detail by presenting various technologies, content objects, and features of the system.


meeting of the association for computational linguistics | 2005

Teaching Language Technology at the North-West University

Suléne Pilon; Gerhard van Huyssteen; Bertus van Rooy

The BA Language Technology program was recently introduced at the North-West University and is, to date, the only of its kind in South Africa. This paper gives an overview of the program, which consists of computational linguistic subjects as well as subjects from languages, computer science, mathematics, and statistics. A brief discussion of the content of the program and specifically the computational linguistics subjects, illustrates that the BA Language Technology program is a vocationally directed, future oriented teaching program, preparing students for both future graduate studies and a career in language technology. By means of an example, it is then illustrated how students and researchers alike benefit from working side by side on research and development projects by using a problem-based, project-organized approach to curriculum design and teaching.


Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014) | 2014

A Taxonomy for Afrikaans and Dutch Compounds

Gerhard van Huyssteen; Ben Verhoeven

The linguistic categorisation of compounds dates back to some of the earliest work in linguistics. The cross-linguistic compound taxonomy of Bisetto and Scalise (2005), later refined in Scalise and Bisetto (2009), is well-known in linguistics for understanding the grammatical relations in compounds. Although this taxonomy has not been used extensively in the field of computational linguistics, it has the potential to influence choices with regard to compound annotation and understanding in natural language processing. For example, their 2005 taxonomy formed the basis for the large-scale, multilingual database of compounds, called CompoNet. The aim of this paper is to examine their latest taxonomy critically, especially with a view on rigorous implementation in computational environments (e.g. for the morphological annotation of compounds). We propose a number of general improvements of their taxonomy, as well as some language-specific refinements.


Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014) | 2014

Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch

Ben Verhoeven; Menno van Zaanen; Walter Daelemans; Gerhard van Huyssteen

Compounding, the process of combining several simplex words into a complex whole, is a productive process in a wide range of languages. In particular, concatenative compounding, in which the components are “glued” together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro project, which focuses on compounding in the closely related languages Afrikaans and Dutch. The project consists of subprojects focusing on compound splitting (identifying the boundaries of the components) and compound semantics (identifying semantic relations between the components). We describe the developed datasets as well as results showing the effectiveness of the developed datasets.


Archive | 2018

The Hulle and Goed Constructions in Afrikaans

Gerhard van Huyssteen

Over the past more than 100 years, Afrikaans associative plural constructions – especially constructions with hulle (‘they’) and goed (‘things/stuff; good’) as right-hand components – have been studied from both diachronic and synchronic perspectives, but with the main interest in their origins, and what they could tell us about the genesis of Afrikaans. One school of thought claims that they both have Germanic roots, while the other school maintains that both are creole constructions. No definitive conclusions have been reached. Moreover, there is no consensus on whether these constructions should be regarded as noun phrases, compounds, or derived words. The most recent synchronic description of the hulle construction was published in 1969, and the last synchronic description of the goed construction in 1989. In the absence of corpus data, unsubstantiated claims about these constructions abound in the literature. This article presents a synchronic, corpus-based, constructionist description of these two Afrikaans constructions. They are characterised as hybrid constructions on a scale between compounds and derivations, while some remarks on their productivity are made. Based on detailed analyses of their right- and left-hand components, the article concludes with a categorisation network of the schemas and subschemas of these constructions.


Tydskrif vir Geesteswetenskappe | 2016

The Virtual Institute for Afrikaans and the Afrikaans community's market needs

Gerhard van Huyssteen; Melodi Botha; Alex Antonites

The Virtual Institute for Afrikaans and the Afrikaans communitys market needs The Virtual Institute for Afrikaans (VivA) is a research institute and service provider for Afrikaans in digital contexts. It is a registered non-profit company, with the Afrikaanse Taal- en Kultuurvereniging (ATKV), North-West University (NWU), Suid-Afrikaanse Akademie vir Wetenskap en Kuns (SAAWK), and Trust vir Afrikaanse Onderwys (TAO) as its founding members. In order to make informed choices regarding VivAs product and service offering, mixed method research was conducted to determine shortcomings in the Afrikaans offering of digital language products. For purposes of the quantitative research, an online questionnaire was completed by 319 respondents (demographic representation of mostly white, mother-tongue speakers of Afrikaans between the ages of 30 and 65), while a focus group with ten respondents (mostly white, mother-tongue speakers of Afrikaans between 15 and 62) was used to gather qualitative information. The focus group session was recorded, transcribed, coded and then analysed to derive seven key themes that are associated with VivA. One of the key findings is that a large part of the Afrikaans users in this sample did not know of the existence of the Afrikaans Wiktionary and Wikipedia. This finding directed VivAs priorities in other directions, although it will keep on exploring ideas and methods to change this perception.It was also clear that Afrikaans users have a need for four specific Afrikaans electronic aids, namely an online/mobile version of the Afrikaanse Woordelys en Spelreels (Afrikaans Word-list and Spelling Rules); an Afrikaans grammar checker; a terminology bank; and automatictranslation tools. Despite the fact that the majority of respondents had a fairly negative experience with regard to automatic translation assistance, it was found that a significant number of respondents are still positive about it, and have a strong need for such a high-quality product. On the basis of this research, the needs of the Afrikaans community related to language products and services were determined, and various products and services were introduced in order to meet these identified needs. Hence, VivAs initial products and services offering includes:a dictionary portal (where users can access various free and commercial dictionaries online, as well as via an online and offline Android and iOS app); grammar portal (where users, especially international researchers, can access extensive information about the phonology, morphology and syntax of Afrikaans, presented comparatively with Dutch and Frisian as part of the international Taalportaal project); language advice portal (where users can get telephonic and online answers to language-related questions from a professional language advisor); corpus portal (where users can do online corpus queries in a large and growing collection of written and transcribed spoken Afrikaans corpora); and information portal (with access to a blog, competitions, etcetera). The article concludes with an overview of potential future research and development topics, including a motivation for the need for regular technology audits.


Stellenbosch Papers in Linguistics Plus | 2015

Afrikaans and Dutch as closely-related languages: a comparison to West Germanic languages and Dutch dialects

Wilbert Heeringa; Febe de Wet; Gerhard van Huyssteen

Following Den Besten’s (2009) desiderata for historical linguistics of Afrikaans, this article aims to contribute some modern evidence to the debate regarding the founding dialects of Afrikaans. From an applied perspective (i.e. human language technology), we aim to determine which West Germanic language(s) and/or dialect(s) would be best suited for the purposes of recycling speech resources for the benefit of developing speech technologies for Afrikaans. Being recognised as a West Germanic language, Afrikaans is first compared to Standard Dutch, Standard Frisian and Standard German. Pronunciation distances are measured by means of Levenshtein distances. Afrikaans is found to be closest to Standard Dutch. Secondly, Afrikaans is compared to 361 Dutch dialectal varieties in the Netherlands and North-Belgium, using material from the Reeks Nederlandse Dialectatlassen , a series of dialect atlases compiled by Blancquaert and Pee in the period 1925-1982 which cover the Dutch dialect area. Afrikaans is found to be closest to the South-Holland dialectal variety of Zoetermeer; this largely agrees with the findings of Kloeke (1950). No speech resources are available for Zoetermeer, but such resources are available for Standard Dutch. Although the dialect of Zoetermeer is significantly closer to Afrikaans than Standard Dutch is, Standard Dutch speech resources might be a good substitute.

Collaboration


Dive into the Gerhard van Huyssteen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aditi Sharma Grover

Council of Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Karen Calteaux

Council of Scientific and Industrial Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge