Steven Moran
University of Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steven Moran.
Science | 2012
Michael Cysouw; Dan Dediu; Steven Moran
We show that Atkinson’s (Reports, 15 April 2011, p. 346) intriguing proposal—that global linguistic diversity supports a single language origin in Africa—is an artifact of using suboptimal data, biased methodology, and unjustified assumptions. We criticize his approach using more suitable data, and we additionally provide new results suggesting a more complex scenario for the emergence of global linguistic diversity.
Sprachwissenschaft | 2015
Mohamed Ahmed Sherif; Axel-Cyrille Ngonga Ngomo; Sebastian Hellmann; Steven Moran; Martin Brümmer; John P. McCrae
In this paper we describe the Semantic Quran dataset, a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources and aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under-represented languages in the Linked Data Cloud, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. We present the ontology devised for structuring the data. We also provide the transformation rules implemented in our extraction framework. Finally, we detail the link creation process as well as possible usage scenarios for the Semantic Quran dataset.
Linked Data in Linguistics | 2012
Steven Moran
In this paper, I describe the challenges in creating a Resource Description Framework (RDF) knowledge base for undertaking phonological typology. RDF is a model for data interchange that encodes representations of knowledge in a graph data structure by using sets of statements that link resource nodes via predicates that can be logically marked-up (Lassila and Swick, 1999). The model I describe uses Linked Data to combine data from disparate segment inventory databases. Once the data in these legacy databases have been made interoperable at the linguistic and computational levels, I show how additional knowledge about distinctive features is linked to the knowledge base. I call this resource the Phonetics Information Base and Lexicon (PHOIBLE, http://phoible.org) and it allows users to query segment inventories from a large number of languages at both the segment and distinctive feature levels (Moran, 2012). I then show how the knowledge base is useful for investigating questions of descriptive phonological universals, e.g. “do all languages have coronals?” and “does every phonological system have at least one front vowel or the palatal glide /j/?” (Hyman, 2008).
Archive | 2014
Steven Moran; Damián E. Blasi
Although complexity of subsystems varies greatly across languages, the compensation hypothesis states that if a language’s structure is complex in one area, it will simplify in another (e.g. Martinet 1955, Hockett 1955, Aitchison 2000). An assumed truism is that these differences balance out cross-linguistically, so that all languages tend to be equally complex (e.g. Hockett 1958, Akmajian et al. 1979, Crystal 1987, McMahon 1994, Dixon 1997). This belief is furthered by the long-held view that linguistic structures are not affected by geographic or societal factors, with vocabulary being an exception, e.g. Sapir 1912.
Frontiers in Psychology | 2015
Sabine Stoll; Taras Zakharko; Steven Moran; Robert Schikowski; Balthasar Bickel
A quantitative analysis of a trans-generational, conversational corpus of Chintang (Tibeto-Burman) speakers with community-wide bilingualism in Nepali (Indo-European) reveals that children show more code-switching into Nepali than older speakers. This confirms earlier proposals in the literature that code-switching in bilingual children decreases when they gain proficiency in their dominant language, especially in vocabulary. Contradicting expectations from other studies, our corpus data also reveal that for adults, multi-word insertions of Nepali into Chintang are just as likely to undergo full syntactic integration as single-word insertions. Speakers of younger generations show less syntactic integration. We propose that this reflects a change between generations, from strongly asymmetrical, Chintang-dominated bilingualism in older generations to more balanced bilingualism where Chintang and Nepali operate as clearly separate systems in younger generations. This change is likely to have been triggered by the increase of Nepali presence over the past few decades.
language resources and evaluation | 2009
Steven Moran
This paper presents the design and implementation of the Ontology for Accessing Transcription Systems (OATS), a knowledge base that supports interoperation over disparate transcription systems and practical orthographies. OATS uses RDF, SPARQL and Unicode to facilitate resource discovery and intelligent search over linguistic data. The knowledge base includes an ontological description of writing systems and relations for mapping transcription system segments to an interlingua pivot, the IPA. It includes orthographic and phonemic inventories from 203 African languages, which were mined from the Web. OATS is motivated by four use cases: querying data in the knowledge base via IPA, querying it in native orthography, error checking of digitized data, and conversion between transcription systems. The model in this paper implements each of these use cases.
Archive | 2013
Jelena Prokić; Steven Moran
In the past 20 years, the application of quantitative methods in historical linguistics has received a lot of attention. Traditional historical linguistics relies on the comparative method in order to determine the genealogical relatedness of languages. More recent quantitative approaches attempt to automate this process, either by developing computational tools that complement the comparative method (Steiner et al. 2010) or by applying fully automatized methods that take into account very limited or no linguistic knowledge, e.g. the Levenshtein approach. The Levenshtein method has been extensively used in dialectometry to measure the distances between various dialects (Kessler 1995; Heeringa 2004; Nerbonne 1996). It has also been frequently used to analyze the relatedness between languages, such as Indo-European (Serva and Petroni 2008; Blanchard et al. 2010), Austronesian (Petroni and Serva 2008), and a very large sample of 3002 languages (Holman 2010). In this paper we will examine the performance of the Levenshtein distance against n-gram models and a zipping approach by applying these methods to the same set of language data. The success of the Levenshtein method is typically evaluated by visually inspecting and comparing the obtained genealogical divisions against already well-established groupings found in the linguistics literature. It has been shown that the Levenshtein method is successful in recovering main languages groups, which for example in the case of Indo-European language family, means that it is able to correctly classify languages into Germanic, Slavic or Romance groups. In a recent analysis of the Austronesian languages by means of Levenshtein distance (Greenhill 2011), the obtained results were evaluated using a more exact method than by visually inspecting the recovered groups. Greenhill (2011) extracted language triplets and compared their subgroupings against those provided by the Ethnologue (Lewis 2009). The possible subgroupings of any three languages included the following: (1) language A is more similar to language B than C, (2) A is more similar to C than B, (3) B is more similar to C than A, or (4) A, B and C are equally
Cognition | 2018
Steven Moran; Damián E. Blasi; Robert Schikowski; Aylin C. Küntay; Barbara Pfeiler; Shanley Allen; Sabine Stoll
Highlights • Data from typologically diverse languages shows common distributional patterns.• Discontinuous repetitive patterns in the input provide cues for category assignment.• Morphological frames accurately predict nouns and verbs in the input to children.
The People's Web Meets NLP | 2013
Christian Chiarcos; Steven Moran; Pablo N. Mendes; Sebastian Nordhoff; Richard Littauer
We describe on going community-efforts to create a Linked Open Data (sub-)cloud of linguistic resources, with an emphasis on resources that are specific to linguistic research, namely annotated corpora and linguistic databases. We argue that for both types of resources, the application of the Linked Open Data paradigm and the representation in RDF represents a promising approach to address interoperability problems, and to integrate information from different repositories. This is illustrated with example studies for different kinds of linguistic resources.The efforts described in this chapter are conducted in the context of the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation. The OWLG is a network of researchers interested in linguistic resources and/or their publication under open licenses, and a number of its members are engaged in the application of the Linked Open Data paradigm to their resources. Under the umbrella of the OWLG, these efforts will eventually emerge in the creation of a Linguistic Linked Open Data cloud (LLOD).
Proceedings of the 12th International Conference on the Evolution of Language (Evolang12) | 2018
Steven Moran; Annemarie Verkerk
Consonants and vowels are processed differently and they seem to have distinct neural representations (Caramazza et al. 2000). Böe et al. (2017) insist that vowel-like systems must be inferred to the last common ancestor of Baboons and humans, 25mya. Unlike vowels, however, consonants appear to be a later innovation in the communication systems of Hominids. Primates, including chimpanzees and orangutans, employ a repertoire of voiceless calls (so-called raspberries), which show homology with voiceless consonants (Lameira et al. 2014). During the course of human evolution, smaller orofacial cavities, increased neuro-cognitive abilities, and more precise motor control of the articulators led to greater phonetic variation, particularly among consonants, which have become phonologized in many ways in different language families. In comparison to vowels, there are over three times as many consonant phonemes in the world’s languages. Their number and diversity ranges greatly, from 6 in Rotokas to over 90 in !Xu (Maddieson 1984); compare vowel systems which range in size from 2 to 14. Why are there are so many more consonants in the world’s languages? The answer to this question is complex, with factors involving a need for increased number of lexical contrasts in order to accommodate a growing vocabulary throughout the evolution of language, and the greater possibility for consonants rather than vowels to increase the number of contrastive sounds in a language through secondary articulations. Two strands of evidence support this conclusion. First, comparing a database of proto-language reconstructions (Marsico et al, accepted; n=100) with modern languages in UPSID (Maddieson 1984), Marsico (1999) notes an increase in the number of consonants in modern 322