Sebastian Nordhoff
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sebastian Nordhoff.
Archive | 2012
Christian Chiarcos; Sebastian Nordhoff; Sebastian Hellmann
In this paper we describe a practical approach to the challenge of linguistic retrodigitization. We propose to distinguish strictly between a base digitization and separate interpretation of the sources. The base digitization only includes a literal electronic transcript of the source. All sources are thus simply treated as strings of characters, i.e. as unstructured corpora. The often complex structure as found in many dictionaries and grammars will subsequently (and possibly much later) be added as Linked Data in the form of standoff annotation. A further advantage of this approach is that the complete digitization and interpretation can be performed collaboratively without a complex organizational superstructure.
Linked Data in Linguistics | 2012
Christian Chiarcos; Sebastian Hellmann; Sebastian Nordhoff
The contributions of this part have described recent activities of the OWLG as a whole and of individual OWLG members aiming to provide linguistic resources as Linked Data. Here, we describe how linguistic resources can be linked with each other, and we illustrate possible use cases of information integration from various sources with example queries for the major types of linguistic resources: Using DBpedia (Hellmann et al., this vol.) to represent lexical-semantic resource, the German NEGRA corpus in its POWLA representation (Chiarcos, this vol.) to represent linguistic corpora, the OLiA ontologies (Chiarcos, this vol.) to represent repositories of linguistic terminology, and languoid definitions in Glottolog/Langdoc (Nordhoff, this vol.) to represent linguistic knowledge bases and metadata repositories.
Linked Data in Linguistics | 2012
Christian Chiarcos; Sebastian Hellmann; Sebastian Nordhoff
The Open Linguistics Working Group (OWLG) is an initiative of experts from different fields concerned with linguistic data, including academic linguistics (e.g. typology, corpus linguistics), applied linguistics (e.g. computational linguistics, lexicography and language documentation) and NLP (e.g. from the Semantic Web community). The primary goals of the working group are 1) the promotion of the idea of open linguistic resources 2) the development of means for their representation, and 3) encouraging the exchange of ideas across different disciplines.
Linked Data in Linguistics | 2012
Sebastian Nordhoff
Most of the linguistic resources available to day are about the world’s major languages. This paper discusses two projects which have world-wide coverage as their aim. Glottolog/Langdoc is an attempt to attain near-complete bibliographical coverage for the world’s lesser-known languages (i.e. 95% of the world’s linguistic diversity), which then provides solid empirical ground for extensional definitions of languages and language classification. Automated Similarity Judgment Program (ASJP) online provides standardized lexical distance data for 5800 languages as Linked Data. These two projects are the first attempt at a Typological Linked Data Cloud, to which PHOIBLE by other resources can easily be added in the future.
The People's Web Meets NLP | 2013
Christian Chiarcos; Steven Moran; Pablo N. Mendes; Sebastian Nordhoff; Richard Littauer
We describe on going community-efforts to create a Linked Open Data (sub-)cloud of linguistic resources, with an emphasis on resources that are specific to linguistic research, namely annotated corpora and linguistic databases. We argue that for both types of resources, the application of the Linked Open Data paradigm and the representation in RDF represents a promising approach to address interoperability problems, and to integrate information from different repositories. This is illustrated with example studies for different kinds of linguistic resources.The efforts described in this chapter are conducted in the context of the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation. The OWLG is a network of researchers interested in linguistic resources and/or their publication under open licenses, and a number of its members are engaged in the application of the Linked Open Data paradigm to their resources. Under the umbrella of the OWLG, these efforts will eventually emerge in the creation of a Linguistic Linked Open Data cloud (LLOD).
Folia Linguistica | 2011
Sebastian Nordhoff
Sri Lanka Malay has innovated a prominent and productive copula, which sets it apart from other descendents of colloquial Malay varieties. This copula has developed from the verb dhaathang ‘to come’, which is a grammaticalization path not yet attested in the literature. This article describes the forms and functions of this copula and shows that it cannot be traced to any of the main input languages of Sri Lanka Malay (Trade Malay, Tamil, Sinhala). Comparing the Sri Lanka Malay case to attested grammaticalization paths, this article concludes that the grammaticalization of COME to a copula is less surprising when assuming intermediate stages of ‘resultative’ and ‘stative’. These subpaths are illustrated by a variety of Creole and non-Creole languages.
Journal of Language Contact | 2012
Sebastian Nordhoff
The study of Sri Lanka Malay has focussed on the genesis scenario, where theories of creolization (Smith et al., 2004; Smith & Paauw, 2006) with a dominant role of Tamil compete with theories of convergence (Bakker, 2006; Ansaldo, 2008), which allow for a more important role of Sinhala. This paper assesses and reevaluates the empirical data brought forward by both sides and contributes more empirical data on parallels with Sinhala. These parallels are partly due to substrate reinforcement (Siegel, 1998) of marginal structures found in Malay varieties, partly they are clear calques on Sinhala patterns. Some structures must be analysed as the result of Early Sinhala Influence during the colonial period, while for others, a later development following socio-political changes after independence is possible (Late Sinhala Influence). The paper argues that SLM changes towards Sinhala at both periods can be seen as a kind of metatypy comparable to other language contact settings in Eurasia and Papua.
Journal of Language Contact | 2012
Sebastian Nordhoff
In his two contributions to this issue, Ian Smith nicely sets out criteria to establish language contact. Unfortunately, a rigorous application of the standards listed by Thomason (2001), which he endorses, is detrimental to his argumentation based on the Tamil accusative. Smith furthermore argues that phonological and syntactic influence should go together. This is intended to discredit Sinhala influence, but closer scrutiny of the argument shows that it actually discredits Tamil influence.Smith’s papers furthermore are not informed by the socio-historical data and analysis presented in Nordhoff (2009), which are not compatible with his approach. Furthermore, Smith lists a phonological analysis based on syllable weight as a desideratum; such an analysis is also already found in Nordhoff 2009 and should have been consulted.
TAL Traitement Automatique des Langues | 2011
Christian Chiarcos; Sebastian Hellmann; Sebastian Nordhoff
Archive | 2012
Christian Chiarcos; Sebastian Nordhoff; Sebastian Hellmann