Oliver Hellwig
Heidelberg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oliver Hellwig.
Sanskrit Computational Linguistics | 2009
Oliver Hellwig
SanskritTagger is a stochastic tagger for unpreprocessed Sanskrit text. The tagger tokenises text and performs part-of-speech tagging using a Markov model. Parameters for these processes are estimated from a manually annotated corpus that currently comprises approximately 1,500,000 words. This article sketches the tagging process, reports the results of tagging a few short passages of Sanskrit text and describes further improvements of the program.
Proceedings of the 3rd International Symposium on Sanskrit Computational Linguistics | 2008
Oliver Hellwig
In this paper, I describe a hybrid dependency tree parser for Sanskrit sentences improving on a purely lexical parsing approach through simple syntactic rules and grammatical information. The performance of the parser is demonstrated on a group of sentences from epic literature.
Language Technology for Cultural Heritage | 2011
Nils Reiter; Oliver Hellwig; Anette Frank; Irina Gossmann; Borayin Larios; Julio Rodrigues; Britta D. Zeller
In this paper we investigate the use of standard natural language processing (NLP) tools and annotation methods for processing linguistic data from ritual science, which is concerned with the study of structure and variance of rituals. The work is embedded in an interdisciplinary project that addresses this study by applying empirical and quantitative computational linguistic analysis techniques to ritual descriptions from Indian rituals.We present motivation and prospects of such a computational approach to ritual structure research and sketch the overall project research plan. In particular, we motivate the choice of frame semantics as a theoretical framework for the semantic analysis of rituals. We discuss the special characteristics of the textual data and examine several domain adaptation strategies in order to use standard NLP resources and tools on the ritual domain. We also report on our workflows and methods for semi-automatic semantic annotation, which is used as a basis for the extraction of event chains. We close with some preliminary investigations on how to uncover regularities and differences of rituals.-
Literary and Linguistic Computing | 2009
Oliver Hellwig
Indian alchemy, a branch of traditional Indian medicine (Āyurveda), has produced a corpus of texts that are difficult to date using regular philological techniques. This article describes a contents-based computational method that is capable of calculating the relative chronology of these texts. Central parts of alchemical literature are encoded in a language model that can be understood by a computer and then compared with an alignment algorithm. Phylogenetic trees derived from these alignments show regularities in the ordering of alchemical texts, and these may be interpreted as temporal patterns. Processing these patterns with a minimization algorithm, we are able to compute a relative chronology of the corpus, which is largely consistent with results obtained using traditional philological techniques.
International Sanskrit Computational Linguistics Symposium | 2010
Oliver Hellwig
Due to the phonetic, morphological, and lexical complexity of Sanskrit, the automatic analysis of this language is a real challenge in the area of natural language processing. The paper describes a series of tests that were performed to assess the accuracy of the tagging program SanskritTagger. To our knowlegde, it offers the first reliable benchmark data for evaluating the quality of taggers for Sanskrit using an unrestricted dictionary and texts from different domains. Based on a detailed analysis of the test results, the paper points out possible directions for future improvements of statistical tagging procedures for Sanskrit.
Literary and Linguistic Computing | 2010
Oliver Hellwig
The article examines how the etymological composition of the Sanskrit lexicon is influenced by time and whether this composition can be used to date Sanskrit texts automatically. For this purpose, statistical tests are applied to a corpus of lexically analyzed texts. Results reported in the article may contribute to the diachronic lexicography of Sanskrit and help to develop computational methods for analyzing anonymous and undated Sanskrit texts.
Literary and Linguistic Computing | 2014
Nils Reiter; Anette Frank; Oliver Hellwig
language resources and evaluation | 2010
Nils Reiter; Oliver Hellwig; Anand Mishra; Anette Frank; Jens Burkhardt
Linguistic Issues in Language Technology | 2012
Anette Frank; Thomas; Oliver Hellwig; Nils Reiter
Archive | 2011
Nils Reiter; Oliver Hellwig; Anette Frank