Sebastian Spiegler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sebastian Spiegler is active.

Explore More

Publication

Featured researches published by Sebastian Spiegler.

spoken language technology workshop | 2008

Learning the morphology of Zulu with different degrees of supervision

Sebastian Spiegler; Bruno Golénia; Ksenia Shalonova; Peter A. Flach; Roger C. F. Tucker

In this paper we compare different levels of supervision for learning the morphology of the indigenous South African language Zulu. After a preliminary analysis of the Zulu data used for our experiments, we concentrate on supervised, semi-supervised and unsupervised approaches comparing strengths and weaknesses of each method. The challenges we face are limited data availability and data sparsity in connection with morphological analysis of indigenous languages. At the end of the paper we draw conclusions for our future work towards a morphological analyzer for Zulu.

cross language evaluation forum | 2009

Unsupervised word decomposition with the promodes algorithm

Sebastian Spiegler; Bruno Golénia; Peter A. Flach

We present PROMODES an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morpho Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.

international conference on e-science | 2010

SubSift Web Services and Workflows for Profiling and Comparing Scientists and Their Published Works

Simon Price; Peter A. Flach; Sebastian Spiegler; Christopher P Bailey; N Rogers

Scientific researchers, laboratories and organisations can be profiled and compared by analysing their published works, including documents ranging from academic papers to web sites, blog posts and Twitter feeds. This paper describes how the vector space model from information retrieval, more normally associated with full text search, has been employed in the open source Sub Sift software to support workflows to profile and compare such collections of documents. Sub Sift was originally designed to match submitted conference or journal papers to potential peer reviewers based on the similarity between the papers abstract and the reviewers publications as found in online bibliographic databases. The software is implemented as a family of Restful web services that, composed into a re-usable workflow, have already been used to support several major data mining conferences. Alternative workflows and service compositions are now enabling other interesting applications.

Future Generation Computer Systems | 2013

SubSift web services and workflows for profiling and comparing scientists and their published works

Simon Price; Peter A. Flach; Sebastian Spiegler; Christopher P Bailey; N Rogers

Scientific researchers, laboratories, organisations and research communities can be profiled and compared by analysing their published works, including documents ranging from academic papers to web sites, blog posts and Twitter feeds. This paper describes how the vector space model from information retrieval, more normally associated with full text search, has been employed in the open source SubSift software to support workflows to profile and compare such collections of documents. SubSift was originally designed to match submitted conference or journal papers to potential peer reviewers based on the similarity between the papers abstract and the reviewers publications as found in online bibliographic databases such as Google Scholar. The software is implemented as a family of RESTful web services that, composed into a re-useable workflow, have already been used to support several major data mining conferences. Alternative workflows and service compositions are now enabling other interesting applications, such as expert finding for the press and media, organisational profiling, and suggesting potential interdisciplinary research partners. This work is a useful generalisation and proof-of-concept realisation of an engineering solution to enable RESTful services to be assembled in workflows to analyse general content in a way that is not immediately available elsewhere. The challenges and lessons learned in the implementation and use of SubSift are discussed. Highlights? We describe a family of RESTful web services for profiling and matching text. ? We introduce five generic e-Research workflows based on the SubSift web services. ? We report on experiences and lessons learned in using SubSift web services.

cross language evaluation forum | 2009

Unsupervised morpheme discovery with ungrade

Bruno Golénia; Sebastian Spiegler; Peter A. Flach

In this paper, we present an unsupervised algorithm for morpheme discovery called UNGRADE (UNsupervised GRAph DEcomposition). UNGRADE works in three steps and can be applied to languages whose words have the structure prefixes-stem-suffixes. In the first step, a stem is obtained for each word using a sliding window, such that the description length of the window is minimised. In the next step prefix and suffix sequences are sought using a morpheme graph. The last step consists in combining morphemes found in the previous steps. UNGRADE has been experimentally evaluated on 5 languages (English, German, Finnish, Turkish and Arabic) with encouraging results.

international conference on computational linguistics | 2010