Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Elke Teich is active.

Publication


Featured researches published by Elke Teich.


Information Processing and Management | 1995

Selective information presentation in an integrated publication system: an application of genre-driven text generation

John A. Bateman; Elke Teich

Abstract In this paper we focus on an experimental application scenario in which the presentation of appropriately selected information is crucial. The scenario involves an editor working on producing a large-scale encyclopedia on the basis of a large number of submitted source articles. In order to make editorial decisions, that editor needs to have access to dynamically selected aspects of the contents of those articles—;a “summarization” of that content needs to be achieved. We present a first prototype that provides a generic basis for such a functionality. The essential features of our system supporting this functionality build on multilingual, genre-driven automatic text generation. The central role of genre in this model is motivated and briefly illustrated by considering examples of generated texts. The scenario as a whole naturally extends to allow considerations of the information needs of the information-seeking non-expert and to open information systems.


Speech Communication | 1997

From communicative context to speech: integrating dialogue processing, speech production and natural language generation

Elke Teich; Eli Hagen; Brigitte Grote; John A. Bateman

Abstract The current article discusses the problem of appropriate intonation selection in Person-Machine dialogues, such as those expected in intelligent information systems when, for example, information retrieval is required. An approach is proposed which integrates the previously mostly separate paradigms of automatic natural language generation and speech synthesis in a Person-Machine dialogue scenario. The article introduces the two independent basis components adopted in the approach — a dialogue model for information retrieval (COR) and a text generation system for German (KOMET-PENMAN) — and develops from these a communicative-context-to-speech system architecture. This system provides for the flexible and context-appropriate selection of intonation patterns. The paper argues that such an approach removes some of the well-known gaps in both text-to-speech and concept-to-speech systems.


8th Conference of the American-Association-for-Corpus-Linguistics | 2010

Exploring a corpus of scientific texts using data mining

Elke Teich; Peter Fankhauser

We report on a project investigating the linguistic properties of English scientific texts on the basis of a corpus of journal articles from nine academic disciplines. The goal of the project is to gain insights on registers emerging at the boundaries of computer science and some other discipline (e.g., bioinformatics, computational linguistics, computational engineering). The questions we focus on in this paper are (a) how characteristic is the corpus of the meta-register it represents, and (b) how different/similar are the subcorpora in terms of the more specific registers they instantiate? We analyze the corpus using several data-mining techniques, including feature ranking, clustering, and classification, to see how the subcorpora group in terms of selected linguistic features. The results show that our corpus is well distinguished in terms of the meta-register of scientific writing; also, we find interesting distinctive features for the subcorpora as indicators of register diversification. Apart from presenting the results of our analyses, we will also reflect upon and assess the use of data mining for the tasks of corpus exploration and analysis.


international conference on computational linguistics | 2000

Multilinguality in a text generation system for three Slavic languages

Geert-Jan M. Kruijff; Elke Teich; John A. Bateman; Ivana Kruijff-Korbayová; Hana Skoumalová; Serge Sharoff; Lena Sokolova; Tony Hartley; Kamenka Staykova; Jiří Hana

This paper describes a multilingual text generation system in the domain of CAD/CAM software instructions for Bulgarian, Czech and Russian. Starting from a language-independent semantic representation, the system drafts natural, continuous text as typically found in software manuals. The core modules for strategic and tactical generation are implemented using the KPML platform for linguistic resource development and generation. Prominent characteristics of the approach implemented are a treatment of multilinguality that makes maximal use of the commonalities between languages while also accounting for their differences and a common representational strategy for both text planning and sentence generation.


natural language generation | 1993

Multilingual Textuality: Some Experiences from Multilingual Text Generation

Elke Teich; Liesbeth Degand; John A. Bateman

The present article describes an approach to multilingual text generation focusing on how ‘textuality’ is achieved across languages (here: English, German, and Dutch). We specify an appropriately abstract level of textual semantics that can accomodate both commonalities and differences between languages. We describe the interaction between global-level discourse semantics and grammar via the newly introduced level of local-level discourse semantics that mediates information between global text structure and the lower linguistic levels, such as grammar and lexis. The implementational basis is the komet-penman multilingual grammar development environment which relies on resource sharing across languages on all strata of the linguistic system. The inclusion of global and local level discourse semantics is thus a straightforward extension of the komet-penman system, making use of the same kinds of representation and multilingual processing as employed for the lexico-grammar.


european conference on artificial intelligence | 1996

Speech Production in Human-Machine Dialogue: A Natural Language Generation Perspective

Brigitte Grote; Eli Hagen; Adelheit Stein; Elke Teich

This article discusses speech production in dialogue from the perspective of natural language generation, focusing on the selection of appropriate intonation. We argue that in order to assign appropriate intonation contours in speech producing systems, it is vital to acknowledge the diversity of functions that intonation fulfills and to account for communicative and immediate contexts as major factors constraining intonation selection. Bringing forward arguments from a functional-linguistically motivated natural language generation architecture, we present a model of context-to-speech as an alternative to the traditional text-to-speech and concept-to-speech approaches.


Künstliche Intelligenz | 2016

Information Density and Linguistic Encoding (IDeaL)

Matthew W. Crocker; Vera Demberg; Elke Teich

We introduce IDeaL (Information Density and Linguistic Encoding), a collaborative research center that investigates the hypothesis that language use may be driven by the optimal use of the communication channel. From the point of view of linguistics, our approach promises to shed light on selected aspects of language variation that are hitherto not sufficiently explained. Applications of our research can be envisaged in various areas of natural language processing and AI, including machine translation, text generation, speech synthesis and multimodal interfaces.


natural language generation | 2001

Linear order as higher-level decision: information structure in strategic and tactical generation

Geert-Jan M. Kruijff; Ivana Kruijff-Korbayová; John A. Bateman; Elke Teich

We propose a multilingual approach to characterizing word order at the clause level as a means to realize information structure. We illustrate the problem with three languages which differ in the degree of word order freedom they exhibit: Czech, a free word order language in which word order variation is pragmatically determined; English, a fixed word order language in which word order is primarily grammatically determined; and German, a language which is between Czech and English on the scale of word order freedom. Our work is theoretically rooted in previous work on information structuring and word order in the Prague School framework as well as on the systemic-functional notion of Theme. The approach we present has been implemented in KPML.


association for information science and technology | 2016

The linguistic construal of disciplinarity: A data-mining approach using register features

Elke Teich; Stefania Degaetano-Ortlieb; Peter Fankhauser; Hannah Kermes; Ekaterina Lapshinova-Koltunski

We analyze the linguistic evolution of selected scientific disciplines over a 30‐year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use—both individually and collectively—over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus‐based methods of feature extraction (various aggregated features [part‐of‐speech based], n‐grams, lexico‐grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.


Lexicographica: International annual for lexicography | 2012

Formulaic expressions in scientific texts: Corpus design, extraction and exploration

Hannah Kermes; Elke Teich

Introduction Main Conclusion 229 231 230 226 4408 4687 4639 4801 0.052 0.0493 0.0496 0.0471 Table 7: Type-token-ratio of 4gram formulaic expressions across text parts Figure 7: Ranking of the top 20 formula of Computer Science 114 Hannah Kermes and Elke Teich In contrast to the type-token distribution for academic disciplines, we can observe only very slight differences. The highest density of formulaic expressions is in the Conclusion part, which uses the most formulas, however also the fewest types. The fewest formulas are used in the Abstract. We would have expected a more evident difference. For further exploration, we will have a look at the distribution of structural types of formulas across text parts shown in decreasing order of frequency as a parallel coordinate plot in Figure 10. We can observe that most of the structural types spread more or less evenly throughout the different text parts. The structural type base_VP_mod shows a clear tendency to occur in the Conclusion. The modals can, could are used to refl ect on the presented research, and the modal would is used to express acknowledgments and to point to future work. If we look at the lexical fi llers, the three most frequent formulas in the conclusion are would like to thank, the authors would like, authors would like to, all of which are rather low in frequency in the other text parts. As we look at a specifi c ngram length only, the resulting structures may include formulas which belong to longer units. It is very likely that most of the instances of the three formulas belong to a longer unit ((the authors) would like to thank). For a full coverage, it is thus desirable to include ngrams of different lengths. Figure 8: Comparison of the top 20 formula in A-B1-C1 115 Formulaic expressions in scientifi c texts: Corpus design, extraction and exploration We can further observe a rather high peak for base_VP in the Introduction. In order to explain this, we have to look more closely at the lexical fi llers. There are only 11 different formulas in this class. Three of these formulas occur almost exclusively in the Introduction (around 90% of the occurrences) and are among the top ten of formulas in this text part but not among the top 50 overall: the paper is organized, paper is organized as, is organized as follows. Again, the three 4grams probably belong to a longer unit ((the) paper is organized as (follows)). The picture gets more obvious, if we look at the ranking of the 20 most frequent formulaic expressions occurring in the Introduction (cf. Figure 11). We can see that the ranks of these formulaic expressions are extremely low for all other text parts. All of these formulaic expressions function as markers introducing a specifi c content (e.g. how the paper is structured, which is a typical piece of information in introductory sections). We encounter a similar picture for the other expressions (e.g. of this paper is, in this paper we), which are also used quite frequently in the Abstract and the Conclusion parts as well. Figure 9: Comparison of the top 20 formula in A-B4-C4 116 Hannah Kermes and Elke Teich 5. Conclusions and future work We have presented a methodology for the extraction of formulaic expressions and the calculation of their frequency distributions on an automatic basis. The pipeline we have built for this purpose allows to apply several (related) queries consecutively in order to extract information about the usage of formulas according to selected parameters (here: academic disciplines, text parts). The process is easily reproducible and applicable to other corpora and parameters (provided the necessary information is encoded). The pipeline also includes multiple sorting and grouping options and different kinds of statistical analysis as well as visualization of the results. We have shown selected analyses using the pipeline on a corpus of scientifi c texts, focusing on 4grams. The results show differences with respect to the distribution of formulas and their structural types across academic disciplines and text parts. We could further observe that academic disciplines differ with respect to the density of formulas: Linguistics, Biology and Computational Linguistics are less dense than the other six disciplines included in the corpus. With regard to text parts, the Conclusion has the highest density of formulas, while the other text parts (Abstract, Introduction, Main) are rather similar. To further interpret these results, we need to know about the functions of the formulas we have extracted. We have performed some preliminary experiments in clustering of formulas in order to determine their functions. The results look promising but further information will have to be included in the classifi cation, such as collocation information, syntactic properties (word order changes, syntactic function, etc.) and morpho-syntactic properties (infl ection, determiners, etc.). Figure 10: 4gram distribution of structural types across text parts (frequency per million) 117 Formulaic expressions in scientifi c texts: Corpus design, extraction and exploration Our longer term goals are twofold. First, we want to investigate the diachronic dimension, looking at recent diachronic changes in connection with the evolution of the contact disciplines (i.e. Computational Linguistics, Bioinformatics, Digital Construction, Micro-Electronics). Here, we are interested in processes of diversifi cation (i.e. as a discipline matures, we would expect it to develop distinctive patterns of linguistic variation) as well as standardization (i.e. as a discipline matures, we would expect it to develop a fairly stable set of recurring linguistic patterns with rather little variation). Second, as mentioned in Section 1, we are planning to build a digital resource for use in language pedagogy that may serve both students and teachers as a source of information on scientifi c writing. Essentially this will take the form of an on-line corpus annotated at various linguistic levels and in terms of various linguistic phenomena, including formulaic expressions, very much in the spirit of Davies’ WORD AND PHRASE INFO. The corpus may be queried and/or browsed. We will make additional information available about the annotated formulas, e.g., information about frequency distribution, collocations, structural type, and eventually also functional information. Thus, although not prototypical for electronic dictionaries, the resource can provide valuable lexicographic information. Due to the nature of formulaic expressions, we believe it is essential to have a close connection to a corpus as the usage of formulas is most important for the potential user. Figure 11: Ranking of the 20 most frequent formula in the Introduction 118 Hannah Kermes and Elke Teich We are planning to build a processing pipeline for the annotation process as well. Together with the dedicated processing pipelines for extraction and analysis this will potentially provide the possibility for students and teachers to analyze and annotate their own corpora (either corpora from other registers or student essays). This could potentially reveal shortcomings and strengths of an essay with respect to the usage of formulaic expressions (and possibly other phenomena), and might provide helpful information to better master these important building blocks of discourse. Both the corpus and the processing pipelines will be made available through the CLARIN-D infrastructure.

Collaboration


Dive into the Elke Teich's collaboration.

Top Co-Authors

Avatar

Peter Fankhauser

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard Eckart

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sabine Bartsch

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge