Chris Biemann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Biemann is active.

Explore More

Publication

Featured researches published by Chris Biemann.

workshop on graph based methods for natural language processing | 2006

Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems

Chris Biemann

We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the performance of Chinese Whispers is measured on Natural Language Processing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation. At this, the fact is employed that the small-world property holds for many graphs in NLP.

meeting of the association for computational linguistics | 2006

Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering

Chris Biemann

An unsupervised part-of-speech (POS) tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower frequencies. Using the resulting word clusters as a lexicon, a Viterbi POS tagger is trained, which is refined by a morphological component. The approach is evaluated on three different languages by measuring agreement with existing taggers.

Lecture Notes in Computer Science | 2013

The Semantic Web - ISWC 2013

Harith Alani; Lalana Kagal; Achille Fokoue; Paul T. Groth; Chris Biemann; Josiane Xavier Parreira; Lora Aroyo; Natasha Noy; Chris Welty; Krzysztof Janowicz

As collaborative, or network science spreads into more science, engineering and medical fields, both the participants and their funders have expressed a very strong desire for highly functional data and information capabilities that are a) easy to use, b) integrated in a variety of ways, c) leverage prior investments and keep pace with rapid technical change, and d) are not expensive or timeconsuming to build or maintain. In response, and based on our accummulated experience over the last decade and a maturing of several key semantic web approaches, we have adapted, extended, and integrated several open source applications and frameworks that handle major portions of functionality for these platforms. At minimum, these functions include: an object-type repository, collaboration tools, an ability to identify and manage all key entities in the platform, and an integrated portal to manage diverse content and applications, with varied access levels and privacy options. At the same time, there is increasing attention to how researchers present and explain results based on interpretation of increasingly diverse and heterogeneous data and information sources. With the renewed emphasis on good data practices, informatics practitioners have responded to this challenge with maturing informatics-based approaches. These approaches include, but are not limited to, use case development; information modeling and architectures; elaborating vocabularies; mediating interfaces to data and related services on the Web; and traceable provenance. The current era of data-intensive research presents numerous challenges to both individuals and research teams. In environmental science especially, sub-fields that were data-poor are becoming data-rich (volume, type and mode), while some that were largely model/ simulation driven are now dramatically shifting to data-driven or least to data-model assimilation approaches. These paradigm shifts make it very hard for researchers used to one mode to shift to another, let alone produce products of their work that are usable or understandable by non-specialists. However, it is exactly at these frontiers where much of the exciting environmental science needs to be performed and appreciated.

Journal of Language Modelling | 2013

Text: now in 2D! A framework for lexical expansion with contextual similarity

Chris Biemann; Martin Riedl

A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic relations are annotated in the text, but also paradigmatic relations are made explicit by generating lexical expansions. We operationalize distributional similarity in a general framework for large corpora, and describe a new method to generate similar terms in context. Our evaluation shows that distributional similarity is able to produce highquality lexical resources in an unsupervised and knowledge-free way, and that our highly scalable similarity measure yields better scores in a WordNet-based evaluation than previous measures for very large corpora. Evaluating on a lexical substitution task, we find that our contextualization method improves over a non-contextualized baseline across all parts of speech, and we show how the metaphor can be applied successfully to part-of-speech tagging. A number of ways to extend and improve the contextualization method within our framework are discussed. As opposed to comparable approaches, our framework defines a model of lexical expansions in context that can generate the expansions as opposed to ranking a given list, and thus does not require existing lexical-semantic resources.

north american chapter of the association for computational linguistics | 2015

Do Supervised Distributional Methods Really Learn Lexical Inference Relations

Omer Levy; Steffen Remus; Chris Biemann; Ido Dagan

Distributional representations of words have been recently used in supervised settings for recognizing lexical inference relations between word pairs, such as hypernymy and entailment. We investigate a collection of these state-of-the-art methods, and show that they do not actually learn a relation between two words. Instead, they learn an independent property of a single word in the pair: whether that word is a “prototypical hypernym”.

language resources and evaluation | 2013

Creating a system for lexical substitutions from scratch using crowdsourcing

Chris Biemann

This article describes the creation and application of the Turk Bootstrap Word Sense Inventory for 397 frequent nouns, which is a publicly available resource for lexical substitution. This resource was acquired using Amazon Mechanical Turk. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then, more contexts are collected. Contexts that cannot be assigned to a current target word’s sense inventory re-enter the bootstrapping loop and get a supply of substitutions. This process yields a sense inventory with its granularity determined by substitutions as opposed to psychologically motivated concepts. It comes with a large number of sense-annotated target word contexts. Evaluation on data quality shows that the process is robust against noise from the crowd, produces a less fine-grained inventory than WordNet and provides a rich body of high precision substitution data at low cost. Using the data to train a system for lexical substitutions, we show that amount and quality of the data is sufficient for producing high quality substitutions automatically. In this system, co-occurrence cluster features are employed as a means to cheaply model topicality.

Frontiers in Psychology | 2011

Remembering words in context as predicted by an associative read-out model

Markus J. Hofmann; Lars Kuchinke; Chris Biemann; Sascha Tamm; Arthur M. Jacobs

Interactive activation models (IAMs) simulate orthographic and phonological processes in implicit memory tasks, but they neither account for associative relations between words nor explicit memory performance. To overcome both limitations, we introduce the associative read-out model (AROM), an IAM extended by an associative layer implementing long-term associations between words. According to Hebbian learning, two words were defined as “associated” if they co-occurred significantly often in the sentences of a large corpus. In a study-test task, a greater amount of associated items in the stimulus set increased the “yes” response rates of non-learned and learned words. To model test-phase performance, the associative layer is initialized with greater activation for learned than for non-learned items. Because IAMs scale inhibitory activation changes by the initial activation, learned items gain a greater signal variability than non-learned items, irrespective of the choice of the free parameters. This explains why the slope of the z-transformed receiver-operating characteristics (z-ROCs) is lower one during recognition memory. When fitting the model to the empirical z-ROCs, it likewise predicted which word is recognized with which probability at the item-level. Since many of the strongest associates reflect semantic relations to the presented word (e.g., synonymy), the AROM merges form-based aspects of meaning representation with meaning relations between words.

meeting of the association for computational linguistics | 2014

That's sick dude!: Automatic identification of word sense change across different timescales

Sunny Mitra; Ritwik Mitra; Martin Riedl; Chris Biemann; Animesh Mukherjee; Pawan Goyal

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books. We construct distributional thesauri based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we compare these sense clusters of two different time points to find if (i) there is birth of a new sense or (ii) if an older sense has got split into more than one sense or (iii) if a newer sense has been formed from the joining of older senses or (iv) if a particular sense has died. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet. Manual evaluation indicates that the algorithm could correctly identify 60.4% birth cases from a set of 48 randomly picked samples and 57% split/join cases from a set of 21 randomly picked samples. Remarkably, in 44% cases the birth of a novel sense is attested by WordNet, while in 46% cases and 43% cases split and join are respectively confirmed by WordNet. Our approach can be applied for lexicography, as well as for applications like word sense disambiguation or semantic search.

meeting of the association for computational linguistics | 2014

Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno

Seid Muhie Yimam; Chris Biemann; Richard Eckart de Castilho; Iryna Gurevych

In this paper, we present a flexible approach to the efficient and exhaustive manual annotation of text documents. For this purpose, we extend WebAnno (Yimam et al., 2013) an open-source web-based annotation tool. 1 While it was previously limited to specific annotation layers, our extension allows adding and configuring an arbitrary number of layers through a web-based UI. These layers can be annotated separately or simultaneously, and support most types of linguistic annotations such as spans, semantic classes, dependency relations, lexical chains, and morphology. Further, we tightly integrate a generic machine learning component for automatic annotation suggestions of span annotations. In two case studies, we show that automatic annotation suggestions, combined with our split-pane UI concept, significantly reduces annotation time.

Archive | 2014

Structure Discovery in Natural Language

Chris Biemann

Current language technology is dominated by approaches that either enumerate a large set of rules, or are focused on a large amount of manually labelled data. The creation of both is time-consuming and expensive, which is commonly thought to be the reason why automated natural language understanding has still not made its way into real-life applications yet. This book sets an ambitious goal: to shift the development of language processing systems to a much more automated setting than previous works. A new approach is defined: what if computers analysed large samples of language data on their own, identifying structural regularities that perform the necessary abstractions and generalisations in order to better understand language in the process?After defining the framework of Structure Discovery and shedding light on the nature and the graphic structure of natural language data, several procedures are described that do exactly this: let the computer discover structures without supervision in order to boost the performance of language technology applications. Here, multilingual documents are sorted by language, word classes are identified, and semantic ambiguities are discovered and resolved without using a dictionary or other explicit human input. The book concludes with an outlook on the possibilities implied by this paradigm and sets the methods in perspective to human computer interaction.The target audience are academics on all levels (undergraduate and graduate students, lecturers and professors) working in the fields of natural language processing and computational linguistics, as well as natural language engineers who are seeking to improve their systems.

Explore More