Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefan Evert is active.

Publication


Featured researches published by Stefan Evert.


Behavior Research Methods Instruments & Computers | 2003

The NITE XML Toolkit: Flexible annotation for multimodal language data

Jean Carletta; Stefan Evert; Ulrich Heid; Jonathan Kilgour; Judy Robertson; Holger Voormann

Multimodal corpora that show humans interacting via language are now relatively easy to collect. Current tools allow one either to apply sets of time-stamped codes to the data and consider their timing and sequencing or to describe some specific linguistic structure that is present in the data, built over the top of some form of transcription. To further our understanding of human communication, the research community needs code sets with both timings and structure, designed flexibly to address the research questions at hand. The NITE XML Toolkit offers library support that software developers can call upon when writing tools for such code sets and, thus, enables richer analyses than have previously been possible. It includes data handling, a query language containing both structural and temporal constructs, components that can be used to build graphical interfaces, sample programs that demonstrate how to use the libraries, a tool for running queries, and an experimental engine that builds interfaces on the basis of declarative specifications.


Computer Speech & Language | 2005

Using small random samples for the manual evaluation of statistical association measures

Stefan Evert; Brigitte Krenn

In this paper, we describe the empirical evaluation of statistical association measures for the extraction of lexical collocations from text corpora. We argue that the results of an evaluation experiment cannot easily be generalized to a different setting. Consequently, such experiments have to be carried out under conditions that are as similar as possible to the intended use of the measures. Finally, we show how an evaluation strategy based on random samples can reduce the amount of manual annotation work significantly, making it possible to perform many more evaluation experiments under specific conditions.


language resources and evaluation | 2005

The NITE XML Toolkit: Data Model and Query Language

Jean Carletta; Stefan Evert; Ulrich Heid; Jonathan Kilgour

The NITE XML Toolkit (NXT) is open source software for working with language corpora, with particular strengths for multimodal and heavily cross-annotated data sets. In NXT, annotations are described by types and attribute value pairs, and can relate to signal via start and end times, to representations of the external environment, and to each other via either an arbitrary graph structure or a multi-rooted tree structure characterized by both temporal and structural orderings. Simple queries in NXT express variable bindings for n-tuples of objects, optionally constrained by type, and give a set of conditions on the n-tuples combined with boolean operators. The defined operators for the condition tests allow full access to the timing and structural properties of the data model. A complex query facility passes variable bindings from one query to another for filtering, returning a tree structure. In addition to describing NXTȁ9s core data handling and search capabilities, we explain the stand-off XML data storage format that it employs and illustrate its use with examples from an early adopter of the technology.


Archive | 2006

Using web data for linguistic purposes

Anke Lüdeling; Stefan Evert; Marco Baroni

The world wide web is a mine of language data of unprecedented richness and ease of access (Kilgarriff and Grefenstette 2003). A growing body of studies has shown that simple algorithms using web-based evidence are successful at many linguistic tasks, often outperforming sophisticated methods based on smaller but more controlled data sources (cf. Turney 2001; Keller and Lapata 2003). Most current internet-based linguistic studies access the web through a commercial search engine. For example, some researchers rely on frequency estimates (number of hits) reported by engines (e.g. Turney 2001). Others use a search engine to find relevant pages, and then retrieve the pages to build a corpus (e.g. Ghani and Mladenic 2001; Baroni and Bernardini 2004). In this study, we first survey the state of the art, discussing the advantages and limits of various approaches, and in particular the inherent limitations of depending on a commercial search engine as a data source. We then focus on what we believe to be some of the core issues of using the web to do linguistics. Some of these issues concern the quality and nature of data we can obtain from the internet (What languages, genres and styles are represented on the web?), others pertain to data extraction, encoding and preservation (How can we ensure data stability? How can web data be marked up and categorized? How can we identify duplicate pages and near duplicates?), and others yet concern quantitative aspects (Which statistical quantities can be reliably estimated from web data, and how much web data do we need? What are the possible pitfalls due to the massive presence of duplicates, mixed-language pages?). All points are illustrated through concrete examples from English, German and Italian web corpora.


meeting of the association for computational linguistics | 2007

zipfR : word frequency distributions in R

Stefan Evert; Marco Baroni

We introduce the zipfR package, a powerful and user-friendly open-source tool for LNRE modeling of word frequency distributions in the R statistical environment. We give some background on LNRE models, discuss related software and the motivation for the toolkit, describe the implementation, and conclude with a complete sample session showing a typical LNRE analysis.


conference of the european chapter of the association for computational linguistics | 2003

Experiments on candidate data for collocation extraction

Stefan Evert; Hannah Kermes

The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).


joint conference on lexical and computational semantics | 2014

Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models

Gabriella Lapesa; Stefan Evert; Sabine Schulte im Walde

This paper presents a large-scale evaluation of bag-of-words distributional models on two datasets from priming experiments involving syntagmatic and paradigmatic relations. We interpret the variation in performance achieved by different settings of the model parameters as an indication of which aspects of distributional patterns characterize these types of relations. Contrary to what has been argued in the literature (Rapp, 2002; Sahlgren, 2006) ‐ that bag-of-words models based on secondorder statistics mainly capture paradigmatic relations and that syntagmatic relations need to be gathered from first-order models ‐ we show that second-order models perform well on both paradigmatic and syntagmatic relations if their parameters are properly tuned. In particular, our results show that size of the context window and dimensionality reduction play a key role in differentiating DSM performance on paradigmatic vs. syntagmatic relations.


international conference on computational linguistics | 2014

SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching

Thomas Proisl; Stefan Evert; Paul Greiner; Besim Kabashi

Being able to quantify the semantic similarity between two texts is important for many practical applications. SemantiKLUE combines unsupervised and supervised techniques into a robust system for measuring semantic similarity. At the core of the system is a word-to-word alignment of two texts using a maximum weight matching algorithm. The system participated in three SemEval-2014 shared tasks and the competitive results are evidence for its usability in that broad field of application.


north american chapter of the association for computational linguistics | 2015

KLUEless: Polarity Classification and Association

Nataliia Plotnikova; Micha Kohl; Kevin Volkert; Stefan Evert; Andreas Lerner; Natalie Dykes; Heiko Ermer

This paper describes the KLUEless system which participated in the SemEval-2015 task on “Sentiment Analysis in Twitter”. This year the updated system based on the developments for the same task in 2014 (Evert et al., 2014) and 2013 (Proisl et al., 2013) participated in all five subtasks. The paper gives an overview of the core features extended by different additional features and parameters required for individual subtasks. Experiments carried out after the evaluation period on the test dataset 2015 with the gold standard available are integrated into each subtask to explain the submitted feature selection.


international conference on computational linguistics | 2014

SentiKLUE: Updating a Polarity Classifier in 48 Hours

Stefan Evert; Thomas Proisl; Paul Greiner; Besim Kabashi

SentiKLUE is an update of the KLUE polarity classifier – which achieved good and robust results in SemEval-2013 with a simple feature set – implemented in 48 hours.

Collaboration


Dive into the Stefan Evert's collaboration.

Top Co-Authors

Avatar

Thomas Proisl

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Ulrich Heid

University of Stuttgart

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anke Lüdeling

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Besim Kabashi

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul Greiner

University of Erlangen-Nuremberg

View shared research outputs
Researchain Logo
Decentralizing Knowledge