Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joachim Wermter is active.

Publication


Featured researches published by Joachim Wermter.


Bioinformatics | 2009

High-performance gene name normalization with GeNo

Joachim Wermter; Katrin Tomanek; Udo Hahn

MOTIVATION The recognition and normalization of textual mentions of gene and protein names is both particularly important and challenging. Its importance lies in the fact that they constitute the crucial conceptual entities in biomedicine. Their recognition and normalization remains a challenging task because of widespread gene name ambiguities within species, across species, with common English words and with medical sublanguage terms. RESULTS We present GeNo, a highly competitive system for gene name normalization, which obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BioCreAtIvE-II test set, thus being on a par with the best system on that task. Our system tackles the complex gene normalization problem by employing a carefully crafted suite of symbolic and statistical methods, and by fully relying on publicly available software and data resources, including extensive background knowledge based on semantic profiling. A major goal of our work is to present GeNos architecture in a lucid and perspicuous way to pave the way to full reproducibility of our results. AVAILABILITY GeNo, including its underlying resources, will be available from www.julielab.de. It is also currently deployed in the Semedico search engine at www.semedico.org.


international conference on computational linguistics | 2004

Collocation extraction based on modifiability statistics

Joachim Wermter; Udo Hahn

We introduce a new, linguistically grounded measure of collocativity based on the property of limited modifiability and test it on German PP-verb combinations. We show that our measure not only significantly outperforms the standard lexical association measures typically employed for collocation extraction, but also yields a valuable by-product for the creation of collocation databases, viz. possible structural and lexical attributes. Our approach is language-, structure-, and domain-independent because it only requires some shallow syntactic analysis (e.g., a POS-tagger and a phrase chunker).


meeting of the association for computational linguistics | 2006

You Can't Beat Frequency (Unless You Use Linguistic Knowledge) -- A Qualitative Evaluation of Association Measures for Collocation and Term Extraction

Joachim Wermter; Udo Hahn

In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of co-occurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference.


empirical methods in natural language processing | 2005

Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms

Joachim Wermter; Udo Hahn

We here propose a new method which sets apart domain-specific terminology from common non-specific noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-specific noun phrases. We define a measure for the observable amount of paradigmatic modifiability of terms and, subsequently, test it on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using a community-wide curated biomedical terminology system as an evaluation gold standard, we show that our algorithm significantly outperforms a variety of standard term identification measures. We also provide empirical evidence that our methodolgy is essentially domain- and corpus-size-independent.


international conference on knowledge capture | 2005

Finding new terminology in very large corpora

Joachim Wermter; Udo Hahn

Most technical and scientific terms are comprised of complex, multi-word noun phrases but certainly not all noun phrases are technical or scientific terms. The distinction of specific terminology from common non-specific noun phrases can be based on the observation that terms reveal a much lesser degree of distributional variation than non-specific noun phrases. We formalize the limited paradigmatic modifiability of terms and, subsequently, test the corresponding algorithm on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using an already existing and community-wide curated biomedical terminology as an evaluation gold standard, we show that our algorithm significantly outperforms standard term identification measures and, therefore, qualifies as a high-performant building block for any terminology identification system. We also provide empirical evidence that the superiority of our approach, beyond a 10-million-word threshold, is essentially domain- and corpus-size-independent.


computational intelligence | 2011

SYNTACTIC SIMPLIFICATION AND SEMANTIC ENRICHMENT—TRIMMING DEPENDENCY GRAPHS FOR EVENT EXTRACTION

Ekaterina Buyko; Erik Faessler; Joachim Wermter; Udo Hahn

In our approach to event extraction, dependency graphs constitute the fundamental data structure for knowledge capture. Two types of trimming operations pave the way to more effective relation extraction. First, we simplify the syntactic representation structures resulting from parsing by pruning informationally irrelevant lexical material from dependency graphs. Second, we enrich informationally relevant lexical material in the simplified dependency graphs with additional semantic meta data at several layers of conceptual granularity. These two aggregation operations on linguistic representation structures are intended to avoid overfitting of machine learning‐based classifiers which we use for event extraction (besides manually curated dictionaries). Given this methodological framework, the corresponding JReX system developed by the JulieLab Team from Friedrich‐Schiller‐Universität Jena (Germany) scored on 2nd rank among 24 competing teams for Task 1 in the “BioNLP’09 Shared Task on Event Extraction,” with 45.8% recall, 47.5% precision and 46.7% F1‐score on all 3,182 events. In more recent experiments, based on slight modifications of JReX and using the same data sets, we were able to achieve 45.9% recall, 57.7% precision, and 51.1% F1‐score.


international conference on computational linguistics | 2004

High-performance tagging on medical texts

Udo Hahn; Joachim Wermter

We ran both Brills rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also -- when trained on medical corpora -- achieve a higher performance level than for the newspaper genre.


pacific rim international conference on artificial intelligence | 2004

Tagging medical documents with high accuracy

Udo Hahn; Joachim Wermter

We ran both Brills rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TnT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also achieve - when trained on medical corpora - a higher performance level than for the newspaper genre.


Bioinformatics | 2009

MaHCO: an ontology of the major histocompatibility complex for immunoinformatic applications and text mining.

David S. DeLuca; Elena Beisswanger; Joachim Wermter; Peter A. Horn; Udo Hahn; Rainer Blasczyk

MOTIVATION The high level of polymorphism associated with the major histocompatibility complex (MHC) poses a challenge to organizing associated bioinformatic data, particularly in the area of hematopoietic stem cell transplantation. Thus, this area of research has great potential to profit from the ongoing development of biomedical ontologies, which offer structure and definition to MHC-data related communication and portability issues. RESULTS We introduce the design considerations, methodological foundations and implementational issues underlying MaHCO, an ontology which represents the alleles and encoded molecules of the major histocompatibility complex. Importantly for human immunogenetics, it includes a detailed level of human leukocyte antigen (HLA) classification. We then present an ontology browser, search interfaces for immunogenetic fact and document retrieval, and the specification of an annotation language for semantic metadata, based on MaHCO. These use cases are intended to demonstrate the utility of ontology-driven bioinformatics in the field of immunogenetics. AVAILABILITY AND IMPLEMENTATION The MaHCO Ontology is available via the BioPortal: http://www.bioontology.org/tools/portal/bioportal.html, and at: http://purl.org/stemnet/.


discovery science | 2005

Massive biomedical term discovery

Joachim Wermter; Udo Hahn

Most technical and scientific terms are comprised of complex, multi-word noun phrases but certainly not all noun phrases are technical or scientific terms. The distinction of specific terminology from common non-specific noun phrases can be based on the observation that terms reveal a much lesser degree of distributional variation than non-specific noun phrases. We formalize the limited paradigmatic modifiability of terms and, subsequently, test the corresponding algorithm on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using an already existing and community-wide curated biomedical terminology as an evaluation gold standard, we show that our algorithm significantly outperforms standard term identification measures and, therefore, qualifies as a high-performant building block for any terminology identification system.

Collaboration


Dive into the Joachim Wermter's collaboration.

Top Co-Authors

Avatar

Udo Hahn

University of Freiburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stefan Schulz

Medical University of Graz

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge