Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Johann Petrak is active.

Publication


Featured researches published by Johann Petrak.


Information Processing and Management | 2015

Analysis of named entity recognition and linking for tweets

Leon Derczynski; Diana Maynard; Giuseppe Rizzo; Marieke van Erp; Genevieve Gorrell; Raphaël Troncy; Johann Petrak; Kalina Bontcheva

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.


Bioinformatics | 2005

GPSDB: a new database for synonyms expansion of gene and protein names

Violaine Pillet; Marc Zehnder; Alexander K. Seewald; Anne-Lise Veuthey; Johann Petrak

UNLABELLED We present a new database, GPSDB (Gene and Protein Synonyms DataBase) which collects gene/protein names, in a species specific way, from 14 main biological resources. A web-based search interface gives access to the database: given a gene/protein name, it retrieves all synonyms for this entity and queries Medline with a set of user-selected terms. AVAILABILITY GPSDB is freely available from http://biomint.oefai.at/ CONTACT [email protected].


Applied Artificial Intelligence | 1997

Knowledge discovery in international conflict databases

Johannes Fürnkranz; Johann Petrak; Robert Trappl

Artificial intelligence (AI) is heavily supported by military institutions, while practically no effort goes into the investigation of possible contributions of AI to the avoidance and termination of crises and wars. This article takes a first step in this direction by investigating the use of machine learning techniques for discovering knowledge in international conflict and conflict management databases. We have applied similarity-based case retrieval to the KOSIMO database of international conflicts. Furthermore, we present results of analyzing the CONFMAN database of successful and unsuccessful conflict management attempts with an inductive decision tree learning algorithm. The latter approach seems to be particularly promising, as conflict management events apparently are more repetitive and thus better suited for machine-aided analysis.


portuguese conference on artificial intelligence | 2001

Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms Before Choosing

Carlos Soares; Johann Petrak; Pavel Brazdil

When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures.


international semantic web conference | 2011

Random indexing for finding similar nodes within large RDF graphs

Danica Damljanovic; Johann Petrak; Mihai Lupu; Hamish Cunningham; Mats Carlsson; Gunnar Engström; Bo Andersson

We propose an approach for searching large RDF graphs, using advanced vector space models, and in particular, Random Indexing (RI). We first generate documents from an RDF Graph, and then index them using RI in order to generate a semantic index, which is then used to find similarities between graph nodes. We have experimented with large RDF graphs in the domain of life sciences and engaged the domain experts in two stages: firstly, to generate a set of keywords of interest to them, and secondly to judge on the quality of the output of the Random Indexing method, which generated a set of similar terms (literals and URIs) for each keyword of interest.


Archive | 2003

Web Site Access Analysis for a National Statistical Agency

Alípio Mário Jorge; Mário Alves; Marko Grobelnik; Dunja Mladenic; Johann Petrak

Web access log analysis is gaining popularity, especially with the growing number of commercial web sites selling their products. The driver for this increase in interest is the promise of gaining some insights into the behaviour of users/customers when browsing through their Web site, fuelled by the desire to improve the user experience. In this chapter we describe the approach taken in analysing web access logs of a non-commercial Web site disseminating Portuguese statistical data. In developing the approach, we follow the common steps for data mining applications (the CRISP-DM phases), and give details about several phases involved in developing the data mining solution. Through intensive communication with the web site owner, we identified three data mining problems which were successfully addressed using different tools and methods. The solution methodology is briefly described here accompanied with some of the results for illustrative purposes. We conclude with an attempt to generalize our experience and provide a number of lessons learned.


Archive | 1997

MACHINE LEARNING AND CASE-BASED REASONING: THEIR POTENTIAL ROLE IN PREVENTING THE OUTBREAK OF WARS OR IN ENDING THEM

Robert Trappl; Johannes Fürnkranz; Johann Petrak; Jacob Bercovitch

In a current project we investigate the potential contribution of Artificial Intelligence for the avoidance and termination of crises and wars. This paper reports some results obtained by analyzing international conflict databases using machine learning and case-based reasoning techniques.


european semantic web conference | 2015

Using @Twitter Conventions to Improve #LOD-Based Named Entity Disambiguation

Genevieve Gorrell; Johann Petrak; Kalina Bontcheva

State-of-the-art named entity disambiguation approaches tend to perform poorly on social media content, and microblogs in particular. Tweets are processed individually and the richer, microblog-specific context is largely ignored. This paper focuses specifically on quantifying the impact on entity disambiguation performance when readily available contextual information is included from URL content, hash tag definitions, and Twitter user profiles. In particular, including URL content significantly improves performance. Similarly, user profile information for @mentions improves recall by over 10i¾?% with no adverse impact on precision. We also share a new corpus of tweets, which have been hand-annotated with DBpedia URIs, with high inter-annotator agreement.


Plant Systematics and Evolution | 1980

Windkanäle für die Untersuchung anemochorer Verbreitungseinheiten

Friedrich Ehrendorfer; Horst W. Luftensteiner; Johann Petrak

New constructions of a vertical and a horizontal wind tunnel are described. Their function is demonstrated by comparative and quantitative analyses of anemochorous dispersal units. The results for 6 Angiosperm genera with pogonochorous, lophochorous and pterochorous dispersal units, and for 2 species ofTragopogon indicate remarkable differences and a broad range of dispersal effectiveness.


Cybernetics and Systems | 2000

Searching for patterns in political event sequences: Experiments with the KEDs database

Klaus Kovar; Johannes Fürnkranz; Johann Petrak; Bernhard Pfahringer; Robert Trappl; Gerhard Widmer

This paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. A data mining algorithm first presented by Mannila and Toivonen (1996), has been implemented and extended, which is able to search for generalized episodes in such event databases. Experiments conducted with this algorithm on the Kansas Event Data System (KEDS) database, an event data set covering interactions between countries in the Persian Gulf region, are described. Some qualitative and quantitative results are reported, and experiences with strategies for reducing the problem complexity and focusing on the search on interesting subsets of events are described.The paper presents an empirical study on the possibility of discovering interesting event sequences and sequential rules in a large database of international political events. We have implemented and extended a data mining algorithm, first presented by Mannila & Toivonen (1996), which is able to search for generalized episodes in such event databases. Experiments conducted with this algorithm on the KEDS database, an event data set covering interactions between countries in the Persian Gulf region, are described. We report some qualitative and quantitative results, and also discuss our experiences with strategies for reducing the problem complexity and focussing the search on interesting subsets of events.

Collaboration


Dive into the Johann Petrak's collaboration.

Top Co-Authors

Avatar

Johannes Fürnkranz

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Robert Trappl

Austrian Research Institute for Artificial Intelligence

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander K. Seewald

Austrian Research Institute for Artificial Intelligence

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gerhard Widmer

Johannes Kepler University of Linz

View shared research outputs
Researchain Logo
Decentralizing Knowledge