Michael Granitzer
Graz University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Granitzer.
international conference on digital information management | 2008
Michael Granitzer; Mark Kröll; Christin Seifert; Andreas S. Rath; Nicolas Weber; Olivia Dietzel; Stefanie N. Lindstaedt
dasiaContext is keypsila conveys the importance of capturing the digital environment of a knowledge worker. Knowing the userpsilas context offers various possibilities for support, like for example enhancing information delivery or providing work guidance. Hence, user interactions have to be aggregated and mapped to predefined task categories. Without machine learning tools, such an assignment has to be done manually. The identification of suitable machine learning algorithms is necessary in order to ensure accurate and timely classification of the userpsilas context without inducing additional workload. This paper provides a methodology for recording user interactions and an analysis of supervised classification models, feature types and feature selection for automatically detecting the current task and context of a user. Our analysis is based on a real world data set and shows the applicability of machine learning techniques.
international acm sigir conference on research and development in information retrieval | 2010
Markus Muhr; Roman Kern; Michael Granitzer
Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon Divergence, Chi Square Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC Ohsumed and the CLEF IP European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes.
management of emergent digital ecosystems | 2009
Roman Kern; Michael Granitzer
The task of linear text segmentation is to split a large text document into shorter fragments, usually blocks of consecutive sentences. The algorithms that demonstrated the best performance for this task come at the price of high computational complexity. In our work we present an algorithm that has a computational complexity of O(n) with n being the number of sentences in a document. The performance of our approach is evaluated against algorithms of higher complexity using standard benchmark data sets and we demonstrate that our approach provides comparable accuracy.
2009 13th International Conference Information Visualisation | 2009
Vedran Sabol; Wolfgang Kienreich; Markus Muhr; Werner Klieber; Michael Granitzer
Knowledge discovery involves data driven processes where data is transformed and processed by various algorithms to identify new knowledge. KnowMiner is a service oriented framework providing a rich set of knowledge discovery functionalities with focus on text data sets. Complementing results of automatic machine analysis with the immense processing power of human visual apparatus has the potential of significantly improving the process of acquiring new knowledge. VisTools is a lightweight visual analytics framework based on multiple coordinated views (MCV) paradigm designed for deployment atop the KnowMiner’s service architecture. In this paper we briefly present both frameworks and, driven by real-world customer requirements, describe how visual techniques can be synergistically combined with machine processing for effective analysis of dynamically changing, metadata-rich text documents sets.
acm conference on hypertext | 2010
Elisabeth Lex; Andreas Juffinger; Michael Granitzer
In this work, we assess objectivity in online news media. We propose to use topic independent features and we show in a cross-domain experiment that with standard bag-of-word models, classifiers implicitly learn topics. Our experiments revealed that our methodology can be applied across different topics with consistent classification performance.
Lecture Notes in Computer Science | 2009
Michael Granitzer; Christin Seifert; Mario Zechner
Automatically linking Wikipedia pages can be done either content based by exploiting word similarities or structure based by exploiting characteristics of the link graph. Our approach focuses on a content based strategy by detecting Wikipedia titles as link candidates and selecting the most relevant ones as links. The relevance calculation is based on the context, i.e. the surrounding text of a link candidate. Our goal was to evaluate the influence of the link-context on selecting relevant links and determining a links best-entry-point. Results show, that a whole Wikipedia page provides the best context for resolving link and that straight forward inverse document frequency based scoring of anchor texts achieves around 4% less Mean Average Precision on the provided data set.
european conference on research and advanced technology for digital libraries | 2010
Roman Kern; Michael Granitzer
Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.
knowledge science engineering and management | 2009
Mario Zechner; Michael Granitzer
Support Vector Machines (SVM) have been applied successfully in a wide variety of fields in the last decade. The SVM problem is formulated as a convex objective function subject to box constraints that needs to be maximized, a quadratic programming (QP) problem. In order to solve the QP problem on larger data sets specialized algorithms and heuristics are required. In this paper we present a new data-squashing method for selecting training instances in support vector learning. Inspired by the growing neural gas algorithm and learning vector quantization we introduce a new, parameter robust neural gas variant to retrieve an initial approximation of the training set containing only those samples that will likely become support vectors in the final classifier. This first approximation is refined in the border areas, defined by neighboring neurons of different classes, yielding the final training set. We evaluate our approach on synthetic as well as real-life datasets, comparing run-time complexity and accuracy to a random sampling approach and the exact solution of the support vector machine. Results show that runtime-complexity can be significantly reduced while achieving the same accuracy as the exact solution and that furthermore our approach does not not rely on data set specific parameterization of the sampling rate like random sampling for doing so. Source code, binary executables as well as the reformatted standard data sets are available for download at http://www.know-center.tugraz.at/forschung/ knowledge_relationship_discovery/downloads_demos/ sngsvm_source_executables
database and expert systems applications | 2011
Roman Kern; Mario Zechner; Michael Granitzer
Author disambiguation is a prerequisite for utilizing bibliographic metadata in citation analysis. Automatic disambiguation algorithms mostly rely on cluster-based disambiguation strategies for identifying unique authors given their names and publications. However, most approaches rely on knowing the correct number of unique authors a-priori, which is rarely the case in real world settings. In this publication we analyse cluster-based disambiguation strategies and develop a model selection method to estimate the number of distinct authors based on co-authorship networks. We show that, given clean textual features, the developed model selection method provides accurate guesses of the number of unique authors.
cross language evaluation forum | 2009
Roman Kern; Andreas Juffinger; Michael Granitzer
Integrating word sense disambiguation into an information retrieval system could potentially improve its performance. This is the major motivation for the Robust WSD tasks of the Ad-Hoc Track of the CLEF 2009 campaign. For these tasks we have build a customizable and flexible retrieval system. The best performing configuration of this system is based on research in the area of axiomatic approaches to information retrieval. Further, our experiments show that configurations that incorporate word sense disambiguation (WSD) information into the retrieval process did outperform those without. For the monolingual task the performance difference is more pronounced than for the bilingual task. Finally, we are able to show that our query translation approach does work effectively, even if applied in the monolingual task.