Is this you? Create Your Porfile

Ioannis Katakis

National and Kapodistrian University of Athens

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ioannis Katakis is active.

Explore More

Publication

Featured researches published by Ioannis Katakis.

International Journal of Data Warehousing and Mining | 2007

Multi-Label Classification: An Overview

Grigorios Tsoumakas; Ioannis Katakis

Multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization, and semantic scene classification. This article introduces the task of multi-label classification, organizes the sparse related literature into a structured presentation and performs comparative experimental results of certain multilabel classification methods. It also contributes the definition of concepts for the quantification of the multi-label nature of a data set.

Data Mining and Knowledge Discovery Handbook | 2009

Mining Multi-label Data

Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas

A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multi-label.

IEEE Transactions on Knowledge and Data Engineering | 2011

Random k-Labelsets for Multilabel Classification

Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas

A simple yet effective multilabel learning method, called label powerset (LP), considers each distinct combination of labels that exist in the training set as a different class value of a single-label classification task. The computational efficiency and predictive performance of LP is challenged by application domains with large number of labels and training examples. In these cases, the number of classes may become very large and at the same time many classes are associated with very few training examples. To deal with these problems, this paper proposes breaking the initial set of labels into a number of small random subsets, called labelsets and employing LP to train a corresponding classifier. The labelsets can be either disjoint or overlapping depending on which of two strategies is used to construct them. The proposed method is called RAkEL (RAndom k labELsets), where k is a parameter that specifies the size of the subsets. Empirical evidence indicates that RAkEL manages to improve substantially over LP, especially in domains with large number of labels and exhibits competitive performance against other high-performing multilabel learning methods.

Knowledge and Information Systems | 2010

Tracking recurring contexts using ensemble classifiers: an application to email filtering

Ioannis Katakis; Grigorios Tsoumakas; Ioannis P. Vlahavas

Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.

intelligent information systems | 2009

An adaptive personalized news dissemination system

Ioannis Katakis; Grigorios Tsoumakas; Evangelos Banos; Nick Bassiliades; Ioannis P. Vlahavas

With the explosive growth of the Word Wide Web, information overload became a crucial concern. In a data-rich information-poor environment like the Web, the discrimination of useful or desirable information out of tons of mostly worthless data became a tedious task. The role of Machine Learning in tackling this problem is thoroughly discussed in the literature, but few systems are available for public use. In this work, we bridge theory to practice, by implementing a web-based news reader enhanced with a specifically designed machine learning framework for dynamic content personalization. This way, we get the chance to examine applicability and implementation issues and discuss the effectiveness of machine learning methods for the classification of real-world text streams. The main features of our system named PersoNews are: (a) the aggregation of many different news sources that offer an RSS version of their content, (b) incremental filtering, offering dynamic personalization of the content not only per user but also per each feed a user is subscribed to, and (c) the ability for every user to watch a more abstracted topic of interest by filtering through a taxonomy of topics. PersoNews is freely available for public use on the WWW (http://news.csd.auth.gr).

panhellenic conference on informatics | 2005

On the utility of incremental feature selection for the classification of textual data streams

Ioannis Katakis; Grigorios Tsoumakas; Ioannis P. Vlahavas

In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms.

hellenic conference on artificial intelligence | 2006

Ensemble pruning using reinforcement learning

Ioannis Partalas; Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas

Multiple Classifier systems have been developed in order to improve classification accuracy using methodologies for effective classifier combination. Classical approaches use heuristics, statistical tests, or a meta-learning level in order to find out the optimal combination function. We study this problem from a Reinforcement Learning perspective. In our modeling, an agent tries to learn the best policy for selecting classifiers by exploring a state space and considering a future cumulative reward from the environment. We evaluate our approach by comparing with state-of-the-art combination methods and obtain very promising results.

artificial intelligence applications and innovations | 2009

On the Combination of Textual and Semantic Descriptions for Automated Semantic Web Service Classification

Ioannis Katakis; Georgios Meditskos; Grigorios Tsoumakas; Nick Bassiliades; Vlahavas

Semantic Web services have emerged as the solution to the need for automating several aspects related to service-oriented architectures, such as service discovery and composition, and they are realized by combining Semantic Web technologies and Web service standards. In the present paper, we tackle the problem of automated classification of Web services according to their application domain taking into account both the textual description and the semantic annotations of OWL-S advertisements. We present results that we obtained by applying machine learning algorithms on textual and semantic descriptions separately and we propose methods for increasing the overall classification accuracy through an extended feature vector and an ensemble of classifiers.

international conference on move to meaningful internet systems | 2006

PersoNews: a personalized news reader enhanced by machine learning and semantic filtering

Evangelos Banos; Ioannis Katakis; Nick Bassiliades; Grigorios Tsoumakas; Ioannis P. Vlahavas

In this paper, we present a web-based, machine-learning enhanced news reader (PersoNews) The main advantages of PersoNews are the aggregation of many different news sources, machine learning filtering offering personalization not only per user but also for every feed a user is subscribed to, and finally the ability for every user to watch a more abstracted topic of interest by employing a simple form of semantic filtering through a taxonomy of topics.

european conference on machine learning | 2004

Effective voting of heterogeneous classifiers

Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas

This paper deals with the combination of classification models that have been derived from running different (heterogeneous) learning algorithms on the same data set. We focus on the Classifier Evaluation and Selection (ES) method, that evaluates each of the models (typically using 10-fold cross-validation) and selects the best one. We examine the performance of this method in comparison with the Oracle selecting the best classifier for the test set and show that 10-fold cross-validation has problems in detecting the best classifier. We then extend ES by applying a statistical test to the 10-fold accuracies of the models and combining through voting the most significant ones. Experimental results show that the proposed method, Effective Voting, performs comparably with the state-of-the-art method of Stacking with Multi-Response Model Trees without the additional computational cost of meta-training.

Explore More