Ioannis Katakis
National and Kapodistrian University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ioannis Katakis.
International Journal of Data Warehousing and Mining | 2007
Grigorios Tsoumakas; Ioannis Katakis
Multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization, and semantic scene classification. This article introduces the task of multi-label classification, organizes the sparse related literature into a structured presentation and performs comparative experimental results of certain multilabel classification methods. It also contributes the definition of concepts for the quantification of the multi-label nature of a data set.
Data Mining and Knowledge Discovery Handbook | 2009
Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas
A large body of research in supervised learning deals with the analysis of single-label data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multi-label.
IEEE Transactions on Knowledge and Data Engineering | 2011
Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas
A simple yet effective multilabel learning method, called label powerset (LP), considers each distinct combination of labels that exist in the training set as a different class value of a single-label classification task. The computational efficiency and predictive performance of LP is challenged by application domains with large number of labels and training examples. In these cases, the number of classes may become very large and at the same time many classes are associated with very few training examples. To deal with these problems, this paper proposes breaking the initial set of labels into a number of small random subsets, called labelsets and employing LP to train a corresponding classifier. The labelsets can be either disjoint or overlapping depending on which of two strategies is used to construct them. The proposed method is called RAkEL (RAndom k labELsets), where k is a parameter that specifies the size of the subsets. Empirical evidence indicates that RAkEL manages to improve substantially over LP, especially in domains with large number of labels and exhibits competitive performance against other high-performing multilabel learning methods.
Knowledge and Information Systems | 2010
Ioannis Katakis; Grigorios Tsoumakas; Ioannis P. Vlahavas
Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.
intelligent information systems | 2009
Ioannis Katakis; Grigorios Tsoumakas; Evangelos Banos; Nick Bassiliades; Ioannis P. Vlahavas
With the explosive growth of the Word Wide Web, information overload became a crucial concern. In a data-rich information-poor environment like the Web, the discrimination of useful or desirable information out of tons of mostly worthless data became a tedious task. The role of Machine Learning in tackling this problem is thoroughly discussed in the literature, but few systems are available for public use. In this work, we bridge theory to practice, by implementing a web-based news reader enhanced with a specifically designed machine learning framework for dynamic content personalization. This way, we get the chance to examine applicability and implementation issues and discuss the effectiveness of machine learning methods for the classification of real-world text streams. The main features of our system named PersoNews are: (a) the aggregation of many different news sources that offer an RSS version of their content, (b) incremental filtering, offering dynamic personalization of the content not only per user but also per each feed a user is subscribed to, and (c) the ability for every user to watch a more abstracted topic of interest by filtering through a taxonomy of topics. PersoNews is freely available for public use on the WWW (http://news.csd.auth.gr).
panhellenic conference on informatics | 2005
Ioannis Katakis; Grigorios Tsoumakas; Ioannis P. Vlahavas
In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms.
hellenic conference on artificial intelligence | 2006
Ioannis Partalas; Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas
Multiple Classifier systems have been developed in order to improve classification accuracy using methodologies for effective classifier combination. Classical approaches use heuristics, statistical tests, or a meta-learning level in order to find out the optimal combination function. We study this problem from a Reinforcement Learning perspective. In our modeling, an agent tries to learn the best policy for selecting classifiers by exploring a state space and considering a future cumulative reward from the environment. We evaluate our approach by comparing with state-of-the-art combination methods and obtain very promising results.
artificial intelligence applications and innovations | 2009
Ioannis Katakis; Georgios Meditskos; Grigorios Tsoumakas; Nick Bassiliades; Vlahavas
Semantic Web services have emerged as the solution to the need for automating several aspects related to service-oriented architectures, such as service discovery and composition, and they are realized by combining Semantic Web technologies and Web service standards. In the present paper, we tackle the problem of automated classification of Web services according to their application domain taking into account both the textual description and the semantic annotations of OWL-S advertisements. We present results that we obtained by applying machine learning algorithms on textual and semantic descriptions separately and we propose methods for increasing the overall classification accuracy through an extended feature vector and an ensemble of classifiers.
international conference on move to meaningful internet systems | 2006
Evangelos Banos; Ioannis Katakis; Nick Bassiliades; Grigorios Tsoumakas; Ioannis P. Vlahavas
In this paper, we present a web-based, machine-learning enhanced news reader (PersoNews) The main advantages of PersoNews are the aggregation of many different news sources, machine learning filtering offering personalization not only per user but also for every feed a user is subscribed to, and finally the ability for every user to watch a more abstracted topic of interest by employing a simple form of semantic filtering through a taxonomy of topics.
european conference on machine learning | 2004
Grigorios Tsoumakas; Ioannis Katakis; Ioannis P. Vlahavas
This paper deals with the combination of classification models that have been derived from running different (heterogeneous) learning algorithms on the same data set. We focus on the Classifier Evaluation and Selection (ES) method, that evaluates each of the models (typically using 10-fold cross-validation) and selects the best one. We examine the performance of this method in comparison with the Oracle selecting the best classifier for the test set and show that 10-fold cross-validation has problems in detecting the best classifier. We then extend ES by applying a statistical test to the 10-fold accuracies of the models and combining through voting the most significant ones. Experimental results show that the proposed method, Effective Voting, performs comparably with the state-of-the-art method of Stacking with Multi-Response Model Trees without the additional computational cost of meta-training.