Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nenad Tomašev is active.

Publication


Featured researches published by Nenad Tomašev.


IEEE Transactions on Knowledge and Data Engineering | 2014

The Role of Hubness in Clustering High-Dimensional Data

Nenad Tomašev; Miloš Radovanović; Dunja Mladenic; Mirjana Ivanović

High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data mining techniques, both in terms of effectiveness and efficiency. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper, we take a novel perspective on the problem of clustering high-dimensional data. Instead of attempting to avoid the curse of dimensionality by observing a lower dimensional feature subspace, we embrace dimensionality by taking advantage of inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points, can be successfully exploited in clustering. We validate our hypothesis by demonstrating that hubness is a good measure of point centrality within a high-dimensional data cluster, and by proposing several hubness-based clustering algorithms, showing that major hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster configurations. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise. The proposed methods are tailored mostly for detecting approximately hyperspherical clusters and need to be extended to properly handle clusters of arbitrary shapes.


conference on information and knowledge management | 2011

A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN

Nenad Tomašev; Miloa Radovanović; Dunja Mladenic; Mirjana Ivanović

Most machine-learning tasks, including classification, involve dealing with high-dimensional data. It was recently shown that the phenomenon of hubness, inherent to high-dimensional data, can be exploited to improve methods based on nearest neighbors (NNs). Hubness refers to the emergence of points (hubs) that appear among the k NNs of many other points in the data, and constitute influential points for kNN classification. In this paper, we present a new probabilistic approach to kNN classification, naive hubness Bayesian k-nearest neighbor (NHBNN), which employs hubness for computing class likelihood estimates. Experiments show that NHBNN compares favorably to different variants of the kNN classifier, including probabilistic kNN (PNN) which is often used as an underlying probabilistic framework for NN classification, signifying that NHBNN is a promising alternative framework for developing probabilistic NN algorithms.


knowledge discovery and data mining | 2011

The role of hubness in clustering high-dimensional data

Nenad Tomašev; Miloš Radovanović; Dunja Mladenic; Mirjana Ivanović

High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data mining techniques, both in terms of effectiveness and efficiency. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper, we take a novel perspective on the problem of clustering high-dimensional data. Instead of attempting to avoid the curse of dimensionality by observing a lower dimensional feature subspace, we embrace dimensionality by taking advantage of inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points, can be successfully exploited in clustering. We validate our hypothesis by demonstrating that hubness is a good measure of point centrality within a high-dimensional data cluster, and by proposing several hubness-based clustering algorithms, showing that major hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster configurations. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise. The proposed methods are tailored mostly for detecting approximately hyperspherical clusters and need to be extended to properly handle clusters of arbitrary shapes.


Knowledge Based Systems | 2013

Class imbalance and the curse of minority hubs

Nenad Tomašev; Dunja Mladenic

Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularities. This paper deals with evaluating the impact of hubness on learning under class imbalance with k-nearest neighbor methods. Our results suggest that, contrary to the common belief, minority class hubs might be responsible for most misclassification in many high-dimensional datasets. The standard approaches to learning under class imbalance usually clearly favor the instances of the minority class and are not well suited for handling such highly detrimental minority points. In our experiments, we have evaluated several state-of-the-art hubness-aware kNN classifiers that are based on learning from the neighbor occurrence models calculated from the training data. The experiments included learning under severe class imbalance, class overlap and mislabeling and the results suggest that the hubness-aware methods usually achieve promising results on the examined high-dimensional datasets. The improvements seem to be most pronounced when handling the difficult point types: borderline points, rare points and outliers. On most examined datasets, the hubness-aware approaches improve the classification precision of the minority classes and the recall of the majority class, which helps with reducing the negative impact of minority hubs. We argue that it might prove beneficial to combine the extensible hubness-aware voting frameworks with the existing class imbalanced kNN classifiers, in order to properly handle class imbalanced data in high-dimensional feature spaces.


international conference on intelligent computer communication and processing | 2011

The influence of hubness on nearest-neighbor methods in object recognition

Nenad Tomašev; Raluca Brehar; Dunja Mladenic; Sergiu Nedevschi

Object recognition from images is one of the essential problems in automatic image processing. In this paper we focus specifically on nearest neighbor methods, which are widely used in many practical applications, not necessarily related to image data. It has recently come to attention that high dimensional data also exhibit high hubness, which essentially means that some very influential data points appear and these points are referred to as hubs. Unsurprisingly, hubs play a very important role in the nearest neighbor classification. We examine the hubness of various image data sets, under several different feature representations. We also show that it is possible to exploit the observed hubness and improve the recognition accuracy.


Knowledge and Information Systems | 2014

Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification

Nenad Tomašev; Dunja Mladenic

Learning from high-dimensional data is usually quite challenging, as captured by the well-known phrase curse of dimensionality. Data analysis often involves measuring the similarity between different examples. This sometimes becomes a problem, as many widely used metrics tend to concentrate in high-dimensional feature spaces. The reduced contrast makes it more difficult to distinguish between close and distant points, which renders many traditional distance-based learning methods ineffective. Secondary distances based on shared neighbor similarities have recently been proposed as one possible solution to this problem. However, these initial metrics failed to take hubness into account. Hubness is a recently described aspect of the dimensionality curse, and it affects all sorts of


Feature Selection for Data and Pattern Recognition | 2015

Hubness-Aware Classification, Instance Selection and Feature Construction: Survey and Extensions to Time-Series

Nenad Tomašev; Krisztian Buza; Kristóf Marussy; Piroska B. Kis


international conference on data mining | 2011

Nearest Neighbor Voting in High-Dimensional Data: Learning from Past Occurrences

Nenad Tomašev; Dunja Mladenic

k


pacific-asia conference on knowledge discovery and data mining | 2013

The Role of Hubs in Cross-Lingual Supervised Document Retrieval

Nenad Tomašev; Jan Rupnik; Dunja Mladenic


european conference on machine learning | 2013

Hub Co-occurrence Modeling for Robust High-Dimensional kNN Classification

Nenad Tomašev; Dunja Mladenic

-nearest neighbor learning methods in severely negative ways. This paper is the first to discuss the impact of hubs on forming the shared neighbor similarity scores. We propose a novel, hubness-aware secondary similarity measure

Collaboration


Dive into the Nenad Tomašev's collaboration.

Top Co-Authors

Avatar

Dunja Mladenic

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Krisztian Buza

University of Hildesheim

View shared research outputs
Top Co-Authors

Avatar

Raluca Brehar

Technical University of Cluj-Napoca

View shared research outputs
Top Co-Authors

Avatar

Sergiu Nedevschi

Technical University of Cluj-Napoca

View shared research outputs
Top Co-Authors

Avatar

Kristóf Marussy

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Piroska B. Kis

College of Dunaújváros

View shared research outputs
Researchain Logo
Decentralizing Knowledge