Miloš Radovanović
University of Novi Sad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Miloš Radovanović.
IEEE Transactions on Knowledge and Data Engineering | 2014
Nenad Tomašev; Miloš Radovanović; Dunja Mladenic; Mirjana Ivanović
High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data mining techniques, both in terms of effectiveness and efficiency. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper, we take a novel perspective on the problem of clustering high-dimensional data. Instead of attempting to avoid the curse of dimensionality by observing a lower dimensional feature subspace, we embrace dimensionality by taking advantage of inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points, can be successfully exploited in clustering. We validate our hypothesis by demonstrating that hubness is a good measure of point centrality within a high-dimensional data cluster, and by proposing several hubness-based clustering algorithms, showing that major hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster configurations. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise. The proposed methods are tailored mostly for detecting approximately hyperspherical clusters and need to be extended to properly handle clusters of arbitrary shapes.
international conference on machine learning | 2009
Miloš Radovanović; Alexandros Nanopoulos; Mirjana Ivanović
High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the k nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high k-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of high-dimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.
knowledge discovery and data mining | 2011
Nenad Tomašev; Miloš Radovanović; Dunja Mladenic; Mirjana Ivanović
High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data mining techniques, both in terms of effectiveness and efficiency. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper, we take a novel perspective on the problem of clustering high-dimensional data. Instead of attempting to avoid the curse of dimensionality by observing a lower dimensional feature subspace, we embrace dimensionality by taking advantage of inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest-neighbor lists of other points, can be successfully exploited in clustering. We validate our hypothesis by demonstrating that hubness is a good measure of point centrality within a high-dimensional data cluster, and by proposing several hubness-based clustering algorithms, showing that major hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster configurations. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise. The proposed methods are tailored mostly for detecting approximately hyperspherical clusters and need to be extended to properly handle clusters of arbitrary shapes.
Scientometrics | 2014
Miloš Savić; Mirjana Ivanović; Miloš Radovanović; Zoran Ognjanović; Aleksandar Pejović; Tatjana Jakšić Krüger
Digital preservation of scientific papers enables their wider accessibility, but also provides a valuable source of information that can be used in a longitudinal scientometric study. The Electronic Library of the Mathematical Institute of the Serbian Academy of Sciences and Arts (eLib) digitizes the most prominent mathematical journals printed in Serbia. In this paper, we study a co-authorship network which represents collaborations among authors who published their papers in the eLib journals in an 80 year period (from 1932 to 2011). Such study enables us to identify patterns and long-term trends in scientific collaborations that are characteristic for a community which mainly consists of Serbian (Yugoslav) mathematicians. Analysis of connected components of the network reveals a topological diversity in the network structure: the network contains a large number of components whose sizes obey a power-law, the majority of components are isolated authors or small trivial components, but there is also a small number of relatively large, non-trivial components of connected authors. Our evolutionary analysis shows that the evolution of the network can be divided into six periods that are characterized by different intensity and type of collaborative behavior among eLib authors. Analysis of author metrics shows that betweenness centrality is a better indicator of author productivity and long-term presence in the eLib journals than degree centrality. Moreover, the strength of correlation between productivity metrics and betweenness centrality increases as the network evolves suggesting that even more stronger correlation can be expected in the future.
Knowledge Based Systems | 2014
Vladimir Kurbalija; Miloš Radovanović; Zoltan Geler; Mirjana Ivanović
A time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents an important tool in many application areas, such as stock-market analysis, process and quality control, observation of natural phenomena, and medical diagnosis. A vital component in many types of time-series analyses is the choice of an appropriate distance/similarity measure. Numerous measures have been proposed to date, with the most successful ones based on dynamic programming. Being of quadratic time complexity, however, global constraints are often employed to limit the search space in the matrix during the dynamic programming procedure, in order to speed up computation. Furthermore, it has been reported that such constrained measures can also achieve better accuracy. In this paper, we investigate four representative time-series distance/similarity measures based on dynamic programming, namely Dynamic Time Warping (DTW), Longest Common Subsequence (LCS), Edit distance with Real Penalty (ERP) and Edit Distance on Real sequence (EDR), and the effects of global constraints on them when applied via the Sakoe-Chiba band. To better understand the influence of global constraints and provide deeper insight into their advantages and limitations we explore the change of the 1-nearest neighbor graph with respect to the change of the constraint size. Also, we examine how these changes reflect on the classes of the nearest neighbors of time series, and evaluate the performance of the 1-nearest neighbor classifier with respect to different distance measures and constraints. Since we determine that constraints introduce qualitative differences in all considered measures, and that different measures are affected by constraints in various ways, we expect our results to aid researchers and practitioners in selecting and tuning appropriate time-series similarity measures for their respective tasks.
Advances in Web Intelligence and Data Mining | 2006
Miloš Radovanović; Mirjana Ivanović
CatS is a meta-search engine that utilizes text classification techniques to improve the presentation of search results. After posting a query, the user is offered an opportunity to refine the results by browsing through a category tree derived from the dmoz Open Directory topic hierarchy. This paper describes some key aspects of the system (including HTML parsing, classification and displaying of results), outlines the text categorization experiments performed in order to choose the right parameters for classification, and puts the system into the context of related work on (meta-)search engines. The approach of using a separate category tree represents an extension of the standard relevance list, and provides a way to refine the search on need, offering the user a non-imposing, but potentially powerful tool for locating needed information quickly and efficiently. The current implementation of CatS may be considered a baseline, on top of which many enhancements are possible.
conference on recommender systems | 2009
Alexandros Nanopoulos; Miloš Radovanović; Mirjana Ivanović
A crucial operation in memory-based collaborative filtering (CF) is determining nearest neighbors (NNs) of users/items. This paper addresses two phenomena that emerge when CF algorithms perform NN search in high-dimensional spaces that are typical in CF applications. The first is similarity concentration and the second is the appearance of hubs (i.e. points which appear in
artificial intelligence methodology systems applications | 2010
Vladimir Kurbalija; Miloš Radovanović; Zoltan Geler; Mirjana Ivanović
k
international test conference | 2011
Miloš Savić; Mirjana Ivanović; Miloš Radovanović
-NN lists of many other points). Through theoretical analysis and experimental evaluation we show that these phenomena are inherent properties of high-dimensional space, unrelated to other data properties like sparsity, and that they can impact CF algorithms by questioning the meaning and representativeness of discovered NNs. Moreover, we show that it is not easy to mitigate the phenomena using dimensionality reduction. Studying these phenomena aims to provide a better understanding of the limitations of memory-based CF and motivate the development of new algorithms that would overcome them.
data warehousing and knowledge discovery | 2006
Miloš Radovanović; Mirjana Ivanović
The popularity of time-series databases in many applications has created an increasing demand for performing data-mining tasks (classification, clustering, outlier detection, etc.) on time-series data. Currently, however, no single system or library exists that specializes on providing efficient implementations of data-mining techniques for time-series data, supports the necessary concepts of representations, similarity measures and preprocessing tasks, and is at the same time freely available. For these reasons we have designed a multi-purpose, multifunctional, extendable system FAP - Framework for Analysis and Prediction, which supports the aforementioned concepts and techniques for mining time-series data. This paper describes the architecture of FAP and the current version of its Java implementation which focuses on time-series similarity measures and nearest-neighbor classification. The correctness of the implementation is verified through a battery of experiments which involve diverse time-series data sets from the UCR repository.