Dmitry Kangin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dmitry Kangin is active.

Explore More

Publication

Featured researches published by Dmitry Kangin.

systems, man and cybernetics | 2016

Empirical data analysis: A new tool for data analytics

Plamen Angelov; Xiaowei Gu; Dmitry Kangin; Jose C. Principe

In this paper, a novel empirical data analysis approach (abbreviated as EDA) is introduced which is entirely data-driven and free from restricting assumptions and pre-defined problem- or user-specific parameters and thresholds. It is well known that the traditional probability theory is restricted by strong prior assumptions which are often impractical and do not hold in real problems. Machine learning methods, on the other hand, are closer to the real problems but they usually rely on problem- or user-specific parameters or thresholds making it rather art than science. In this paper we introduce a theoretically sound yet practically unrestricted and widely applicable approach that is based on the density in the data space. Since the data may have exactly the same value multiple times we distinguish between the data points and unique locations in the data space. The number of data points, k is larger or equal to the number of unique locations, l and at least one data point occupies each unique location. The number of different data points that have exactly the same location in the data space (equal value), ƒ can be seen as frequency. Through the combination of the spatial density and the frequency of occurrence of discrete data points, a new concept called multimodal typicality, τMM is proposed in this paper. It offers a closed analytical form that represents ensemble properties derived entirely from the empirical observations of data. Moreover, it is very close (yet different) from the histograms, from the probability density function (pdf) as well as from fuzzy set membership functions. Remarkably, there is no need to perform complicated pre-processing like clustering to get the multimodal representation. Moreover, the closed form for the case of Euclidean, Mahalanobis type of distance as well as some other forms (e.g. cosine-based dissimilarity) can be expressed recursively making it applicable to data streams and online algorithms. Inference/estimation of the typicality of data points that were not present in the data so far can be made. This new concept allows to rethink the very foundations of statistical and machine learning as well as to develop a series of anomaly detection, clustering, classification, prediction, control and other algorithms.

International Journal of Intelligent Systems | 2017

Empirical Data Analytics

Plamen Angelov; Xiaowei Gu; Dmitry Kangin

In this paper, we propose an approach to data analysis, which is based entirely on the empirical observations of discrete data samples and the relative proximity of these points in the data space. At the core of the proposed new approach is the typicality—an empirically derived quantity that resembles probability. This nonparametric measure is a normalized form of the square centrality (centrality is a measure of closeness used in graph theory). It is also closely linked to the cumulative proximity and eccentricity (a measure of the tail of the distributions that is very useful for anomaly detection and analysis of extreme values). In this paper, we introduce and study two types of typicality, namely its local and global versions. The local typicality resembles the well‐known probability density function (pdf), probability mass function, and fuzzy set membership but differs from all of them. The global typicality, on the other hand, resembles well‐known histograms but also differs from them. A distinctive feature of the proposed new approach, empirical data analysis (EDA), is that it is not limited by restrictive impractical prior assumptions about the data generation model as the traditional probability theory and statistical learning approaches are. Moreover, it does not require an explicit and binary assumption of either randomness or determinism of the empirically observed data, their independence, or even their number (it can be as low as a couple of data samples). The typicality is considered as a fundamental quantity in the pattern analysis, which is derived directly from data and is stated in a discrete form in contrast to the traditional approach where a continuous pdf is assumed a priori and estimated from data afterward. The typicality introduced in this paper is free from the paradoxes of the pdf. Typicality is objectivist while the fuzzy sets and the belief‐based branch of the probability theory are subjectivist. The local typicality is expressed in a closed analytical form and can be calculated recursively, thus, computationally very efficiently. The other nonparametric ensemble properties of the data introduced and studied in this paper, namely, the square centrality, cumulative proximity, and eccentricity, can also be updated recursively for various types of distance metrics. Finally, a new type of classifier called naïve typicality‐based EDA class is introduced, which is based on the newly introduced global typicality. This is only one of the wide range of possible applications of EDA including but not limited for anomaly detection, clustering, classification, control, prediction, control, rare events analysis, etc., which will be the subject of further research.

2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS) | 2014

Symbol recognition with a new autonomously evolving classifier autoclass

Plamen Angelov; Dmitry Kangin; Xiaowei Zhou; Denis Kolev

A new algorithm for symbol recognition is proposed in this paper. It is based on the AutoClass classifier [1], [2], which itself is a version of the evolving fuzzy rule-based classifier eClass [3] in which AnYa[1] type of fuzzy rules and data density are used. In this classifier, symbol recognition task is divided into two stages: feature extraction, and recognition based on feature vector. This approach gives flexibility, allowing us to use various feature sets for one classifier. The feature extraction is performed by means of gist image descriptors[4] augmented by several additional features. In this method, we map the symbol images into the feature space, and then we apply AutoClass classifier in order to recognise them. Unlike many of the state-of-the-art algorithms, the proposed algorithm is evolving, i.e. it has a capability of incremental learning as well as ability to change its structure during the training phase. The classifier update is performed sample by sample, and we should not memorize the training set to provide recognition or further update. It gives a possibility to adapt the classifier to the broadening and changing data sets, which is especially useful for large scale systems improvement during exploitation. More, the classifier is computationally cheap, and it has shown stable recognition time during the increase of training data set size that is extremely important for online applications.

Procedia Computer Science | 2015

Evolving classifier TEDAClass for big data

Dmitry Kangin; Plamen Angelov; José Antonio Iglesias; Araceli Sanchis

Abstract In the era of big data, huge amounts of data are generated and updated every day, and their processing and analysis is an important challenge today. In order to tackle this challenge, it is necessary to develop specific techniques which can process large volume of data within limited run times. TEDA is a new systematic framework for data analytics, which is based on the typicality and eccentricity of the data. This framework is spatially-aware, non-frequentist and non-parametric. TEDA can be used for development of alternative machine learning methods, in this work, we will use it for classification (TEDAClass). Specifically, we present a TEDAClass based approach which can process huge amounts of data items using a novel parallelization technique. Using this parallelization, we make possible the scalability of TEDAClass. In that way, the proposed approach is particularly useful for various applications, as it opens the doors for high-performance big data processing, which could be particularly useful for healthcare, banking, scientific and many other purposes.

Information Sciences | 2018

Self-Organised direction aware data partitioning algorithm

Xiaowei Gu; Plamen Angelov; Dmitry Kangin; Jose C. Principe

In this paper, a novel fully data-driven algorithm, named Self-Organised Direction Aware (SODA) data partitioning and forming data clouds is proposed. The proposed SODA algorithm employs an extra cosine similarity-based directional component to work together with a traditional distance metric, thus, takes the advantages of both the spatial and angular divergences. Using the nonparametric Empirical Data Analytics (EDA) operators, the proposed algorithm automatically identifies the main modes of the data pattern from the empirically observed data samples and uses them as focal points to form data clouds. A streaming data processing extension of the SODA algorithm is also proposed. This extension of the SODA algorithm is able to self-adjust the data clouds structure and parameters to follow the possibly changing data patterns and processes. Numerical examples provided as a proof of the concept illustrate the proposed algorithm as an autonomous algorithm and demonstrate its high clustering performance and computational efficiency.

International Symposium on Statistical Learning and Data Sciences | 2015

Recursive SVM Based on TEDA

Dmitry Kangin; Plamen Angelov

The new method for incremental learning of SVM model incorporating recently proposed TEDA approach is proposed. The method updates the widely renowned incremental SVM approach, as well as introduces new TEDA and RDE kernels which are learnable and capable of adaptation to data. The slack variables are also adaptive and depend on each point’s ‘importance’ combining the outliers detection with SVM slack variables to deal with misclassifications. Some suggestions on the evolving systems based on SVM are also provided. The examples of image recognition are provided to give a ‘proof of concept’ for the method.

international symposium on neural networks | 2017

Multi-Bernoulli filter for group object tracking and its Gaussian-Wishart implementation

Dmitry Kangin; Garik Markarian

The problem of multiple group object tracking is a challenging one and has been extensively researched during the last two decades. The problem solution, proposed in this article, is an extension of the well known multi-Bernoulli filter. The model is first formulated generally, then the linear Gaussian-Wishart implementation is proposed. Contrary to many known methods, which track either individual targets or groups, the proposed filter tracks individual objects and combines them into group tracks. The experiments on simulated data have shown a sustainable ability of the method to track group objects.

2015 Sensor Data Fusion: Trends, Solutions, Applications (SDF) | 2015

Multiple video object tracking using variational inference

Dmitry Kangin; Denis Kolev; Garegin Markarian

In this article a Bayesian filter approximation is proposed for simultaneous multiple target detection and tracking and then applied for object detection on video from moving camera. The inference uses the evidence lower bound optimisation for Gaussian mixtures. The proposed filter is capable of real time data processing and may be used as a basis for data fusion. The method we propose was tested on the video with dynamic background,where the velocity with respect to the background is used to discriminate the objects. The framework does not depend on the feature space, that means that different feature spaces can be unrestrictedly used while preserving the structure of the filter.

Information Sciences | 2016