Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Volodymyr Melnykov is active.

Publication


Featured researches published by Volodymyr Melnykov.


Statistics Surveys | 2010

Finite mixture models and model-based clustering

Volodymyr Melnykov; Ranjan Maitra

Finite mixture models have a long history in statistics, hav- ing been used to model pupulation heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classication. This paper provides a detailed review into mixture models and model-based clustering. Recent trends in the area, as well as open problems are also discussed.


Journal of Computational and Graphical Statistics | 2010

Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms

Ranjan Maitra; Volodymyr Melnykov

A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online.


Computational Statistics & Data Analysis | 2012

Initializing the EM algorithm in Gaussian mixture models with an unknown number of components

Volodymyr Melnykov; Igor Melnykov

An approach is proposed for initializing the expectation-maximization (EM) algorithm in multivariate Gaussian mixture models with an unknown number of components. As the EM algorithm is often sensitive to the choice of the initial parameter vector, efficient initialization is an important preliminary process for the future convergence of the algorithm to the best local maximum of the likelihood function. We propose a strategy initializing mean vectors by choosing points with higher concentrations of neighbors and using a truncated normal distribution for the preliminary estimation of dispersion matrices. The suggested approach is illustrated on examples and compared with several other initialization methods.


Computational Statistics & Data Analysis | 2016

Model-based biclustering of clickstream data

Volodymyr Melnykov

Navigation patterns expressed by sequences of visited web-sites or categories can characterize the behavior and habits of users. Such web-page routes taken by individuals are commonly called clickstreams. Clustering clickstream sequences is a recent yet challenging problem with many applications. The main difficulty is related to the fact that one needs to group categorical data sequences rather than vectors and the majority of traditional clustering algorithms are not applicable in this setting. The time-related character of data suggests that dynamic models have a better promise than static ones. Model-based clustering relying on the mixture of first order Markov models will be considered. Since the number of distinct web-pages, and therefore the number of states in a Markov process, can be very high, such a mixture model involves a large number of parameters. Thus, grouping states by their similarity to reduce the number of parameters in the model is also proposed. Then, states are clustered along with users providing a biclustering framework. The developed methodology is illustrated on synthetic and real datasets with good results.


Journal of the American Statistical Association | 2012

Bootstrapping for Significance of Compact Clusters in Multidimensional Datasets

Ranjan Maitra; Volodymyr Melnykov; Soumendra N. Lahiri

This article proposes a bootstrap approach for assessing significance in the clustering of multidimensional datasets. The procedure compares two models and declares the more complicated model a better candidate if there is significant evidence in its favor. The performance of the procedure is illustrated on two well-known classification datasets and comprehensively evaluated in terms of its ability to estimate the number of components via extensive simulation studies, with excellent results. The methodology is also applied to the problem of k-means color quantization of several standard images in the literature and is demonstrated to be a viable approach for determining the minimal and optimal numbers of colors needed to display an image without significant loss in resolution. Additional illustrations and performance evaluations are provided in the online supplementary material.


Journal of Computational and Graphical Statistics | 2016

Merging Mixture Components for Clustering Through Pairwise Overlap

Volodymyr Melnykov

Finite mixture models are well known for their flexibility in modeling heterogeneity in data. Model-based clustering is an important application of mixture models, which assumes that each mixture component distribution can adequately model a particular group of data. Unfortunately, when more than one component is needed for each group, the appealing one-to-one correspondence between mixture components and groups of data is ruined and model-based clustering loses its attractive interpretation. Several remedies have been considered in literature. We discuss the most promising recent results obtained in this area and propose a new algorithm that finds partitionings through merging mixture components relying on their pairwise overlap. The proposed technique is illustrated on a popular classification and several synthetic datasets, with excellent results.


Journal of Multivariate Analysis | 2013

On the distribution of posterior probabilities in finite mixture models with application in clustering

Volodymyr Melnykov

The paper discusses an approach based on the multivariate Delta method for approximating the distribution of posterior probabilities in finite mixture models. It can be used for developing distributions of many other characteristics involving posterior probabilities such as the entropy of fuzzy classification or expected cluster sizes. An application of the proposed methodology to clustering through merging mixture components is proposed and discussed. The methodology is studied and illustrated on simulated and well-known classification datasets with good results.


Computational Statistics & Data Analysis | 2016

Manly transformation in finite mixture modeling

Xuwen Zhu; Volodymyr Melnykov

Finite mixture modeling is one of the most rapidly developing areas of statistics due to its modeling flexibility and appealing interpretability. Gaussian mixture models have been popular among researchers for decades proving their usefulness in various applications. However, when Gaussian mixture components do not provide an adequate fit for the data, more general models must be considered. Traditional remedies for deviation from normality include employing a more appropriate distribution as well as transforming data to near-normality. Merging both approaches by introducing a mixture model with components derived from the multivariate Manly transformation is proposed. Such mixture models show good performance in modeling skewness and have excellent interpretability. Forward and backward model selection algorithms are proposed to choose an appropriate multivariate transformation. At each step of these algorithms, a model with the specific combination of skewness parameters is estimated by means of the expectation–maximization algorithm. The developed technique is carefully illustrated on synthetic data and applied to several well-known datasets, with promising results.


Statistical Analysis and Data Mining | 2012

Efficient estimation in model-based clustering of Gaussian regression time series

Volodymyr Melnykov

This paper discusses an alternative approach to the estimation procedure presented in a recently published paper. The authors developed a model-based clustering approach for regression time series and proposed the APECM procedure as an acceleration method for the expectation–maximization algorithm. The process of the estimation of model parameters was discussed in great detail. In this paper, we show how the proposed procedure can be modified to achieve substantial acceleration and better stability. In particular, numerical maximization suggested for the estimation of parameters can be replaced with analytical closed-form expressions, and inverting high dimensional matrices can be avoided entirely. A convenient approach for assessing variability in parameter estimates is also provided. The results of conducted experiments are very promising.


Journal of Classification | 2016

Finite Mixture Modeling of Gaussian Regression Time Series with Application to Dendrochronology

Semhar Michael; Volodymyr Melnykov

Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.

Collaboration


Dive into the Volodymyr Melnykov's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xuwen Zhu

University of Louisville

View shared research outputs
Top Co-Authors

Avatar

Semhar Michael

South Dakota State University

View shared research outputs
Top Co-Authors

Avatar

Igor Melnykov

Colorado State University–Pueblo

View shared research outputs
Top Co-Authors

Avatar

Wei-Chen Chen

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gang Shen

North Dakota State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge