Dvora Toledano-Kitai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dvora Toledano-Kitai is active.

Explore More

Publication

Featured researches published by Dvora Toledano-Kitai.

Machine Learning | 2011

Resampling approach for cluster model selection

Zeev Volkovich; Zeev Barzily; Gerhard-Wilhelm Weber; Dvora Toledano-Kitai; Renata Avros

In cluster analysis, selecting the number of clusters is an “ill-posed” problem of crucial importance. In this paper we propose a re-sampling method for assessing cluster stability. Our model suggests that samples’ occurrences in clusters can be considered as realizations of the same random variable in the case of the “true” number of clusters. Thus, similarity between different cluster solutions is measured by means of compound and simple probability metrics. Compound criteria result in validation rules employing the stability content of clusters. Simple probability metrics, in particular those based on kernels, provide more flexible geometrical criteria. We analyze several applications of probability metrics combined with methods intended to simulate cluster occurrences. Numerical experiments are provided to demonstrate and compare the different metrics and simulation approaches.

Journal of Global Optimization | 2013

Self-learning K-means clustering: a global optimization approach

Zeev Volkovich; Dvora Toledano-Kitai; Gerhard-Wilhelm Weber

An appropriate distance is an essential ingredient in various real-world learning tasks. Distance metric learning proposes to study a metric, which is capable of reflecting the data configuration much better in comparison with the commonly used methods. We offer an algorithm for simultaneous learning the Mahalanobis like distance and K-means clustering aiming to incorporate data rescaling and clustering so that the data separability grows iteratively in the rescaled space with its sequential clustering. At each step of the algorithm execution, a global optimization problem is resolved in order to minimize the cluster distortions resting upon the current cluster configuration. The obtained weight matrix can also be used as a cluster validation characteristic. Namely, closeness of such matrices learned during a sample process can indicate the clusters readiness; i.e. estimates the true number of clusters. Numerical experiments performed on synthetic and on real datasets verify the high reliability of the proposed method.

Central European Journal of Operations Research | 2012

An application of the minimal spanning tree approach to the cluster stability problem

Zeev Volkovich; Zeev Barzily; Gerhard-Wilhelm Weber; Dvora Toledano-Kitai; Renata Avros

Among the areas of data and text mining which are employed today in OR, science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. An important component of clustering theory is determination of the true number of clusters. This problem has not been satisfactorily solved. In our paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters, we estimate the stability of the partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured by the total number of edges, in the clusters’ minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis of well mingled samples, within the clusters, leads to an asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster, corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described distribution and estimates its left-asymmetry. Several presented numerical experiments demonstrate the ability of the approach to detect the true number of clusters.

POWER CONTROL AND OPTIMIZATION: Proceedings of the Second Global Conference on Power Control and Optimization | 2009

CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH

Zeev Volkovich; Zeev Barzily; Gerhard-Wilhelm Weber; Dvora Toledano-Kitai

Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters’ minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the stand...

Communications in Statistics-theory and Methods | 2011

On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem

Zeev Volkovich; Zeev Barzily; Renata Avros; Dvora Toledano-Kitai

K-Nearest Neighbors is a widely used technique for classifying and clustering data. In the current article, we address the cluster stability problem based upon probabilistic characteristics of this approach. We estimate the stability of partitions obtained from clustering pairs of samples. Partitions are presumed to be consistent if their clusters are stable. Clusters validity is quantified through the amount of K-Nearest Neighbors belonging to the points sample. The null-hypothesis, of the well-mixed samples within the clusters, suggests Binomial Distribution of this quantity with K trials and the success probability 0.5. A cluster is represented by a summarizing index, of the p-values calculated over all cluster objects, under the null hypothesis for the alternative, and the partition quality is evaluated via the worst partition cluster. The true number of clusters is attained by the empirical index distribution having maximal suitable asymmetry. The proposed methodology offers to produce the index distributions sequentially and to assess their asymmetry. Numerical experiments exhibit a good capability of the methodology to expose the true number of clusters.

soft computing | 2013

A binomial noised model for cluster validation

Dvora Toledano-Kitai; Renata Avros; Zeev Volkovich; Gerhard-Wilhelm Weber; Orly Yahalom

Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal “true” number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implement this idea, we draw pairs of independent equal sized samples, where one sample in any pair is drawn from the data source and the other one is drawn from a noised version thereof. We then run the same clustering method on both samples in any pair and test the similarity between the obtained partitions using a general k-Nearest Neighbor Binomial model. These similarity measurements enable us to estimate the correct number of clusters. A series of numerical experiments on both synthetic and real world data demonstrates the high capability of the offered discipline compared to other methods. In particular, the use of a noised data set is shown to produce significantly better results than in the case of using two independent samples which are both drawn from the data source.

Procedia Computer Science | 2015

An Iterative Projective Clustering Method

Renata Avros; Zakharia Frenkel; Dvora Toledano-Kitai; Zeev Volkovich

Abstract In this article we offer an algorithm recurrently divides a dataset by search of partitions via one dimensional subspace discovered by means of optimizing of a projected pursuit function. Aiming to assess the model order a resampling technique is employed. For each number of clusters, bounded by a predefined limit, samples from the projected data are drawn and clustered through the EM algorithm. Further, the basis cumulative histogram of the projected data is approximated by means of the GMM histograms constructed using the samples’ partitions. The saturation order of this approximation process, at what time the components’ amount increases, is recognized as the “true” components’ number. Afterward the whole data is clustered and the densest cluster is omitted. The process is repeated while waiting for the true number of clusters equals one. Numerical experiments demonstrate the high ability of the proposed method.

Archive | 2015

Feedback, Averaging and Randomization in Control and Data Mining

Oleg N. Granichin; Zeev Volkovich; Dvora Toledano-Kitai

Realization of a control affects the information contributing to new changes in the object of information. Formed control u enters the system and affects the state x changing it in many cases.

Archive | 2015

Randomized Stochastic Approximation

Oleg N. Granichin; Zeev Volkovich; Dvora Toledano-Kitai

Multidimensional stochastic optimization plays an important role in the analysis and control of many technical systems. Randomized algorithms of stochastic approximation with perturbed input have been suggested for solving the challenging multidimensional problems of optimization. These algorithms have simple forms and provide consistent estimates of the unknown parameters for observations under almost arbitrary noise. They are easily incorporated into the design of quantum devices for estimating the gradient vector of a multivariable function.

Archive | 2015

Randomized Control Strategies

Oleg N. Granichin; Zeev Volkovich; Dvora Toledano-Kitai

For adaptive control the identification approach is often used. This approach constructs estimates for the possible values of the unknown parameters x* based on observation sequences, and these estimates are then used in a parameterized feedback loop that, if properly selected, normally assumed or established the quality of a closed-loop system that satisfies the user.

Explore More