Dvora Toledano-Kitai
ORT Braude College of Engineering
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dvora Toledano-Kitai.
Machine Learning | 2011
Zeev Volkovich; Zeev Barzily; Gerhard-Wilhelm Weber; Dvora Toledano-Kitai; Renata Avros
In cluster analysis, selecting the number of clusters is an “ill-posed” problem of crucial importance. In this paper we propose a re-sampling method for assessing cluster stability. Our model suggests that samples’ occurrences in clusters can be considered as realizations of the same random variable in the case of the “true” number of clusters. Thus, similarity between different cluster solutions is measured by means of compound and simple probability metrics. Compound criteria result in validation rules employing the stability content of clusters. Simple probability metrics, in particular those based on kernels, provide more flexible geometrical criteria. We analyze several applications of probability metrics combined with methods intended to simulate cluster occurrences. Numerical experiments are provided to demonstrate and compare the different metrics and simulation approaches.
Journal of Global Optimization | 2013
Zeev Volkovich; Dvora Toledano-Kitai; Gerhard-Wilhelm Weber
An appropriate distance is an essential ingredient in various real-world learning tasks. Distance metric learning proposes to study a metric, which is capable of reflecting the data configuration much better in comparison with the commonly used methods. We offer an algorithm for simultaneous learning the Mahalanobis like distance and K-means clustering aiming to incorporate data rescaling and clustering so that the data separability grows iteratively in the rescaled space with its sequential clustering. At each step of the algorithm execution, a global optimization problem is resolved in order to minimize the cluster distortions resting upon the current cluster configuration. The obtained weight matrix can also be used as a cluster validation characteristic. Namely, closeness of such matrices learned during a sample process can indicate the clusters readiness; i.e. estimates the true number of clusters. Numerical experiments performed on synthetic and on real datasets verify the high reliability of the proposed method.
Central European Journal of Operations Research | 2012
Zeev Volkovich; Zeev Barzily; Gerhard-Wilhelm Weber; Dvora Toledano-Kitai; Renata Avros
Among the areas of data and text mining which are employed today in OR, science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. An important component of clustering theory is determination of the true number of clusters. This problem has not been satisfactorily solved. In our paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters, we estimate the stability of the partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured by the total number of edges, in the clusters’ minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis of well mingled samples, within the clusters, leads to an asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster, corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described distribution and estimates its left-asymmetry. Several presented numerical experiments demonstrate the ability of the approach to detect the true number of clusters.
POWER CONTROL AND OPTIMIZATION: Proceedings of the Second Global Conference on Power Control and Optimization | 2009
Zeev Volkovich; Zeev Barzily; Gerhard-Wilhelm Weber; Dvora Toledano-Kitai
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters’ minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the stand...
Communications in Statistics-theory and Methods | 2011
Zeev Volkovich; Zeev Barzily; Renata Avros; Dvora Toledano-Kitai
K-Nearest Neighbors is a widely used technique for classifying and clustering data. In the current article, we address the cluster stability problem based upon probabilistic characteristics of this approach. We estimate the stability of partitions obtained from clustering pairs of samples. Partitions are presumed to be consistent if their clusters are stable. Clusters validity is quantified through the amount of K-Nearest Neighbors belonging to the points sample. The null-hypothesis, of the well-mixed samples within the clusters, suggests Binomial Distribution of this quantity with K trials and the success probability 0.5. A cluster is represented by a summarizing index, of the p-values calculated over all cluster objects, under the null hypothesis for the alternative, and the partition quality is evaluated via the worst partition cluster. The true number of clusters is attained by the empirical index distribution having maximal suitable asymmetry. The proposed methodology offers to produce the index distributions sequentially and to assess their asymmetry. Numerical experiments exhibit a good capability of the methodology to expose the true number of clusters.
soft computing | 2013
Dvora Toledano-Kitai; Renata Avros; Zeev Volkovich; Gerhard-Wilhelm Weber; Orly Yahalom
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal “true” number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implement this idea, we draw pairs of independent equal sized samples, where one sample in any pair is drawn from the data source and the other one is drawn from a noised version thereof. We then run the same clustering method on both samples in any pair and test the similarity between the obtained partitions using a general k-Nearest Neighbor Binomial model. These similarity measurements enable us to estimate the correct number of clusters. A series of numerical experiments on both synthetic and real world data demonstrates the high capability of the offered discipline compared to other methods. In particular, the use of a noised data set is shown to produce significantly better results than in the case of using two independent samples which are both drawn from the data source.
Procedia Computer Science | 2015
Renata Avros; Zakharia Frenkel; Dvora Toledano-Kitai; Zeev Volkovich
Abstract In this article we offer an algorithm recurrently divides a dataset by search of partitions via one dimensional subspace discovered by means of optimizing of a projected pursuit function. Aiming to assess the model order a resampling technique is employed. For each number of clusters, bounded by a predefined limit, samples from the projected data are drawn and clustered through the EM algorithm. Further, the basis cumulative histogram of the projected data is approximated by means of the GMM histograms constructed using the samples’ partitions. The saturation order of this approximation process, at what time the components’ amount increases, is recognized as the “true” components’ number. Afterward the whole data is clustered and the densest cluster is omitted. The process is repeated while waiting for the true number of clusters equals one. Numerical experiments demonstrate the high ability of the proposed method.
Archive | 2015
Oleg N. Granichin; Zeev Volkovich; Dvora Toledano-Kitai
Realization of a control affects the information contributing to new changes in the object of information. Formed control u enters the system and affects the state x changing it in many cases.
Archive | 2015
Oleg N. Granichin; Zeev Volkovich; Dvora Toledano-Kitai
Multidimensional stochastic optimization plays an important role in the analysis and control of many technical systems. Randomized algorithms of stochastic approximation with perturbed input have been suggested for solving the challenging multidimensional problems of optimization. These algorithms have simple forms and provide consistent estimates of the unknown parameters for observations under almost arbitrary noise. They are easily incorporated into the design of quantum devices for estimating the gradient vector of a multivariable function.
Archive | 2015
Oleg N. Granichin; Zeev Volkovich; Dvora Toledano-Kitai
For adaptive control the identification approach is often used. This approach constructs estimates for the possible values of the unknown parameters x* based on observation sequences, and these estimates are then used in a parameterized feedback loop that, if properly selected, normally assumed or established the quality of a closed-loop system that satisfies the user.