Dima Kuzmin
University of California, Santa Cruz
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dima Kuzmin.
Machine Learning | 2012
Manfred K. Warmuth; Dima Kuzmin
We consider the following type of online variance minimization problem: In every trial t our algorithms get a covariance matrix Ct and try to select a parameter vector wt−1 such that the total variance over a sequence of trials
conference on learning theory | 2007
Dima Kuzmin; Manfred K. Warmuth
\sum_{t=1}^{T} (\boldsymbol {w}^{t-1})^{\top} \boldsymbol {C}^{t}\boldsymbol {w}^{t-1}
Machine Learning | 2010
Manfred K. Warmuth; Dima Kuzmin
is not much larger than the total variance of the best parameter vector u chosen in hindsight. Two parameter spaces in ℝn are considered—the probability simplex and the unit sphere. The first space is associated with the problem of minimizing risk in stock portfolios and the second space leads to an online calculation of the eigenvector with minimum eigenvalue of the total covariance matrix
international conference on machine learning | 2007
Dima Kuzmin; Manfred K. Warmuth
\sum_{t=1}^{T} \boldsymbol {C}^{t}
conference on learning theory | 2005
Dima Kuzmin; Manfred K. Warmuth
. For the first parameter space we apply the Exponentiated Gradient algorithm which is motivated with a relative entropy regularization. In the second case, the algorithm has to maintain uncertainty information over all unit directions u. For this purpose, directions are represented as dyads uu⊤ and the uncertainty over all directions as a mixture of dyads which is a density matrix. The motivating divergence for density matrices is the quantum version of the relative entropy and the resulting algorithm is a special case of the Matrix Exponentiated Gradient algorithm. In each of the two cases we prove bounds on the additional total variance incurred by the online algorithm over the best offline parameter.
Journal of Machine Learning Research | 2008
Manfred K. Warmuth; Dima Kuzmin
We give a compression scheme for any maximum class of VC dimension d that compresses any sample consistent with a concept in the class to at most d unlabeled points from the domain of the sample.
neural information processing systems | 2006
Manfred K. Warmuth; Dima Kuzmin
One of the main concepts in quantum physics is a density matrix, which is a symmetric positive definite matrix of trace one. Finite probability distributions can be seen as a special case when the density matrix is restricted to be diagonal.We develop a probability calculus based on these more general distributions that includes definitions of joints, conditionals and formulas that relate these, including analogs of the Theorem of Total Probability and various Bayes rules for the calculation of posterior density matrices. The resulting calculus parallels the familiar “conventional” probability calculus and always retains the latter as a special case when all matrices are diagonal. We motivate both the conventional and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the conventional Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices.Whereas the conventional Bayesian methods maintain uncertainty about which model has the highest data likelihood, the generalization maintains uncertainty about which unit direction has the largest variance. Surprisingly the bounds also generalize: as in the conventional setting we upper bound the negative log likelihood of the data by the negative log likelihood of the MAP estimator.
uncertainty in artificial intelligence | 2006
Manfred K. Warmuth; Dima Kuzmin
A number of updates for density matrices have been developed recently that are motivated by relative entropy minimization problems. The updates involve a softmin calculation based on matrix logs and matrix exponentials. We show that these updates can be kernelized. This is important because the bounds provable for these algorithms are logarithmic in the feature dimension (provided that the 2-norm of feature vectors is bounded by a constant). The main problem we focus on is the kernelization of an online PCA algorithm which belongs to this family of updates.
Archive | 2009
Manfred K. Warmuth; Dima Kuzmin
Consider the following setting for an on-line algorithm (introduced in [FS97]) that learns from a set of experts: In trial t the algorithm chooses an expert with probability p
Lecture Notes in Computer Science | 2005
Dima Kuzmin; Manfred K. Warmuth
^{t}_{i}