Xibin Zhu
Bielefeld University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xibin Zhu.
International Journal of Neural Systems | 2012
Andrej Gisbrecht; Bassam Mokbel; Frank-Michael Schleif; Xibin Zhu; Barbara Hammer
Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, they are infeasible already for medium sized data sets. The contribution of this article is twofold: On the one hand we propose a novel supervised prototype based classification technique for dissimilarity data based on popular learning vector quantization (LVQ), on the other hand we transfer a linear time approximation technique, the Nyström approximation, to this algorithm and an unsupervised counterpart, the relational generative topographic mapping (GTM). This way, linear time and space methods result. We evaluate the techniques on three examples from the biomedical domain.
Neurocomputing | 2012
Xibin Zhu; Andrej Gisbrecht; Frank-Michael Schleif; Barbara Hammer
Recently, diverse high quality prototype-based clustering techniques have been developed which can directly deal with data sets given by general pairwise dissimilarities rather than standard Euclidean vectors. Examples include affinity propagation, relational neural gas, or relational generative topographic mapping. Corresponding to the size of the dissimilarity matrix, these techniques scale quadratically with the size of the training set, such that training becomes prohibitive for large data volumes. In this contribution, we investigate two different linear time approximation techniques, patch processing and the Nystrom approximation. We apply these approximations to several representative clustering techniques for dissimilarities, where possible, and compare the results for diverse data sets.
international conference on neural information processing | 2011
Barbara Hammer; Frank-Michael Schleif; Xibin Zhu
Prototype-based models offer an intuitive interface to given data sets by means of an inspection of the model prototypes. Supervised classification can be achieved by popular techniques such as learning vector quantization (LVQ) and extensions derived from cost functions such as generalized LVQ (GLVQ) and robust soft LVQ (RSLVQ). These methods, however, are restricted to Euclidean vectors and they cannot be used if data are characterized by a general dissimilarity matrix. In this approach, we propose relational extensions of GLVQ and RSLVQ which can directly be applied to general possibly non-Euclidean data sets characterized by a symmetric dissimilarity matrix.
intelligent data analysis | 2011
Barbara Hammer; Bassam Mokbel; Frank-Michael Schleif; Xibin Zhu
Unlike many black-box algorithms in machine learning, prototype-based models offer an intuitive interface to given data sets, since prototypes can directly be inspected by experts in the field. Most techniques rely on Euclidean vectors such that their suitability for complex scenarios is limited. Recently, several unsupervised approaches have successfully been extended to general, possibly non-Euclidean data characterized by pairwise dissimilarities. In this paper, we shortly review a general approach to extend unsupervised prototype-based techniques to dissimilarities, and we transfer this approach to supervised prototypebased classification for general dissimilarity data. In particular, a new supervised prototype-based classification technique for dissimilarity data is proposed.
intelligent tutoring systems | 2012
Sebastian Gross; Xibin Zhu; Barbara Hammer; Niels Pinkwart
In this paper, we propose the use of machine learning techniques operating on sets of student solutions in order to automatically infer structure on these spaces. Feedback opportunities can then be derived from the clustered data. A validation of the approach based on data from a programming course confirmed the feasibility of the approach.
workshop on self organizing maps | 2011
Barbara Hammer; Andrej Gisbrecht; Alexander Hasenfuss; Bassam Mokbel; Frank-Michael Schleif; Xibin Zhu
Topographic mapping offers a very flexible tool to inspect large quantities of high-dimensional data in an intuitive way. Often, electronic data are inherently non-Euclidean and modern data formats are connected to dedicated non-Euclidean dissimilarity measures for which classical topographic mapping cannot be used. We give an overview about extensions of topographic mapping to general dissimilarities by means of median or relational extensions. Further, we discuss efficient approximations to avoid the usually squared time complexity.
hybrid artificial intelligence systems | 2012
Barbara Hammer; Bassam Mokbel; Frank-Michael Schleif; Xibin Zhu
While state-of-the-art classifiers such as support vector machines offer efficient classification for kernel data, they suffer from two drawbacks: the underlying classifier acts as a black box which can hardly be inspected by humans, and non-positive definite Gram matrices require additional preprocessing steps to arrive at a valid kernel. In this approach, we extend prototype-based classification towards general dissimilarity data resulting in a technology which (i) can deal with dissimilarity data characterized by an arbitrary symmetric dissimilarity matrix, (ii) offers intuitive classification in terms of prototypical class representatives, and (iii) leads to state-of-the-art classification results.
artificial intelligence applications and innovations | 2012
Frank-Michael Schleif; Xibin Zhu; Barbara Hammer
Current classification algorithms focus on vectorial data, given in euclidean or kernel spaces. Many real world data, like biological sequences are not vectorial and often non-euclidean, given by (dis-)similarities only, requesting for efficient and interpretable models. Current classifiers for such data require complex transformations and provide only crisp classification without any measure of confidence, which is a standard requirement in the life sciences. In this paper we propose a prototype-based conformal classifier for dissimilarity data. It effectively deals with dissimilarity data. The model complexity is automatically adjusted and confidence measures are provided. In experiments on dissimilarity data we investigate the effectiveness with respect to accuracy and model complexity in comparison to different state of the art classifiers.
Annals of Mathematics and Artificial Intelligence | 2015
Frank-Michael Schleif; Xibin Zhu; Barbara Hammer
Existing classification algorithms focus on vectorial data given in Euclidean space or representations by means of positive semi-definite kernel matrices. Many real world data, like biological sequences are not vectorial, often non-euclidean and given only in the form of (dis-)similarities between examples, requesting for efficient and interpretable models. Vectorial embeddings or transformations to get a valid kernel are limited and current dissimilarity classifiers often lead to dense complex models which are hard to interpret by domain experts. They also fail to provide additional information about the confidence of the classification. In this paper we propose a prototype-based conformal classifier for dissimilarity data. It is based on a prototype dissimilarity learner and extended by the conformal prediction methodology. It (i) can deal with dissimilarity data characterized by an arbitrary symmetric dissimilarity matrix, (ii) offers intuitive classification in terms of sparse prototypical class representatives, (iii) leads to state-of-the-art classification results supported by a confidence measure and (iv) the model complexity is automatically adjusted. In experiments on dissimilarity data we investigate the effectiveness with respect to accuracy and model complexity in comparison to different state of the art classifiers.
international conference on data mining | 2014
Frank-Michael Schleif; Thomas Villmann; Xibin Zhu
In supervised learning the parameters of a parametric Euclidean distance or mahalanobis distance can be effectively learned by so called Matrix Relevance Learning. This adaptation is not only useful to improve the discrimination capabilities of the model, but also to identify relevant features or relevant correlated features in the input data. Classical Matrix Relevance Learning scales quadratic with the number of input dimensions M and becomes prohibitive if M exceeds some thousand input features. We address Matrix Relevance Learning for data with a very large number of input dimensions. Such high dimensional data occur frequently in the life sciences domain e.g. For microarray or spectral data. We derive two respective approximation schemes and show exemplarily the implementation in Generalized Matrix Relevance Learning (GMLVQ) for classification problems. The first approximation scheme is based on Limited Rank Matrix Approximation (LiRaM) LiRaM is a random subspace projection technique which was formerly mainly considered for visualization purposes. The second novel approximation scheme is based on the Nystroem approximation and is exact if the number of Eigen values equals the rank of the Relevance Matrix. Using multiple benchmark problems, we demonstrate that the training process yields fast low rank approximations of the relevance matrices without harming the generalization ability. The approaches can be used to identify discriminative features for high dimensional data sets.