Is this you? Create Your Porfile

Dit Yan Yeung

Hong Kong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dit Yan Yeung is active.

Explore More

Publication

Featured researches published by Dit Yan Yeung.

computer vision and pattern recognition | 2004

Super-resolution through neighbor embedding

Hong Chang; Dit Yan Yeung; Yimin Xiong

In this paper, we propose a novel method for solving single-image super-resolution problems. Given a low-resolution image as input, we recover its high-resolution counterpart using a set of training examples. While this formulation resembles other learning-based methods for super-resolution, our method has been inspired by recent manifold teaming methods, particularly locally linear embedding (LLE). Specifically, small image patches in the lowand high-resolution images form manifolds with similar local geometry in two distinct feature spaces. As in LLE, local geometry is characterized by how a feature vector corresponding to a patch can be reconstructed by its neighbors in the feature space. Besides using the training image pairs to estimate the high-resolution embedding, we also enforce local compatibility and smoothness constraints between patches in the target high-resolution image through overlapping. Experiments show that our method is very flexible and gives good empirical results.

IEEE Transactions on Neural Networks | 1997

Constructive algorithms for structure learning in feedforward neural networks for regression problems

Tin Yau Kwok; Dit Yan Yeung

In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state-space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm, and the network architecture, is then presented.

knowledge discovery and data mining | 2015

Collaborative Deep Learning for Recommender Systems

Hao Wang; Naiyan Wang; Dit Yan Yeung

Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recently advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art.

Pattern Recognition | 2003

Host-based intrusion detection using dynamic and static behavioral models

Dit Yan Yeung; Yuxin Ding

Intrusion detection has emerged as an important approach to network security. In this paper, we adopt an anomaly detection approach by detecting possible intrusions based on program or user profiles built from normal usage data. In particular, program profiles based on Unix system calls and user profiles based on Unix shell commands are modeled using two different types of behavioral models for data mining. The dynamic modeling approach is based on hidden Markov models (HMM) and the principle of maximum likelihood, while the static modeling approach is based on event occurrence frequency distributions and the principle of minimum cross entropy. The novelty detection approach is adopted to estimate the model parameters using normal training data only, as opposed to the classification approach which has to use both normal and intrusion data for training. To determine whether or not a certain behavior is similar enough to the normal model and hence should be classified as normal, we use a scheme that can be justified from the perspective of hypothesis testing. Our experimental results show that the dynamic modeling approach is better than the static modeling approach for the system call datasets, while the dynamic modeling approach is worse for the shell command datasets. Moreover, the static modeling approach is similar in performance to instance-based learning reported previously by others for the same shell command database but with much higher computational and storage requirements than our method.

Pattern Recognition | 2008

Robust path-based spectral clustering

Hong Chang; Dit Yan Yeung

Spectral clustering and path-based clustering are two recently developed clustering approaches that have delivered impressive results in a number of challenging clustering tasks. However, they are not robust enough against noise and outliers in the data. In this paper, based on M-estimation from robust statistics, we develop a robust path-based spectral clustering method by defining a robust path-based similarity measure for spectral clustering under both unsupervised and semi-supervised settings. Our proposed method is significantly more robust than spectral clustering and path-based clustering. We have performed experiments based on both synthetic and real-world data, comparing our method with some other methods. In particular, color images from the Berkeley segmentation data set and benchmark are used in the image segmentation experiments. Experimental results show that our method consistently outperforms other methods due to its higher robustness.

International Journal on Document Analysis and Recognition | 2000

Mathematical expression recognition: a survey

Kam Fai Chan; Dit Yan Yeung

Abstract. Automatic recognition of mathematical expressions is one of the key vehicles in the drive towards transcribing documents in scientific and engineering disciplines into electronic form. This problem typically consists of two major stages, namely, symbol recognition and structural analysis. In this survey paper, we will review most of the existing work with respect to each of the two major stages of the recognition process. In particular, we try to put emphasis on the similarities and differences between systems. Moreover, some important issues in mathematical expression recognition will be addressed in depth. All these together serve to provide a clear overall picture of how this research area has been developed to date.

international conference on pattern recognition | 2002

Parzen-window network intrusion detectors

Dit Yan Yeung; Calvin Chow

Network intrusion detection is the problem of detecting anomalous network connections caused by intrusive activities. Many intrusion detection systems proposed before use both normal and intrusion data to build their classifiers. However, intrusion data are usually scarce and difficult to collect. We propose to solve this problem using a novelty detection approach. In particular, we propose to take a nonparametric density estimation approach based on Parzen-window estimators with Gaussian kernels to build an intrusion detection system using normal data only. To facilitate comparison, we have tested our system on the KDD Cup 1999 dataset. Our system compares favorably with the KDD Cup winner which is based on an ensemble of decision trees with bagged boosting, as our system uses no intrusion data at all and much less normal data for training.

IEEE Transactions on Neural Networks | 1997

Objective functions for training new hidden units in constructive neural networks

Tin Yau Kwok; Dit Yan Yeung

In this paper, we study a number of objective functions for training new hidden units in constructive algorithms for multilayer feedforward networks. The aim is to derive a class of objective functions the computation of which and the corresponding weight updates can be done in O(N) time, where N is the number of training patterns. Moreover, even though input weight freezing is applied during the process for computational efficiency, the convergence property of the constructive algorithms using these objective functions is still preserved. We also propose a few computational tricks that can be used to improve the optimization of the objective functions under practical situations. Their relative performance in a set of two-dimensional regression problems is also discussed.

knowledge discovery and data mining | 2012

A probabilistic model for multimodal hash function learning

Yi Zhen; Dit Yan Yeung

In recent years, both hashing-based similarity search and multimodal similarity search have aroused much research interest in the data mining and other communities. While hashing-based similarity search seeks to address the scalability issue, multimodal similarity search deals with applications in which data of multiple modalities are available. In this paper, our goal is to address both issues simultaneously. We propose a probabilistic model, called multimodal latent binary embedding (MLBE), to learn hash functions from multimodal data automatically. MLBE regards the binary latent factors as hash codes in a common Hamming space. Given data from multiple modalities, we devise an efficient algorithm for the learning of binary latent factors which corresponds to hash function learning. Experimental validation of MLBE has been conducted using both synthetic data and two realistic data sets. Experimental results show that MLBE compares favorably with two state-of-the-art models.

Pattern Recognition | 2006

Robust locally linear embedding

Hong Chang; Dit Yan Yeung

In the past few years, some nonlinear dimensionality reduction (NLDR) or nonlinear manifold learning methods have aroused a great deal of interest in the machine learning community. These methods are promising in that they can automatically discover the low-dimensional nonlinear manifold in a high-dimensional data space and then embed the data points into a low-dimensional embedding space, using tractable linear algebraic techniques that are easy to implement and are not prone to local minima. Despite their appealing properties, these NLDR methods are not robust against outliers in the data, yet so far very little has been done to address the robustness problem. In this paper, we address this problem in the context of an NLDR method called locally linear embedding (LLE). Based on robust estimation techniques, we propose an approach to make LLE more robust. We refer to this approach as robust locally linear embedding (RLLE). We also present several specific methods for realizing this general RLLE approach. Experimental results on both synthetic and real-world data show that RLLE is very robust against outliers.

Explore More