Song-Chun Zhu
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Song-Chun Zhu.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1996
Song-Chun Zhu; Alan L. Yuille
We present a novel statistical and variational approach to image segmentation based on a new algorithm, named region competition. This algorithm is derived by minimizing a generalized Bayes/minimum description length (MDL) criterion using the variational principle. The algorithm is guaranteed to converge to a local minimum and combines aspects of snakes/balloons and region growing. The classic snakes/balloons and region growing algorithms can be directly derived from our approach. We provide theoretical analysis of region competition including accuracy of boundary location, criteria for initial conditions, and the relationship to edge detection using filters. It is straightforward to generalize the algorithm to multiband segmentation and we demonstrate it on gray level images, color images and texture images. The novel color model allows us to eliminate intensity gradients and shadows, thereby obtaining segmentation based on the albedos of objects. It also helps detect highlight regions.
International Journal of Computer Vision | 1998
Song-Chun Zhu; Ying Nian Wu; David Mumford
This article presents a statistical theory for texture modeling. This theory combines filtering theory and Markov random field modeling through the maximum entropy principle, and interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of view. Our theory characterizes the ensemble of images I with the same texture appearance by a probability distribution f(I) on a random field, and the objective of texture modeling is to make inference about f(I), given a set of observed texture examples.In our theory, texture modeling consists of two steps. (1) A set of filters is selected from a general filter bank to capture features of the texture, these filters are applied to observed texture images, and the histograms of the filtered images are extracted. These histograms are estimates of the marginal distributions of f( I). This step is called feature extraction. (2) The maximum entropy principle is employed to derive a distribution p(I), which is restricted to have the same marginal distributions as those in (1). This p(I) is considered as an estimate of f( I). This step is called feature fusion. A stepwise algorithm is proposed to choose filters from a general filter bank. The resulting model, called FRAME (Filters, Random fields And Maximum Entropy), is a Markov random field (MRF) model, but with a much enriched vocabulary and hence much stronger descriptive ability than the previous MRF models used for texture modeling. Gibbs sampler is adopted to synthesize texture images by drawing typical samples from p(I), thus the model is verified by seeing whether the synthesized texture images have similar visual appearances to the texture images being modeled. Experiments on a variety of 1D and 2D textures are described to illustrate our theory and to show the performance of our algorithms. These experiments demonstrate that many textures which are previously considered as from different categories can be modeled and synthesized in a common framework.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2002
Zhuowen Tu; Song-Chun Zhu
This paper presents a computational paradigm called Data-Driven Markov Chain Monte Carlo (DDMCMC) for image segmentation in the Bayesian statistical framework. The paper contributes to image segmentation in four aspects. First, it designs efficient and well-balanced Markov Chain dynamics to explore the complex solution space and, thus, achieves a nearly global optimal solution independent of initial segmentations. Second, it presents a mathematical principle and a K-adventurers algorithm for computing multiple distinct solutions from the Markov chain sequence and, thus, it incorporates intrinsic ambiguities in image segmentation. Third, it utilizes data-driven (bottom-up) techniques, such as clustering and edge detection, to compute importance proposal probabilities, which drive the Markov chain dynamics and achieve tremendous speedup in comparison to the traditional jump-diffusion methods. Fourth, the DDMCMC paradigm provides a unifying framework in which the role of many existing segmentation algorithms, such as, edge detection, clustering, region growing, split-merge, snake/balloon, and region competition, are revealed as either realizing Markov chain dynamics or computing importance proposal probabilities. Thus, the DDMCMC paradigm combines and generalizes these segmentation methods in a principled way. The DDMCMC paradigm adopts seven parametric and nonparametric image models for intensity and color at various regions. We test the DDMCMC paradigm extensively on both color and gray-level images and some results are reported in this paper.
International Journal of Computer Vision | 2005
Zhuowen Tu; Xiangrong Chen; Alan L. Yuille; Song-Chun Zhu
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.
Journal of Mathematical Imaging and Vision | 2003
Anuj Srivastava; Ann B. Lee; Eero P. Simoncelli; Song-Chun Zhu
Statistical analysis of images reveals two interesting properties: (i) invariance of image statistics to scaling of images, and (ii) non-Gaussian behavior of image statistics, i.e. high kurtosis, heavy tails, and sharp central cusps. In this paper we review some recent results in statistical modeling of natural images that attempt to explain these patterns. Two categories of results are considered: (i) studies of probability models of images or image decompositions (such as Fourier or wavelet decompositions), and (ii) discoveries of underlying image manifolds while restricting to natural images. Applications of these models in areas such as texture analysis, image classification, compression, and denoising are also considered.
Neural Computation | 1997
Song-Chun Zhu; Ying Nian Wu; David Mumford
This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. The first is the maximum entropy principle for feature binding (or fusion): for a given set of observed feature statistics, a distribution can be built to bind these feature statistics together by maximizing the entropy over all distributions that reproduce them. The second part is the minimum entropy principle for feature selection: among all plausible sets of feature statistics, we choose the set whose maximum entropy distribution has the minimum entropy. Computational and inferential issues in both parts are addressed; in particular, a feature pursuit procedure is proposed for approximately selecting the optimal set of features. The minimax entropy principle is then corrected by considering the sample variation in the observed feature statistics, and an information criterion for feature pursuit is derived. The minimax entropy principle is applied to texture modeling, where a novel Markov random field (MRF) model, called FRAME (filter, random field, and minimax entropy), is derived, and encouraging results are obtained in experiments on a variety of texture images. The relationship between our theory and the mechanisms of neural computation is also discussed.
international conference on computer vision | 1995
Song-Chun Zhu; Tai Sing Lee; Alan L. Yuille
We present a novel statistical and variational approach to image segmentation based on a new algorithm named region competition. This algorithm is derived by minimizing a generalized Bayes/MDL (Minimum Description Length) criterion using the variational principle. We show that existing techniques in early vision such as, snake/balloon models, region growing, and Bayes/MDL are addressing different aspects of the same problem and they can be unified within a common statistical framework which combines their advantages. We analyze how to optimize the precision of the resulting boundary location by studying the statistical properties of the region competition algorithm and discuss what are good initial conditions for the algorithm. Our method is generalized to color and texture segmentation and is demonstrated on grey level images, color images and texture images.<<ETX>>
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1997
Song-Chun Zhu; David Mumford
This article addresses two important themes in early visual computation: it presents a novel theory for learning the universal statistics of natural images, and, it proposes a general framework of designing reaction-diffusion equations for image processing. We studied the statistics of natural images including the scale invariant properties, then generic prior models were learned to duplicate the observed statistics, based on minimax entropy theory. The resulting Gibbs distributions have potentials of the form U(I; /spl Lambda/, S)=/spl Sigma//sub /spl alpha/=1//sup k//spl Sigma//sub x,y//spl lambda//sup (/spl alpha/)/((F/sup (/spl alpha/)/*I)(x,y)) with S={F/sup (1)/, F/sup (2)/,...,F/sup (K)/} being a set of filters and /spl Lambda/={/spl lambda//sup (1)/(),/spl lambda//sup (2)/(),...,/spl lambda//sup (K)/()} the potential functions. The learned Gibbs distributions confirm and improve the form of existing prior models such as line-process, but, in contrast to all previous models, inverted potentials were found to be necessary. We find that the partial differential equations given by gradient descent on U(I; /spl Lambda/, S) are essentially reaction-diffusion equations, where the usual energy terms produce anisotropic diffusion, while the inverted energy terms produce reaction associated with pattern formation, enhancing preferred image features. We illustrate how these models can be used for texture pattern rendering, denoising, image enhancement, and clutter removal by careful choice of both prior and data models of this type, incorporating the appropriate features.
Proceedings of the IEEE | 2010
Benjamin Z. Yao; Xiong Yang; Liang Lin; Mun Wai Lee; Song-Chun Zhu
In this paper, we present an image parsing to text description (I2T) framework that generates text descriptions of image and video content based on image understanding. The proposed I2T framework follows three steps: 1) input images (or video frames) are decomposed into their constituent visual patterns by an image parsing engine, in a spirit similar to parsing sentences in natural language; 2) the image parsing results are converted into semantic representation in the form of Web ontology language (OWL), which enables seamless integration with general knowledge bases; and 3) a text generation engine converts the results from previous steps into semantically meaningful, human readable, and query-able text reports. The centerpiece of the I2T framework is an and-or graph (AoG) visual knowledge representation, which provides a graphical representation serving as prior knowledge for representing diverse visual patterns and provides top-down hypotheses during the image parsing. The AoG embodies vocabularies of visual elements including primitives, parts, objects, scenes as well as a stochastic image grammar that specifies syntactic relations (i.e., compositional) and semantic relations (e.g., categorical, spatial, temporal, and functional) between these visual elements. Therefore, the AoG is a unified model of both categorical and symbolic representations of visual knowledge. The proposed I2T framework has two objectives. First, we use semiautomatic method to parse images from the Internet in order to build an AoG for visual knowledge representation. Our goal is to make the parsing process more and more automatic using the learned AoG model. Second, we use automatic methods to parse image/video in specific domains and generate text reports that are useful for real-world applications. In the case studies at the end of this paper, we demonstrate two automatic I2T systems: a maritime and urban scene video surveillance system and a real-time automatic driving scene understanding system.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010
Jinli Suo; Song-Chun Zhu; Shiguang Shan; Xilin Chen
In this paper, we present a compositional and dynamic model for face aging. The compositional model represents faces in each age group by a hierarchical And-or graph, in which And nodes decompose a face into parts to describe details (e.g., hair, wrinkles, etc.) crucial for age perception and Or nodes represent large diversity of faces by alternative selections. Then a face instance is a transverse of the And-or graph-parse graph. Face aging is modeled as a Markov process on the parse graph representation. We learn the parameters of the dynamic model from a large annotated face data set and the stochasticity of face aging is modeled in the dynamics explicitly. Based on this model, we propose a face aging simulation and prediction algorithm. Inversely, an automatic age estimation algorithm is also developed under this representation. We study two criteria to evaluate the aging results using human perception experiments: (1) the accuracy of simulation: whether the aged faces are perceived of the intended age group, and (2) preservation of identity: whether the aged faces are perceived as the same person. Quantitative statistical analysis validates the performance of our aging model and age estimation algorithm.