Shih-Fu Chang
Columbia University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Shih-Fu Chang.
acm multimedia | 1997
John R. Smith; Shih-Fu Chang
We describe a highly functional prototype system for searching by visual features in an image database. The VisualSEEk system is novel in that the user forms the queries by diagramming spatial arrangements of color regions. The system nds the images that contain the most similar arrangements of similar regions. Prior to the queries, the system automatically extracts and indexes salient color regions from the images. By utilizing e cient indexing techniques for color information, region sizes and absolute and relative spatial locations, a wide variety of complex joint color/spatial queries may be computed.
computer vision and pattern recognition | 2012
Wei Liu; Jun Wang; Rongrong Ji; Yu-Gang Jiang; Shih-Fu Chang
Recent years have witnessed the growing popularity of hashing in large-scale vision problems. It has been shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing supervised methods either lack adequate performance or often incur cumbersome model training. In this paper, we propose a novel kernel-based supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing. The idea is to map the data to compact binary codes whose Hamming distances are minimized on similar pairs and simultaneously maximized on dissimilar pairs. Our approach is distinct from prior works by utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. We carry out extensive experiments on two image benchmarks with up to one million samples, demonstrating that our approach significantly outperforms the state-of-the-arts in searching both metric distance neighbors and semantically similar neighbors, with accuracy gains ranging from 13% to 46%.
IEEE MultiMedia | 2006
Milind R. Naphade; John R. Smith; Jelena Tesic; Shih-Fu Chang; Winston H. Hsu; Lyndon Kennedy; Alexander G. Hauptmann; Jon Curtis
As increasingly powerful techniques emerge for machine tagging multimedia content, it becomes ever more important to standardize the underlying vocabularies. Doing so provides interoperability and lets the multimedia community focus ongoing research on a well-defined set of semantics. This paper describes a collaborative effort of multimedia researchers, library scientists, and end users to develop a large standardized taxonomy for describing broadcast news video. The large-scale concept ontology for multimedia (LSCOM) is the first of its kind designed to simultaneously optimize utility to facilitate end-user access, cover a large semantic space, make automated extraction feasible, and increase observability in diverse broadcast news video data sets
IEEE Transactions on Circuits and Systems for Video Technology | 2001
Shih-Fu Chang; Thomas Sikora; A. Purl
MPEG-7, formally known as the Multimedia Content Description Interface, includes standardized tools (descriptors, description schemes, and language) enabling structural, detailed descriptions of audio-visual information at different granularity levels (region, image, video segment, collection) and in different areas (content description, management, organization, navigation, and user interaction). It aims to support and facilitate a wide range of applications, such as media portals, content broadcasting, and ubiquitous multimedia. We present a high-level overview of the MPEG-7 standard. We first discuss the scope, basic terminology, and potential applications. Next, we discuss the constituent components. Then, we compare the relationship with other standards to highlight its capabilities.
IEEE Transactions on Circuits and Systems for Video Technology | 2001
Ching-Yung Lin; Shih-Fu Chang
Image authentication verifies the originality of an image by detecting malicious manipulations. Its goal is different from that of image watermarking, which embeds into the image a signature surviving most manipulations. Most existing methods for image authentication treat all types of manipulation equally (i.e., as unacceptable). However, some practical applications demand techniques that can distinguish acceptable manipulations (e.g., compression) from malicious ones. In this paper, we present an effective technique for image authentication which can prevent malicious manipulations but allow JPEG lossy compression. The authentication signature is based on the invariance of the relationships between discrete cosine transform (DCT) coefficients at the same position in separate blocks of an image. These relationships are preserved when DCT coefficients are quantized in JPEG compression. Our proposed method can distinguish malicious manipulations from JPEG lossy compression regardless of the compression ratio or the number of compression iterations. We describe adaptive methods with probabilistic guarantee to handle distortions introduced by various acceptable manipulations such as integer rounding, image filtering, image enhancement, or scaling-recaling. We also present theoretical and experimental results to demonstrate the effectiveness of the technique.
Storage and Retrieval for Image and Video Databases | 1996
John R. Smith; Shih-Fu Chang
The growth of digital image and video archives is increasing the need for tools that effectively filter and efficiently search through large amounts of visual data. Towards this goal we propose a technique by which the color content of images and videos is automatically extracted to form a class of meta-data that is easily indexed. The color indexing algorithm uses the back- projection of binary color sets to extract color regions from images. This technique provides for both the automated extraction of regions and representation of their color content. It overcomes some of the problems with color histogram techniques such as high-dimensional feature vectors, spatial localization, indexing and distance computation. We present the binary color set back-projection technique and discuss its implementation in the VisualSEEk content- based image/video retrieval system for the World Wide Web. We also evaluate the retrieval effectiveness of the color set back-projection method and compare its performance to other color image retrieval methods.
IEEE Journal on Selected Areas in Communications | 1995
Shih-Fu Chang; David G. Messerschmitt
Many advanced video applications require manipulations of compressed video signals. Popular video manipulation functions include overlap (opaque or semitransparent), translation, scaling, linear filtering, rotation, and pixel multiplication. We propose algorithms to manipulate compressed video in the compressed domain. Specifically, we focus on compression algorithms using the discrete cosine transform (DCT) with or without motion compensation (MC). Such compression systems include JPEG, motion JPEG, MPEG, and H.261. We derive a complete set of algorithms for all aforementioned manipulation functions in the transform domain, in which video signals are represented by quantized transform coefficients. Due to a much lower data rate and the elimination of decompression/compression conversion, the transform-domain approach has great potential in reducing the computational complexity. The actual computational speedup depends on the specific manipulation functions and the compression characteristics of the input video, such as the compression rate and the nonzero motion vector percentage. The proposed techniques can be applied to general orthogonal transforms, such as the discrete trigonometric transform. For compression systems incorporating MC (such as MPEG), we propose a new decoding algorithm to reconstruct the video in the transform domain and then perform the desired manipulations in the transform domain. The same technique can be applied to efficient video transcoding (e.g., from MPEG to JPEG) with minimal decoding. >
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Jun Wang; Sanjiv Kumar; Shih-Fu Chang
Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.
computer vision and pattern recognition | 2010
Jun Wang; Sanjiv Kumar; Shih-Fu Chang
Large scale image search has recently attracted considerable attention due to easy availability of huge amounts of data. Several hashing methods have been proposed to allow approximate but highly efficient search. Unsupervised hashing methods show good performance with metric distances but, in image search, semantic similarity is usually given in terms of labeled pairs of images. There exist supervised hashing methods that can handle such semantic similarity but they are prone to overfitting when labeled data is small or noisy. Moreover, these methods are usually very slow to train. In this work, we propose a semi-supervised hashing method that is formulated as minimizing empirical error on the labeled data while maximizing variance and independence of hash bits over the labeled and unlabeled data. The proposed method can handle both metric as well as semantic similarity. The experimental results on two large datasets (up to one million samples) demonstrate its superior performance over state-of-the-art supervised and unsupervised methods.
IEEE MultiMedia | 1997
John R. Smith; Shih-Fu Chang
New visual information in the form of images, graphics, animations and videos is published on the World Wide Web at an incredible rate. However, cataloging it exceeds the capabilities of current text-based Web search engines. WebSeek provides a complete system that collects visual information from the Web by automated agents, then catalogs and indexes it for fast searching and retrieval.
