Mina Makar
Qualcomm
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mina Makar.
international conference on acoustics, speech, and signal processing | 2009
Mina Makar; Chuo-Ling Chang; David M. Chen; Sam S. Tsai; Bernd Girod
Local features are widely used for content-based image retrieval and object recognition. We present an efficient method for encoding digital images suitable for local feature extraction. First, we find the patches in the image corresponding to the detected features. Then, we extract these patches at their characteristic scale and orientation and encode them for efficient transmission. A Discrete Cosine Transform (DCT) with adaptive block size is used for patch compression. We compare this method to directly compressing feature descriptors using transform coding. Experimental results show the superior performance of our technique. Image patches can be compressed to rates around 55 bits/patch (18x compression relative to uncompressed SIFT feature descriptors) and still achieve good image matching performance.
IEEE Transactions on Image Processing | 2014
Mina Makar; Vijay Chandrasekhar; Sam S. Tsai; David M. Chen; Bernd Girod
Streaming mobile augmented reality applications require both real-time recognition and tracking of objects of interest in a video sequence. Typically, local features are calculated from the gradients of a canonical patch around a keypoint in individual video frames. In this paper, we propose a temporally coherent keypoint detector and design efficient interframe predictive coding techniques for canonical patches, feature descriptors, and keypoint locations. In the proposed system, we strive to transmit each patch or its equivalent feature descriptor with as few bits as possible by modifying a previously transmitted patch or descriptor. Our solution enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for image-based retrieval, and object localization, is sent at a bit-rate that is practical for todays wireless links and less than one-tenth of the bit-rate needed to stream the compressed video to the server.
International Journal of Semantic Computing | 2013
Mina Makar; Sam S. Tsai; Vijay Chandrasekhar; David M. Chen; Bernd Girod
Local features are widely used for content-based image retrieval and augmented reality applications. Typically, feature descriptors are calculated from the gradients of a canonical patch around a repeatable keypoint in the image. In this paper, we propose a temporally coherent keypoint detector and design efficient interframe predictive coding techniques for canonical patches and keypoint locations. In the proposed system, we strive to transmit each patch with as few bits as possible by simply modifying a previously transmitted patch. This enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for image-based retrieval and localization, can be sent over a wireless link at a low bit-rate. Experimental results show that our technique achieves a similar image matching performance at 1/15 of the bit-rate when compared to detecting keypoints independently frame-by-frame and allows performing streaming mobile augmented reality at low bit-rates of about 20–50 kbps, practical for todays wireless links.
IEEE Transactions on Image Processing | 2010
Chuo-Ling Chang; Mina Makar; Sam S. Tsai; Bernd Girod
The direction-adaptive partitioned block transform (DA-PBT) is proposed to exploit the directional features in color images to improve coding performance. Depending on the directionality in an image block, the transform either selects one of the eight directional modes or falls back to the nondirectional mode equivalent to the conventional 2-D DCT. The selection of a directional mode determines the transform direction that provides directional basis functions, the block partitioning that spatially confines the high-frequency energy, the scanning order that arranges the transform coefficients into a 1-D sequence for efficient entropy coding, and the quantization matrix optimized for visual quality. The DA-PBT can be incorporated into image coding using a rate-distortion optimized framework for direction selection, and can therefore be viewed as a generalization of variable blocksize transforms with the inclusion of directional transforms and nonrectangular partitions. As a block transform, it can naturally be combined with block-based intra or inter prediction to exploit the directionality remaining in the residual. Experimental results show that the proposed DA-PBT outperforms the 2-D DCT by more than 2 dB for test images with directional features. It also greatly reduces the ringing and checkerboard artifacts typically observed around directional features in images. The DA-PBT also consistently outperforms a previously proposed directional DCT. When combined with directional prediction, gains are less than additive, as similar signal properties are exploited by the prediction and the transform. For hybrid video coding, significant gains are shown for intra coding, but not for encoding the residual after accurate motion-compensated prediction.
international conference on image processing | 2014
Andre F. de Araújo; Mina Makar; Vijay Chandrasekhar; David M. Chen; Sam S. Tsai; Huizhong Chen; Roland Angst; Bernd Girod
We study the challenges of image-based retrieval when the database consists of videos. This variation of visual search is important for a broad range of applications that require indexing video databases based on their visual contents. We present new solutions to reduce storage requirements, while at the same time improving video search quality. The video database is preprocessed to find different appearances of the same visual elements, and build robust descriptors. Compression algorithms are developed to reduce systems storage requirements. We introduce a dataset of CNN broadcasts and queries that include photos taken with mobile phones and images of objects. Our experiments include pairwise matching and retrieval scenarios. We demonstrate one order of magnitude storage reduction and search quality improvements of up to 12% in mean average precision, compared to a baseline system that does not make use of our techniques.
international symposium on multimedia | 2012
Mina Makar; Sam S. Tsai; Vijay Chandrasekhar; David M. Chen; Bernd Girod
Local features are widely used for content-based image retrieval and augmented reality applications. Typically, feature descriptors are calculated from the gradients of a canonical patch around a repeatable key point in the image. In previous work, we showed that one can alternatively transmit the compressed canonical patch and perform descriptor computation at the receiving end with comparable performance. In this paper, we propose a temporally coherent key point detector in order to allow efficient interframe coding of canonical patches. In inter-patch compression, one strives to transmit each patch with as few bits as possible by simply modifying a previously transmitted patch. This enables server-based mobile augmented reality where a continuous stream of salient information, sufficient for the image-based retrieval, can be sent over a wireless link at the smallest possible bit-rate. Experimental results show that our technique achieves a similar image matching performance at 1/10 of the bit-rate when compared to detecting key points independently frame-by-frame.
Proceedings of SPIE | 2012
Sam S. Tsai; David M. Chen; Gabriel Takacs; Vijay Chandrasekhar; Mina Makar; Radek Grzeszczuk; Bernd Girod
In mobile visual search applications, an image-based query is typically sent from a mobile client to the server. Because of the bit-rate limitations, the query should be as small as possible. When performing image-based retrieval with local features, there are two types of information: the descriptors of the image features and the locations of the image features within the image. Location information can be used to check geometric consistency of the set of features and thus improve the retrieval performance. To compress the location information, location histogram coding is an effective solution. We present a location histogram coder that reduces the bitrate by 2:8x when compared to a fixed-rate scheme and 12:5x when compared to a floating point representation of the locations. A drawback is the large context table which can be difficult to store in the coder and requires large training data. We propose a new sum-based context for coding the location histogram map. We show that it can reduce the context up to 200x while being able to perform just as well as or better than previously proposed location histogram coders.
data compression conference | 2014
Vijay Chandrasekhar; Gabriel Takacs; David M. Chen; Sam S. Tsai; Mina Makar; Bernd Girod
MPEG is currently developing a standard titled Compact Descriptors for Visual Search (CDVS) for descriptor extraction and compression. In this work, we report comprehensive patch-level experiments for a direct comparison of low bitrate descriptors for visual search. For evaluating different compression schemes, we propose a dataset of matching pairs of image patches from the MPEG-CDVS image-level data sets. We propose a greedy rate allocation scheme for distributing bits across different spatialbins of the SIFT descriptor. We study a scheme based on Entropy Constrained Vector Quantization and greedy rate allocation, which performs close to the performancebound for any compression scheme. Finally, we present extensive feature-level Receiver Operating Characteristic (ROC) comparisons for different compression schemes (VectorQuantization, Transform Coding, Lattice Coding) proposed during the MPEG-CDVS standardization process.
data compression conference | 2014
David M. Chen; Mina Makar; Andre F. de Araújo; Bernd Girod
For mobile augmented reality, an image captured by a mobile devices camera is often compared against a database hosted on a remote server to recognize objects in the image. It is critically important that the amount of data transmitted over the network is as small as possible to reduce the system latency. A low bitrate global signature for still images has been previously shown to achieve high-accuracy image retrieval. In this paper, we develop new methods for interframe coding of a continuous stream of global signatures that can reduce the bitrate by nearly two orders of magnitude compared to independent coding of these global signatures, while achieving the same or better image retrieval accuracy. The global signatures are constructed in an embedded data structure that offers rate scalability. The usage of these new coding methods and the embedded data structure allows the streaming of high-quality global signatures at a bitrate that is less than 2 kbps. Furthermore, a statistical analysis of the retrieval and coding performance is performed to understand the trade off between bitrate and image retrieval accuracy and explain why interframe coding of global signatures substantially outperforms independent coding.
international conference on image processing | 2012
Mina Makar; Haricharan Lakshman; Vijay Chandrasekhar; Bernd Girod
Local features are widely used for content-based image retrieval and object recognition. Most feature descriptors are calculated from the gradients of a canonical patch around repeatable keypoints in the image. In this paper, we propose a technique for designing quantization matrices that reduce the mean squared error distortion of the gradient derived from DCT-encoded canonical patches. Experimental results demonstrate that our proposed patch encoder greatly outperforms a JPEG encoder at the same encoding complexity. Moreover, our quantization matrices achieve lower gradient distortion and larger number of feature matches at the same bit-rate.