Computer Science Multimedia - Researchain

Featured Researches

AdaCompress: Adaptive Compression for Online Computer Vision Services

With the growth of computer vision based applications and services, an explosive amount of images have been uploaded to cloud servers which host such computer vision algorithms, usually in the form of deep learning models. JPEG has been used as the {\em de facto} compression and encapsulation method before one uploads the images, due to its wide adaptation. However, standard JPEG configuration does not always perform well for compressing images that are to be processed by a deep learning model, e.g., the standard quality level of JPEG leads to 50\% of size overhead (compared with the best quality level selection) on ImageNet under the same inference accuracy in popular computer vision models including InceptionNet, ResNet, etc. Knowing this, designing a better JPEG configuration for online computer vision services is still extremely challenging: 1) Cloud-based computer vision models are usually a black box to end-users; thus it is difficult to design JPEG configuration without knowing their model structures. 2) JPEG configuration has to change when different users use it. In this paper, we propose a reinforcement learning based JPEG configuration framework. In particular, we design an agent that adaptively chooses the compression level according to the input image's features and backend deep learning models. Then we train the agent in a reinforcement learning way to adapt it for different deep learning cloud services that act as the {\em interactive training environment} and feeding a reward with comprehensive consideration of accuracy and data size. In our real-world evaluation on Amazon Rekognition, Face++ and Baidu Vision, our approach can reduce the size of images by 1/2 -- 1/3 while the overall classification accuracy only decreases slightly.

Multimedia

Adaptive Control of Embedding Strength in Image Watermarking using Neural Networks

Digital image watermarking has been widely used in different applications such as copyright protection of digital media, such as audio, image, and video files. Two opposing criteria of robustness and transparency are the goals of watermarking methods. In this paper, we propose a framework for determining the appropriate embedding strength factor. The framework can use most DWT and DCT based blind watermarking approaches. We use Mask R-CNN on the COCO dataset to find a good strength factor for each sub-block. Experiments show that this method is robust against different attacks and has good transparency.

Multimedia

Adaptive Embedding Pattern for Grayscale-Invariance Reversible Data Hiding

In traditional reversible data hiding (RDH) methods, researchers pay attention to enlarge the embedding capacity (EC) and to reduce the embedding distortion (ED). Recently, a completely novel RDH algorithm was developed to embed secret data into color image without changing the corresponding grayscale [1], which largely expands the applications of RDH. In [1], for color image, channel R and channel B are exploited to carry secret information, channel G is adjusted for balancing the modifications of channel R and channel B to keep the invariance of grayscale. However, we found that the embedding performance (EP) of that method is still unsatisfied and could be further enhanced. To improve the EP, an adaptive embedding pattern is introduced to enhance the competence of algorithm for selectively embedding different bits of secret data into pixels according to context information. Moreover, a novel two-level predictor is designed by uniting two normal predictors for reducing the ED for embedding more bits. Experimental results demonstrate that, compared to the previous method, our scheme could significantly enhance the image fidelity while keeping the grayscale invariant.

Multimedia

Adaptive Multi-modal Fusion Hashing via Hadamard Matrix

Hashing plays an important role in information retrieval, due to its low storage and high speed of processing. Among the techniques available in the literature, multi-modal hashing, which can encode heterogeneous multi-modal features into compact hash codes, has received particular attention. Most of the existing multi-modal hashing methods adopt the fixed weighting factors to fuse multiple modalities for any query data, which cannot capture the variation of different queries. Besides, many methods introduce hyper-parameters to balance many regularization terms that make the optimization harder. Meanwhile, it is time-consuming and labor-intensive to set proper parameter values. The limitations may significantly hinder their promotion in real applications. In this paper, we propose a simple, yet effective method that is inspired by the Hadamard matrix. The proposed method captures the multi-modal feature information in an adaptive manner and preserves the discriminative semantic information in the hash codes. Our framework is flexible and involves a very few hyper-parameters. Extensive experimental results show the method is effective and achieves superior performance compared to state-of-the-art algorithms.

Multimedia

Adaptive Music Composition for Games

The generation of music that adapts dynamically to content and actions has an important role in building more immersive, memorable and emotive game experiences. To date, the development of adaptive music systems for video games is limited by both the nature of algorithms used for real-time music generation and the limited modelling of player action, game world context and emotion in current games. We propose that these issues must be addressed in tandem for the quality and flexibility of adaptive game music to significantly improve. Cognitive models of knowledge organisation and emotional affect are integrated with multi-modal, multi-agent composition techniques to produce a novel Adaptive Music System (AMS). The system is integrated into two stylistically distinct games. Gamers reported an overall higher immersion and correlation of music with game-world concepts with the AMS than with the original game soundtracks in both games.

Multimedia

Adaptive Rate Allocation for View-Aware Point-Cloud Streaming

In the context of view-dependent point-cloud streaming in a scene, our rate allocation is "adaptive" in the sense that it priorities the point-cloud models depending on the camera view and the visibility of the objects and their distance as described. The algorithm delivers higher bitrate to the point-cloud models which are inside user's viewport, more likely for the user to look at, or are closer to the view camera or, while delivers lower quality level to the point-cloud models outside of a user's immediate viewport or farther away from the camera. For that purpose, we hereby explain the rate allocation problem within the context of multi-point-cloud streaming where multiple point-cloud models are aimed to be streamed to the target device, and propose a rate allocation heuristic algorithm to enable the adaptations within this context. To the best of our knowledge, this is the first work to mathematically model, and propose a rate allocation heuristic algorithm within the context of point-cloud streaming.

Multimedia

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

Cross-modal retrieval aims to retrieve relevant data across different modalities (e.g., texts vs. images). The common strategy is to apply element-wise constraints between manually labeled pair-wise items to guide the generators to learn the semantic relationships between the modalities, so that the similar items can be projected close to each other in the common representation subspace. However, such constraints often fail to preserve the semantic structure between unpaired but semantically similar items (e.g. the unpaired items with the same class label are more similar than items with different labels). To address the above problem, we propose a novel cross-modal similarity transferring (CMST) method to learn and preserve the semantic relationships between unpaired items in an unsupervised way. The key idea is to learn the quantitative similarities in single-modal representation subspace, and then transfer them to the common representation subspace to establish the semantic relationships between unpaired items across modalities. Experiments show that our method outperforms the state-of-the-art approaches both in the class-based and pair-based retrieval tasks.

Multimedia

Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data. We begin this survey by introducing the typical emotion representation models from psychology that are widely employed in AC. We briefly describe the available datasets for evaluating AC algorithms. We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods. Finally, we discuss some challenges and future directions for multimedia affective computing.

Multimedia

Ambiguity of Objective Image Quality Metrics: A New Methodology for Performance Evaluation

Objective image quality metrics try to estimate the perceptual quality of the given image by considering the characteristics of the human visual system. However, it is possible that the metrics produce different quality scores even for two images that are perceptually indistinguishable by human viewers, which have not been considered in the existing studies related to objective quality assessment. In this paper, we address the issue of ambiguity of objective image quality assessment. We propose an approach to obtain an ambiguity interval of an objective metric, within which the quality score difference is not perceptually significant. In particular, we use the visual difference predictor, which can consider viewing conditions that are important for visual quality perception. In order to demonstrate the usefulness of the proposed approach, we conduct experiments with 33 state-of-the-art image quality metrics in the viewpoint of their accuracy and ambiguity for three image quality databases. The results show that the ambiguity intervals can be applied as an additional figure of merit when conventional performance measurement does not determine superiority between the metrics. The effect of the viewing distance on the ambiguity interval is also shown.

Multimedia

An Advert Creation System for Next-Gen Publicity

With the rapid proliferation of multimedia data in the internet, there has been a fast rise in the creation of videos for the viewers. This enables the viewers to skip the advertisement breaks in the videos, using ad blockers and 'skip ad' buttons -- bringing online marketing and publicity to a stall. In this paper, we demonstrate a system that can effectively integrate a new advertisement into a video sequence. We use state-of-the-art techniques from deep learning and computational photogrammetry, for effective detection of existing adverts, and seamless integration of new adverts into video sequences. This is helpful for targeted advertisement, paving the path for next-gen publicity.

Ready to get started?

Join us today

Archive Your Research