Computer Science Multimedia - Researchain

Featured Researches

Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications

Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset.

Multimedia

ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images

In this paper, we propose a novel image-to-image GAN framework for eyeglasses removal, called ByeGlassesGAN, which is used to automatically detect the position of eyeglasses and then remove them from face images. Our ByeGlassesGAN consists of an encoder, a face decoder, and a segmentation decoder. The encoder is responsible for extracting information from the source face image, and the face decoder utilizes this information to generate glasses-removed images. The segmentation decoder is included to predict the segmentation mask of eyeglasses and completed face region. The feature vectors generated by the segmentation decoder are shared with the face decoder, which facilitates better reconstruction results. Our experiments show that ByeGlassesGAN can provide visually appealing results in the eyeglasses-removed face images even for semi-transparent color eyeglasses or glasses with glare. Furthermore, we demonstrate significant improvement in face recognition accuracy for face images with glasses by applying our method as a pre-processing step in our face recognition experiment.

Multimedia

CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images

Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture search approach to shrink the network structure of existing vast, over-parameterized deep-learning based steganalyzers. We observe that the broad inverted-pyramid structure of existing deep-learning based steganalyzers might contradict the well-established model diversity oriented philosophy, and therefore is not suitable for steganalysis. Then a hybrid criterion combined with two network pruning schemes is introduced to adaptively shrink every involved convolutional layer in a data-driven manner. The resulting network architecture presents a slender bottleneck-like structure. We have conducted extensive experiments on BOSSBase+BOWS2 dataset, more diverse ALASKA dataset and even a large-scale subset extracted from ImageNet CLS-LOC dataset. The experimental results show that the model structure generated by our proposed CALPA-NET can achieve comparative performance with less than two percent of parameters and about one third FLOPs compared to the original steganalytic model. The new model possesses even better adaptivity, transferability, and scalability.

Multimedia

CBA: Contextual Quality Adaptation for Adaptive Bitrate Video Streaming (Extended Version)

Recent advances in quality adaptation algorithms leave adaptive bitrate (ABR) streaming architectures at a crossroads: When determining the sustainable video quality one may either rely on the information gathered at the client vantage point or on server and network assistance. The fundamental problem here is to determine how valuable either information is for the adaptation decision. This problem becomes particularly hard in future Internet settings such as Named Data Networking (NDN) where the notion of a network connection does not exist. In this paper, we provide a fresh view on ABR quality adaptation for QoE maximization, which we formalize as a decision problem under uncertainty, and for which we contribute a sparse Bayesian contextual bandit algorithm denoted CBA. This allows taking high-dimensional streaming context information, including client-measured variables and network assistance, to find online the most valuable information for the quality adaptation. Since sparse Bayesian estimation is computationally expensive, we develop a fast new inference scheme to support online video adaptation. We perform an extensive evaluation of our adaptation algorithm in the particularly challenging setting of NDN, where we use an emulation testbed to demonstrate the efficacy of CBA compared to state-of-the-art algorithms.

Multimedia

CIS-Net: A Novel CNN Model for Spatial Image Steganalysis via Cover Image Suppression

Image steganalysis is a special binary classification problem that aims to classify natural cover images and suspected stego images which are the results of embedding very weak secret message signals into covers. How to effectively suppress cover image content and thus make the classification of cover images and stego images easier is the key of this task. Recent researches show that Convolutional Neural Networks (CNN) are very effective to detect steganography by learning discriminative features between cover images and their stegos. Several deep CNN models have been proposed via incorporating domain knowledge of image steganography/steganalysis into the design of the network and achieve state of the art performance on standard database. Following such direction, we propose a novel model called Cover Image Suppression Network (CIS-Net), which improves the performance of spatial image steganalysis by suppressing cover image content as much as possible in model learning. Two novel layers, the Single-value Truncation Layer (STL) and Sub-linear Pooling Layer (SPL), are proposed in this work. Specifically, STL truncates input values into a same threshold when they are out of a predefined interval. Theoretically, we have proved that STL can reduce the variance of input feature map without deteriorating useful information. For SPL, it utilizes sub-linear power function to suppress large valued elements introduced by cover image contents and aggregates weak embedded signals via average pooling. Extensive experiments demonstrate the proposed network equipped with STL and SPL achieves better performance than rich model classifiers and existing CNN models on challenging steganographic algorithms.

Multimedia

CNN Based Adversarial Embedding with Minimum Alteration for Image Steganography

Historically, steganographic schemes were designed in a way to preserve image statistics or steganalytic features. Since most of the state-of-the-art steganalytic methods employ a machine learning (ML) based classifier, it is reasonable to consider countering steganalysis by trying to fool the ML classifiers. However, simply applying perturbations on stego images as adversarial examples may lead to the failure of data extraction and introduce unexpected artefacts detectable by other classifiers. In this paper, we present a steganographic scheme with a novel operation called adversarial embedding, which achieves the goal of hiding a stego message while at the same time fooling a convolutional neural network (CNN) based steganalyzer. The proposed method works under the conventional framework of distortion minimization. Adversarial embedding is achieved by adjusting the costs of image element modifications according to the gradients backpropagated from the CNN classifier targeted by the attack. Therefore, modification direction has a higher probability to be the same as the sign of the gradient. In this way, the so called adversarial stego images are generated. Experiments demonstrate that the proposed steganographic scheme is secure against the targeted adversary-unaware steganalyzer. In addition, it deteriorates the performance of other adversary-aware steganalyzers opening the way to a new class of modern steganographic schemes capable to overcome powerful CNN-based steganalysis.

Multimedia

CNN-based Steganalysis and Parametric Adversarial Embedding: a Game-Theoretic Framework

CNN-based steganalysis has recently achieved very good performance in detecting content-adaptive steganography. At the same time, recent works have shown that, by adopting an approach similar to that used to build adversarial examples, a steganographer can adopt an adversarial embedding strategy to effectively counter a target CNN steganalyzer. In turn, the good performance of the steganalyzer can be restored by retraining the CNN with adversarial stego images. A problem with this model is that, arguably, at training time the steganalizer is not aware of the exact parameters used by the steganograher for adversarial embedding and, vice versa, the steganographer does not know how the images that will be used to train the steganalyzer are generated. In order to exit this apparent deadlock, we introduce a game theoretic framework wherein the problem of setting the parameters of the steganalyzer and the steganographer is solved in a strategic way. More specifically, a non-zero sum game is first formulated to model the problem, and then instantiated by considering a specific adversarial embedding scheme setting its operating parameters in a game-theoretic fashion. Our analysis shows that the equilibrium solution of the non zero-sum game can be conveniently found by solving an associated zero-sum game, thus reducing greatly the complexity of the problem. Then we run several experiments to derive the optimum strategies for the steganographer and the staganalyst in a game-theoretic sense, and to evaluate the performance of the game at the equilibrium, characterizing the loss with respect to the conventional non-adversarial case. Eventually, by leveraging on the analysis of the equilibrium point of the game, we introduce a new strategy to improve the reliability of the steganalysis, which shows the benefits of addressing the security issue in a game-theoretic perspective.

Multimedia

CNN-based driving of block partitioning for intra slices encoding

This paper provides a technical overview of a deep-learning-based encoder method aiming at optimizing next generation hybrid video encoders for driving the block partitioning in intra slices. An encoding approach based on Convolutional Neural Networks is explored to partly substitute classical heuristics-based encoder speed-ups by a systematic and automatic process. The solution allows controlling the trade-off between complexity and coding gains, in intra slices, with one single parameter. This algorithm was proposed at the Call for Proposals of the Joint Video Exploration Team (JVET) on video compression with capability beyond HEVC. In All Intra configuration, for a given allowed topology of splits, a speed-up of ×2 is obtained without BD-rate loss, or a speed-up above ×4 with a loss below 1\% in BD-rate.

Multimedia

CSIS: compressed sensing-based enhanced-embedding capacity image steganography scheme

Image steganography plays a vital role in securing secret data by embedding it in the cover images. Usually, these images are communicated in a compressed format. Existing techniques achieve this but have low embedding capacity. Enhancing this capacity causes a deterioration in the visual quality of the stego-image. Hence, our goal here is to enhance the embedding capacity while preserving the visual quality of the stego-image. We also intend to ensure that our scheme is resistant to steganalysis attacks. This paper proposes a Compressed Sensing Image Steganography (CSIS) scheme to achieve our goal while embedding binary data in images. The novelty of our scheme is the combination of three components in attaining the above-listed goals. First, we use compressed sensing to sparsify cover image block-wise, obtain its linear measurements, and then uniquely select permissible measurements. Further, before embedding the secret data, we encrypt it using the Data Encryption Standard (DES) algorithm, and finally, we embed two bits of encrypted data into each permissible measurement. Second, we propose a novel data extraction technique, which is lossless and completely recovers our secret data. Third, for the reconstruction of the stego-image, we use the least absolute shrinkage and selection operator (LASSO) for the resultant optimization problem. We perform experiments on several standard grayscale images and a color image, and evaluate embedding capacity, PSNR value, mean SSIM index, NCC coefficients, and entropy. We achieve 1.53 times more embedding capacity as compared to the most recent scheme. We obtain an average of 37.92 dB PSNR value, and average values close to 1 for both the mean SSIM index and the NCC coefficients, which are considered good. Moreover, the entropy of cover images and their corresponding stego-images are nearly the same.

Multimedia

Camera Fingerprint Extraction via Spatial Domain Averaged Frames

Photo Response Non-Uniformity (PRNU) based camera attribution is an effective method to determine the source camera of visual media (an image or a video). To apply this method, images or videos need to be obtained from a camera to create a "camera fingerprint" which then can be compared against the PRNU of the query media whose origin is under question. The fingerprint extraction process can be time-consuming when a large number of video frames or images have to be denoised. This may need to be done when the individual images have been subjected to high compression or other geometric processing such as video stabilization. This paper investigates a simple, yet effective and efficient technique to create a camera fingerprint when so many still images need to be denoised. The technique utilizes Spatial Domain Averaged (SDA) frames. An SDA-frame is the arithmetic mean of multiple still images. When it is used for fingerprint extraction, the number of denoising operations can be significantly decreased with little or no performance loss. Experimental results show that the proposed method can work more than 50 times faster than conventional methods while providing similar matching results.

Ready to get started?

Join us today

Archive Your Research