Computer Science Multimedia - Researchain

Featured Researches

DEMC: A Deep Dual-Encoder Network for Denoising Monte Carlo Rendering

In this paper, we present DEMC, a deep Dual-Encoder network to remove Monte Carlo noise efficiently while preserving details. Denoising Monte Carlo rendering is different from natural image denoising since inexpensive by-products (feature buffers) can be extracted in the rendering stage. Most of them are noise-free and can provide sufficient details for image reconstruction. However, these feature buffers also contain redundant information, which makes Monte Carlo denoising different from natural image denoising. Hence, the main challenge of this topic is how to extract useful information and reconstruct clean images. To address this problem, we propose a novel network structure, Dual-Encoder network with a feature fusion sub-network, to fuse feature buffers firstly, then encode the fused feature buffers and a noisy image simultaneously, and finally reconstruct a clean image by a decoder network. Compared with the state-of-the-art methods, our model is more robust on a wide range of scenes and is able to generate satisfactory results in a significantly faster way.

Multimedia

DIME: An Online Tool for the Visual Comparison of Cross-Modal Retrieval Models

Cross-modal retrieval relies on accurate models to retrieve relevant results for queries across modalities such as image, text, and video. In this paper, we build upon previous work by tackling the difficulty of evaluating models both quantitatively and qualitatively quickly. We present DIME (Dataset, Index, Model, Embedding), a modality-agnostic tool that handles multimodal datasets, trained models, and data preprocessors to support straightforward model comparison with a web browser graphical user interface. DIME inherently supports building modality-agnostic queryable indexes and extraction of relevant feature embeddings, and thus effectively doubles as an efficient cross-modal tool to explore and search through datasets.

Multimedia

DIPPAS: A Deep Image Prior PRNU Anonymization Scheme

Source device identification is an important topic in image forensics since it allows to trace back the origin of an image. Its forensics counter-part is source device anonymization, that is, to mask any trace on the image that can be useful for identifying the source device. A typical trace exploited for source device identification is the Photo Response Non-Uniformity (PRNU), a noise pattern left by the device on the acquired images. In this paper, we devise a methodology for suppressing such a trace from natural images without significant impact on image quality. Specifically, we turn PRNU anonymization into an optimization problem in a Deep Image Prior (DIP) framework. In a nutshell, a Convolutional Neural Network (CNN) acts as generator and returns an image that is anonymized with respect to the source PRNU, still maintaining high visual quality. With respect to widely-adopted deep learning paradigms, our proposed CNN is not trained on a set of input-target pairs of images. Instead, it is optimized to reconstruct the PRNU-free image from the original image under analysis itself. This makes the approach particularly suitable in scenarios where large heterogeneous databases are analyzed and prevents any problem due to lack of generalization. Through numerical examples on publicly available datasets, we prove our methodology to be effective compared to state-of-the-art techniques.

Multimedia

DRST: Deep Residual Shearlet Transform for Densely Sampled Light Field Reconstruction

The Image-Based Rendering (IBR) approach using Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction. The ST-based DSLF reconstruction typically relies on an iterative thresholding algorithm for Epipolar-Plane Image (EPI) sparse regularization in shearlet domain, involving dozens of transformations between image domain and shearlet domain, which are in general time-consuming. To overcome this limitation, a novel learning-based ST approach, referred to as Deep Residual Shearlet Transform (DRST), is proposed in this paper. Specifically, for an input sparsely-sampled EPI, DRST employs a deep fully Convolutional Neural Network (CNN) to predict the residuals of the shearlet coefficients in shearlet domain in order to reconstruct a densely-sampled EPI in image domain. The DRST network is trained on synthetic Sparsely-Sampled Light Field (SSLF) data only by leveraging elaborately-designed masks. Experimental results on three challenging real-world light field evaluation datasets with varying moderate disparity ranges (8 - 16 pixels) demonstrate the superiority of the proposed learning-based DRST approach over the non-learning-based ST method for DSLF reconstruction. Moreover, DRST provides a 2.4x speedup over ST, at least.

Multimedia

DWT-GBT-SVD-based Robust Speech Steganography

Steganography is a method that can improve network security and make communications safer. In this method, a secret message is hidden in content like audio signals that should not be perceptible by listening to the audio or seeing the signal waves. Also, it should be robust against different common attacks such as noise and compression. In this paper, we propose a new speech steganography method based on a combination of Discrete Wavelet Transform, Graph-based Transform, and Singular Value Decomposition (SVD). In this method, we first find voiced frames based on energy and zero-crossing counts of the frames and then embed a binary message into voiced frames. Experimental results on the NOIZEUS database show that the proposed method is imperceptible and also robust against Gaussian noise, re-sampling, re-quantization, high pass filter, and low pass filter. Also, it is robust against MP3 compression and scaling for watermarking applications.

Multimedia

Data Driven Analysis of Tiny Touchscreen Performance with MicroJam

The widespread adoption of mobile devices, such as smartphones and tablets, has made touchscreens a common interface for musical performance. New mobile musical instruments have been designed that embrace collaborative creation and that explore the affordances of mobile devices, as well as their constraints. While these have been investigated from design and user experience perspectives, there is little examination of the performers' musical outputs. In this work, we introduce a constrained touchscreen performance app, MicroJam, designed to enable collaboration between performers, and engage in a novel data-driven analysis of more than 1600 performances using the app. MicroJam constrains performances to five seconds, and emphasises frequent and casual music making through a social media-inspired interface. Performers collaborate by replying to performances, adding new musical layers that are played back at the same time. Our analysis shows that users tend to focus on the centre and diagonals of the touchscreen area, and tend to swirl or swipe rather than tap. We also observe that while long swipes dominate the visual appearance of performances, the majority of interactions are short with limited expressive possibilities. Our findings are summarised into a set of design recommendations for MicroJam and other touchscreen apps for social musical interaction.

Multimedia

Data hiding in speech signal using steganography and encryption

Data privacy and data security are always on highest priority in the world. We need a reliable method to encrypt the data so that it reaches the destination safely. Encryption is a simple yet effective way to protect our data while transmitting it to a destination. The proposed method has state of art technology of steganography and encryption. This paper puts forward a different approach for data hiding in speech signals. A ten-digit number within speech signal using audio steganography and encrypting it with a unique key for better security. At the receiver end the same unique key is used to decrypt the received signal and then hidden numbers are extracted. The proposed approach performance can be evaluated by PSNR, MSE, SSIM and bit-error rate. The simulation results give better performance compared to existing approach.

Multimedia

Deep Convolutional Neural Network for Identifying Seam-Carving Forgery

Seam carving is a representative content-aware image retargeting approach to adjust the size of an image while preserving its visually prominent content. To maintain visually important content, seam-carving algorithms first calculate the connected path of pixels, referred to as the seam, according to a defined cost function and then adjust the size of an image by removing and duplicating repeatedly calculated seams. Seam carving is actively exploited to overcome diversity in the resolution of images between applications and devices; hence, detecting the distortion caused by seam carving has become important in image forensics. In this paper, we propose a convolutional neural network (CNN)-based approach to classifying seam-carving-based image retargeting for reduction and expansion. To attain the ability to learn low-level features, we designed a CNN architecture comprising five types of network blocks specialized for capturing subtle signals. An ensemble module is further adopted to both enhance performance and comprehensively analyze the features in the local areas of the given image. To validate the effectiveness of our work, extensive experiments based on various CNN-based baselines were conducted. Compared to the baselines, our work exhibits state-of-the-art performance in terms of three-class classification (original, seam inserted, and seam removed). In addition, our model with the ensemble module is robust for various unseen cases. The experimental results also demonstrate that our method can be applied to localize both seam-removed and seam-inserted areas.

Multimedia

Deep Learning-Based Video Coding: A Review and A Case Study

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF) and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help improve the compression efficiency by a significant margin. With the two deep tools as well as other non-deep coding tools, DLVC is able to achieve on average 39.6\% and 33.0\% bits saving than HEVC, under random-access and low-delay configurations, respectively. The source code of DLVC has been released for future researches.

Multimedia

Deep Learning-based Concept Detection in vitrivr at the Video Browser Showdown 2019 - Final Notes

This paper presents an after-the-fact summary of the participation of the vitrivr system to the 2019 Video Browser Showdown. Analogously to last year's report, the focus of this paper lies on additions made since the original publication and the system's performance during the competition.

Ready to get started?

Join us today

Archive Your Research