Featured Researches

Multimedia

A nonlinear transform based analog video transmission framework

Soft-cast, a cross-layer design for wireless video transmission, is proposed to solve the drawbacks of digital video transmission: threshold transmission framework achieving the same effect. Specifically, in encoder, we carry out power allocation on the transformed coefficients and encode the coefficients based on the new formulation of power distortion. In decoder, the process of LLSE estimator is also improved. Accompanied with the inverse nonlinear transform, DCT coefficients can be recovered depending on the scaling factors , LLSE estimator coefficients and metadata. Experiment results show that our proposed framework outperforms the Soft-cast in PSNR 1.08 dB and the MSSIM gain reaches to 2.35% when transmitting under the same bandwidth and total power.

Read more
Multimedia

A practical convolutional neural network as loop filter for intra frame

Loop filters are used in video coding to remove artifacts or improve performance. Recent advances in deploying convolutional neural network (CNN) to replace traditional loop filters show large gains but with problems for practical application. First, different model is used for frames encoded with different quantization parameter (QP), respectively. It is expensive for hardware. Second, float points operation in CNN leads to inconsistency between encoding and decoding across different platforms. Third, redundancy within CNN model consumes precious computational resources. This paper proposes a CNN as the loop filter for intra frames and proposes a scheme to solve the above problems. It aims to design a single CNN model with low redundancy to adapt to decoded frames with different qualities and ensure consistency. To adapt to reconstructions with different qualities, both reconstruction and QP are taken as inputs. After training, the obtained model is compressed to reduce redundancy. To ensure consistency, dynamic fixed points (DFP) are adopted in testing CNN. Parameters in the compressed model are first quantized to DFP and then used for inference of CNN. Outputs of each layer in CNN are computed by DFP operations. Experimental results on JEM 7.0 report 3.14%, 5.21%, 6.28% BD-rate savings for luma and two chroma components with all intra configuration when replacing all traditional filters.

Read more
Multimedia

A steganographic approach based on the chaotic fractional map and in the DCT domain

A steganographic method based on the chaotic fractional map and in the DCT domain is proposed. This method embeds a secret message in some high frequency coefficients of the image using a 128-bit private key and a chaotic fractional map which generate a permutation indicating the positions where the secret bits will be embedded. An experimental work on the validation of the proposed method is also presented, showing performance in imperceptibility, quality, similarity and security analysis of the steganographic system. The proposed algorithm improved the level of imperceptibility and Cachin's security of stego-system analyzed through the values of Peak Signal-to-Noise Ratio (PSNR) and the Relative Entropy (RE).

Read more
Multimedia

A study for Image compression using Re-Pair algorithm

The compression is an important topic in computer science which allows we to storage more amount of data on our data storage. There are several techniques to compress any file. In this manuscript will be described the most important algorithm to compress images such as JPEG and it will be compared with another method to retrieve good reason to not use this method on images. So to compress the text the most encoding technique known is the Huffman Encoding which it will be explained in exhaustive way. In this manuscript will showed how to compute a text compression method on images in particular the method and the reason to choice a determinate image format against the other. The method studied and analyzed in this manuscript is the Re-Pair algorithm which is purely for grammatical context to be compress. At the and it will be showed the good result of this application.

Read more
Multimedia

A survey of comics research in computer science

Graphical novels such as comics and mangas are well known all over the world. The digital transition started to change the way people are reading comics, more and more on smartphones and tablets and less and less on paper. In the recent years, a wide variety of research about comics has been proposed and might change the way comics are created, distributed and read in future years. Early work focuses on low level document image analysis: indeed comic books are complex, they contains text, drawings, balloon, panels, onomatopoeia, etc. Different fields of computer science covered research about user interaction and content generation such as multimedia, artificial intelligence, human-computer interaction, etc. with different sets of values. We propose in this paper to review the previous research about comics in computer science, to state what have been done and to give some insights about the main outlooks.

Read more
Multimedia

A user model for JND-based video quality assessment: theory and applications

The video quality assessment (VQA) technology has attracted a lot of attention in recent years due to an increasing demand of video streaming services. Existing VQA methods are designed to predict video quality in terms of the mean opinion score (MOS) calibrated by humans in subjective experiments. However, they cannot predict the satisfied user ratio (SUR) of an aggregated viewer group. Furthermore, they provide little guidance to video coding parameter selection, e.g. the Quantization Parameter (QP) of a set of consecutive frames, in practical video streaming services. To overcome these shortcomings, the just-noticeable-difference (JND) based VQA methodology has been proposed as an alternative. It is observed experimentally that the JND location is a normally distributed random variable. In this work, we explain this distribution by proposing a user model that takes both subject variabilities and content variabilities into account. This model is built upon user's capability to discern the quality difference between video clips encoded with different QPs. Moreover, it analyzes video content characteristics to account for inter-content variability. The proposed user model is validated on the data collected in the VideoSet. It is demonstrated that the model is flexible to predict SUR distribution of a specific user group.

Read more
Multimedia

AMP: Authentication of Media via Provenance

Advances in graphics and machine learning have led to the general availability of easy-to-use tools for modifying and synthesizing media. The proliferation of these tools threatens to cast doubt on the veracity of all media. One approach to thwarting the flow of fake media is to detect modified or synthesized media through machine learning methods. While detection may help in the short term, we believe that it is destined to fail as the quality of fake media generation continues to improve. Soon, neither humans nor algorithms will be able to reliably distinguish fake versus real content. Thus, pipelines for assuring the source and integrity of media will be required---and increasingly relied upon. We propose AMP, a system that ensures the authentication of media via certifying provenance. AMP creates one or more publisher-signed manifests for a media instance uploaded by a content provider. These manifests are stored in a database allowing fast lookup from applications such as browsers. For reference, the manifests are also registered and signed by a permissioned ledger, implemented using the Confidential Consortium Framework (CCF). CCF employs both software and hardware techniques to ensure the integrity and transparency of all registered manifests. AMP, through its use of CCF, enables a consortium of media providers to govern the service while making all its operations auditable. The authenticity of the media can be communicated to the user via visual elements in the browser, indicating that an AMP manifest has been successfully located and verified.

Read more
Multimedia

ART-UP: A Novel Method for Generating Scanning-robust Aesthetic QR codes

QR codes are usually scanned in different environments, so they must be robust to variations in illumination, scale, coverage, and camera angles. Aesthetic QR codes improve the visual quality, but subtle changes in their appearance may cause scanning failure. In this paper, a new method to generate scanning-robust aesthetic QR codes is proposed, which is based on a module-based scanning probability estimation model that can effectively balance the tradeoff between visual quality and scanning robustness. Our method locally adjusts the luminance of each module by estimating the probability of successful sampling. The approach adopts the hierarchical, coarse-to-fine strategy to enhance the visual quality of aesthetic QR codes, which sequentially generate the following three codes: a binary aesthetic QR code, a grayscale aesthetic QR code, and the final color aesthetic QR code. Our approach also can be used to create QR codes with different visual styles by adjusting some initialization parameters. User surveys and decoding experiments were adopted for evaluating our method compared with state-of-the-art algorithms, which indicates that the proposed approach has excellent performance in terms of both visual quality and scanning robustness.

Read more
Multimedia

ASMD: an automatic framework for compiling multimodal datasets with audio and scores

This paper describes an open-source Python framework for handling datasets for music processing tasks, built with the aim of improving the reproducibility of research projects in music computing and assessing the generalization abilities of machine learning models. The framework enables the automatic download and installation of several commonly used datasets for multimodal music processing. Specifically, we provide a Python API to access the datasets through Boolean set operations based on particular attributes, such as intersections and unions of composers, instruments, and so on. The framework is designed to ease the inclusion of new datasets and the respective ground-truth annotations so that one can build, convert, and extend one's own collection as well as distribute it by means of a compliant format to take advantage of the API. All code and ground-truth are released under suitable open licenses.

Read more
Multimedia

Accessibility in 360-degree video players

Any media experience must be fully inclusive and accessible to all users regardless of their ability. With the current trend towards immersive experiences, such as Virtual Reality (VR) and 360-degree video, it becomes key that these environments are adapted to be fully accessible. However, until recently the focus has been mostly on adapting the existing techniques to fit immersive displays, rather than considering new approaches for accessibility designed specifically for these increasingly relevant media experiences. This paper surveys a wide range of 360-degree video players and examines the features they include for dealing with accessibility, such as Subtitles, Audio Description, Sign Language, User Interfaces, and other interaction features, like voice control and support for multi-screen scenarios. These features have been chosen based on guidelines from standardization contributions, like in the World Wide Web Consortium (W3C) and the International Communication Union (ITU), and from research contributions for making 360-degree video consumption experiences accessible. The in-depth analysis has been part of a research effort towards the development of a fully inclusive and accessible 360-degree video player. The paper concludes by discussing how the newly developed player has gone above and beyond the existing solutions and guidelines, by providing accessibility features that meet the expectations for a widely used immersive medium, like 360-degree video.

Read more

Ready to get started?

Join us today