IEEE Transactions on Multimedia | 2021

A Hierarchical Visual Feature-Based Approach For Image Sonification

 
 

Abstract


This paper presents a new image sonification system that strives to help visually impaired users access visual information via an audio (easily decodable) signal that is generated in real time when the users explore the image on a touch screen or with a pointer. The sonified signal, which is generated for each position within the image, tries to capture the most useful and discriminant local information about the image content at different levels of abstraction, ranging from low-level (at the pixel level) to high-level (segmentation) and combining low-level (color edges and texture), mid-level and high-level (gradient or color distribution for each region of the image) features. The proposed system mainly uses musical notes at several octaves, the notion of timbre, and loudness but also uses pitch, rhythm and the distortion effect in an intuitive way to sonify the image content both locally and globally. To this end, we use perceptually meaningful mappings, in which the properties of an image are directly reflected in the audio domain, in a very predictable way. The listener can then draw simple and reliable conclusions about the image by quickly decoding the sonified result.

Volume 23
Pages 706-715
DOI 10.1109/TMM.2020.2987710
Language English
Journal IEEE Transactions on Multimedia

Full Text