From pixels to notes: a computational implementation of synaesthesia for cultural artefacts
FFrom pixels to notes: a computational implementation ofsynaesthesia for cultural artefacts
Dimitrios Kritikos, Kostas Karpouzis [email protected] Intelligence and Learning Systems Lab,National Technical University of AthensAthens, Greece
ABSTRACT
Synaesthesia is a condition that enables people to sense informationin the form of several senses at once. This work describes a Pythonimplementation of a simulation of synaesthesia between listeningto music and viewing a painting. Based on Scriabin’s definition,we developed a deterministic process to produce a melody afterprocessing a painting, mimicking the production of notes fromcolours in the field of view of persons experiencing synaesthesia.
CCS CONCEPTS • Human-centered computing ; •
Applied computing → Digi-tal libraries and archives ; Media arts ; •
Social and professionaltopics → Cultural characteristics ; KEYWORDS cultural experiences, synaesthesia, painting, music, multimediaexperience
ACM Reference Format:
Dimitrios Kritikos, Kostas Karpouzis. 2020. From pixels to notes: a computa-tional implementation of synaesthesia for cultural artefacts. In
AVI2CH ’20:Workshop on Advanced Visual Interfaces and Interactions in Cultural Heritage,Ischia, Italy, Sept 28 - October 2, 2020.
ACM, New York, NY, USA, 3 pages.https://doi.org/10.1145/1122445.1122456
Synaesthesia [7] is a neurological condition that enables the brainto process and sense information in the form of several senses atonce, despite experiencing only one or some of them. For example,a person with synaesthesia may hear sounds while also visualizingthem as colours. Synaesthesia may be encountered in many forms,the most usual being chromaesthesia [8], where a person interpretsa music or sound signal as colours. Other forms include: • Lexical-gustatory [9], where hearing words is accompaniedby the sense of certain tastes
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
AVI2CH ’20, Sept 28 - October 2, 2020, Ischia, Italy © 2020 Association for Computing Machinery.ACM ISBN 123...$00.00https://doi.org/10.1145/1122445.1122456 • Mirror-touch [10], where people may sense being touchedmerely being watching other people touching parts of theirbody • Grapheme-color [11], where people associate letters andnumbers with colours, with each number corresponding toa different colour for different persons • Number-form, where people associate numbers with specificshapes or arrangements in 3D space, and • Personification [12], where people perceive letters, numbers,days or anything that can be arranged as a sequence as entitywith its own personalityIn general, synaesthesia may also be correlated with emotions,which makes sense since emotions ([16], [17]) are triggered in spe-cific areas of the brain [21], which may also participate in processesrelated to synaesthesia. According to Simner [13], synaesthetes com-prise about 4% of the general population; that percentage seems tobe higher among artists of any kind. Synaesthesia is also probablyhereditary [14] with more than 40% of synaesthetes having a first-degree relative experiencing the same condition. Emotion-awaremapping ([24]) or matching of music to images usually takes placeat a higher level than pixels, i.e. by incorporating shapes, placesor events ([25], [26]) included in an image or music genres in theprocess.This work aims to simulate the form of synaesthesia which hap-pens when individuals are listening to music and viewing a painting.This combination is interesting, since both media are essentiallybased on receiving and experiencing waves, in terms of perception([27]): sound waves collected by the human ear and light wavesfalling on cones and rods in the human eye and ultimately beingtranslated to colours [20]. There have been a number of compu-tational approaches regarding the transformation of an image toan audio file, most notably PixelSynth [22] and Spectrogram audioplayer (SAP) [23]. Pixelsynth was created by artist and coder OliviaJack and works with monochrome images by mapping luminosityto note; its web-based implementation offers a limited selection ofinteractive tools (e.g. image rotation) for users to change the resultof the conversion. SAP utilises the concept of a spectrogram, i.e. avisual representation of the spectrum of frequencies of sound orother signal as they vary with time, and allows users to change thelength of the output audio file or sampling density of the image.Both of the above-mentioned implementations offer an insightto the world of synaesthesia, but the mapping they utilise is eitherarbitrary or based on the spectral (and not visual) representationof the image, leading to different properties of the colours beingused. Based on theoretical approaches of synaesthesia, as well asmappings described by a prominent synaesthete, we developed a r X i v : . [ c s . H C ] J a n VI2CH ’20, Sept 28 - October 2, 2020, Ischia, Italy Kritikos and Karpouzis a deterministic process to produce a melody when processing apainting, mimicking the production of notes from colours in thefield of view of a person experiencing synaesthesia. Our purposewas to investigate whether different painting styles and objectarrangements in paintings and drawings would produce similarlyarranged music melodies. Our implementation was coded in Python,and typically takes around 5 seconds for a 1920x1080 image on ani7 laptop computer.
Designing and implementing a system which mimics synaesthesiabetween images and music starts with identifying a correlationbetween each color and a note. Since we cannot fully comprehendthe process which takes place in a synaesthete’s brain, our best betis to formalize their accounts of how they perceive the connectionbetween colors and notes.A well-known mapping of image colours to music notes is basedon the work of Alexander Scriabin, a Russian composer who alsoauthored an index of colour representations between music notes,based on his own perception of synaesthesia. In this work, we willbuild upon this mapping to transform a painting into a sequence ofmusic notes, incorporating higher-level concepts, such as colour andmusic harmony, in the process. Besides his work in music, Scriabinproduced an index which maps notes to colors, according to hisown perception. This elementary mapping (see Figure 1) is basedon elementary, saturated color tones and needs to be extended inorder to capture the variations found in paintings or photographs.
Figure 1: Scriabin’s mapping of notes to colors
A more flexible color representation utilizes 3 dimensions on theform of a color cone, where the two dimensions of the base describethe color hue (the actual color information, usually depicted with alabel, e.g. ‘green’ or ‘red’) and the saturation (how clear a color isor, conversely, how much white or black has been used to ‘wash itout’) and the third dimension of the cone describing the amount oflight used (how bright the color is). This representation, which is quite close to how humans perceive and verbally describe color, canthen be mapped to the usual RGB color encoding, found in digitalrepresentations of paintings (files or computer screens). In additionto this, and to further enhance the variety of the notes producedwhen processing a digital image, we also take into account colorharmonies [15]. There are 6 main classes of color harmonies: • Complementary colors, i.e. colors at opposite sides of thecolor circle (or the base of the color cone) • Analogous colors, i.e. neighboring colors which differ by 30degrees on each side (left and right on the color circle) • Split-complementary colors, a triad consisting of two analo-gous colors and the complementary of one of them • Triad colors, which form an isosceles triangle (differ by 60degrees) • Tetradic colors, formed by two pairs of complementary col-ors, and • Square, where colors differ by 90 degrees, forming a squareon the color circle or the base of the color coneIn our implementation, color harmonies are recognized by process-ing the image and then mapped to music chords to provide a richertune. We also take into account the mean luminosity of each imagesegment: if the segment is darker than the mean luminosity of theimage, we utilise a minor chord, while for a brighter segment, thechord used in in major. This is in line with our general perceptionof melancholic music being correlated with darker colors.
Beginning by segmenting the image from left to right, the tempoof the sequence is calculated upon the saturation of colours ineach segment: a slow tempo (around 75 beats per minute – bpm) isthe result of low saturation colours, while higher saturation mayproduce tempos up to 160 bpm.Following from that, our algorithm chooses which notes to playfor the particular image segment, taking into account the percentageof pixels in the segment which correspond to each of the primarycolours, as per Scriabin’s definition: if one of those colours is foundin more than 5% of the segment pixels, then the correspondingnote is included in the music sequence. In the same framework,the volume for each note is calculated with respect to the meanluminosity of the relevant pixels, while its value (i.e. duration) isbased on the variety of the colours in the segment: the richer incolours, the shorter each note in the music sequence.
Figure 2 shows the results from analyzing a complete image, while3 shows how individual parts (segments) of an image can be pro-cessed.The overall pipeline is summarized in the following sequence:(1) Our implementation starts with downsizing the input image(painting or photo); this step has limited effect on the actualcolor harmonies and richness, but results in a much fasterimplementation(2) Then, the image is filtered to eliminate very bright or verydark pixels, which do not contribute to the mapping process,since they carry very little color information rom pixels to notes: a computational implementation of synaesthesia for cultural artefacts AVI2CH ’20, Sept 28 - October 2, 2020, Ischia, Italy
Figure 2: Results from analyzing a complete imageFigure 3: Results from analyzing an image segment (3) The remaining pixels are mapped to each of the 12 segmentsof the color model; the algorithm only takes into accountparts of the color model with more than 5% of the imagepixels(4) Participating colors are then ranked according to number ofpixels in the image and checked for color harmonies(5) The color information is then mapped to notes, each witha different volume, depending on the average saturation ofthe respective pixels: the clearer a color is, the higher thevolume of the respective note(6) Chords are calculated and possibly transformed into majoror minor, according to the luminosity of the segment(7) Finally, the melody is composed into a MIDI sequenceThis process yields a calm melody when image colors are softerand an uptempo (faster) melody for images with greater color va-riety. As mentioned before, darker colors result to minor chords,which matches the general conception of an imposing melody.
This paper describes the implementation of a system which pro-duces a music melody based on the colors (values and harmonies)of a painting or photo, simulating synaesthesia. This algorithm canbe used to design audiovisual cultural experiences or interactiveapplications [18]. The mapping process is based on input from aprominent synaesthete, who was also a composer; despite the in-herent subjectiveness of the color-to-note mapping, the algorithm results in melodies which conform to our general conception ofhow different color values correspond to faster vs. slower melodiesor richer vs simpler chords.