Decoding the Style and Bias of Song Lyrics
DDecoding the Style and Bias of Song Lyrics
Manash Pratim Barman
Indian Institute of InformationTechnologyGuwahati, India
Amit Awekar
Indian Institute of TechnologyGuwahati, [email protected]
Sambhav Kothari
Bloomberg LPLondon, United [email protected]
ABSTRACT
The central idea of this paper is to gain a deeper understandingof song lyrics computationally. We focus on two aspects: styleand biases of song lyrics. All prior works to understand these twoaspects are limited to manual analysis of a small corpus of songlyrics. In contrast, we analyzed more than half a million songsspread over five decades. We characterize the lyrics style in termsof vocabulary, length, repetitiveness, speed, and readability. Wehave observed that the style of popular songs significantly differsfrom other songs. We have used distributed representation methodsand WEAT test to measure various gender and racial biases in thesong lyrics. We have observed that biases in song lyrics correlatewith prior results on human subjects. This correlation indicatesthat song lyrics reflect the biases that exist in society. Increasingconsumption of music and the effect of lyrics on human emotionsmakes this analysis important.
KEYWORDS
Text Mining, NLP Applications, Distributed Representation
ACM Reference Format:
Manash Pratim Barman, Amit Awekar, and Sambhav Kothari. 2019. Decod-ing the Style and Bias of Song Lyrics. In
Proceedings of the 42nd InternationalACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’19), July 21–25, 2019, Paris, France.
ACM, New York, NY, USA, 5 pages.https://doi.org/10.1145/3331184.3331363
Music is an integral part of our culture. Smartphones and nearubiquitous availability of the internet have resulted in dramaticgrowth of online music consumption [17]. More than 85% of onlinemusic subscribers search for song lyrics, indicating a keen interestof people in song lyrics. Song lyrics have a significant impact onhuman emotions and behavior. While listening to songs with violentlyrics can increase aggressive thoughts and hostile feelings [1],listening to songs with pro-social lyrics increased the accessibilityof pro-social thoughts, led to more interpersonal empathy, andfostered helping behavior [10].This paper is motivated by the observation that song lyrics havenot received enough attention from the research community tounderstand them computationally. Several works focus on related
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
SIGIR ’19, July 21–25, 2019, Paris, France © 2019 Association for Computing Machinery.ACM ISBN 978-1-4503-6172-9/19/07...$15.00https://doi.org/10.1145/3331184.3331363 problems such as personalized music recommendation [13], songpopularity prediction, and genre identification [8].However, priorworks on style analysis and bias measurement are limited only tothe manual analysis of a few hundred songs. Such an approachcannot scale to the analysis of millions of songs. These works alsofail to capitalize on recent advances in computational methods thatgenerate a semantic representation for natural language text. Ourwork aims to fill this gap by applying these methods on a largecorpus of song lyrics scraped from online user-generated content.There are two main takeaway results from our work. First, pop-ular songs significantly differ from other songs when it comes tothe style of lyrics. This difference indicates that lyrics play a majorrole in deciding the popularity of a song. Second, biases in songlyrics correlate with biases measured in humans. This correlationindicates that song lyrics reflect the existing biases in society. Tothe best of our knowledge, this is the first work that analyzes songlyrics at the scale of half million lyrics to understand style andbias. Our results are reproducible as all our code and datasets areavailable publicly on the Web . We briefly review the related workin Section 2. Our results are presented in Sections 3 and 4. Ourconclusion and future work are highlighted in Section 5. Song lyrics have been used for many tasks related to music min-ing such as genre identification and popularity prediction. Earlierworks considered lyrics as a weak source of song characteristicsas compared to auditory or social features. However, recent workshave shown the strength of lyrics for music mining. Barman et al.have shown that knowledge encoded in lyrics can be utilized toimprove the distributed representation of words [2]. Mayer et al.have introduced various features for lyrics processing [15]. Fell andSporleder presented a lyrics-based analysis of songs based on vocab-ulary and song structure [8]. Our work complements these worksby characterizing lyrics style using multiple attributes extractedfrom lyrics.Many studies have analyzed gender and racial biases in songlyrics [14, 18]. However, such an approach of manual analysis can-not scale to millions of songs. Caliskan et al. proposed Word Embed-ding Association Test (WEAT) to computationally measure biasesin any text repository [5]. Their test quantifies biases by computingsimilarity scores between various sets of words. To compute sim-ilarity, the WEAT test represents words using a distributed wordrepresentation method such as fastText or word2vec [4, 16]. Weapply the WEAT test on song lyrics and discuss its implications. https://github.com/manashpratim/Decoding-the-Style-and-Bias-of-Song-Lyrics a r X i v : . [ c s . I R ] J u l IGIR ’19, July 21–25, 2019, Paris, France Manash Pratim Barman, Amit Awekar, and Sambhav Kothari (a) Top 100 Words in 1965 (b) Top 100 Words in 2015
Figure 1: Comparison of top words used . Given any setof words, this tool plots the relative popularity of those words inpopular song lyrics through various years. Please refer to Figure 2.This figure compares popularity of words “rock" and “blues" overthe period from 1965 to 2015. For a given year Y , lower value ofrank for a word W indicates more frequent use of that word W inpopular song lyrics of year Y . We can observe that word “rock" hasmaintained its popularity as compared to word “blues".We also looked at the usage of swear words. To compile the list ofswear words, we used various online resources. This list is availablealong with our code. Please refer to Figure 3. This graph showsa comparison between popular (BB) and other (MSD) song lyricsbased on swear word usage over the period from 1965 to 2015. Wecan observe that other songs have steadily gained the swear wordusage over this period. From 1965 to 1995, popular songs used fewerswear words as compared to the other songs. However, from the1990s there is a persistent trend of increasing swear word usage inpopular songs. As compared to 1980s, swear words are used almostten times more frequently now in popular song lyrics. Multiple https://tiny.cc/songlyrics Figure 2: Year-wise rank comparison studies have reported adverse effects of inappropriate content inmusic on the listeners [11].We also measured the length of song lyrics as the number ofwords in the song. Please refer to Figure 4a. Other songs haveshown a steady increase in length from 1965 to 2015. Popular songsalso showed similar trend till 1980. However, since then popularsongs are significantly more lengthy than other songs. Please re-fer to Figure 4b. This figure depicts the average duration of songsper year measured in seconds. Both poplar and other songs rosein duration till 1980. Since then, other songs have maintained theaverage duration of about 240 seconds. In contrast, popular songswere of longer duration from 1980 to 2010. However, the currenttrend shows far reduced duration for popular songs. Please refer toFigure 4c. This graph compares popular and other songs based onspeed. We measure the speed of a song as (length in words/durationin seconds). We can observe that other songs have maintained anaverage speed of around 0.6 words per second. However, popularsongs were comparatively slower till 1990 and since have becomesignificantly faster than other songs. Some studies have reportedthat repetitive songs are lyrically processed more fluently and lis-teners prefer such songs [6]. We computed repetitiveness of a songlyric as ((1- (number of unique lines/total number of lines))*100).Please refer to Figure 5. Except for the period from 1990 to 2000,popular songs are more repetitive than other songs.Readability tests are standardized tests designed to provide ameasure indicating how difficult it is to read or understand a givenEnglish text. We applied the well known Flesch Kincaid Readability
Figure 3: Comparison of swear word usage ecoding the Style and Bias of Song Lyrics SIGIR ’19, July 21–25, 2019, Paris, France (a) Year-wise Average Length (b) Year-wise Average Duration (c) Year-wise Average Speed
Figure 4: Comparison of Length, Duration, and Speed of songs
Test (FK) on the song lyrics [12]. The FK test returns a numberthat directly relates to the US Education grade system while alsoindicating the number of years of education required to understandthe given English text. For example, an FK score of 5 indicates thatanybody educated up to 5th grade can read and understand thegiven English text. Please refer to Figure 6. It can be seen that theFK scores of popular songs have always been less than 2. Also, theFK scores of other songs have always been quite higher than thepopular songs. This difference indicates that popular songs havealways been easier to understand as compared to other songs.
Figure 5: Comparison of repetitivenessFigure 6: Comparison of FK score
We humans have certain biases in our thinking. For example, somepeople can find flower names more pleasant and insect names moreunpleasant. These biases reflect in our various activities such aspolitics, movies, and song lyrics as well. Implicit Association Test(IAT) is a well-known test designed to measure such biases in humanbeings [9]. This test involves two sets of attribute words and two setsof target words. For example, consider two attribute word sets aspleasant words (nice, beautiful) and unpleasant words (dirty, awful).Also, consider two target word sets as flower names (rose, daffodil)and insect names (gnat, cockroach). The null hypothesis is thatthere should be little to no difference between the two sets of targetwords when we measure their similarity with the attribute word sets.The IAT test measures the unlikelihood of the null hypothesis byevaluating the effect size. A positive value of the IAT test indicatesthat people are biased to associate first attribute word set to the firsttarget word set (Bias: flowers are pleasant) and second attributeword set with second target word set (Bias: insects are unpleasant).A negative value of the effect size indicates the bias in the otherdirection, that is flowers are unpleasant, and insects are pleasant.Larger magnitude of effect size indicates a stronger bias. If the valueof effect size is closer to zero, then it indicates slight or no bias.Caliskan et al. designed the Word Embedding Association Test(WEAT) by tweaking the IAT test [5]. Similar to the IAT test, thistest can measure bias given the sets of attribute and target words.However, the IAT test requires human subjects to compute the biasvalue. On the other hand, the WEAT test can compute the bias valueusing a large text repository, and it does not require human subjects.The WEAT test represents attribute and target words as vectorsusing distributed representation methods such as word2vec andfastText [4, 16]. The WEAT test computes the similarity betweenwords using the cosine similarity. Caliskan et al. have performedthe bias measurement on a large internet crawl text corpus usingthe WEAT test. They have shown that their results correlate withthe IAT tests conducted with human subjects.We applied the WEAT test on our song lyrics dataset. Due tothe small size of popular songs dataset, we cannot apply the WEATtest separately on popular songs lyrics. Please refer to Table 1.Corresponding to eight rows of the table, we have measured eightbiases. We borrowed these attribute and target word sets fromCaliskan et al. [5]. First two columns (w2v and FT) correspondto measurements on the song lyrics dataset with word2vec and
IGIR ’19, July 21–25, 2019, Paris, France Manash Pratim Barman, Amit Awekar, and Sambhav Kothari
Table 1: Comparison of Effect SizeTest no. Target Words Attribute Words w2v FT CA IAT
We have analyzed over half a million lyrics to understand the styleand prevalent biases. As compared to other songs, we have observedthat popular songs have several distinguishing characteristics thatcan be expressed in terms of the style of lyrics. Lyrics can capture human biases quite accurately. This work can be extended furtherby investigating music genre-specific style and biases.
REFERENCES [1] Craig A. Anderson, Nicholas L. Carnagey, and Janie Eubanks. 2003. Exposure toviolent media: The effects of songs with violent lyrics on aggressive thoughtsand feelings.
Journal of Personality and Social Psychology (2003), 960–971.[2] Manash Pratim Barman, Kavish Dahekar, Abhinav Anshuman, and Amit Awekar.2019. It’s Only Words And Words Are All I Have.
To Appear at the EuropeanConference on Information Retrieval (ECIR) 2019 (2019).[3] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. 2011. The million songDataset. In
Proceedings of the 12th International Conference on Music InformationRetrieval (ISMIR 2011) .[4] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. En-riching Word Vectors with Subword Information.
Transactions of the Associationfor Computational Linguistics
Science
Journal of Consumer Psychology
American Sociological Review
COLING .[9] Greenwald, Anthony G., McGhee, Debbie E., Schwartz, and Jordan L. K. 1998.Measuring individual differences in implicit cognition: The implicit associationtest.
Journal of Personality and Social Psychology
Journal of Experimental Social Psychology
Sexuality& Culture
DTIC Document, Tech. Rep. (1975).[13] Peter Knees and Markus Schedl. 2013. A Survey of Music Similarity and Recom-mendation from Music Context Data.
ACM Trans. Multimedia Comput. Commun.Appl.
10, 1, Article 2 (Dec. 2013), 21 pages. https://doi.org/10.1145/2542205.2542206[14] Judy C. Koskei, Margaret Barasa, and Beatrice Manyasi. 2018. STEREOTYPICALPORTRAYAL OF WOMEN IN KIPSIGIS SECULAR SONGS.
European Journal ofLiterature, Language and Linguistics Studies
In Proceedings of InternationalConference on Music Information Retrieval (ISMIR) (June 2008), 337–342.[16] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.Distributed Representations of Words and Phrases and their Compositionality.In
Advances in Neural Information Processing Systems ecoding the Style and Bias of Song Lyrics SIGIR ’19, July 21–25, 2019, Paris, France
Sex Roles
76 (2016), 188–201. [19] E. Sapir. 1985. Selected writings of Edward Sapir in language, culture and per-sonality.