[PDF] Predicting the top and bottom ranks of billboard songs using Machine Learning

Abstract

The music industry is a $130 billion industry. Predicting whether a song catches the pulse of the audience impacts the industry. In this paper we analyze language inside the lyrics of the songs using several computational linguistic algorithms and predict whether a song would make to the top or bottom of the billboard rankings based on the language features. We trained and tested an SVM classifier with a radial kernel function on the linguistic features. Results indicate that we can classify whether a song belongs to top and bottom of the billboard charts with a precision of 0.76.

Full PDF

PPredicting the top and bottom ranks of billboard songs using Machine Learning

Vivek Datla and

Abhinav Vishnu

Paciﬁc Northwest National [email protected] and [email protected], WA 99354

Abstract

The music industry is a $130 billion industry. Predictingwhether a song catches the pulse of the audience impacts theindustry. In this paper we analyze language inside the lyricsof the songs using several computational linguistic algorithmsand predict whether a song would make to the top or bottomof the billboard rankings based on the language features.We trained and tested an SVM classiﬁer with a radial kernelfunction on the linguistic features. Results indicate that wecan classify whether a song belongs to top and bottom of thebillboard charts with a precision of 0.76.

Introduction

German philosopher Friedrich Nietzche famously said “without music, life would be a mistake” . In this digital age,we have access to a large collection of music composed atan amazing rate. iTunes music store alone offers 37 millionsongs, and has sold more than 25 billion songs worldwide.Every society has its version of music and popularity ofthe songs, and sometimes they transcend the societies aswell as continents. The 90´s era of pop and rock music wasdominated by artists such as Micheal Jackson, Sting, U2 andmany others. The whole generation of 90´s youth can imme-diately identify “Beat it!” a top song during that period.What makes a song catchy? The lyrics of the songs con-tain words that arouse several emotions such as anger, andlove, which tend to play an important role in humans likingthe songs. The liking of the songs does have not only a hu-man emotion aspect but also has a direct economic impacton the $130 billion music industry.The sales and evaluation of the songs directly impact themusic companies and a computational model that predictsthe popularity of a song is of great value for the music in-dustry. Identifying the potential of a song earlier gives anedge for the companies to purchase the songs at a lower cost.Also, an artist usually composes the music for a song afterthe lyrics are written. For an organization investing in a mu-sic album, it is a great ﬁnancial incentive to know whetherthe song would catch the pulse of the audience just basedon the lyrics even before the music album is composed, ascomposing music requires considerable resources.

Since songs are composed of several complex compo-nents such as lyrics, instrumental music, vocal and visualrenditions, the nature of a song itself is highly complex.Lyrics is the language component that ties up the vocal, mu-sic, and visual components. There needs to be harmony be-tween the components to produce a song. Songs have thepotential to lift our moods, make us shake a leg or move usto tears. They also help us relate to our experiences, by trig-gering several emotional responses.There has been a lot of work on genre classiﬁcation usingmachine learning. Researchers identify the category of thesongs based on the emotions such as sad, happy and party.All the songs tend to have an emotional component, but wesee very few songs that catch the people’s pulse and becomea hit.The research question addressed in the paper are as fol-lows: • Can machine learning models be trained on lyrics for pre-dicting the top and bottom ranked songs?In the current paper, we look at language features that helppredict whether a song belongs to a top or a bottom rankedcategory. To the best of our knowledge, this is the ﬁrst studyaddressing this problem.

Related Work

Language is a strong indicator of stresses and mood of aperson. Identifying these features has helped computationallinguists as well as computer scientists to correlate the lan-guage features with several complex problems arising in tu-toring systems (Rus et al., 2013; Graesser et al., 2005), affectrecognition(DMello et al., 2008), sentiment mining (Hu andLiu, 2004), opinion mining, and many others.Su, Fung, and Auguin (2013) implemented a multimodalmusic emotion classiﬁcation (MEC) for classifying 14 kindsof emotions from music and song lyrics of western mu-sic genre. Their dataset consisted of ˜3500 songs with emo-tions/mood such as sad, high, groovy, happy, lonely, sexy,energetic, romantic, angry, sleepy, nostalgic, funny, jazzy,and calm. They used AdaBoost with decision stumps forclassiﬁcation of the music and language features of the lyricsinto their respective emotion categories. They have an accu-racy of 0.78 using language as well as surface features of theaudio. The authors claim that the language features played a a r X i v : . [ c s . C L ] D ec ore important role compared to the music features in clas-siﬁcation.Laurier, Grivolla, and Herrera (2008) also indicatedthat the language features outperformed audio featuresfor music mood classiﬁcation. They have shown that lan-guage features extracted from the songs ﬁt well with Rus-sel’s valence(negative-positive) and arousal(inactive-active)model (Russell, 1980). Several cross-cultural studies showevidence for universal emotional cues in music and languageacross different cultures and traditions (McKay, 2002).While signiﬁcant advances have been made in the areaof emotion detection and mood classiﬁcation based on mu-sic and lyrics analysis, through large-scale machine learningoperating on vast feature sets, sometimes spanning multipledomains, applied to relatively short musical selections (Kimet al., 2010). Many times, these approaches help in identify-ing the genre and mood but do not reveal much in terms ofwhy a song is popular, or what features of the song made itcatch the pulse of the audience.Mihalcea and Strapparava (2012) used LIWC and surfacemusic components of all the phrases present in a small col-lection of songs as a dataset for identifying the emotions inthat phrase. Each of the phrases was annotated for emotions.Using SVM classiﬁer they obtained an accuracy of 0.87 us-ing just the language features. They observed that the lan-guage components gave a higher accuracy than music fea-tures in predicting emotions. The accuracy is higher as theyare looking at emotions in a phrase, where the chance of hav-ing multiple emotions inside such a small text is very low.When we look at a collection of popular songs, they be-long to several emotional categories. It is clear from previ-ous research that language is a strong indicator of emotions,but it is not clear if the language is an indicator of a songbecoming a commercial success.We used the language features extracted from the lyrics totrain an SVM classiﬁer to identify the top and bottom cate-gory of songs. Below is the description of both approaches: • A machine learning approach: We extracted the languagefeatures, performed dimensionality reduction using prin-cipal component analysis (PCA) in-order to reduce thenoise in the data. We trained and tested SVM classiﬁer onthe new features for identifying the songs that belongedto the top and bottom of the Billboard rankings.

Data

Billboard magazine (Billboard, 2015) is a world premiermusic publication since 1984. Billboard’s music charts haveevolved into the primary source of information on trendsand innovation in music industry. With more than 10 Mil-lion users, its ranking is considered as a standard in the mu-sic industry. Billboard releases the weekly ranking of top songs in several categories such as rock, pop, hip-hop,etc. For this study, we used top hot-hits of every weekfrom − . We collected the lyrics of the songs from . Since the ratings of the songs are given ev-ery week, there is a lot of repetition of the same song beingin present in multiple weeks. For the simplifying the prob-lem we selected the top rank of the song throughout the year as the rank of the song.After cleaning the lyrics from hypertext annotations andpunctuations, we had a total of songs from artists.The histogram of the peak rank of the songs in the datasetis shown in Figure 1. For our analysis, we build a model toidentify the songs that belonged to the top 30 and bottom 30ranks. There are a total of songs of which belongedto top 30, and the rest belonged to bottom 30 ranks.Figure 1: Histogram of the best rank of songs from 2001-2010 of billboard top 100 Features

There are few analysis which conduct whole battery of lin-guistic algorithms that look at syntax, semantics, emotions,and affect contribution of words present in the lyrics. Thesealgorithms can generally be classiﬁed into general structural(e.g., word count), syntactic(e.g., connectives) and semantic(e.g., word choice) dimensions of language, whereby someused a bag-of-word approach (e.g. LIWC), whereas othersused a probability approach (MRC), whereas yet others re-lied on the computation of different factors (e.g., type-tokenratio). There are eight computation linguistic algorithms thatare used to analyze the language features inside the lyrics ofthe songs.For general linguistic features, we used the frequencyof linguistic features described by (Biber, 1991). Thesefeatures primarily operate at the word level (e.g., parts-of-speech) and can be categorized as tense and aspect mark-ers, place and time adverbials, pronouns and proverbs, ques-tions, nominal forms, passives, stative forms, subordina-tion features, prepositional phrases, adjectives and adverbs,lexical speciﬁcity, lexical classes, modals, specialized verbclasses, reduced forms and dis-preferred structures, andco-ordinations and negations (Luno, Beck, and Louwerse,2013).For semantic categories of the words, we used Wordnet(Miller et al., 1998). Wordnet has words in base yntac’c seman’c structural connec%ves general linguis%c features Bag of words ra’ng social interpersonal

LCM aﬀect emo%ons conceptual Word Net MRC LIWC Word count, CELEX Figure 2: Overview of computational linguistic algorithms used. Louwerse (2001), Biber (1991), Semin and Fiedler (1988,1991), Johnson-laird and Oatley (1989), Miller et al. (1998), Coltheart (1981), Baayen, Piepenbrock, and Gulikers (1995), Tausczik and Pennebaker (2010)types including primitive groups for nouns (e.g. time, lo-cation, person, etc.), for verbs (e.g. communication, cog-nition, etc.), groups of adjectives and group of adverbs.We also collected all the English words from Google uni-grams (Brants and Franz, 2006) and binned them into oneof the categories if one of their synonyms belonged tothose categories. These words represent the categories suchas communication nouns, social nouns, and many others.The linguistic category model (LCM) gives insight intothe interpersonal language use. The model consists of a clas-siﬁcation of interpersonal (transitive) verbs that are used todescribe actions or psychological states and adjectives thatare employed to characterize persons. To capture the variousemotions expressed by the statement, we have used the emo-tion words given by (Tausczik and Pennebaker, 2010), clas-siﬁed into two classes broadly basic emotions (anger, fear,disgust, happiness, etc.) and complex emotions (guilt, pity,tenderness, etc.).The basic emotions indicate no cognitive load hence theyare also called as raw emotions, whereas the complex emo-tions indicate cognitive load. Inter-clausal relationships werecaptured using parameterization, including positive addi-tive, (also, moreover), negative additive (however, but), pos-itive temporal (after, before), negative temporal (until), andcausal (because, so) connectives. To get the frequencies ofthe words, we have used CELEX database (Baayen, Piepen-brock, and Gulikers, 1995). The CELEX database consistsof . million words taken from both spoken (news wireand telephonic conversations) and written (newspapers andbooks) corpora. Also, we used the MRC PsycholinguisticDatabase (Johnson-laird and Oatley, 1989), to get linguisticmeasures such as familiarity, concreteness, and meaningful-ness. Measures Precision Recall KappaSVM exp. ker SVM ply. ker 0.68 0.68 0.36SVM lin. ker 0.53 0.53 0.05Table 1: Classiﬁcation Results

Classiﬁcation

After the linguistic analysis, we approached the problem as aclassiﬁcation problem. As discussed earlier, we extracted thelanguage features from the lyrics using the computationallinguistic algorithms shown in Figure 2. We extracted 261features from each of the 2616 songs. The goal is to builda classiﬁer that predicts the top and bottom ranked songs ofthe Billboard. Since there are many features and very fewsongs, we removed the noise contributed by the features us-ing principle component analysis (PCA). Features that ex-plained 0.6 variance were selected, and this reduced the fea-tures to 39 from 261.It is important to note that the major advantage of doing aPCA is noise reduction, and also identifying the best featuresthat capture the variance in the data. The disadvantage is thatthe variables loose their semantic meaning compared to theraw features.The classes of positive and negative samples i.e. the top30 and bottom 30 songs were in the ratio of 1.5 to 1, and tobalance the classes we performed synthetic minority over-sampling (SMOTE) (Chawla et al., 2002). The SMOTE cre-ates new synthetic samples that are similar to the minorityclass by picking data points that are closer to the originalsample.After balancing the classes, we performed classiﬁca-tion using support vector machine (SVM) using a ra-dial(exponential), polynomial and linear kernel functions.The classiﬁcation is done using a 10-fold cross validationethod.SVM uses implicit mapping function deﬁned by the ker-nel function, to map the input data into a very high dimen-sional feature space. Then it learns the plane of separationbetween the two classes of the high dimensional space. Forthe classiﬁcation of top and bottom ranked songs we observethat the radial (exponential) function performs the best, witha precision 0.76, recall 0.76 and Cohen’s Kappa -0.51. Thekappa score indicates that the classiﬁer did the classiﬁcationwith great conﬁdence.We also attempted building a classiﬁer using other clas-siﬁcation algorithms such as Bayes, Naive-Bayes, and deci-sion trees, but all of them performed poorly compared to theSVM.

Discussion

There are several studies (Mihalcea and Strapparava, 2012;Su, Fung, and Auguin, 2013; Laurier, Grivolla, and Herrera,2008; Kim et al., 2010) that have looked into emotions inmusic based on language as well as few audio features. Allthe studies explicitly indicated that language features weremore useful than surface music features in identifying theemotion present in the songs.Songs contain both music and lyrics. In this work, wehave used only the lyrics as our data. Lyrics of the songs areavailable publicly when compared to the music. Since previ-ous studies have shown the importance of language in musicfor identifying emotions, we extended the investigation foridentifying the language features that help in differentiatingthe top and bottom rated songs on the billboard. To the bestof our knowledge this is a ﬁrst study that uses computationallinguistic algorithms and machine learning models to predictwhether a song belongs to top or bottom of the Billboardrankings.We used the language features extracted using the lan-guage model to train SVM classiﬁers under different ker-nel functions to identify whether a song belongs to the topor bottom of the billboard chart. The radial kernel functiongives a precision . with a kappa . which indicates thatthe conﬁdence in classiﬁcation.Although audio features of the song play an importantrole, they are expensive and not publicly available for down-load. In this paper, we focused only on the language featuresand the results from both the studies indicate that we canrobustly identify whether a song goes to top or bottom ofBillboard charts based on the language features alone. Al-though the precision is only 0.76 (chance is 0.5), given thatwe are in a very dense space of top 100 songs from Bill-board, where all the songs are best of the best when takinginto consideration all the music albums uploaded on to so-cial media (youtube, facebook, twitter, etc.).Overall the take-home message of this paper is that lan-guage features can be exploited by the machine learning al-gorithms to predict whether a song reaches the top or bottomof the Billboard rankings. Conclusion and Future Work

The music industry is a vibrant business community, withmany artists publishing their work in the form of albums,individual songs, and performances. There is a huge ﬁnan-cial incentive for the businesses to identify the songs that aremost likely to be a hit.can use machine learning models to train on several lan-guage features to predict whether a song belongs to the top30 or bottom 30 of the Billboard ratings.In future, we would like to expand our research questionto predict whether the song reaches to the class of top 100Billboard list or not.

References

Baayen, H. R.; Piepenbrock, R.; and Gulikers, L. 1995.

TheCELEX lexical database. release 2 (CD-ROM) . Philadel-phia, Pennsylvania: Linguistic Data Consortium, Univer-sity of Pennsylvania.Biber, D. 1991.

Variation across speech and writing . Cam-bridge University Press.Billboard. 2015. Billboard [email protected], T., and Franz, A. 2006.

Web 1T 5-gram Version 1 .Philadelphia: Linguistic Data Consortium.Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; and Kegelmeyer,W. P. 2002. Smote: synthetic minority over-samplingtechnique.

Journal of artiﬁcial intelligence research

The Quarterly Journal of Experimental Psychology

User Modeling and User-Adapted Interaction

Education, IEEE Transactionson

Proceedings of the Tenth ACM SIGKDDInternational Conference on Knowledge Discovery andData Mining , KDD ’04, 168–177. New York, NY, USA:ACM.Johnson-laird, P. N., and Oatley, K. 1989. The language ofemotions: An analysis of a semantic ﬁeld.

Cognition andEmotion

Proc. ISMIR , 255–266. Citeseer.Laurier, C.; Grivolla, J.; and Herrera, P. 2008. Multimodalmusic mood classiﬁcation using audio and lyrics. In

Ma-chine Learning and Applications, 2008. ICMLA’08. Sev-enth International Conference on , 688–693. IEEE.ouwerse, M. 2001. An analytic and cognitive parametriza-tion of coherence relations.

Cognitive Linguistics

Course Paper, McGill University, Canada .Mihalcea, R., and Strapparava, C. 2012. Lyrics, music,and emotions. In

Proceedings of the 2012 Joint Con-ference on Empirical Methods in Natural Language Pro-cessing and Computational Natural Language Learning ,590–599. Association for Computational Linguistics.Miller, G. A.; Beckwith, R.; Fellbaum, C.; Gross, D.; andMiller, K. 1998. Five Papers on WordNet. In Fellbaum,C., ed.,

WordNet: An Electronic Lexical Database . MITPress.Rus, V.; Niraula, N.; Lintean, M.; Banjade, R.; Stefanescu,D.; and Baggett, W. 2013. Recommendations for the gen-eralized intelligent framework for tutoring based on thedevelopment of the deeptutor tutoring service. In

AIED2013 Workshops Proceedings Volume 7 , 116.Russell, J. A. 1980. A circumplex model of affect.

Journalof personality and social psychology

Journal of Personality and SocialPsychology

EuropeanReview of Social Psychology

Acoustics, Speech and Signal Process-ing (ICASSP), 2013 IEEE International Conference on ,3447–3451.Tausczik, Y. R., and Pennebaker, J. W. 2010. The psycholog-ical meaning of words: Liwc and computerized text analy-sis methods.