Are Emojis Predictable?
AAre Emojis Predictable?
Francesco Barbieri ♦ Miguel Ballesteros ♠ Horacio Saggion ♦♦ Large Scale Text Understanding Systems Lab, TALN GroupUniversitat Pompeu Fabra, Barcelona, Spain ♠ IBM T.J Watson Research Center, U.S { francesco.barbieri, horacio.saggion } @[email protected] Abstract
Emojis are ideograms which are natu-rally combined with plain text to visuallycomplement or condense the meaning ofa message. Despite being widely usedin social media, their underlying seman-tics have received little attention from aNatural Language Processing standpoint.In this paper, we investigate the relationbetween words and emojis, studying thenovel task of predicting which emojis areevoked by text-based tweet messages. Wetrain several models based on Long Short-Term Memory networks (LSTMs) in thistask. Our experimental results show thatour neural model outperforms two base-lines as well as humans solving the sametask, suggesting that computational mod-els are able to better capture the underly-ing semantics of emojis.
The advent of social media has brought along anovel way of communication where meaning iscomposed by combining short text messages andvisual enhancements, the so-called emojis . Thisvisual language is as of now a de-facto standardfor online communication, available not only inTwitter, but also in other large online platformssuch as Facebook, Whatsapp, or Instagram.Despite its status as language form, emojis havebeen so far scarcely studied from a Natural Lan-guage Processing (NLP) standpoint. Notable ex-ceptions include studies focused on emojis’ se-mantics and usage (Aoki and Uchida, 2011; Barbi-eri et al., 2016a; Barbieri et al., 2016b; Barbieri etal., 2016c; Eisner et al., 2016; Ljubeˇsic and Fiˇser,2016), or sentiment (Novak et al., 2015). How-ever, the interplay between text-based messages and emojis remains virtually unexplored. This pa-per aims to fill this gap by investigating the rela-tion between words and emojis, studying the prob-lem of predicting which emojis are evoked by text-based tweet messages.Miller et al. (2016) performed an evaluationasking human annotators the meaning of emojis,and the sentiment they evoke. People do not al-ways have the same understanding of emojis, in-deed, there seems to exist multiple interpretationsof their meaning beyond their designer’s intent orthe physical object they evoke . Their main con-clusion was that emojis can lead to misunderstand-ings. The ambiguity of emojis raises an interestingquestion in human-computer interaction: how canwe teach an artificial agent to correctly interpretand recognise emojis’ use in spontaneous conver-sation? The main motivation of our research isthat an artificial intelligence system that is ableto predict emojis could contribute to better natu-ral language understanding (Novak et al., 2015)and thus to different natural language processingtasks such as generating emoji-enriched social me-dia content, enhance emotion/sentiment analysissystems, and improve retrieval of social networkmaterial.In this work, we employ a state of the art clas-sification framework to automatically predict themost likely emoji a Twitter message evokes. Themodel is based on Bidirectional Long Short-termMemory Networks (BLSTMs) with both standardlookup word representations and character-basedrepresentation of tokens. We will show that theBLSTMs outperform a bag of words baseline, abaseline based on semantic vectors, and humanannotators in this task. a r X i v : . [ c s . C L ] F e b Table 1: The 20 most frequent emojis that weuse in our experiments and the number of thousandtweets they appear in.
Dataset:
We retrieved 40 million tweets with theTwitter APIs . Tweets were posted between Oc-tober 2015 and May 2016 geo-localized in theUnited States of America. We removed all hyper-links from each tweet, and lowercased all textualcontent in order to reduce noise and sparsity. Fromthe dataset, we selected tweets which include oneand only one of the 20 most frequent emojis, re-sulting in a final dataset composed of 584,600tweets. In the experiments we also consider thesubsets of the 10 (502,700 tweets) and 5 most fre-quent emojis (341,500 tweets). See Table 1 for the20 most frequent emojis that we consider in thiswork. Task : We remove the emoji from the sequence oftokens and use it as a label both for training andtesting. The task for our machine learning modelsis to predict the single emoji that appears in theinput tweet.
In this Section, we present and motivate the mod-els that we use to predict an emoji given a tweet.The first model is an architecture based on Recur-rent Neural Networks (Section 3.1) and the sec-ond and third are the two baselines (Section 3.2.1and 3.2.2). The two major differences between theRNNs and the baselines, is that the RNNs take intoaccount sequences of words and thus, the entirecontext.
Given the proven effectiveness and the impactof recurrent neural networks in different tasks(Chung et al., 2014; Vinyals et al., 2015; Dzmitryet al., 2014; Dyer et al., 2015; Lample et al., 2016;Wang et al., 2016, inter-alia), which also includesmodeling of tweets (Dhingra et al., 2016), ouremoji prediction model is based on bi-directional https://dev.twitter.com Available at http://sempub.taln.upf.edu/tw/eacl17
Long Short-term Memory Networks (Hochreiterand Schmidhuber, 1997; Graves and Schmidhu-ber, 2005). The B-LSTM can be formalized asfollows: s = max { , W [ fw ; bw ] + d } where W is a learned parameter matrix, fw is theforward LSTM encoding of the message, bw isthe backward LSTM encoding of the message, and d is a bias term, then passed through a component-wise ReLU. The vector s is then used to computethe probability distribution of the emojis given themessage as: p ( e | s ) = exp (cid:0) g (cid:62) e s + q e (cid:1)(cid:80) e (cid:48) ∈E exp (cid:0) g (cid:62) e (cid:48) s + q e (cid:48) (cid:1) where g e (cid:48) is a column vector representing the (out-put) embedding of the emoji e , and q e is a biasterm for the emoji e . The set E represents the listof emojis. The loss/objective function the networkaims to minimize is the following: Loss = − log ( p ( e m | s )) where m is a tweet of the training set T , s is theencoded vector representation of the tweet and e m is the emoji contained in the tweet m . The inputsof the LSTMs are word embeddings . Following,we present two alternatives explored in the exper-iments presented in this paper. Word Representations : We generate word em-beddings which are learned together with the up-dates to the model. We stochastically replace(with p = 0.5) each word that occurs only once inthe training data with a fixed represenation (out-of-vocabulary words vector). When we use pre-trained word embeddings, these are concatenatedwith the learned vector representations obtaininga final representation for each word type. This issimilar to the treatment of word embeddings byDyer et al. (2015). Character-based Representations : We computecharacter-based continuous-space vector embed-dings (Ling et al., 2015b; Ballesteros et al., 2015)of the tokens in each tweet using, again, bidi-rectional LSTMs. The character-based approachlearns representations for words that are ortho-graphically similar, thus, they should be able tohandle different alternatives of the same word typeoccurring in social media. The output embeddings of the emojis have 100 dimen-sions.
100 dimensions. .2 Baselines
In this Section we describe the two baselines. Un-like the previous model, the baselines do not takeinto account the word order. However, in the sec-ond baseline (Section 3.2.2) we abstract on theplain word representation using semantic vectors,previously trained on Twitter data.
We applied a bag of words classifier as baseline,since it has been successfully employed in se-veral classification tasks, like sentiment analysisand topic modeling (Wallach, 2006; Blei, 2012;Titov and McDonald, 2008; Maas et al., 2011;Davidov et al., 2010). We represent each mes-sage with a vector of the most informative to-kens (punctuation marks included) selected usingterm frequency − inverse document frequency (TF-IDF). We employ a L2-regularized logistic regres-sion classifier to make the predictions. We train a Skip-gram model (Mikolov et al., 2013)learned from 65M Tweets (where testing instanceshave been removed) to learn Twitter semantic vec-tors. Then, we build a model (henceforth, AVG)which represents each message as the average ofthe vectors corresponding to each token of thetweet. Formally, each message m is representedwith the vector V m : V m = (cid:80) t ∈ T m S t | T m | Where T m are the set of tokens included in themessage m , S t is the vector of token t in the Skip-gram model, and | T m | is the number of tokens in m . After obtaining a representation of each mes-sage, we train a L2-regularized logistic regression,(with ε equal to 0.001). In order to study the relation between words andemojis, we performed two different experiments.In the first experiment, we compare our machinelearning models, and in the second experiment, wepick the best performing system and compare itagainst humans.
This experiment is a classification task, wherein each tweet the unique emoji is removed and .59 .60 .58 .43 .46 .41 .32 .34 .29
AVG .60 .60 .57 .44 .47 .40 .34 .36 .29 W .59 .59 .59 .46 .46 .46 .35 .36 .33 C .61 .61 .61 .44 .44 .44 .36 .37 .32 W+P .61 .61 .61 .45 .45 .45 .34 .36 .32
C+P .63 .63 .63 .48 .47 .47 .42 .39 .34
Table 2: Results of 5, 10 and 20 emojis. Preci-sion, Recall, F-measure. BOW is bag of words,AVG is the Skipgram Average model, C refers tochar-BLSTM and W refers to word-BLSTM. +Prefers to pretrained embeddings.used as a label for the entire tweet. We usethree datasets, each containing the 5, 10 and 20most frequent emojis (see Section 2). We ana-lyze the performance of the five models describedin Section 3: a bag of words model, a Bidirec-tional LSTM model with character-based repre-sentations (char-BLSTM), a Bidirectional LSTMmodel with standard lookup word representa-tions (word-BLSTM). The latter two were trainedwith/without pretrained word vectors. To pretrainthe word vectors, we use a modified skip-grammodel (Ling et al., 2015a) trained on the EnglishGigaword corpus version 5.We divide each dataset in three parts, train-ing (80%), development (10%) and testing (10%).The three subsets are selected in sequence start-ing from the oldest tweets and from the trainingset since automatic systems are usually trained onpast tweets, and need to be robust to future topicvariations.Table 2 reports the results of the five modelsand the baseline. All neural models outperformthe baselines in all the experimental setups. How-ever, the BOW and AVG are quite competitive,suggesting that most emojis come along with spe-cific words (like the word love and the emoji ).However, considering sequences of words in themodels seems important for encoding the mean-ing of the tweet and therefore contextualize theemojis used. Indeed, the B-LSTMs models alwaysoutperform BOW and AVG. The character-basedmodel with pretrained vectors is the most accurateat predicting emojis. The character-based modelseems to capture orthographic variants of the sameword in social media. Similarly, pretrained vec-tors allow to initialize the system with unsuper- https://catalog.ldc.upenn.edu/LDC2003T05 ised pre-trained semantic knowledge (Ling et al.,2015a), which helps to achieve better results. Emoji P R F1 Rank Num
Table 3: Precision, Recall, F-measure, Rankingand occurrences in the test set of the 20 most fre-quent emojis using char-BLSTM + Pre.
Qualitative Analysis of Best System:
We an-alyze the performances of the char-BLSTM withpretrained vectors on the 20-emojis dataset, as itresulted to be the best system in the experimentpresented above. In Table 3 we report Precision,Recall, F-measure and Ranking of each emoji.We also added in the last column the occurrencesof each emoji in the test set.The frequency seems to be very relevant. TheRanking of the most frequent emojis is lower thanthe Ranking of the rare emojis. This means that ifan emoji is frequent, it is more likely to be on topof the possible choices even if it is a mistake. Onthe other hand, the F-measure does not seem to de-pend on frequency, as the highest F-measures arescored by a mix of common and uncommon emo-jis ( , , , and ) which are respectively the The Ranking is a number between 1 and 20 that repre-sents the average number of emojis with higher probabilitythan the gold emoji in the probability distribution of the clas-sifier. first, second, the sixth and the second last emoji interms of frequencies.The frequency of an emoji is not the only im-portant variable to detect the emojis properly; it isalso important whether in the set of emojis thereare emojis with similar semantics. If this is thecase the model prefers to predict the most frequentemojis. This is the case of the emoji that is al-most never predicted, even if the Ranking is nottoo high (4.69). The model prefers similar butmost frequent emojis, like (instead of ). Thesame behavior is observed for the emoji, butin this case the performance is a bit better dueto some specific words used along with the blueheart: “blue”, “sea” and words related to child-hood (e.g. “little” or “Disney”).Another interesting case is the Christmas treeemoji , that is present only three times in thetest set (as the test set includes most recent tweetsand Christmas was already over; this emoji iscommonly used in tweets about Christmas). Themodel is able to recognize it twice, but missingit once. The correctly predicted cases include theword “Christmas”; and it fails to predict: “get-ting into the holiday spirit with this gorgeous pairof leggings today ! , since thereare no obvious clues (the model chooses insteadprobably because of the intended meaning of “hol-iday” and “gorgeous”.).In general the model tends to confuse similaremojis to and , probably for their higher fre-quency and also because they are used in multiplecontexts. An interesting phenomenon is that isoften confused with . The first one represent asmall face crying, and the second one a small facelaughing, but the results suggest that they appearin similar tweets. The punctuation and tone used isoften similar (many exclamation marks and wordslike “omg” and “hahaha” ). Irony may also play arole to explain the confusion, e.g. “I studied jour-nalism and communications , I’ll be an awesomespeller! Wrong. haha so much fun”.
Given that Miller et al. (2016) pointed out thatpeople tend to give multiple interpretations toemojis, we carried out an experiment in whichwe evaluated human and machine performanceson the same task. We randomly selected 1,000tweets from our test set of the 5 most frequentemojis used in the previous experiment, and asked umans B-LSTMEmo P R F1 P R F1
Table 4: Precision, Recall and F-Measure of hu-man evaluation and the character-based B-LSTMfor the 5 most frequent emojis and 1,000 tweets.humans to predict, after reading a tweet (with theemoji removed), the emoji the text evoked. Weopted for the 5 emojis task to reduce annotationefforts. After displaying the text of the tweet, weasked the human annotators “What is the emojiyou would include in the tweet?”, and gave thepossibility to pick one of 5 possible emojis ,, , , and . Using the crowdsourcing plat-form ‘’CrowdFlower”, we designed an experimentwhere the same tweet was presented to four anno-tators (selecting the final label by majority agree-ment). Each annotator assessed a maximum of200 tweets. The annotators were selected fromthe United States of America and of high qual-ity (level 3 of CrowdFlower). One in every tentweets, was an obvious test question, and anno-tations from subjects who missed more than 20%of the test questions were discarded. The overallinter-annotator agreement was 73% (in line withprevious findings (Miller et al., 2016)). After cre-ating the manually annotated dataset, we com-pared the human annotation and the char-BLSTMmodel with the gold standard (i.e. the emoji usedin the tweet).We can see in Table 4, where the results of thecomparison are presented, that the char-BLSTMperforms better than humans, with a F1 of 0.65versus 0.50. The emojis that the char-BLSTMstruggle to predict are and , while the humanannotators mispredict and mostly. We cansee in the confusion matrix of Figure 1 that ismisclassified as by both human and LSTM, andthe emoji is mispredicted as and . An in-teresting result is the number of times was cho-sen by human annotators; this emoji occurred 100times (by chance) in the test set, but it was chosen208 times, mostly when the correct label was thelaughing emoji . We do not observe the same be- Figure 1: Confusion matrix of the second experi-ment. On the left the human evaluation and on theright the char-BLSTM model.havior in the char-BLSTMs, perhaps because theyencoded information about the probability of thesetwo emojis and when in doubt, the laughing emojiwas chosen as more probable.
Emojis are used extensively in social media, how-ever little is known about their use and seman-tics, especially because emojis are used differentlyover different communities (Barbieri et al., 2016a;Barbieri et al., 2016b). In this paper, we providea neural architecture to model the semantics ofemojis, exploring the relation between words andemojis. We proposed for the first time an auto-matic method to, given a tweet, predict the mostprobable emoji associated with it. We showedthat the LSTMs outperform humans on the sameemoji prediction task, suggesting that automaticsystems are better at generalizing the usage ofemojis than humans. Moreover, the good accuracyof the LSTMs suggests that there is an importantand unique relation between sequences of wordsand emojis.As future work, we plan to make the model ableto predict more than one emoji per tweet, and ex-plore the position of the emoji in the tweet, asclose words can be an important clue for the emojiprediction task.
Acknowledgments
We thank the three reviewers for their time andtheir useful suggestions. First and third authorsacknowledge support from the TUNER project(TIN2015-65308-C5-5-R, MINECO/FEDER,UE) and the Maria de Maeztu Units of ExcellenceProgramme (MDM-2015-0502). eferences [Aoki and Uchida2011] Sho Aoki and Osamu Uchida.2011. A method for automatically generating theemotional vectors of emoticons using weblog arti-cles. In
Proceedings of the 10th WSEAS Interna-tional Conference on Applied Computer and AppliedComputational Science, Stevens Point, Wisconsin,USA , pages 132–136, September.[Ballesteros et al.2015] Miguel Ballesteros, Chris Dyer,and Noah A. Smith. 2015. Improved transition-based parsing by modeling characters instead ofwords with lstms. In
Proceedings of the 2015Conference on Empirical Methods in Natural Lan-guage Processing , pages 349–359, Lisbon, Portugal,September. Association for Computational Linguis-tics.[Barbieri et al.2016a] Francesco Barbieri, Luis Es-pinosa Anke, and Horacio Saggion. 2016a. Reveal-ing Patterns of Twitter Emoji Usage in Barcelonaand Madrid. In
19 th International Conference ofthe Catalan Association for Artificial Intelligence ,pages 326–332, Barcelona, Spain, December.[Barbieri et al.2016b] Francesco Barbieri, GermanKruszewski, Francesco Ronzano, and HoracioSaggion. 2016b. How Cosmopolitan Are Emojis?Exploring Emojis Usage and Meaning over Differ-ent Languages with Distributional Semantics. In
Proceedings of the 2016 ACM on Multimedia Con-ference , pages 531–535, Amsterdam, Netherlands,October. ACM.[Barbieri et al.2016c] Francesco Barbieri, FrancescoRonzano, and Horacio Saggion. 2016c. What doesthis emoji mean? a vector space skip-gram modelfor twitter emojis. In
Language Resources and Eval-uation conference, LREC , pages 526–534, Portoroz,Slovenia, May.[Blei2012] David M Blei. 2012. Probabilistic topicmodels.
Communications of the ACM , 55(4):77–84,April.[Chung et al.2014] Junyoung Chung, Caglar Gulcehre,KyungHyun Cho, and Yoshua Bengio. 2014. Em-pirical evaluation of gated recurrent neural net-works on sequence modeling. arXiv preprintarXiv:1412.3555 .[Davidov et al.2010] Dmitry Davidov, Oren Tsur, andAri Rappoport. 2010. Semi-supervised recognitionof sarcastic sentences in twitter and amazon. In
Pro-ceedings of the Fourteenth Conference on Computa-tional Natural Language Learning , pages 107–116,Uppsala, Sweden, July. Association for Computa-tional Linguistics.[Dhingra et al.2016] Bhuwan Dhingra, Zhong Zhou,Dylan Fitzpatrick, Michael Muehl, and William Co-hen. 2016. Tweet2vec: Character-based distributedrepresentations for social media. In
Proceedings ofthe 54th Annual Meeting of the Association for Com-putational Linguistics (Volume 2: Short Papers) , pages 269–274, Berlin, Germany, August. Associ-ation for Computational Linguistics.[Dyer et al.2015] Chris Dyer, Miguel Ballesteros,Wang Ling, Austin Matthews, and Noah A. Smith.2015. Transition-based dependency parsing withstack long short-term memory. In
Proceedings ofthe 53rd Annual Meeting of the Association forComputational Linguistics and the 7th InternationalJoint Conference on Natural Language Processing(Volume 1: Long Papers) , pages 334–343, Bei-jing, China, July. Association for ComputationalLinguistics.[Dzmitry et al.2014] Bahdanau Dzmitry, ChoKyunghyun, and Bengio Yoshua. 2014. Neu-ral machine translation by jointly learning to alignand translate. In
In Proceeding of the third Inter-national Conference on Learning Representations ,Toulon, France, May.[Eisner et al.2016] Ben Eisner, Tim Rockt¨aschel, Is-abelle Augenstein, Matko Bosnjak, and SebastianRiedel. 2016. emoji2vec: Learning emoji repre-sentations from their description. In
Proceedings ofThe Fourth International Workshop on Natural Lan-guage Processing for Social Media , pages 48–54,Austin, TX, USA, November. Association for Com-putational Linguistics.[Graves and Schmidhuber2005] Alex Graves andJ¨urgen Schmidhuber. 2005. Framewise phonemeclassification with bidirectional LSTM networks. In
Proceedings of the International Joint Conferenceon Neural Networks (IJCNN) , Killarney, Ireland,July.[Hochreiter and Schmidhuber1997] Sepp Hochreiterand J¨urgen Schmidhuber. 1997. Long short-termmemory.
Neural Computation , 9(8):1735–1780.[Lample et al.2016] Guillaume Lample, Miguel Balles-teros, Sandeep Subramanian, Kazuya Kawakami,and Chris Dyer. 2016. Neural architectures fornamed entity recognition. In
Proceedings of the2016 Conference of the North American Chapter ofthe Association for Computational Linguistics: Hu-man Language Technologies , pages 260–270, SanDiego, California, June. Association for Computa-tional Linguistics.[Ling et al.2015a] Wang Ling, Chris Dyer, Alan WBlack, and Isabel Trancoso. 2015a. Two/too sim-ple adaptations of word2vec for syntax problems. In
Proceedings of the 2015 Conference of the NorthAmerican Chapter of the Association for Compu-tational Linguistics: Human Language Technolo-gies , pages 1299–1304, Denver, Colorado, May–June. Association for Computational Linguistics.[Ling et al.2015b] Wang Ling, Chris Dyer, Alan WBlack, Isabel Trancoso, Ramon Fermandez, SilvioAmir, Luis Marujo, and Tiago Luis. 2015b. Findingfunction in form: Compositional character modelsor open vocabulary word representation. In
Pro-ceedings of the 2015 Conference on Empirical Meth-ods in Natural Language Processing , pages 1520–1530, Lisbon, Portugal, September. Association forComputational Linguistics.[Ljubeˇsic and Fiˇser2016] Nikola Ljubeˇsic and DarjaFiˇser. 2016. A global analysis of emoji usage. In
Proceedings of the 10th Web as Corpus Workshop(WAC-X) and the EmpiriST Shared Task , pages 82–89, Berlin, Germany, August. Association for Com-putational Linguistics.[Maas et al.2011] Andrew L. Maas, Raymond E. Daly,Peter T. Pham, Dan Huang, Andrew Y. Ng, andChristopher Potts. 2011. Learning word vectors forsentiment analysis. In
Proceedings of the 49th An-nual Meeting of the Association for ComputationalLinguistics: Human Language Technologies , pages142–150, Portland, Oregon, USA, June. Associationfor Computational Linguistics.[Mikolov et al.2013] Tomas Mikolov, Quoc V Le, andIlya Sutskever. 2013. Exploiting similarities amonglanguages for machine translation. arXiv preprintarXiv:1309.4168 .[Miller et al.2016] Hannah Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen,and Brent Hecht. 2016. “Blissfully Happy” orReady to Fight: Varying Interpretations of Emoji. In
In Proceeding of the International AAAI Conferenceon Web and Social Media (ICWSM) , pages 259–268,Cologne, Germany, July. AAAI.[Novak et al.2015] Petra Kralj Novak, JasminaSmailovi´c, Borut Sluban, and Igor Mozetiˇc. 2015.Sentiment of emojis.
PloS one , 10(12):e0144296.[Titov and McDonald2008] Ivan Titov and Ryan Mc-Donald. 2008. Modeling online reviews with multi-grain topic models. In
Proceedings of the 17th in-ternational conference on World Wide Web , pages111–120, Beijing, China, April. ACM.[Vinyals et al.2015] Oriol Vinyals, Lukasz Kaiser,Terry Koo, Slav Petrov, Ilya Sutskever, and Geof-frey Hinton. 2015. Grammar as a foreign language.In
Proceeding of the conference on Neural Infor-mation Processing Systems , Montreal, Canada,December.[Wallach2006] Hanna M Wallach. 2006. Topic mod-eling: Beyond bag-of-words. In
Proceedings of the23rd International Conference on Machine Learn-ing , pages 977–984, Pittsburgh, USA, June. ACM.[Wang et al.2016] Peilu Wang, Yao Qian, Frank K.Soong, Lei He, and Hai Zhao. 2016. Learning dis-tributed word representations for bidirectional lstmrecurrent neural network. In