[PDF] HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition

Abstract

This paper introduces HeBERT and HebEMO. HeBERT is a Transformer-based model for modern Hebrew text, which relies on a BERT (Bidirectional Encoder Representations for Transformers) architecture. BERT has been shown to outperform alternative architectures in sentiment analysis, and is suggested to be particularly appropriate for MRLs. Analyzing multiple BERT specifications, we find that while model complexity correlates with high performance on language tasks that aim to understand terms in a sentence, a more-parsimonious model better captures the sentiment of entire sentence. Either way, out BERT-based language model outperforms all existing Hebrew alternatives on all common language tasks. HebEMO is a tool that uses HeBERT to detect polarity and extract emotions from Hebrew UGC. HebEMO is trained on a unique Covid-19-related UGC dataset that we collected and annotated for this study. Data collection and annotation followed an active learning procedure that aimed to maximize predictability. We show that HebEMO yields a high F1-score of 0.96 for polarity classification. Emotion detection reaches F1-scores of 0.78-0.97 for various target emotions, with the exception of surprise, which the model failed to capture (F1 = 0.41). These results are better than the best-reported performance, even among English-language models of emotion detection.

Full PDF

HHeBERT & HebEMO: a Hebrew BERT Model and a Tool for PolarityAnalysis and Emotion Recognition

Avihay Chriqui, Inbal Yahav

Abstract

Sentiment analysis of user-generated content (UGC) can provide valuable information across nu-merous domains, including marketing, psychology, and public health. Currently, there are very fewHebrew models for natural language processing in general, and for sentiment analysis in particu-lar; indeed, it is not straightforward to develop such models because Hebrew is a MorphologicallyRich Language (MRL) with challenging characteristics. Moreover, the only available Hebrew sen-timent analysis model, based on a recurrent neural network, was developed for polarity analysis(classifying text as “positive”, “negative”, or neutral) and was not used for detection of ﬁner-grained emotions (e.g., anger, fear, joy). To address these gaps, this paper introduces HeBERTand HebEMO. HeBERT is a Transformer-based model for modern Hebrew text, which relies ona BERT (Bidirectional Encoder Representations for Transformers) architecture. BERT has beenshown to outperform alternative architectures in sentiment analysis, and is suggested to be partic-ularly appropriate for MRLs. Analyzing multiple BERT speciﬁcations, we ﬁnd that while modelcomplexity correlates with high performance on language tasks that aim to understand terms in asentence, a more-parsimonious model better captures the sentiment of an entire sentence. Notably,regardless of the complexity of the BERT speciﬁcation, our BERT-based language model outper-forms all existing Hebrew alternatives on all common language tasks. HebEMO is a tool that usesHeBERT to detect polarity and extract emotions from Hebrew UGC. HebEMO is trained on aunique Covid-19-related UGC dataset that we collected and annotated for this study. Data collec-tion and annotation followed an active learning procedure that aimed to maximize predictability.We show that HebEMO yields a high F1-score of 0.96 for polarity classiﬁcation. Emotion detectionreaches F1-scores of 0.78-0.97 for various target emotions, with the exception of surprise , which themodel failed to capture (F1 = 0.41). These results are better than the best-reported performance,even among English-language models of emotion detection.

Preprint submitted to XX December 2020 a r X i v : . [ c s . C L ] F e b . Introduction Sentiment analysis , also referred to as opinion mining or subjectivity analysis (Liu and Zhang2012), is probably one of the most common tasks in natural language processing (NLP) (Liu 2012,Zhang et al. 2018). The goal of sentiment analysis is to systematically extract, from written text,what people think or feel toward entities such as products, services, individuals, events, newsarticles, and topics.Sentiment analysis includes multiple types of tasks, one of the most common being polarity classiﬁcation: the binning of overall sentiment into the three categories of positive, neutral, ornegative. Another prominent sentiment analysis task is emotion detection - a process for extractingﬁner-grained emotions such as happiness, anger, and fear from human language. These emotions, inturn, can shed light on individuals’ beliefs, behaviors, or mental states. Both polarity classiﬁcationand emotion detection have proven to yield valuable information in diverse applications. Researchin marketing, for example, has shown that emotions that users express in online product reviewsaﬀect products’ virality and proﬁtability (Chitturi et al. 2007, Ullah et al. 2016, Adamopouloset al. 2018). In ﬁnance, Bellstam et al. (2020) extracted sentiments from ﬁnancial analysts’ textualdescriptions of ﬁrm activities, and used those sentiments to measure corporate innovation. Inpsychology, sentiment analysis has been used to detect distress in psychotherapy patients (Shapiraet al. 2020), and to identify speciﬁc emotions that might be indicative of suicidal intentions (Desmetand Hoste 2013). Notably, recent studies suggest that the capacity to identify certain emotions(e.g., fear or distress) can contribute towards the understanding of individuals’ behaviors and mentalhealth in the Covid-19 pandemic (Ahorsu et al. 2020, Pfeﬀerbaum and North 2020).The literature oﬀers a considerable number of methods and models for sentiment analysis, witha strong bias towards polarity detection. Models for emotion detection, though less common, arealso accessible to the research community in multiple languages. As yet, however, emotion de-tection models do not support the Hebrew language. In fact, to our knowledge, only one studythus far has developed a Hebrew-language model for sentiment analysis of any kind (speciﬁcally,polarity classiﬁcation; Amram et al. (2018)). Notably, existing sentiment analysis methods devel-oped for other languages are not easily adjustable to Hebrew, due to unique linguistic and cultural2eatures of this language. A key challenge in the development of Hebrew-language sentiment anal-ysis tools relates to the fact that Hebrew is a Morphologically Rich Language (MRL), deﬁned asa language “in which signiﬁcant information concerning syntactic units and relations is expressedat word-level” (Tsarfaty et al. 2010). In Hebrew, as in other MRLs (e.g., Arabic), grammaticalrelations between words are expressed via the addition of aﬃxes (suﬃxes, preﬁxes), instead of theaddition of particles. Moreover, the word order in Hebrew sentences is rather ﬂexible. Many wordshave multiple meanings, which change depending on context. Further, written Hebrew containsvocalization diacritics, known as

Niqqud (“dots”), which are missing in non-formal scripts; otherHebrew characters represent some, but not all of the vowels. Thus, it is common for words thatare pronounced diﬀerently to be written in the same way. These unique characteristics of Hebrewpose a challenge in developing appropriate Hebrew NLP models. Architectural choices should bemade with care, to ensure that the features of the language are well represented. The current bestpractice for Hebrew NLP is the use of the multilingual BERT model (mBERT, based on the BERT[Bidirectional Encoder Representations from Transformers] architecture, discussed further below;Devlin et al. (2018)), which was trained on a small size Hebrew dictionary. When tested on Arabic(the closest language to Hebrew), mBERT was shown to have signiﬁcantly lower performance thana language-speciﬁc BERT model on multiple language tasks (Antoun et al. 2020).This paper achieves two main goals related to the development of Hebrew-language sentimentanalysis capabilities. First, we pre-train a language model for modern Hebrew, called

HeBERT ,which can be implemented in diverse NLP tasks, and is expected to be particularly appropriatefor sentiment analysis (as compared with alternative model architectures). HeBERT is based onthe well-established BERT architecture (Devlin et al. 2018); the latter was originally trained forthe unsupervised ﬁll-in-the-blank task (known as Masked Language Modeling - MLM (Fedus et al.2018)). We train HeBERT on two large-scale Hebrew corpuses – Hebrew Wikipedia and OSCAR(Open Super-large Crawled ALMAnaCH corpus, a huge multilingual corpus based on open webcrawl data, (Ortiz Su´arez et al. 2020). We then evaluate HeBERT’s performance on ﬁve key NLPtasks, namely, ﬁll-in-the-blank, out-of-vocabulary (OOV), Name Entity Recognition (NER), Partof Speech (POS), and sentiment (polarity) analysis. We examine several architectural choices forour model and put forward and test hypotheses regarding their relative performance, ultimatelyselecting the best-performing option. Speciﬁcally, we show that while model complexity correlates3ith high performance on language tasks that aim to understand terms in a sentence, a more-parsimonious model better captures the sentiment of an entire sentence.Second, we develop a tool to detect sentiments – speciﬁcally, polarity and emotions – fromuser-generated content (UGC). Our sentiment detector, called

HebEMO , is based on HeBERTand operates on a document level. We apply HebEMO to user-generated comments, from threemajor news sites in Israel, that were posted in response to Covid-19-related articles during 2020.We chose this dataset on the basis of ﬁndings that the Covid-19 pandemic intensiﬁed emotions inmultiple communities (Pedrosa et al. 2020), suggesting that online discourse regarding the pandemicis likely to be highly emotional. Comments were selected for annotation following an innovativesemi-supervised iterative labeling approach that aimed to maximize predictability.We show that HebEMO achieves a high performance of weighted average F1-score = 0.96for polarity classiﬁcation. Emotion detection reaches F1-scores of 0.8-0.97 for the various targetemotions, with the exception of surprise , which the model failed to capture (F1 = 0.41). Theseresults are better than the best reported performance, even when compared to English-languagemodels for emotion detection (Ghanbari-Adivi and Mosleh 2019, Mohammad et al. 2018).The remainder of this paper is organized as follows. In the next section, we provide a briefoverview of the state of the art in sentiment analysis in general and emotion recognition in par-ticular; we also brieﬂy discuss considerations that must be taken into account when developingpre-trained language models for sentiment analysis. Next, we present HeBERT, our languagemodel, elaborating on how we address some of the unique challenges associated with the Hebrewlanguage. We subsequently describe HebEMO and evaluate its performance on our UGC data.

2. Background

Psychologists and psychoanalysts have long known that, despite the importance of non-verbalbehavior, words are the most natural way to externally express an inner emotional world (Ortonyet al. 1987). In line with this premise, theories of emotions stress that emotional experience andits intensity can be inferred from spoken or written language (Argaman 2010). Yet, emotionsvary across cultures (Rosaldo et al. 1984), and, consequently, languages diﬀer in the degree ofemotionality they convey and in the ways in which emotions are expressed in words (Wierzbicka4994, K¨ovecses 2003). In particular, as noted by K¨ovecses (2003), the verbalization of emotionscommonly relies on the use of metaphorical and metonymic expressions, which may diﬀer acrosslanguages. Religion is another source of variation in emotional experience and its associated ex-pression Kim-Prieto and Diener (2009). One study showed how the moral system of a culture - andspeciﬁcally, a Middle Eastern culture - can be linked to certain types of emotions, and suggestedthat diﬀerences in culturally dominant emotions can play a decisive role in cultural clashes (Fattahand Fierke 2009).The above discussion implies that emotion detection tools that are implemented in one languagemight not be easily transferable to other languages, particularly languages that are culturallydistant. Accordingly, sentiment analysis tools must be tailored to speciﬁc language models in orderto provide informative results. The current paper proposes such a tool for the Hebrew language– one that takes into account speciﬁc linguistic challenges associated with Hebrew, elaborated insubsequent sections.

Many studies oﬀer comprehensive overviews of common sentiment analysis methods (e.g., Liuet al. 2019, Hemmatian and Sohrabi 2019, Yue et al. 2019, Yadav and Vishwakarma 2020). Wepresent here the main points, with an emphasis on models that form the basis of this study. Mostof the models described below were developed primarily for polarity analysis; however, as noted inthe following subsection, the architectures are applicable to other sentiment analysis tasks such asemotion detection.Current reviews on sentiment analysis tend to categorize the various approaches according tothe granularity level of text that they accommodate (Liu et al. 2019): document level , that is,evaluating whether an entire document expresses a particular type of sentiment (e.g., positive ornegative); sentence level – assigning a sentiment to each sentence in the document separately; and aspect level , that is, assigning sentiment to each “aspect” discussed in the text. The latter requires apre-processing step to extract aspects from a written text. In this paper we follow a document-levelapproach, elaborated further below.Sentiment classiﬁcation approaches can further be categorized according to their underlyingmethodologies. The ﬁrst, and perhaps the most popular, methodology is the lexicon-based ap-proach. Based on the theory of emotions, this approach uses sentiment terms to score emotions in5n input text.

Linguistic inquiry and word count (LIWC) , for example, is a popular software pro-gram that was developed to assess (among other features) emotions in text, using a psychometricallyvalidated internal dictionary (Pennebaker et al. 2001). The main advantage of the lexicon-basedapproach is that it is unsupervised , meaning that it can be applied without any training or labeleddata (Yue et al. 2019). The main limitation of this approach is that is does not account for thecontext of terms in the lexicon, and thus overlooks complex linguistic features such as sarcasm, am-biguity, and idioms (Liu 2012). Accordingly, its accuracy is fairly low compared with the alternativeapproaches.The second sentiment classiﬁcation approach is Deep Learning (DL)-based. DL approaches aresupervised methods that are based on multiple-layer neural networks. DL-based sentiment classiﬁ-cation models diﬀer by their network architecture. Common architectures include the following: (1)

Convolutional Neural Networks (CNNs) , which transform a structured input layer (e.g., sentencesor documents represented as bag-of-words or word-embedding vectors), via convolutional layers,into a sentiment class (Kim 2014); (2)

Recursive or Recurrent Neural Networks (RNNs) , whichhandle unstructured sequential data, such as textual sentences, and learn the relations betweenthe sequential elements (Dong et al. 2014); and (3)

Long Short-Term Memory (LSTM) , a popularvariant of RNN, which can catch long-term dependencies between data segments, in one direction(e.g., left to right) or in both (denoted bidirectional LSTM, or

BiLSTM architecture) (Hochreiterand Schmidhuber 1997).In a recent paper, Amram et al. (2018) raised the question of the relationship between the char-acteristics of a language and the DL architectural choices of a sentiment classiﬁer. They analyzedthis question for the morphologically rich Hebrew language. Speciﬁcally, they compared the per-formance of CNN and BiLSTM architectures on a polarity classiﬁcation task. They assumed thatthe latter method would implicitly capture main morphological signatures, and thus outperformthe former. Interestingly, and in contrast to ﬁndings in English sentiment analysis (Yin et al. 2017,Acheampong et al. 2020), they found that CNN yielded overall better performance (accuracy =0.89) than BiLSTM, even when the latter was trained on morphologically segmented inputs. Asfar as we know, this is the only paper that developed and evaluated a sentiment analysis model forthe Hebrew language.The last sentiment classiﬁcation method, which we adopt in this paper, is the transfer learning-6ased approach. Transfer learning is the act of carrying knowledge gained from one problem andapplying it to another, similar problem (Pan and Yang 2009). In NLP, transfer learning is imple-mented via

Transformers (Tay et al. 2020). Similarly to RNN, transformers use a DL approach toprocess sequential data. The primary advantage of the Transformer is its unique attention mech-anism, which eliminates the need to process data in order, and allows for parallelization (Vaswaniet al. 2017). With Transformers, a target language is ﬁrst algorithmically learned, irrespective ofthe target language task (e.g., sentiment analysis task). To this end, a language model is trainedon a pre-selected unsupervised NLP task (see section 3 for details) . Then the language model istransferred to the target task. This process is called ﬁne-tuning .Various pre-trained language models have been used in transfer learning for NLP; these includefastText (Joulin et al. 2016), ELMo (Embeddings from Language Models, based on forward andbackward LSTMs) (Peters et al. 2018), GPT (Generative Pre-trained Transformer) (Radford et al.2018), and BERT (Devlin et al. 2018). Of these, BERT is one of the most common Transformermodels for NLP. For sentiment analysis tasks, BERT models - and Transformer models in general- are widely used and produce the best results compared with alternatives (Zampieri et al. 2019,Patwa et al. 2020). For the Hebrew language, the only BERT model available is mBERT (Devlinet al. 2018), which was trained on a small-sized Hebrew dictionary (about 2000 tokens). Notably,for the Arabic language, which is the closest MRL to Hebrew, Antoun et al. (2020) showed thata pre-trained Arabic BERT model achieved better performance on polarity analysis than did anyother architecture (improvement of 1% to 6% in accuracy). The Arabic-speciﬁc model also achievedbetter performance compared with mBERT.

Emotion recognition is a sub-task in sentiment analysis that oﬀers a ﬁner granularity sentimentlevel compared with polarity analysis. Two deﬁnitions of human emotions dominate the NLPliterature, with no clear preference between them (Kratzwald et al. 2018). The ﬁrst deﬁnition, basedon a theory developed by Ekman (1999), considers emotions as distinct categories, meaning thateach emotion diﬀers from the others in important ways rather than simply their intensity. Ekman(1999) identiﬁed six basic emotions, consistent across cultures, that ﬁt facial expressions: anger,disgust, fear, happiness, sadness and surprise. The second deﬁnition is based on a theory by Plutchik(1980), who stressed that emotions can be treated as dimensional constructs, and that there are7elations between occurrences and intensities of basic emotions. In particular, Plutchik (1980)deﬁned a “wheel” comprising four polar-pairs of basic emotions: joy–sadness, anger–fear, trust–disgust, and surprise–anticipation. Combinations of dyads or triads of emotions deﬁne another setof 56 emotions. For example, envy is a combination of sadness and anger . This wheel serves asthe theoretical basis of common automated emotion detection algorithms (Medhat et al. 2014).Notably, for the purpose of emotion detection, the two conceptualizations of emotion are generallycompatible with each other, as they agree on the set of emotions deﬁned as “basic” emotions.Though common, emotion recognition is not as widespread as polarity analysis, and it is con-sidered more challenging (Acheampong et al. 2020). A key challenge is that, whereas any textcan be classiﬁed according to its polarity, not all texts contain emotions, and thus it is harder toinfer emotions via a lexicon-based approach. This challenge is further compounded by the factthat labeled data are commonly not available. Further, existing datasets are rather imbalanced.Naturally, the lack of data availability is more severe in non-English languages (Ahmad et al. 2020).In general, the emotion detection task is treated as a multi-label classiﬁcation task, and modelsfor emotion recognition are similar in architecture to polarity detection models. Recent researchhas shown that, in emotion detection tasks, pre-trained BiLSTM architectures provide advantagesover CNN and unidirectional RNN models (Acheampong et al. 2020), and that Transformers arepreferable to other DL approaches (Chatterjee et al. 2019, Zhong et al. 2019). For example, ina recent SemEval competition (Chatterjee et al. 2019) that included an emotion detection taskfor three emotions (angry, happy, sad), Transformer-based models were shown to give the bestperformance (performance ranges: F1-Score = 0.75 - 0.8; precision = 0.78 - 0.85; recall = 0.78 -0.85).

As noted above, transfer learning for polarity analysis and/or emotion recognition requiresa pre-trained language model. To develop and train a language model, one needs to make thefollowing three basic decisions:1. Input representation (tokenization): What is the granularity of the tokens that are fed tothe model? Common granularity levels include characters, n-gram-based sub-words (usingWordPiece algorithm (Schuster and Nakajima 2012)), morpheme-based sub-words, and full8ords (see Figure 1 for the diﬀerences between the approaches).2. Architectural choices: What is the exact architecture and speciﬁcation of the neural network?3. Output: What is the (unsupervised) task that the model is trained on?

Figure 1: Input representation alternatives

Regarding input representation , the choice of representation aﬀects the features that the languagemodel is able to capture, and the training complexity. Character-based representation is betterfor learning word-morphology, especially for low-frequency words and MRLs (Belinkov et al. 2017,Vania et al. 2018), but it comes with longer training time and a deeper architecture, comparedwith other representations (Bojanowski et al. 2015). Word-based representation, in turn, treatseach word as a separate token, and thus is considered better for understanding semantics (Potaet al. 2019). With this representation, however, words that diﬀer by preﬁx or suﬃx are con-sidered diﬀerent, necessitating storage of a very large vocabulary. Moreover, out-of-vocabulary(OOV) tokens are not represented. The intermediate option is to use a sub-word representation,which provides some balance between the character- and word-based representations; moreover, itovercomes the OOV problem associated with the word-based representation, and its vocabularyrequirements are more manageable (Wu et al. 2016). With sub-words, words can be broken into9ither n-gram characters, or according to morphemes that have lingual meaning (but also highercomputational costs). Previous literature has produced mixed results regarding to the extent towhich using a morpheme-based approach can improve upon the n-gram-based approach (Bareketand Tsarfaty 2020). Recently, Klein and Tsarfaty (2020) showed that sub-word splitting in the mul-tilingual BERT model (mBERT, Devlin et al. (2018)) is sub-optimal for capturing morphologicalinformation.For the question of architecture selection , Devlin et al. (2018) and Radford et al. (2019) showedthat for similar model size, BERT outperforms other architectures such as GPT and ELMo onsentiment tasks.With respect to the model output , there are two tasks on which a model can be trained. Theﬁrst is predict-the-future , meaning that the model is trained to predict the last token of a sentence.This task accounts for uni-directional contexts only. The second is the ﬁll-in-the-blank task, wherethe model is trained to ﬁll in a missing token within a sentence. This task takes into accountthe full (bi-directional) sentence context, and is able to better capture the meanings of tokens,both syntactically and semantically (Devlin et al. 2018). Recently, Levine et al. (2020) oﬀered amethod to optimize these tasks, called

Pointwise Mutual Information (PMI) masking . The authorssuggested that instead of ﬁlling in a single random token, the model should be trained to ﬁll in aset of tokens that carry mutual information.

3. HeBERT: Language Model

In this section we develop an unsupervised Hebrew BERT model, which we will later ﬁne-tunefor the tasks of polarity analysis and emotion recognition.

We begin by addressing the three key modeling decisions outlined in the previous section - inputrepresentation (tokenization), architecture, and output - in the context of the Hebrew language.Recall that, as discussed in the introduction, Hebrew is an MRL with the following importantcharacteristics: (i) grammatical relations in Hebrew are expressed via the addition of aﬃxes; (ii)Hebrew sentences are nearly order-free; (iii) many Hebrew words have multiple meanings, whichchange depending on context; (iv) Hebrew contains vocalization diacritics that are missing in non-10ormal scripts, implying that words that are pronounced diﬀerently can be written in the sameway.Bearing these features in mind, we ﬁrst address the last two questions, of architectural choice andmodel output. As discussed in previous sections, BERT has been shown to outperform alternativearchitectures in sentiment analysis tasks (Radford et al. 2019); moreover, the literature oﬀersevidence that BERT networks eﬀectively capture linguistic information and phrase-level information(Jawahar et al. 2019), a necessary requirement for MRLs (Tsarfaty et al. 2020). Accordingly, wedecided to use BERT as our base model, with the default architecture. For the output task, weused BERT’s default ﬁll-in-the-blank task.

Fill-in-the-blank has the advantage of understandingbi-directional context, which corresponds to the order-free property of Hebrew sentences.With respect to the input - the granularity of the tokens - the literature on MRLs, and Hebrewspeciﬁcally, is inconclusive. Belinkov et al. (2017) and Vania et al. (2018) showed that character-based representation, which is becoming increasingly popular, is better than word-based represen-tation for learning Hebrew morphology, especially for low-frequency words. For sentiment tasks,however, Amram et al. (2018) and Tsarfaty et al. (2020) showed that a word-based representationyields better predictions than a char-based representation. With regard to sub-word represen-tations, Klein and Tsarfaty (2020) suggested (but did not verify) that, for BERT for Hebrew,morpheme-based sub-words are likely to be preferable to n-gram-based sub-words. A similar argu-ment was made for Arabic, which is the closest MRL language to Hebrew (Antoun et al. 2020).To understand what causes diﬀerences in ﬁndings between diﬀerent researchers, consider thefollowing three examples:1. First, is the word

NA’AL . NA’AL can be translated as either locked (e.g., he locked thedoor), a shoe , or the past, singular, tense of the verb wearing (a shoe). It is also often usedas a slang term for stupid . The actual semantic meaning of

NA’AL in a sentence is derivedfrom the context. In that respect, a high-level text granularity (such as a word-basedrepresentation) might be the preferable choice for representing Hebrew, as it is better incapturing semantic meanings in context (Pota et al. 2019).2. Next, is the word

NA’ALO , which is an inﬂection of the word

NA’AL with the suﬃx ”O”.

NA’ALO can refer to either “his shoe” or “locked it”. In that respect, a ﬁner text granu- arity , such as char-based, which is better at learning morphology, might be preferred.3. Finally, consider the splitting of the word NA’ALO . Here, a meaningful splitting would be

NA’AL-O . However, such a splitting can be only achieved with morpheme-based sub-words ,using a tool such as YAP (Yet Another Parser, by More et al. (2019)). The alternative, n-gram-based sub-words , will result in additional splitting, which might have lower semanticmeaning than morpheme-based sub-words, yet higher robustness to OOV.Given the above discussion, we hypothesize that sub-word representations (n-gram- or morpheme-based representation), which balance semantic meaning with morphology, will best capture thefeatures of the Hebrew language, and will yield better performance for various language tasks, ascompared with character-based and word-based representations. Comparing n-gram-based sub-words with morpheme-based sub-words, we expect the latter to have an advantage on token-leveltasks that require a good “understanding” of the language features; yet, a morpheme-based repre-sentation might not have such an advantage in document-level downstream tasks.To examine our hypothesis, we ﬁrst train and evaluate multiple small-size BERT models thatdiﬀer by the granularity of the input. We then choose the best-performing architecture, and re-trainthe model on a much larger corpus.

We examine ﬁve alternative text representations: char-based; two n-gram-based sub-word repre-sentations, which diﬀer in the total vocabulary size (30K tokens vs. 50K tokens); a morpheme-basedsub-word representation; and a word-based representation, which considers all words in the corpus,after trimming terms in the lowest 5 th quantile according to their term frequency (vocabulary sizeof over 53K tokens).To compare between the input alternatives, we ﬁrst train small-sized base-BERTs on a HebrewWikipedia dump . Our working assumption is that the performance of a small-sized BERT ismonotonic with the model’s performance when trained on a larger corpus with the same parameters,yet requires signiﬁcantly fewer resources. We evaluate the models’ performances on two commonunsupervised language tasks and on three downstream tasks: As of September 2013; retrieved from https://u.cs.biu.ac.il/ yogo/hebwiki/. The dataset includes over 63 millionwords and 3.8 million sentences.

12. Unsupervised language tasks:(a) Fill-in-the-blank - the ability to ﬁll in a missing token; tested on a newspaper arti-cle and a fairy-tale dataset . Performance was measured with sequence perplexity( P P ( W )) - a common measure to examine the ability of a language model to evalu-ate the correctness of sentences in a sample set. Perplexity of a sequence W with N tokens ( W = { w , w , ..., w n } ) is calculated as the exponential average log-likelihoodof the sequence ( P P ( W ) = exp {− N (cid:80) Ni log p θ ( w i | w

In line with the speciﬁcations outlined above, we trained a large-size BERT on both theWikipedia corpus and an OSCAR corpus (Ortiz Su´arez et al. 2020), with a small-size n-gram-based sub-word dictionary. For the Hebrew language, OSCAR contains a corpus of size 9.8 GB,including 1 billion words and over 20.8 million sentences (after de-duplicating the original data).We used a Pytorch implementation of Transformers in Python (Wolf et al. 2020) to train a base-BERT network for 4 epochs, with learning rate = 5e-5, using the Adam optimizer in batches of 128sentences each. 14he performance of the ﬁnal model is reported in Table 2, and compared to the performanceof (i) the (non-BERT) model reported in Amram et al. (2018) and More et al. (2019) and Bareketand Tsarfaty (2020), the only other model developed for NLP tasks in Hebrew (denoted SOTA, or“state of the art”); and (ii) mBERT. The best results for each task are in bold.Task Fill-in-the-blank OOV NER POS Polarity analysisMetric (Perplexity) (%) (F-1 score) (F-1 score) (F-1 score)

HeBERT 3.24 ∼ Current SOTA (Not reported)

8% 0.84

Table 2: HeBERT performance, compared to alternative models. Best results for each task are in bold.

The results show that while mBERT outperformed HeBERT in an unsupervised task (ﬁll-in-the-blank), HeBERT performed better on supervised tasks, even when compared to the currentSOTA. Of note, mBERT contains only 2,000 tokens in Hebrew (compared to 30K in HeBERT).HeBERT’s higher performance in supervised tasks is thus not surprising.

4. HebEMO: A Model for Polarity Analysis and Emotion Recognition

In this section we develop HebEMO - a model for sentiment analysis, including polarity analysisand emotion recognition. HebEMO, which is based on HeBERT, predicts sentiments at a documentlevel; as elaborated in what follows, in our case a “document” is a single user-generated comment ona news website. The development of the model is based on three main elements: (i) data collection;(ii) data annotation; and (iii) ﬁne-tuning of HeBERT.

The data collected for this study were compiled from user comments that were posted to Israelinews websites in response to Covid-19-related articles, during the Covid-19 pandemic (Jan-Dec,2020) - a highly emotional period (Pedrosa et al. 2020).Our selection of news sites was inspired by a 2016 statement by Israel’s president, Reuven(Rubi) Rivlin, according to which Israeli society is composed of four equal-sized “tribes” which areculturally diﬀerent (and hence might express emotions slightly diﬀerently); of these, three compriseHebrew-speaking Jews - namely, secular, national-religious, and ultra-Orthodox (“Haredi”) - and15he fourth “tribe” is Israel’s Arab population (Steiner 2016). Each group is represented in bothpolitics and the media.Accordingly, we collected data from three popular Israeli news sites that, respectively, representthe three Hebrew-speaking “tribes”. Speciﬁcally, our dataset contained all Covid-19-related articlesfrom

Ynet , which is identiﬁed with the secular “tribe” (with a slight left-wing political leaning); Israel Hayom (translation: “Israel Today”), which is identiﬁed with the national-religious “tribe”(with a slight right-wing political leaning), and Be-Hadre Haredim (translation: “In Haredis’Rooms”), which represents the ultra-Orthodox group.For each article, we collected the article’s text, its date of publication, the section in the newssite in which it was published (e.g., news, health, sports), the author, and the comments section. Weexcluded from the dataset comments that did not contain Hebrew words, and comments with fewerthan 3 words. We further merged repeated consecutive characters (e.g., three or more identicalpunctuation symbols) and removed links and double spaces. The compiled corpus, summarized inTable 3, contained over half a million comments on 10,794 titles in various sections. source section Table 3: Description of the collected data igure 2: Iterative annotation process We annotated a total of 4,000 comments. Comments were selected for annotation followingactive learning principles (Li et al. 2012) to minimize the well-known imbalance problem in theemotion recognition literature (Acheampong et al. 2020). The annotation process we used is de-scribed below and illustrated in Figure 2.Our iterative process was initialized in step 1 with a naive unsupervised lexicon-based ap-proach. For this step, we Google-translated EmoLex: a freely-available English-language polarityand emotion dictionary (Mohammad and Turney 2013). EmoLex contains a list of manually col-lected (via crowdsourcing) English words classiﬁed according to one or more of the eight basicemotions and two polarity values (positive and negative). We then used the translated dictionariesto score the entire set of lemmatized comments in our dataset. Lemmatization was achieved withUDPipe (Straka et al. 2016).In step 2 , given the initial sentiment scores generated in step 1, we selected a set of 150comments, of which 75 comments had received the highest positive polarity scores, and 75 hadreceived the highest negative polarity scores. Similarly, for each of the eight emotions, we selecteda set of 75 comments in which the emotion was highly expressed, and another 75 comments in whichthe emotion was not expressed. The resulting set, after removing duplicate comments, compriseda total of 1,500 initially labeled comments. 17e then turned to Proliﬁc , a trusted online labor and research platform, to manually re-annotate the 1,500 comments. Each comment was annotated by at least three distinct nativeHebrew-speaking Proliﬁc workers. Speciﬁcally, annotators were asked to rate individual comments’polarity on a symmetric 5-point scale of { strongly negative, negative, neutral, positive, stronglypositive } , and to rate the expression of each emotion in the comment on a polar 3-point scale of { not expressed (in the comment), expressed, strongly expressed } . The participants were given thecontext of the comment (i.e., the title of the news article on which the comment was posted). Eachparticipant annotated 20 randomly selected comments.The reliability levels of workers’ annotations were then computed with Krippendorﬀ’s alpha(Krippendorﬀ 1970), a measure of inter-rater agreement. We measured reliability independentlyfor each sentiment in a comment, using coarser sentiment scales of polarity = { positive, neutral,negative } and emotion = { expressed, not expressed } . For example, if two raters, i and j , rated theemotion “anger” in a comment c as L ic,anger = “expressed” and L jc,anger = “strongly expressed”,we computed their mutual response as “agreement” (formally, the observed agreement betweenthe raters was δ ( L ic,anger , L jc,anger ) = 0). If the ratings were L ic,anger = “expressed” (or “stronglyexpressed”) and L jc,anger = “not expressed”, we computed the raters’ mutual response as “dis-agreement” ( δ ( L ic,anger , L jc,anger ) = 1). We then excluded comments’ sentiment annotation withKrippendorﬀ’s alpha lower than 0.75.In step 3 , we trained an initial HeBERT-based sentiment (supervised) classiﬁer (see detailsin Section 4.3) on the crowd-annotated data, and predicted polarity and emotion scores for theremainder of the corpus. We then repeated steps 2 and 3 until the performance of our classiﬁerconverged. Convergence occurred after three iterations, and a total of 4,000 partially labeledcomments (partially means that the raters agreed on at least one sentiment). Tables 4 and 5summarize the number of comments for each sentiment (polarity and emotion, respectively) forwhich there was high agreement among raters, and the percentage of the comments that expressthis sentiment. For example, the expression/non-expression of the emotion “anger” was labelledin 1,979 distinct comments; among these, “anger” was expressed in 78% of the comments, and in22% it was not expressed. Table 4: Summary of the polarity data

Emotion

Table 5: Summary of the emotion data

Interestingly, though we attempted to balance the expression and non-expression of each sen-timent in our labelled data, our raters had signiﬁcantly lower agreement on positive sentiments -speciﬁcally, positive polarity, expression of happiness, surprise, and trust, and non-expression ofanger and disgust. In line with the theory of Plutchik (1980), we observed high negative corre-lation between emotions that are located opposite each other in Plutchik’s wheel of emotion, andpositive correlation between closely related emotions (see Table 6). The ﬁnal classiﬁcation model

Anger Disgust Anticipation Fear Joy Sadness Surprise Trust PolarityAnger 1.00Disgust 0.46 1.00Anticipation 0.10 0.09 1.00Fear 0.15 0.11 0.14 1.00Joy 0.25 0.27 0.12 0.11 1.00Sadness 0.21 0.16 0.13 0.28 0.12 1.00Surprise 0.06 0.04 0.10 0.15 0.05 0.12 1.00Trust 0.27 0.31 0.11 0.07 0.41 0.08 0.07 1.00Polarity 0.47 0.44 0.11 0.09 0.36 0.14 0.05 0.40 1.00

Table 6: Pearson score for correlation among emotions identiﬁed by human raters was denoted

HebEMO . 19 .3. Fine-Tuning of HeBERT: The Classiﬁcation Model

We modeled our classiﬁcation algorithm by ﬁne-tuning HeBERT for a document-level classiﬁca-tion task. Prediction probabilities were computed with a softmax activation function. We treatedthe polarity task as a multinomial problem with three classes (positive, neutral, negative); emotionswere modeled as independent dichotomous classiﬁcation tasks (expressed, not expressed), as mul-tiple emotions can co-exist in a single comment. Attempts to merge emotion pairs (e.g., joy-sad)into a single classiﬁcation category yielded lower performance. To train and evaluate our model,we randomly partitioned the corpus into training (70%), validation (15%), and test (15%) sets. Inorder to avoid data leakage, the tokenization process (in HeBERT) was not trained on the UGCdataset. We repeated the training and evaluation process following a bootstrap approach with 50samples (each generated a diﬀerent data partition) and examined the stability of our results.

5. Results

We applied HebEMO to our annotated dataset and examined its performance, as measured byprecision, recall, F1-score and overall accuracy of the expressed sentiment. Table 7 presents theperformance of our model on the polarity task, and Table 8 presents the performance for emotionrecognition. The weighted average performance across all sentiments is F1-score = 0.931, andoverall accuracy = 0.91. With the exception of the emotion “surprise” , the performance of themodel ranges between F1-score and accuracy of 0.78-0.97. These performance levels, as far aswe know, exceed those of state-of-the-art English-language models for UGC emotion recognition(Ghanbari-Adivi and Mosleh 2019, Mohammad et al. 2018).The emotion “surprise” is known to be hard to detect. As mentioned in Zhou et al. (2020),the best reported F1-score for this emotion in English was found to be as low as 0.19 (Mohammadet al. 2018). In our dataset, the amount of labeled data for “surprise” - as well as for its opposingcounterpart on the wheel of emotion, “anticipation” (Plutchik 1980) - was also the lowest amongall emotions (see Table 5), implying that this pair is a challenging labeling task even for humanannotators.Next, we re-trained HebEMO on the polarity data reported by Amram et al. (2018). Amramet al. (2018) collected comments that were written in response to oﬃcial tweets posted by theIsraeli president, Mr. Reuven Rivlin, between June and August, 2014 (a total of 12,804 Hebrew20recision Recall F1-scorePositive 0.96 0.92 0.94Neutral 0.83 0.56 0.67Negative 0.97 0.99 0.98Accuracy 0.97

Table 7: HebEMO performance on polarity task in the UGC data

F1 Precision Recall AccuracyAnger 0.97 0.97 0.97 0.95Disgust 0.96 0.97 0.95 0.93Anticipation 0.85 0.83 0.87 0.84Fear 0.80 0.84 0.77 0.80Joy 0.88 0.89 0.87 0.97Sadness 0.84 0.83 0.84 0.79Surprise 0.41 0.47 0.37 0.78Trust 0.78 0.88 0.70 0.95

Table 8: HebEMO performance on emotion detection task in the UGC data comments). The authors manually annotated the comments with the following labels - supportive(positive), criticizing (negative), or oﬀ-topic (neutral) comments - and published a partitioneddataset (training and validation) for the beneﬁt of comparisons between language models.The performance of our model is presented in Table 9, along with the improvement/ deterio-ration in performance as compared with the SOTA model reported in Amram et al. (2018). Theresults show that, in most aspects, with the exception of oﬀ-topic precision, our model’s perfor-mance exceeds that of the SOTA model. The improvement is signiﬁcant at the 95% conﬁdencelevel. Precision Recall F1-scorePositive 0.95 (+.03) (+.01) (+.01)

Negative 0.89 (+.05) (+.02) (+.04)

Oﬀ-topic 0.70 (-.3) (+.55) (+.03)

Accuracy 0.93 (+.03)

Table 9: The performance of HebEMO when trained on the polarity corpus reported by Amram et al. (2018) . Summary and Discussion This paper presented two new tools that contribute to the development of Hebrew-languagesentiment analysis capabilities: (i)

HeBERT - the ﬁrst Hebrew BERT model, and a new state-of-the-art model for multiple Hebrew NLP tasks; and (ii)

HebEMO - a tool for polarity analysis andemotion recognition from Hebrew UGC.Although HeBERT was developed for the purpose of optimizing sentiment analysis, we showedthat it outperforms mBERT in a variety of supervised language tasks. This ﬁnding is consistentwith the literature that proposes that language-speciﬁc models are better than multilingual model.HeBERT also showed better performance than the current (non-BERT) SOTA Hebrew-languagemodel.For the task of extracting sentiments from UGC, we showed that a morpheme-based model,which aims to “understand” features of the language, performed less well than a model that didnot address the language features (ngram-based sub-words). For the latter input representation,a smaller-size dictionary was better than the larger-size dictionary. A plausible explanation forthese results is that UGC contains unoﬃcial language, which includes non-lexical words such asslang words and typos. Over-ﬁtting a model to a language in this case may overlook the uniquecharacteristics of the unoﬃcial language. In future work we plan to examine the performance ofHebEMO when HeBERT is trained on a PMI masking task, rather than ﬁll-in-the-blank.22 eferences

Acheampong FA, Wenyu C, Nunoo-Mensah H (2020) Text-based emotion detection: Advances, challenges,and opportunities.

Engineering Reports e12189.Adamopoulos P, Ghose A, Todri V (2018) The impact of user personality traits on word of mouth: Text-mining social media platforms.

Information Systems Research

Expert Systems with Applications

International journal of mental health and addiction .Amram A, David AB, Tsarfaty R (2018) Representations and architectures in neural sentiment analysis formorphologically rich languages: A case study from modern hebrew.

Proceedings of the 27th Interna-tional Conference on Computational Linguistics , 2242–2252.Antoun W, Baly F, Hajj H (2020) Arabert: Transformer-based model for arabic language understanding. arXiv preprint arXiv:2003.00104 .Argaman O (2010) Linguistic markers and emotional intensity.

Journal of psycholinguistic research arXiv preprintarXiv:2007.15620 .Belinkov Y, Durrani N, Dalvi F, Sajjad H, Glass J (2017) What do neural machine translation models learnabout morphology? arXiv preprint arXiv:1704.03471 .Bellstam G, Bhagat S, Cookson JA (2020) A text-based analysis of corporate innovation.

ManagementScience .Bojanowski P, Joulin A, Mikolov T (2015) Alternative structures for character-level rnns. arXiv preprintarXiv:1511.06303 .Chatterjee A, Narahari KN, Joshi M, Agrawal P (2019) Semeval-2019 task 3: Emocontext contextual emotiondetection in text.

Proceedings of the 13th International Workshop on Semantic Evaluation , 39–48.Chitturi R, Raghunathan R, Mahajan V (2007) Form versus function: How the intensities of speciﬁc emo-tions evoked in functional versus hedonic trade-oﬀs mediate product preferences.

Journal of marketingresearch

Expert Systems with Applications arXiv preprint arXiv:1810.04805 .Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for target-dependent twitter sentiment classiﬁcation.

Proceedings of the 52nd annual meeting of the associationfor computational linguistics (volume 2: Short papers) , 49–54.Ekman P (1999) Basic emotions.

Handbook of cognition and emotion

European journal of international relations arXiv preprintarXiv:1801.07736 .Ghanbari-Adivi F, Mosleh M (2019) Text emotion detection in social networks using a novel ensembleclassiﬁer based on parzen tree estimator (tpe).

Neural Computing and Applications

Artiﬁcial Intelligence Review

Neural computation oulin A, Grave E, Bojanowski P, Douze M, J´egou H, Mikolov T (2016) Fasttext.zip: Compressing textclassiﬁcation models. arXiv preprint arXiv:1612.03651 .Kim Y (2014) Convolutional neural networks for sentence classiﬁcation. arXiv preprint arXiv:1408.5882 .Kim-Prieto C, Diener E (2009) Religion as a source of variation in the experience of positive and negativeemotions. The Journal of Positive Psychology

Proceedings of the 17th SIGMORPHON Workshop on Computational Researchin Phonetics, Phonology, and Morphology , 204–209.K¨ovecses Z (2003)

Metaphor and emotion: Language, culture, and body in human feeling (Cambridge Uni-versity Press).Kratzwald B, Ili´c S, Kraus M, Feuerriegel S, Prendinger H (2018) Deep learning for aﬀective computing:Text-based emotion recognition in decision support.

Decision Support Systems

Edu-cational and Psychological Measurement arXiv preprint arXiv:2010.01825 .Li S, Ju S, Zhou G, Lin X (2012) Active learning for imbalanced sentiment classiﬁcation.

Proceedings of the2012 Joint conference on empirical methods in natural language processing and computational naturallanguage learning , 139–148.Liu B (2012) Sentiment analysis and opinion mining.

Synthesis lectures on human language technologies

Mining text data , 415–463(Springer).Liu R, Shi Y, Ji C, Jia M (2019) A survey of sentiment analysis based on transfer learning.

IEEE Access

AinShams engineering journal

Proceedings of the 12th international workshop on semantic evaluation , 1–17.Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion association lexicon 29(3):436–465.Mordecai NB, Elhadad M (2005) Hebrew named entity recognition.

MONEY

Transactions of the Association forComputational Linguistics

Proceedings of the 58th Annual Meeting of the Association forComputational Linguistics , 1703–1714 (Online: Association for Computational Linguistics), URL .Ortony A, Clore GL, Foss MA (1987) The referential structure of the aﬀective lexicon.

Cognitive science

IEEE Transactions on knowledge and data engineering arXiv e-prints arXiv–2008.Pedrosa AL, Bitencourt L, Fr´oes ACF, Cazumb´a MLB, Campos RGB, de Brito SBCS, e Silva ACS (2020)Emotional, behavioral, and psychological impact of the covid-19 pandemic.

Frontiers in psychology

Mahway:Lawrence Erlbaum Associates eters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualizedword representations. arXiv preprint arXiv:1802.05365 .Pfeﬀerbaum B, North CS (2020) Mental health and the covid-19 pandemic. New England Journal of Medicine .Plutchik R (1980) A general psychoevolutionary theory of emotion.

Theories of emotion , 3–33 (Elsevier).Pota M, Marulli F, Esposito M, De Pietro G, Fujita H (2019) Multilingual pos tagging by a composite deeparchitecture based on character-level features and on-the-ﬂy enriched word embeddings.

Knowledge-Based Systems

OpenAI blog , 5149–5152 (IEEE).Shapira N, Lazarus G, Goldberg Y, Gilboa-Schechtman E, Tuval-Mashiach R, Juravski D, Atzil-Slonim D(2020) Using computerized text analysis to examine associations between linguistic features and clients’distress during psychotherapy.

Journal of counseling psychology .Sima’an K, Itai A, Winter Y, Altman A, Nativ N (2001) Building a tree-bank of modern hebrew text.

Traitement Automatique des Langues

Proceedings of the Tenth InternationalConference on Language Resources and Evaluation (LREC’16) , 4290–4297.Tay Y, Dehghani M, Bahri D, Metzler D (2020) Eﬃcient transformers: A survey. arXiv preprintarXiv:2009.06732 .Tsarfaty R, Bareket D, Klein S, Seker A (2020) From spmrl to nmrl: What did we learn (and unlearn) in adecade of parsing morphologically-rich languages (mrls)? arXiv preprint arXiv:2005.01330 .Tsarfaty R, Seddah D, Goldberg Y, K¨ubler S, Versley Y, Candito M, Foster J, Rehbein I, Tounsi L (2010)Statistical parsing of morphologically rich languages (spmrl) what, how and whither.

Proceedings of theNAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages , 1–12.Ullah R, Amblee N, Kim W, Lee H (2016) From valence to emotions: Exploring the distribution of emotionsin online product reviews.

Decision Support Systems arXiv preprint arXiv:1808.09180 .Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser (cid:32)L, Polosukhin I (2017) Attentionis all you need.

Advances in neural information processing systems , 5998–6008.Wierzbicka A (1994) Emotion, language, and cultural scripts. .Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, DavisonJ, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, LhoestQ, Rush AM (2020) Transformers: State-of-the-art natural language processing.

Proceedings of the2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , 38–45 (Online: Association for Computational Linguistics), URL .Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al.(2016) Google’s neural machine translation system: Bridging the gap between human and machinetranslation. arXiv preprint arXiv:1609.08144 .Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review.

ArtiﬁcialIntelligence Review in W, Kann K, Yu M, Sch¨utze H (2017) Comparative study of cnn and rnn for natural language processing. arXiv preprint arXiv:1702.01923 .Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowledge andInformation Systems arXiv preprint arXiv:1903.08983 .Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: A survey.

Wiley InterdisciplinaryReviews: Data Mining and Knowledge Discovery arXiv preprint arXiv:1909.10681 .Zhou D, Wu S, Wang Q, Xie J, Tu Z, Li M (2020) Emotion classiﬁcation by jointly learning to lexiconize andclassify.

Proceedings of the 28th International Conference on Computational Linguistics , 3235–3245., 3235–3245.

Related Researches

AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation

by Jonáš Kulhánek

BembaSpeech: A Speech Recognition Corpus for the Bemba Language

by Claytone Sikasote

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

by Wenmeng Yu

Transfer Learning Approach for Arabic Offensive Language Detection System -- BERT-Based Model

by Fatemah Husain

Bootstrapping Relation Extractors using Syntactic Search by Examples

by Matan Eyal

Leveraging cross-platform data to improve automated hate speech detection

by John D Gallacher

NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application

by Chuhan Wu

Broader terms curriculum mapping: Using natural language processing and visual-supported communication to create representative program planning experiences

by Rogério Duarte

Decontextualization: Making Sentences Stand-Alone

by Eunsol Choi

The Singleton Fallacy: Why Current Critiques of Language Models Miss the Point

by Magnus Sahlgren

Generate and Revise: Reinforcement Learning in Neural Poetry

by Andrea Zugarini

A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining

by Boliang Zhang

SLUA: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

by Di Wu

Wake Word Detection with Streaming Transformers

by Yiming Wang

A study of text representations in Hate Speech Detection

by Chrysoula Themeli

OntoEnricher: A Deep Learning Approach for Ontology Enrichment from Unstructured Text

by Lalit Mohan Sanagavarapu

Effects of Layer Freezing when Transferring DeepSpeech to New Languages

by Onno Eberhard

How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases

by Hannah Kirk

In-Order Chart-Based Constituent Parsing

by Yang Wei

Quality Estimation without Human-labeled Data

by Yi-Lin Tuan

Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration

by Betty van Aken

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

by Yunyang Xiong

Spoiler Alert: Using Natural Language Processing to Detect Spoilers in Book Reviews

by Allen Bao

An open access NLP dataset for Arabic dialects : Data collection, labeling, and model construction

by ElMehdi Boujou

Representation Learning for Natural Language Processing

by Zhiyuan Liu

«

1

2

3

4

»

Submitted on 3 Feb 2021 (v1), last revised 25 Feb 2021 (this version, v3) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar