Subjective Sentiment Analysis for Arabic Newswire Comments
Journal of Digital Information Management (cid:137)
Volume 17 Number 5 (cid:137)
October 2019 289
Subjective Sentiment Analysis for Arabic Newswire Comments
Journal of DigitalInformation Management
ABSTRACT:
This paper presents an approach based onsupervised machine learning methods to discriminatebetween positive, negative and neutral Arabic reviews inonline newswire. The corpus is labeled for subjectivityand sentiment analysis (SSA) at the sentence-level. Themodel uses both count and TF-IDF representations andapply six machine learning algorithms; Multinomial NaïveBayes, Support Vector Machines (SVM), Random Forest,Logistic Regression, Multi-layer perceptron and k-nearestneighbors using uni-grams, bi-grams features. With thegoal of extracting users’ sentiment from written text.Experimental results showed that n-gram features couldsubstantially improve performance; and showed that theMultinomial Naïve Bayes approach is the most accuratein predicting topic polarity. Best results were achievedusing count vectors trained by combination of word-baseduni-gramsand bi-grams with an overall accuracy of 85.57%over two classes and 65.64% over three classes.
Subject Categories and DescriptorsH.3.1 [Content Analysis and Indexing] I.2.7 [NaturalLanguage Processing]:
Discourse, Text analysis.
I.7[Document and Text Processing]:
Document Preparation,Document and Text Editing
General Terms:
Data Mining, Sentiment Analysis, ArabicOpinion Processing, Arabic Text Mining
Keywords:
Arabic Sentiment Analysis, Opinion Mining, NaturalLanguage Processing, Machine Learning, Social Media
Received:
Review Metrics:
Review Scale- 0/6, Review Score-4.85, Inter-reviewer Consistency- 82%
DOI:
1. Introduction
With the explosion of communication technologies andthe accompanying pervasive use of social media, we noticean outstanding proliferation of reviews, comments,recommendations and other forms of opinion expressions.This opinionated content attracted researchers fromdifferent fields; economy, political sciences, socialsciences, psychology and particularly languageprocessing. One of the prominent subjects is the sentimentanalysis also called opinion mining. Sentiment analysis(SA) is the process of identifying and extracting subjectiveinformation, sentiments, opinions in a text using naturallanguage processing and machine learning techniques.The problem is usually addressed by formulating the SAas a classification task. The task involves identifyingwhether a text expresses a Positive, a Negative, or aNeutral sentiment. Applications of SA are varied includinganalyzing social media output to survey the public, togain an overview of the wider public opinions and emotionstowards certain persons, topics, products or services, topredict stock market, to build personalizedrecommendation systems, to track the public mood, etc.These applications can be applied in different fields forinstance: economy, business, education politics, sports,tourism, etc., which helps in the decision-making. However,SA is still far from producing perfect results due to thecomplexity of the language. The use of Arabic languagehas been increasing consistently over various social mediaSadik Bessou, Rania AberkaneDepartment of Computer Science, Faculty of SciencesUniversity of Ferhat Abbas Sétif 1, Algeria{[email protected]} {[email protected]} 290 Journal of Digital Information Management (cid:137)
Volume 17 Number 5 (cid:137)
October 2019platforms. However, Arabic imposes many challenges dueto its complex morphology and agglutinative nature witha highly inflectional and derivational system, Arabic wordshave different polarity categories in different contexts andusers use frequently dialectal Arabic (DA) rather thanmodern standard Arabic (MSA).In this work, we provide a new resource to support researchadvances in Arabic sentiment analysis (ASA). We scrapcomments from an Algerian online newspaper (Echoroukonline) . We highlight some features that discriminatebetween the different sentiment polarities. We propose asupervised approach, which relies on training languagemodels for the collected data to discriminate commentsentiments based on n -gram-words using six machine-learning algorithms.In the remainder of this paper, we review related work insection 2 and report on data and methods in section 3;where we present datasets, and we describe our approach.Finally, we discuss results and analyze errors in section4.
2. Related Work
The objective of this section is to provide a review of themajor works that have been devoted to ASA. A number ofprojects were conducted and several studies publishedapplied on both MSA and DA.Some researchers addressed the problem of building SAresources. Abdul-Mageed & Diab (2012) [2] presented theAWATIF multi-genre corpus of MSA labeled for subjectivityand sentiment analysis (SSA) at the sentence-level. Thecorpus was labeled using both manually and crowdsourcing annotation using Penn Arabic Treebank,Wikipedia Talk pages, and web forums. In another studyAbdul-Mageed & Diab (2014) [3] presented SANA a largescale multi-genre, multi dialect lexicon for the SSA ofMSA, Egyptian and Levantine dialects. SANA isdeveloped both manually and automatically exploiting datafrom several genres like Arabic Treebank newswire, Twitter,YouTube comments, and Egyptian chat logs.Aly & Atiya (2013) [6] presented LABR a large scale Arabicbook reviews dataset, where the reviews are rated on ascale of 1 to 5 stars. These data were used for both ratingclassification and sentiment polarity classification.ElSahar & El-Beltagy (2015) [13] build several domainspecific datasets for ASA. The domains covered in thedataset were movies, hotels, restaurants and products.The lexicon was extracted using a semi-supervisedapproach.Nabil et al. (2015) [20] presented ASTD (Arabic socialsentiment analysis dataset) gathered from Twitter, thetweets are classified as objective, subjective positive, subjective negative, and subjective mixed. Authorsperformed two sets of benchmark experiments: four-waysentiment classification and two-stage classification.Other researchers tackled the problem of classification,Shoukry & Rafea (2012) [22] considered a corpus-basedapproach for SA of tweets written in MSA and Egyptiandialects. They collected 1000 tweets divided equally intopositive and negative. After filtering the tweets, they usedstandard n-gram features and experimented with SVMand Naïve Bayes classifiers.Ibrahim et al. (2015) [15] presented a SA system for MSAand Egyptian dialect using a corpus of different types ofdata. Authors used rich feature sets to improve theclassification by handling the valence shifters, questionand supplication terms. The experimental results showedgood performance with SVM classifier.Mourad & Darwish (2013) [19] improved the accuracy ofSSA of Arabic tweets by translating an English lexicon,and applying a random graph walk approach on a manuallycreated lexicon using Arabic-English AMT phrase tables.The authors added different features such as stemming,POS tagging, tweet-specific features, etc.Khalifa & Omar (2014) [16] presented a hybrid approachfor Arabic opinion question answering and applied it toArabic costumers reviews on Jordanian hotels. Theapproach consists of extracting named entities anddetermining the polarities of words in the reviews. Theauthors experimented with Naïve Bayes, SVM and KNNclassifiers.Tartir & Abdul-Nabi (2017) [23] presented a semanticapproach to detect user attitudes and business insightsin social media using MSA or DA. The twitter feeds areclassified using Arabic sentiment ontology (ASO) intopositive or negative. The approach produced goodunderstanding.El-Masri et al. (2017) [12] presented a web based toolthat applied SA to Arabic text tweets. Several parametersare proposed: the time of the tweets, preprocessing, n-gram features, lexicon-based methods, machine-learningmethods. The polarity labels are (positive, negative, both,and neutral). Experimental results showed that NaïveBayes approach performs better than other classifiers.Alomari et al. (2017) [5] investigated different supervisedmachine learning SA approaches applied to Arabic user’ssocial media. Authors constructed their corpus bycollecting Arabic tweets written in Jordanian dialect andMSA. Experiments were conducted using SVM and NaïveBayes algorithms utilizing different features andpreprocessing strategies.Heikal et al. (2018) [14] explored different deep learningmodels to predict the sentiment of Arabic tweets. Theyused an ensemble model, combining convolutional neural Journal of Digital Information Management (cid:137)
Volume 17 Number 5 (cid:137)
October 2019 291network (CNN) and long short-Term Memory (LSTM)models. The model achieved significant improvement inthe F1-score and the accuracy over the existing models.Abdul-Mageed (2018) [1] introduced a framework ofstructural and social context features of the Twitter domainand showed its utility in classification with an SVMapproach.Baly et al. (2019) [8] created: the Multi-Dialect ArabicSentiment Twitter Dataset, and the Arabic SentimentTwitter Dataset for the Levantine dialect. The authorsexperimented with SVM, logistic regression and randomforest trees classifiers using POS tags, numbers ofpositive/negative emoticons and words from differentlexicons, Twitter-specific features, etc., in addition toseveral deep learning models.The survey of Al-Ayyoub et al. (2019) [4] presented anoteworthy comprehensive overview of the works done onASA. The survey grouped 361 published papers based onthe SA related problems. It covered the methods, tools,and resourced that used in the ASA. The aspectsconsidered in the study were Binary/ternary SSA, Multi-Way SA, Aspect-Based SA, Multilingual SA, and OtherSA-related problems. The study covered both corpus-based and lexicon-based SA approaches and it covereddialects and MSA.Recently, several evaluation campaigns were dedicatedfor SSA. SemEval is the international workshop onsemantic evaluation. It is an ongoing series of evaluationsof computational semantic analysis systems. It has beenrun yearly since 2013 [21]. The first time SemEvalintroduced Arabic language for all subtasks, was in 2017.[21]Our proposed approach focuses on word-based n-gramlanguage models using machine-learning algorithmsexpecting significant improvement in accuracy.
3. Data and Methods3.1 Dataset
There are many available standard datasets for Englishsentiment classification, but unfortunately, for Arabic thereis no standard dataset. Almost of the researchers collecttheir own corpus from the online web sites. Consequently,no common dataset is used for benchmarking results andevaluating experiments. [10]The major impact of using online data sources rather thanstandard datasets is the kind of data. Reviews andcomments are opinionated and include a subjectiveinformation rather than descriptive one.Our dataset consists of Arabic comments and reviews thatare manually harvested from online newspaper. We use“Echorouk online”, a daily newspaper in Algeria; it supportsreviews and comments allowing readers to express their opinions about the article they are reading. It is the mostread, and it is the second most visited website in 2018 inAlgeria. We collect different articles in economy, politics, socialissues, violence, culture, art, etc. Our corpus as shownin table 1 consists of 1.633 documents with 63.055 tokens.Each comment has been annotated for sentiment polarity:positive, negative, neutral: 31.392 tokens for negative,21.248 tokens for neutral and 9.975 tokens for positive.The annotation is manually conducted to guarantee thebest results.To collect this corpus we search articles with a number ofopinionated comments superior than 20.Welabel eachcomment with one of the following tags: positive, negative,neutral. Sentiment label
Positive 453 9.975Negative 760 31.392Neutral 420 21.248Total 1.633 63.055Table 1. Number of documents and tokens in each label
The scrapped data necessitate pre-processing since noisyand worthless information data can decrease the efficiencyof the system. To improve the quality of the input data weclean up the unwanted content by performing the followingpreprocessing steps:
Removal of URLs
Comments contain frequent web links to share additionalinformation. The content of the links is not analyzed,hence the link itself does not provide any useful informationand its removal can reduce the feature size.
Filtering
The purpose of filtering is to remove character sequencesthat may be noisy and thus affect the quality of data.After converting text corpus into UTF”8 encoding, it isnecessary to clean up the texts by removing punctuationmarks, special characters, non-Arabic characters, dates,time, numbers, single letters, links, and diacritics, etc.None of these impurities represents any polarity. Therefore,they should be removed.
Tokenizing
It consists on splitting paragraphs into sentences andsentences into tokens or words. In this step, we normalizeour data based on white space, excluding all non-Unicodecharacters. 292 Journal of Digital Information Management (cid:137)
Volume 17 Number 5 (cid:137)
October 2019
Normalizing
Normalization means replacing specific letters within theword with other letters according to a predefined set ofrules; i.e., the unification of characters. Some writing forms(Hamza and Alif) need normalization, which consists forinstance in converting “ ”, “ ” and “ ” into “ ” because mostof the Arabic texts neglect the addition of Hamza on Alif.Another kind of impurity encountered is the elongationwhere users repeat letters for exaggeration. We shortenthe elongated words by replacing the repeated letters witha simple occurrence instead.
Stop Words Removal
Stop words (pronouns, conjunctions, prepositions, andnames) are extremely frequent words and considered asvalueless for taking them as features. We remove stopwords that do not affect the classification task. Negationwords should not be removed; they reverse the sentimentfrom positive to negative and vice versa.
Stemming
It is the process of removing affixes from words, andreducing these words to their roots. It can significantlyimprove the efficiency of the classification by reducingthe number of terms being input to the classification [9].Many stemming methods have been developed for Arabiclanguage. The two most widely used stemming methodsare:1. The heavy stemming: Allows transforming each surfaceArabic word in the document into its root. [17]2. The light stemming: Allows removing prefixes andsuffixes. [18]In this work, we use light stemming. That does not reducea word to its proper root but it removes only prefixes andsuffixes from words, as the removal of infixes can changethe word meaning completely and consequently thesentiment polarity.
In this step, we label the comments, whose total numberis 1.633 comments with 63.055 tokens, and then we splitthe dataset into training set and test set. The training setconsists of 1.306 comments (80%), and the test set of327comments (20%).
Machine learning algorithms can predict sentiments basedon textual data. Sentiment analysis based on machinelearning consists of classifying subjective texts in two ormore categories. The binary classification determineswhether the text expresses a positive or a negative opinion.A multi-way classification determines whether the textexpresses a positive, a negative or a neutral opinion. Dataneeds to be annotated with sentiment labels. Labelleddata are fed into the machine learning algorithm to build aclassification model, which in turn can predict the label for unforeseen instances. Naive Bayes and SVM arecommonly used for sentiment classification, withsatisfactory results [7], they have showed good accuracyin sentiment polarity classification in various languages[11], such as English, Chinese and Arabic.Our approach is focused on word-based n -grams usingvarious classification algorithms, since syntactic units andrelations are expressed at the word-level. We extractdifferent lengths of n -grams, 1-2 word n -grams. These n -grams are used as features in the vector space model(VSM) which builds a term-document matrix by assigninga weight to every term appears in each comment. Manyschemes of this model can be used. In our case, we useCount vectors based on combinations of 1-2 word n -grams(binary weights) and term-frequency inverse-document-frequency (TF-IDF) vectors based on combinations of 1-2word n -grams (sophisticated weights). For each sentimentpolarity, we train a word-level language model.We formulate the task as a multi-class classificationproblem, where each sentiment polarity is a separateclass. Given a collection of comments and associatedpolarities, we consider a supervised system to predictthe sentiment labels of the comments, f: C → P i . It assignsto each comment C , the sentiment polarity P i thatmaximizes its conditional probability score argmax i P(P i \C) . For this task, we use six algorithms: MultinomialNaïve Bayes (MNB), Random Forest (RF), Linear SVC(LSVC), K-Nearest Neighbors (KNN), Logistic regression(LR) and Multi-layer perceptron (MLP).The goal of the experiments is to find the highest accuracyusing different classifiers.We used default settings for Logistic Regression andMultinomial Naïve Bayes. For Linear Support VectorMachine we changed the number of iterations to 1500.In the Logistic Regression and the Linear Support VectorMachine, we used L1 and L2 regularization, which can beadded to the algorithm to ensure that the models do notover fit their data. The L1 regularization norm is the sumof the absolute differences between the estimated andtarget values, while the L2 regularization norm is the sumof square of the differences between estimated and targetvalues. The regularization value of 1.0 have been used forclass weighting. For Multinomial Naïve Bayes we usedLaplace smoothing regularization method.
4. Results and Discussion
In this section, we illustrate our conducted experiments.We have performed two different experiments by trainingsix classifiers using diverse choices of features. Weexplore the problem of sentiment classification as a two-class classification problem: “positive, negative” and athree-class classification problem: “positive, negative andneutral”. The feature set is composed of uni -grams and bi - Journal of Digital Information Management (cid:137) Volume 17 Number 5 (cid:137)
October 2019 293grams represented with count vectors. We tested Bag-of-words features using uni -gram, bi -grams, and both. Theresults of the different classifiers using all set of featuresare compared to determine which classifier is mostaccurate in the task. Results are reported in term ofaccuracy, precision, recall, and F1-score. We notice thatusing count vector weighting scheme through different n -grams gives much better results than TF-IDF weightingscheme. In this set of experiments, we apply a binary classificationthat determines whether the comment expresses apositive or a negative opinion. Therefore, we disregard theneutral class.
First Experiment
The first experiment is conducted on the model trainedusing uni -grams features. The results of the evaluationare shown in table 2.
Algorithm Accuracy Precision Recall F1-score
MNB
RF 75.61% 76% 76% 76%LSVC 72.31% 74% 72% 73%KNN 43.80% 72% 44% 35%LR 77.68% 78% 78% 78%MLP 79.33% 79% 79% 79%Table 2. Results of experiments over two classes using uni -grams
Multinomial Naïve Bayes achieved the best accuracy with84.71 % and the best results for the other metrics.
Second Experiment
The following experiments are performed using bi -grams.The results are presented in Table 3. Algorithm Accuracy Precision Recall F1-score
MNB
RF 51.23% 71% 51% 48%LSVC 51.75% 72% 52% 48%KNN 37.19% 64% 37% 22%LR 74.79% 76% 75% 72%MLP 55.75% 75% 56% 54%Table 3. Results of experiments over two classes using bi -grams Multinomial Naïve Bayes achieved the best accuracy with75.20 % and the best results for the other metrics.
Third Experiment
This experiment represents the accuracy of previousalgorithms with combination of uni -grams and bi -gramsfeatures. The results are shown in table 4. Algorithm Accuracy Precision Recall F1-score
MNB
RF 70.66% 73% 71% 71%LSVC 70.66% 74% 71% 72%KNN 40.49% 69% 40% 29%LR 79.75% 80% 80% 80%MLP 83.05% 83% 83% 83%Table 4. Results of experiments over two classes using both uni -grams and bi -grams Multinomial Naïve Bayes achieved the best accuracy with85.57% and the best results for the other metrics. Theaccuracy when combining uni -grams and bi -grams is thebest over the three experiments. Given the experimental results, we notice a lowperformance when dealing with bi -grams alone. Thecombination of uni -grams and bi -grams outperformed theuse of bi -grams by around 10% of accuracy reporting85.57%. Furthermore, we notice that Multinomial NaïveBayes classifier performs better than the other classifiers,and the results are improved when combining uni -gramsand bi -grams. The best recall and precision are achievedby Multinomial Naïve Bayes classifier at 86% and 86%.Multinomial Naïve Bayes is followed by Multi-layerPerceptron, which achieves high performance resultsoutperforming the remaining classifiers with an accuracyof 83.05%. While considerable improvements are gainedfor all classifiers, only slight performance in accuracy(40.49%) is reached for KNN. In this set of experiments, all dataset instances with thethree-class labels are used.
Fourth experiment
The fourth experiment is conducted on the model that istrained using uni -grams features. The results of theevaluation are presented in table 5.We notice that Multinomial Naïve Bayes is the bestclassifier. It achieves the highest accuracy with 64.41%.
Fifth experiment
The fifth experiment is performed using bi -grams features.The results are shown in Table 6.In this experiment, there is a significant reduction inaccuracy for all classifiers. We can find out that Multinomial 294 Journal of Digital Information Management (cid:137) Volume 17 Number 5 (cid:137)
October 2019Naïve Bayes and Logistic Regression have the sameaccuracy: 58.58%. However, when observing F1-scoreMultinomial Naïve Bayes achieves the highest result.Therefore, it is the best classifier.
Sixth experiment
This experiment represents the accuracy of previousalgorithms combing uni -grams and bi -grams features. Theresults are shown in table 7. Algorithm Accuracy Precision Recall F1-score
MNB
RF 62.57% 63% 63% 58%LSVC 60.73% 59% 61% 59%KNN 37.11% 40% 37% 30%LR 62.88% 61% 63% 61%MLP 63.20% 61% 63% 61%Table 5. Results of experiments over three classes using uni -grams
Algorithm Accuracy Precision Recall F1-score
MNB
RF 60.42% 64% 60% 58%LSVC 61.34% 61% 61% 59%KNN 31.90% 42% 32% 23%LR 62.88% 61% 62% 59%MLP 63.19% 68% 67% 63%Table 7. Results of experiments over three classes usingboth uni -grams and bi -grams We notice that Multinomial Naïve Bayes achieves thehighest accuracy of 65.64%, comparing to the otherclassifiers and comparing to the two previous experiments.The accuracy when combining uni -grams and bi -gramsis the best over the three experiments. As expected, the introduction of the neutral class causesa reduction in accuracy. The sentiment classification intothree classes is more difficult than two-class classification.Multinomial Naïve Bayes proved to be the best performingclassifier scoring a significant difference than the rest ofclassifiers reporting 65.64% of accuracy.We notice a low performance when dealing with bi- gramsalone. The combination of uni -grams and bi -gramsoutperformed the use of bi -grams by around 7% ofaccuracy. The best recall and precision are achieved byMulti-layer Perceptron classifier at 68% and 67%.Multinomial Naïve Bayes is followed by Multi-layer Perceptron, which achieves high performance resultsoutperforming the remaining classifiers with an accuracyof 63.20%. While considerable improvements are gainedfor all classifiers, only slight performance in accuracy(31.90%) is reached for KNN.Over all experiments, we found out that preprocessing, n -grams combination, and count vectors representationweighting improve the classification performance. In theseexperiments, six supervised machine-learning classifierswere compared for sentiment classification. Theexperimental results show that Multinomial Naïve Bayesoutperformed the other classifiers. We conclude that it isbetter to combine uni -grams and bi -grams to improveperformance on the sentiment classification.
5. Conclusion
In this paper, we used machine learning to detectsentiments in online newswire comments. Several modelswere trained for sentence-level SA. Our model relies on apre-trained word vector representation. We used variousclassifiers, features, and preprocessing strategies to findthe best models to predict the sentiment label. The resultsshowed that n -gram features could substantially improveperformance. Additionally, we noticed that the kind of datarepresentation could provide a significant performanceboost compared to simple representation. The bestperforming feature representation is the combination of uni -grams and bi -grams.Out of the experimental results, we highlighted that thebest performing classifier was Multinomial Naïve Bayesand the worst was KNN. The findings show that althoughsubjectivity and sentiment expressed at semantic andpragmatic levels modeling them can benefit from lowerlinguistics levels in lexical space.In the future, we plan to extend our work to investigatemore complex emotion recognition models and exploredialectal Arabic as well as experiment on multi-genre,multi-lingual lexical resources, and multi-level (sentence,paragraph, and document), and investigate morealgorithms, which may be insightful. In light of the recentsuccesses of deep learning models, we plan toexperiment similarly with deep learning techniques on thetask. References [1] Abdul-Mageed, M. (2018). Learning SubjectiveLanguage: Feature Engineered vs. Deep Models. In : LRECWorkshop The 3rd Workshop on Open-SourceArabic Corpora and Processing Tools . 80-90.[2] Abdul-Mageed, M., Diab, M. T. (2012). AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity andSentiment Analysis. In : LREC, 3907-3914.[3] Abdul-Mageed, M., Diab, M. T. (2014). SANA: A Large Journal of Digital Information Management (cid:137) Volume 17 Number 5 (cid:137)
October 2019 295Scale Multi-Genre, Multi-Dialect Lexicon for ArabicSubjectivity and Sentiment Analysis. In : LREC, 1162-1169.[4] Al-Ayyoub, M., Khamaiseh, A. A., Jararweh, Y., Al-Kabi, M. N. (2019). A comprehensive survey of arabicsentiment analysis. Information Processing &Management , 56(2) 320-342.[5] Alomari, K. M., ElSherif, H. M., Shaalan, K. (2017).Arabic Tweets Sentimental Analysis Using MachineLearning. In : International Conference on Industrial,Engineering and Other Applications of Applied IntelligentSystems . 602-610. Springer, Cham.[6] Aly, M., Atiya, A. (2013). LABR: A large scale Arabicbook reviews dataset. In : Proceedings of the 51st AnnualMeeting of the Association for Computational Linguistics(Volume 2: Short Papers). 2, 494-498.[7] Assiri, A., Emam, A., Aldossari, H. (2015). Arabicsentiment analysis: a survey. International Journal ofAdvanced Computer Science and Applications , 6(12) 75-85.[8] Baly, R., Khaddaj, A., Hajj, H., El-Hajj, W., Shaban,K. B. (2019). ArSentD-LEV: A multi-topic corpus for target-based sentiment analysis in Arabic levantine tweets. arXivpreprint arXiv:1906.01830.[9] Bessou, S., Touahria, M. (2014). An Accuracy-Enhanced Stemming Algorithm for Arabic InformationRetrieval.
Neural Network World , 24(2) 117-128.[10] Boudad, N., Faizi, R., Thami, R. O. H., Chiheb, R.(2017). Sentiment analysis in Arabic: A review of theliterature.
Ain Shams Engineering Journal .[11] Elarnaoty, M., AbdelRahman, S., Fahmy, A. (2012).A machine learning approach for opinion holder extractionin Arabic language. arXiv preprint arXiv:1206.1011.[12] El-Masri, M., Altrabsheh, N., Mansour, H., Ramsay,A. (2017). A web-based tool for Arabic sentiment analysis.
Procedia Computer Science , 117, 38-45.[13] ElSahar, H., El-Beltagy, S. R. (2015). Building largeArabic multi-domain resources for sentiment analysis. In : International Conference on Intelligent Text Processing and Computational Linguistics . 23-34. Springer, Cham.[14] Heikal, M., Torki, M., El-Makky, N. (2018). SentimentAnalysis of Arabic Tweets using Deep Learning.