[PDF] Arabic aspect based sentiment analysis using bidirectional GRU based models

Abstract

Aspect-based Sentiment analysis (ABSA) accomplishes a fine-grained analysis that defines the aspects of a given document or sentence and the sentiments conveyed regarding each aspect. This level of analysis is the most detailed version that is capable of exploring the nuanced viewpoints of the reviews. Most of the research available in ABSA focuses on English language with very few work available on Arabic. Most previous work in Arabic has been based on regular methods of machine learning that mainly depends on a group of rare resources and tools for analyzing and processing Arabic content such as lexicons, but the lack of those resources presents another challenge. To overcome these obstacles, Deep Learning (DL)-based methods are proposed using two models based on Gated Recurrent Units (GRU) neural networks for ABSA. The first one is a DL model that takes advantage of the representations on both words and characters via the combination of bidirectional GRU, Convolutional neural network (CNN), and Conditional Random Field (CRF) which makes up (BGRU-CNN-CRF) model to extract the main opinionated aspects (OTE). The second is an interactive attention network based on bidirectional GRU (IAN-BGRU) to identify sentiment polarity toward extracted aspects. We evaluated our models using the benchmarked Arabic hotel reviews dataset. The results indicate that the proposed methods are better than baseline research on both tasks having 38.5% enhancement in F1-score for opinion target extraction (T2) and 7.5% in accuracy for aspect-based sentiment polarity classification (T3). Obtaining F1 score of 69.44% for T2, and accuracy of 83.98% for T3.

Full PDF

AArabic aspect based sentiment analysis usingbidirectional GRU based models

Mohammed M.Abdelgwad a , Taysir Hassan A Soliman a , Ahmed I.Taloba a ,Mohamed Fawzy Farghaly a a Information system departmentFaculty of computer and informationAssiut universityEgypt

Abstract

Aspect-based Sentiment analysis (ABSA) accomplishes a ﬁne-grained anal-ysis that deﬁnes the aspects of a given document or sentence and the sen-timents conveyed regarding each aspect. This level of analysis is the mostdetailed version that is capable of exploring the nuanced viewpoints of thereviews. Most of the research available in ABSA focuses on English languagewith very few work available on Arabic. Most previous work in Arabic hasbeen based on regular methods of machine learning that mainly depends on agroup of rare resources and tools for analyzing and processing Arabic contentsuch as lexicons, but the lack of those resources presents another challenge.Toovercome these obstacles, Deep Learning (DL)-based methods are proposedusing two models based on Gated Recurrent Units (GRU) neural networksfor ABSA.The ﬁrst one is a DL model that takes advantage of the representa-tions on both words and characters via the combination of bidirectional GRU,Convolutional neural network (CNN), and Conditional Random Field (CRF)which makes up (BGRU-CNN-CRF) model to extract the main opinionatedaspects (OTE). The second is an interactive attention network based on bidi-rectional GRU (IAN-BGRU) to identify sentiment polarity toward extractedaspects. We evaluated our models using the benchmarked Arabic hotel re-views dataset proposed by [1].The results indicate that the proposed methods

Email addresses: [email protected] (Mohammed M.Abdelgwad), [email protected] (Taysir Hassan A Soliman),

[email protected] (AhmedI.Taloba), [email protected] (Mohamed Fawzy Farghaly) a r X i v : . [ c s . C L ] F e b re better than baseline research on both tasks having 38.5% enhancement inF1-score for opinion target extraction (T2) and 7.5% in accuracy for aspect-based sentiment polarity classiﬁcation (T3). Obtaining F1 score of 69.44%for T2, and accuracy of 83.98% for T3. Keywords:

Aspect-based sentiment analysis (ABSA), deep learning,opinion target extraction (OTE), aspect sentiment polarity classiﬁcation,BGRU-CNN-CRF and IAN-BGRU.

1. Introduction

The development of web technologies has provided new ways to communi-cate with user-generated content like blogs, social networks, forums, reviewsof websites, etc [2]. With this phenomenal growth in the amount of datato manage and the diﬃculty of processing unstructured text in natural lan-guages, there was a strong interest from individuals and organizations in thedata mining ﬁeld to take advantage of this information ﬂow.The Sentiment Analysis (opinion mining or emotional analysis) is a computer-generated examination of human opinions, moods, and emotions [3]. Aimingto determine the attitude of the author of a speciﬁc piece of content relatedto the subject of interest. It determines the document polarity (i.e., review,tweet, blog, or news), that is, whether the reported opinion is positive, neg-ative, or neutral.Sentiment Analysis (SA) of the Arabic text can be categorized into threeapproaches: corpus-based, lexicon-based, and hybrid-based(i.e. the mixtureof lexicon-based and corpus-based) [4]. Arabic SA was ﬁrst reported in 2006as indicated in [4]. There has been a number of methods published in ArabicSA since that period.Studying sentiment analysis is possible at three levels as argued by [5]: thedocument level, sentence level, and aspect level. The document level consid-ers that the whole document has only one opinion, and therefore, the taskwould be to identify the positive or negative feeling of a whole document.The task at the sentence level breaks the document into a set of sentencesand determines whether each sentence expresses a positive, negative, or neu-tral opinion. Neither the document level nor the sentence level analysis coulddetermine precisely what people liked and disliked. As sentences or reviewsmight contain several opinions about diﬀerent aspects of a certain entity. Inaddition, these opinions may conﬂict with one another. These reviews need2nother reasonable SA is known as ABSA.Three main tasks for ABSA may, therefore, be deﬁned as proposed by -[1].Task 1 (T1): aspect category identiﬁcation, Task 2 (T2): aspect opinion tar-get extraction, and Task 3 (T3): aspect polarity detection. In this paper, wewill concentrate only on T2 and T3.There are numerous variations between SA and ABSA according to [4]; suchas (a) connecting a portion of text to certain aspect (i.e. extracting a targetopinions expression), and (b) Paraphrasing the text by extracting parts ofthe text that discuss the same aspects (for example. Battery eﬃciency andpower usage both related to the same aspect).ABSA has been a major focus in high-proﬁle Natural Language Processing(NLP) conferences and workshops like SemEval because of its importance.SemEval is an annual workshop on NLP that provides the scientiﬁc commu-nity with a variety of activities in which SA systems can be evaluated. In2014 SemEval organized the ﬁrst joint ABSA task [6]. This task oﬀered bothstandard datasets and joint evaluation procedures to the scientiﬁc commu-nity.The success of ABSA tasks resulted in tasks being replicated over the nexttwo years [1],[7] as the task expanded in its coverage of various domains,diﬀerent languages, and the problems it presented. In reality, 19 training,20 test datasets for eight languages, and seven domains were oﬀered for theABSA task hosted by SemEval-2016. In addition, a classiﬁer proven to per-form well for NLP tasks, which is Support Vector Machine (SVM), was usedin the baseline evaluation procedure.More recently, experimental work with innovative machine learning meth-ods, called “Deep Learning” multi-layer processing technology that utilizesconsecutive unit layers to build on previous outputs, was demonstrated usingthe backpropagation algorithm [8]. On each layer, the inputs are convertedto numerical representations, which are later classiﬁed. Consequently, anincreasingly higher abstraction level is achieved [9]. DL is considered as oneof the highly recommended machine learning methods for dealing with manyresearch challenges in NLP such as named entity recognition [10], machinetranslation [11, 12], speech recognition [13, 14], and SA [15, 16].DL advantagelies in its independence from specialized knowledge and linguistic resourcesas well as in its superior performance.The collection of algorithms (i.e. deep neural networks (DNNs), recurrentneural networks (RNNs), convolutionary neural networks (CNNs), recursiveneural networks (RecNNs), etc.) promote research in numerous areas of deep3eural networks that are specially adapted to ﬁne-grained work due to thevast number of linked processor layers that are either enabled by environ-mental sensors or weighted pre-neuron calculations [17].Word embedding or distributed representations improve neural network per-formance and enhances DL models. Two classic ways to embed words avail-able in Arabic: Word2Vec [18] and FastText [19].First, Word2Vec makes use of small neural networks to calculate word em-bedding based on word context. There are two ways to put that approachinto practice. The ﬁrst is a continuous bag of words or CBOW. In thismethod, the network attempts to predict which word is most likely giventhe context. Skip-gram is the second approach, the idea is very similar butthe network is working in the opposite direction, the network attempts topredict the context given target word. In several areas of NLP, Word2Vechas proved useful. But one unresolved issue has been unknown term gener-alization. Second, FastText which was founded in 2016 by Facebook vowedto resolve this obstacle. In our research, we used word2vec embeddings inthe two tasks and left FastText for the future.A new development in DL has emerged, it is the attention mechanism. Theattention mechanism has achieved good success in image classiﬁcation and inmany NLP disciplines such as document sentiment classiﬁcation, documentsummarization, named entity recognition, and machine translation. The At-tention Mechanism helps the neural network learn better by knowing whereto look as it is performing the task.Recently, DL with attention was very common for SA [20, 21, 22, 23]. Whenthe sentence has various aspects, the attention mechanism will concentrateon the diﬀerent sections of a sentence that pertain to each aspect. Interac-tive attention networks showed promising results in diﬀerent NLP tasks likemachine translation [24], question answering [25, 26], document classiﬁca-tion [27]. Authors of [22] proposed using the Interactive Attention Networks(IAN) for aspect-level sentiment classiﬁcation and achieved state-of-the-artperformance. The key concept is to learn self-representation for targets andcontexts interactively. To further improve the representation of targets andcontext we proposed using the features provided by the GRU model in generaland bidirectional GRU in particular instead of using single direction LongShort-Term Memory (LSTM) in the base model, the bi-directional GRU over-comes feed-forward models limited ability by extracting unlimited contextualinformation from both sentence directions.4o carry out the ABSA tasks, we follow some steps that ultimately makeup the ABSA workﬂow, and they are in order:1. Breaking down reviews into individual sentences.2. preprocessing each sentence (tokenization, stopwords removal, and textvectorization).3. T1: Extracting main opinionated aspects using (BGRU-CNN-CRF)model.4. T2: Identifying sentiments conveyed regarding each aspect using (IAN-BGRU) model.The overall workﬂow of the proposed approach for ABSA is shown in Figure1.

Figure 1: The overall workﬂow of the proposed approach for ABSA.

In this paper, ABSA research tasks are performed by a type of RNN(GRU), where two GRU models were built, as follows: (a) DL architecturebased on a state-of-the-art model that takes advantage of the representationson both words and characters via the combination of bidirectional GRU,CNN, and CRF (BGRU-CNN-CRF) to extract the main opinionated aspects(i.e. T2: (OTE)) (b) IAN based on bidirectional GRU (IAN-BGRU) isimplemented to identify sentiment polarity toward extracted aspects fromT2 (i.e. T3).The main contributions of this study are :1. The proposed models did not rely on any handcrafted features or ex-ternal resources such as lexicons, which considered one of the toolsthat are not widely available in the public domain for analyzing andprocessing Arabic content and requires great eﬀort in collection.5. The proposed models are better than baseline research on both taskshaving 38.5% enhancement in F1-score for opinion target extraction(T2) and 7.5% in accuracy for aspect-based sentiment polarity classi-ﬁcation (T3). Obtaining F1 score of 69.44% for T2, and accuracy of83.98% for T3.The rest of this paper is arranged as follows: Section 2 addresses literaturereviews of ABSA; Section 3 presents proposed models; Section 4 explains thedataset and the baseline approach; Section 5 Presents results and discussion;section 6 presents case study; ﬁnally, section 7 concludes the paper andoutlines future work plans.

2. Related work

ABSA is a SA branch where research approaches can be classiﬁed intotwo approaches: regular methods of machine learning and methods basedon DL. Sentiment classiﬁcation at aspect level is usually considered to be aproblem of text classiﬁcation.Text classiﬁcation techniques such as SVM [28] may then be used to solvethe sentiment classiﬁcation problem at aspect level, without consideration ofthe mentioned targets or aspects. In some early works, numerous rule-basedmodels have been designed to deal with ABSA, as suggested in [29, 30] whereauthors, executed sentence dependency parsing then used predeﬁned rules todecide the sentiments regarding aspects. While these approaches producedsatisfactory results, their outcomes are highly dependent on the eﬃcacy ofthe labor-intensive handcraft features.Neural Networks (NNs) have the potential to create new representationsvia several hidden layers using original features. Semantic compositions ontree structures can be performed by Recursive NN (Rec-NN), authors of[31, 32] adopted Rec-NN for aspect sentiment classiﬁcation, by transformingthe opinion targets to the root of the tree and spreading the sentiment oftargets according to their context and the syntactic relationships.NN, especially RNN, are most widely used in ABSA to identify the senti-ment polarity of an aspect. LSTM is an eﬀective RNN network that is ableto reduce vanishing gradient problems. Nevertheless, LSTMs are not suitedfor addressing the interactive correlation of context and aspect, leading to anenormous loss of aspect relevant data. To include aspects into the model, theauthors of [33] proposed Target-Dependent LSTM (TD-LSTM) and Target-Connection LSTM (TC-LSTM). The TD-LSTM splits a sentence into the left6nd right parts around the aspect target and ﬂows into two LSTM modelsin a diﬀerent forward and backward sequential path. In order to determinethe sentiment polarity label, the ﬁnal hidden vectors of left LSTM and rightLSTM are linked to be fed into Softmax. Nonetheless, the interactions be-tween aspect target and context are not captured by TD-LSTM. To solvethis issue, TC-LSTM uses the semantical interaction between the aspect andthe context, by integrating the aspect target and context word embeddingsas the inputs, and transfers it back and forth through two diﬀerent LSTMs,similar to those used in TD-LSTM.The Attention Mechanism helps the neural network learn better by knowingwhere to look as it is performing the task. The authors of [34] developedAttention-based LSTM (ATAE-LSTM) to investigate the correlation of as-pects and contexts through the attention mechanism that helps to identifythe import portion of the sentence against the speciﬁed aspect. ATAE-LSTMintegrates the context embeddings with the aspect embedding as LSTM in-put for eﬃcient usage of aspect target information. Throughout this case, thehidden LSTM vectors will include knowledge from the aspect target whichcan help to make the model more accurate in obtaining attention weights.With a very long distance of dependency, one attention mechanism may bebad at catching various key context words for numerous targets, so the au-thors of [35] proposed using multiple attention mechanism to mitigate thisissue by creating recurrent attention on memory (RAM). RAM creates mem-ory from input and extracts important information from the memory throughthe multiple attention mechanism. And for prediction, it uses a combinationof the extracted features of diﬀerent attentions non-linearly using a gatedrecurrent unit network.Generally, a review is comprised of several sentences and a sentence is madeof many words, and therefore the review is, of course, a hierarchical structure.On the basis of the review architecture, the authors of [36] suggested a Hierar-chical Bidirectional LSTM (H-LSTM) model for the ABSA task. They foundthat modelizing internal review structure knowledge can improve model per-formance.Although eﬀective, LSTM suﬀers from an inability to train in parallel asother RNNs , so it appears to be time-intensive since they are time-serialNN . Thus the authors of [37] suggested a simpler, precise and quicker modelbased on CNN and gating mechanism than traditional LSTM models withattention mechanism, as their model computations can be easily paralyzedduring training and have no dependency. On their work they targeted two7asks: aspect category sentiment analysis and aspect sentiment analysis andachieved competitive results on both.CNN are also used as an additional component to discover local main fea-tures, such as linguistic patterns (CNN-LP) [38] , or as an eﬀective method toreplace attention, such as Target-speciﬁc Transformation Networks (TNet)[39], in ABSA tasks.All of the above research papers have focused on English ABSA. For ArabicABSA, a few attempts have been achieved where the HAAD dataset togetherwith a baseline research were presented in 2015 [40];- HAAD has been anno-tated through the SemEval-2014 framework. In 2016 another benchmarkeddataset of Arabic hotel reviews has been noted in support of ABSA taskrequirements, this dataset has been used to test some of the approaches pre-sented in the Multilingual ABSA SemEval-2016 Task5.The authors of [41] suggested applying a set of supervised machine learning-based classiﬁers enhanced with a set of hand-crafted features such as morpho-logical, syntactic, and semantic features on the Arabic hotel review dataset.Their approach covered three tasks: identify aspect categories, extract opin-ion targets, and identify the sentiment polarity. The evaluation resultsshowed that their approach was very competitive and eﬀective.In addition,the authors of [4] proposed applying two supervised machinelearning-based approaches namely SVM and RNN on the Arabic hotel re-views dataset in line with task 5 of SemEval-2016. The researchers investi-gated the three tasks : aspect category identiﬁcation, (OTE), and sentimentpolarity identiﬁcation. The ﬁndings indicate that SVM outperforms RNNin all tasks. Though, the deep RNN was found to be faster and better bycomparing the time taken during training and testing.Authors of [42] proposed using two approaches based on LSTM NN and wereassessed using Arabic hotel reviews dataset, namely: (a) combination of Bi-directional LSTM and Conditional Random Field classiﬁer (BLSTM- CRF)based on character level and word level for extracting the main opinionatedaspects from the text, (b) aspect-based LSTM for sentiment polarity classi-ﬁcation (AB-LSTM-PC) used for handling the third task. The test resultshave demonstrated that their methods surpass the baseline and are prettyeﬀective. 8 . Proposed methods

In this section, the proposed models previously mentioned in ABSA work-ﬂow in Figure 1 will be explained in details. Suggested models mainly dependon GRU which is a type of RNN. RNN is a form of Artiﬁcial Neural Network(ANN) used in various NLP applications such as machine translation [43],speech recognition [44], question answering [45], automatic Summarization[46], chatbots [47], market intelligence [48], text classiﬁcation [49], and muchmore. RNN model is designed to recognize the sequential properties of thedata, and then use patterns to predict the next scenarios. The main featureof RNN over feed-forward neural networks is its ability to process inputs ofany length and remember all information all the time, which is very usefulin any prediction of a time series. RNN uses recurrent hidden units whoseactivation at each time step depends on the previous step. The main draw-backs of RNNs is the gradient vanishing/exploding problems, which makethem diﬃcult to train and scale to large machine learning problems [50, 51].GRU has been suggested as a solution to this problem and has proven to beeﬀective in many NLP problems.Two models based on GRU are proposed to achieve the research tasks: (a) DLarchitecture that uses representations at word and character levels by com-bining bidirectional GRU, CNN, and CRF (BGRU-CNN-CRF) in order toextract the main aspects of the text that have an opinion (T2: OTE), and (b)An interactive attention network based on bidirectional GRU (IAN-BGRU)is implemented to handle the third task (T3: identify aspect sentiment po-larity). To our knowledge, no other research has deployed these techniquesfor Arabic ABSA.

In this section, we list all the elements we need to build our models insuﬃcient details; these elements include GRU,BGRU,CNN and CRF.

Recently, GRU [43] , a family of RNNs, has been proposed to deal withgradient vanishing problems. GRU is a powerful and simple alternative toLSTM networks [52]. Similar to LSTM models, GRU is designed to adap-tively update or reset the memory content using r j reset gate and a z j updategate that are similar to the forget and input gates of LSTM. Compared toLSTM, GRU does not have a memory cell and only has two gates. The GRU9ctivation h jt in time t is the linear interpolation of previous activation h jt − and candidate activation ˜ h jt .To compute the state h jt of the j-th GRU at time step t, we use the followingequation: h jt = (1 − z jt ) h jt − + z jt ˜ h jt (1)Where ˜ h jt and h jt − correspond to the new candidate and previous memnorycontent , respectively. z jt represents the update gate that helps the model todetermine the amount of past information (from previous time steps) to bepassed along to the future and how much of the new memory content is tobe added.To calculate the update gate z t for time step t, we use the previous hiddenstates h t − and the current input x t in the following equation: z t = σ ( W z x t + U z h t − ) (2)The new memory content ˜ h jt is calculated as follows:˜ h t = tanh( W x t + r t (cid:12) U h t − ) (3)where (cid:12) is the Hadamard product (also known as the element-wise product)and r t represent the reset gate which used to decide how much of the pastinformation to forget. We use this formula to calculate it: r t = σ ( W r x t + U r h t − ) (4)A graphical representation of the GRU unit is illustrated in Figure 2.GRU is faster than LSTM on training since GRU has simpliﬁed architecturewith fewer parameters and therefore uses less memory. CNN has been one of the leading NNs of DL technology,it has becomeone of the hotspots of research in many disciplines. CNNs are normally usedin computer vision, but recently, they have been applied for a variety of NLPtasks in several languages in general and Arabic in particular and achievedpromising results.With regard to the OTE task, CNN has been applied as a preliminary layerto extract character-level features such as preﬁx or suﬃx of a word [53]. Weused CNN for training character vectors. The character vectors are then10 igure 2: Gated Recurrent Unit. searched through the character lookup table to form the matrix C by stack-ing the searched results. Then various convolution ﬁlters of diﬀerent sizesbetween the matrix C and multiple ﬁlter matrices are carried out to obtainthe character-level features of each word by maximizing the pooling. Beforeinserting character embedding in CNN, we applied the dropout layer, whichis useful for preventing the model from being dependent on certain words,this is an eﬀective way to prevent overﬁtting.

One downside of GRU networks is that they can only use the previouscontext, without taking into consideration the future context, so they canonly manage sequences from front to back which results in information loss.This has motivated many researchers to use a bidirectional GRU that canprocess data in both directions, as the output layer collects information fromthe two separate hidden layers.The fundamental architecture of Bidirectional GRU networks is actually justputting together two separate GRUs. The input sequence is fed in normaltime order for one network (from right to left for Arabic languages), and inreverse time order for another. The outputs of the two networks are normallyconcatenated at each-time step (there are other choices, such as summation).Such a structure can provide complete context information.

In this paper, the OTE task is considered as a sequence labeling taskwhere, for a given aspect consisting of more than one word (in N-gram rep-11esentation), It is useful to consider the associations between adjacent labelsand decode them together by choosing their best sequence. We used the (in-side, outside, beginning) IOB format to represent the N-gram aspects, whereeach token at the beginning of the aspect is marked as B-Aspect, I-Aspect ifthe token is located inside the aspect, not the ﬁrst token, and O otherwise.In order to capture these correlations between aspect OTE tags, we appliedconditional random ﬁeld classiﬁer (CRF) on top of BGRU. CRF is a stan-dard model for predicting the most likely sequence of labels that correspondto a sequence of inputs. CRF layer has a state transition matrix as a param-eter, that is used to predict the current tag based on the previous and nexttags. We refer to this transition matrix by Xi,j which represents the degreeof transition from the i-th tag to the j-th tag. Then softmax is computedusing the sentence scores and possible sequence of labels.The mathematicalmodel was illustrated by the authors of [54].

In this section, we explain the proposed models in details: IAN-BGRUfor polarity classiﬁcation and BGRU-CNN-CRF for aspect term extractionin order to handle both tasks T2 and T3, as mentioned previously.

To handle the third task, we applied the interactive attention network(IAN) [22] with some modiﬁcations to the preliminary layers as IAN canprecisely represent target and context interactively. This model consists oftwo parts: the ﬁrst part considers obtaining the hidden states of the targetsand their context by applying two separate LSTMs using word embeddingsof target and context as input. To further improve the representation of tar-gets and context we proposed using the features provided by GRU insteadof LSTM in general and bidirectional GRU in particular. The bi-directionalGRU resolves the feed-forward model’s limited ability by extracting limitlesscontextual information from the front and back. Then, the hidden state val-ues for both aspects and context are averaged separately, giving an initialrepresentation of both, which will be used later to calculate the vectors ofattention. The second part considers collecting essential data in the con-text through the application of the attention mechanism with the initialrepresentation of the target leading to the context attention vector. Like-wise, applying the attention mechanism with the initial representation ofthe context to capture important information in the target obtaining target12ttention vector. After that, the target and context representations can bedeﬁned based on the attention vectors. Finally, representations of both thetarget and context are concatenated as a ﬁnal representation which is fedinto a softmax function to predict sentiment polarity for the target withinits context.Please refer to [22] for basic model details. Figure 3, illustratesthe architecture of the IAN-BGRU model in details.

Figure 3: Architecture of IAN-BGRU

We considered the OTE task as a sequence labeling task, so we suggesteda neural network architecture for sequence labeling that relies on the DLarchitecture proposed by the authors of [55], where they combined two as-pects: 1) the use of a character-level representation [53]; 2) the addition ofan output layer based on CRF [56]. It is an end-to-end model that does notrequire any task-speciﬁc resources, feature engineering, or preprocessing ofdata other than pre-trained word embedding in unlabeled corpora.First, we used CNN to encode word information at the character level to ob-tain a character-level representation. A dropout layer [57] is applied beforefeeding the CNN with character embeddings. Then, character embeddingvectors (character-level-representations, extracted previously by CNN ) areconcatenated with word embedding vectors and fed into the BGRU to modelcontext information of each word. The dropout layer is also applied to out-put vectors from the BGRU. The output vectors of BGRU are then fed into13he CRF layer for decoding the best sequence of labels. Figure 4, clariﬁesthe architecture of our network in details.

Figure 4: architecture of BGRU-CNN-CRF

4. Data and baseline research

The main tasks of our research were tested using the Arabic hotel reviewsdataset. The dataset was prepared as a part of SemEval-2016 Task-5 whichwas an ABSA multilingual task that includes customer reviews in eight lan-guages and seven domains [1]. The Arabic hotel reviews dataset contains24,028 annotated ABSA tuples divided as follow: 19,226 tuples for train-ing and 4,802 tuple for testing. Furthermore, both text level (2291 reviewstexts) and sentence level (6029 sentences) annotations were provided for thedataset. This research focused only on sentence level tasks.Table 1, indicatesthe size and distribution of the dataset to the tasks of the research..The dataset was supplemented with baseline research based on SVM and14-grams as features. The results obtained from this research are consideredas a baseline for each task and are mentioned in this paper in the resultssection related to each task.

Table 1: Dataset size and it is distribution over the research tasks [42].TASK TRAIN TESTtext sentence tuples text sentence tuplesT1: Sentence-level ABSA 1839 4802 10.509 452 1227 2604T2: Text-level ABSA 1839 4802 8757 452 1227 2158

5. Experimentation and results

As stated previously, two GRU-based models were applied to handle bothtasks (BGRU-CNN-CRF for T2, and IAN-BGRU for T3). The Arabic hotelreview dataset was used to train and test both models. For model training70% of the dataset was used, for validation 10% was used and for testing20%. All neural networks were implemented using the Pytorch library. Thecomputations for each model were performed separately on the GeForce GTX1080 Ti GPU. This section explains training of each model based on thetargeted task.

To handle this task , BGRU-CNN-CRF model is implemented and trainedusing the features of word and character embedding.

F1 metric is adopted for performance evaluation of OTE which is re-ferred as the weighted harmonic mean for accuracy and recall. The score iscomputed by F = ( recall − + precision − − = 2 · precision · recallprecision + recall (5)15 .1.2. Hyperparameters Setting Both context word embeddings and targets word embeddings are ini-tialized by AraVec [58] which is a pre-trained distributed word representa-tion (word embedding) that intends to provide free-to-use and eﬀective wordembedding models for Arabic NLP research community. It is basically aword2vec model that has been trained on Arabic data. Two diﬀerent mod-els, unigrams and n-grams are provided by AraVec, which are built on topof various domains with Arabic content. In this research, we used CBOW-unigram model built on top of Twitter tweets. The dimensions of word em-beddings and GRU hidden states are set to 100. During the testing phase,non-embedded words (unknown words) are represented using ”UNK” em-bedding.All character embeddings are initialized with uniform samples of[ −√ dim, + √ dim ], where dim=25 in addition, 30 ﬁlters with a windowlength of 3 are used.Parameters optimization is performed with mini-batch stochastic gradientdescent (SGD) with batch size 16 and momentum 0.9. We selected an initialrate of ( η ηt = η ρt ), with rate of decay ρ = 0.04, and t is the number of epochscompleted. Gradient clipping of 5.0 was used to reduce the impact of “gra-dient exploding”.The initial embeddings are ﬁne-tuned and updated by back-propagating gra-dients in the gradient update of the neural network model. We use earlystopping [59] based on performance on the validation set. According to ourexperiments, the best parameters appear at approximately 60 epochs.To mitigate overﬁtting, we use the drop-out approach to regularize our model.Dropout is applied in several places in our model, namely : to character em-beddings before inputting to CNN and on both the input and output vectorsof GRU. We ﬁxed the dropout rate at 0.5 for all dropout layers. To our knowledge, we have not found any published research on the sametask that applies neural network models to the Arabic hotel reviews datasetsand achieved better results than [42], so, we compared our results to theirﬁndings as shown in table 2.The Authors of [42] extended BLSTM-CRF base model used for sequence la-beling with character-level word embeddings which are developed by applyinga BLSTM to the sequence of characters in each word (BLSTM-CRF+LSTM-16har). There are two techniques carried out for initializing a word embeddinglookup table: (a) word2vec, (b) fastText. BLSTM-CRF+LSTM-char withword2vec word embeddings attained (F-1 = 66.32%) and (F-1 = 69.98%)with FastText embeddings.

Table 2: models results on T2

Model F1(%)Baseline 30.90BLSTM-CRF (word2vec) 66.32

BLSTM-CRF (FastText) 69.98

CNN-BGRU-CRF (word2vec) 69.44We developed BGRU-CRF+CNN-char model that is similar to [42] butwith some changes: • The BLSTM layer they used for character embedding has been sub-stituted by the CNN layer. As CNN has fewer training parametersthan BLSTM, training performance is higher and was recommended asthe preferred approach as mentioned in [60] when compared the per-formance of BLSTM-CRF models with CNN-based and LSTM-basedcharacter-level embeddings for biomedical named entity recognition. • We replaced the LSTM in the BLSTM-CRF base model with GRU dueto the simplicity of the GRU (since GRUs have fewer parameters), easeof training, and thus ease of learning.In our experiments, we only used word2vec for word embedding andfound that our model BGRU-CNN-CRF got better results (F1=69.44%)than BLSTM-CRF+LSTM-char based on word2vec (F1=66.32%) and closeto BLSTM-CRF+LSTM-char based on FastText (69.98%)We noticed that the simpler the model, the easier to learn, the better resultsto get. We plan to use the FastText as word embedding in the future.

To determine models eﬃciency in the aspect sentiment polarity identiﬁ-cation task. Accuracy metric is adopted that can be expressed as:17 igure 5: Achieved results in T2, OTE

Accuracy = TN (6)where T and N, respectively, refer to the correctly predicted number ofsamples and the overall samples number Accuracy is the percentage of sam-ples predicted correctly out of all samples. A better-performed system hasbetter accuracy. As mentioned earlier, both context word embeddings and targets wordembeddings are initialized by AraVec [58]. In particular, the CBOW-unigrammodel built on top of Twitter tweets.We initialize all weight matrices using samples from the uniform distribution U ( − . , . e − , L2-regularization weightof 2 e − and the dropout rate of 0.3, batch size of 64 and number of epochsequal to 12. Baseline * SVM is trained only with N-grams features.

INSIGHT-1 * was the ﬁrst place winner of the SemEval-2016 Task-5 com-18etition working on the Arabic hotel reviews dataset. They concatenatedword embeddings with aspects embeddings and fed the mixture to CNN foraspect sentiment and category identiﬁcation tasks [62].

LSTM uses one LSTM for sentence modeling, and the last hidden state isused to represent the sentence for ﬁnal classiﬁcation.

TD-LSTM splits the sentence into the left and right parts around the as-pect and ﬂows into two LSTM models in a diﬀerent forward and backwardsequential path. In order to determine the sentiment polarity label, the ﬁnalhidden vectors of the left LSTM and the right LSTM are linked to be fedinto Softmax [33].

ATAE-LSTM * attention-based LSTM with aspect embedding applies theattention mechanism that assists on focusing on more relative context totargeted aspects. ATAE-LSTM appends aspect embeddings with each wordembedding, which strengthens the model by learning the hidden relation be-tween context and aspect [34]. The authors of [42] applied ATAE-LSTMmodel on Arabic hotel review dataset for ABSA .

MemNet applies multi-hop attention layers on context word embeddings ofthe sentence in order to accurately capture the signiﬁcance of the contextualwords [63].

IAN-LSTM uses two LSTMs for the purpose of modelizing the aspect andits context. Context hidden states are used to generate the target attentionvector, and target hidden states are used to generate the context attentionvector. On the basis of these two attention vectors, context representationsand target representation are created, then concatenated and ultimately fedinto softmax for classiﬁcation [22].

IAN-BLSTM extends IAN-LSTM by using bidirectional LSTM instead ofuni-direction LSTM to model aspect term and the context.

IAN-GRU like IAN-LSTM but use GRU instead of LSTM to model aspectterm and its context.

IAN-BGRU extends IAN-GRU by using bidirectional GRU instead of uni-direction GRU to model the aspect term and context.Models with * sign mean that results were adopted from their research with-out being practically re-implementedTo our knowledge, no other research has applied models without * sign onArabic aspect-based sentiment analysis in general and on Arabic hotel re-views for ABSA in particular. The test results achieved by the above modelson the Arabic hotel review dataset for ABSA is given on table 319 able 3: Models Accuracy Results on T3

Model AccuracyBaseline 76.4INSIGHT-1 (CNN)* 82.7LSTM 81.49TD-LSTM 81.79ATAE-LSTM* 82.25MEMNET 82.46IAN (LSTM) 83.18IAN (GRU) 83.68IAN (BLSTM) 83.48

IAN (BGRU)

LSTM achieves the poorest performance out of all neural network base-line methods because it deals with targets on a par with other context words.As the target information is not used suﬃciently.TD-LSTM outperforms LSTM, as it evolves from the standard LSTM andhandles both left and right contexts of target seprately. The targets are rep-resented twice and are emphasized in certain ways in the ﬁnal representation.Moreover, ATAE-LSTM stably outperforms TD-LSTM for its introductionof the attention mechanism. As it collects a range of signiﬁcant contextualinformation, under the guidance of the target and creates more accurate rep-resentations for ABSA. ATAE-LSTM also aﬃrms the importance of modelingtargets by including embeddings of aspect, which is the cause for enhance-ment in results.We re-implemented the model and obtained the same resultsmentioned in [42].INSIGHT-1(CNN) adopted the same idea by emphasizing the modeling oftargets by concatenating the aspect embedding with every word embeddingand applied convolution layer over it trying to simulate the eﬀect of attentionby using a set of ﬁlters that could capture the features in a better manner.That is the reason for performance improvement over ATAE-LSTM.IAN models emphasize the signiﬁcance of targets via interactive learningtarget and context representations. We can see that across all baselines,20AN provides the best performance. The key explanation for this could bethat IAN relies on two related attention networks that aﬀect one anotherfor modeling target and context interactively. IAN(BLSTM) outperformIAN(LSTM) since it could resolve the feed-forward model’s limited ability byextracting limitless contextual information from the front and back to modelaspect term and the context. we noticed that IAN(GRU) model achievedbetter results than both IAN(LSTM) and IAN(BLSTM) by about 0.5% and0.25% when replacing LSTM by GRU, that is due to the simplicity of GRU(since GRUs have less number of parameters) and ease of training and con-sequently ease of learning. And by using (IAN-BGRU) bidirectional GRUinstead of unidirectional, we were able to obtain the best results among allthe methods.After ﬁnding a noticeable improvement in performance when using mod-els that apply attention mechanism, we tried to adopt the idea of multipleattentions to synthesize important features in diﬃcult sentence structuresusing models such as MEMNET. However, we noticed no improvement inperformance. When using MEMNET, we found a slight improvement inperformance of about 0.4% compared to the ATAE-LSTM model and lowerresults than all IAN variations, that is maybe due to the complexity of themodel and need a mechanism to stop the attention process automatically ifno more useful information can be read. This is the same reason why we getbetter results when using GRU than LSTM in the IAN model (less complex).

Figure 6: achieved results for T3, aspect polarity identiﬁcation . Case study The importance of IAN-BGRU comes from relying on two related atten-tion networks that aﬀect one another for interactive modeling of context andtarget. The model can closely focus on important parts of the target andcontext and ultimately provide a clear representation of both, that enabledthe model to classify the diﬀerent sentiments that refer to diﬀerent aspectsof the same sentence.As shown in this review sentence : (cid:16)(cid:233)(cid:13)(cid:74)(cid:28)(cid:10)(cid:131) (cid:9)(cid:173)(cid:131)(cid:67)(cid:203) (cid:16)(cid:73) (cid:9)(cid:75)(cid:65)(cid:191) (cid:9)(cid:172)(cid:81) (cid:9)(cid:170)(cid:203)(cid:64) (cid:9)(cid:225)(cid:186)(cid:203) (cid:16)(cid:232)(cid:81) (cid:9)(cid:175)(cid:241)(cid:16)(cid:74)(cid:211) (cid:233)(cid:74)(cid:10) (cid:9)(cid:175)(cid:81)(cid:16)(cid:75) (cid:201)(cid:13)(cid:75)(cid:65)(cid:131)(cid:240) (cid:240) (cid:64)(cid:89)(cid:103)(cid:46) (cid:64)(cid:89)(cid:74)(cid:10)(cid:107)(cid:46) (cid:169)(cid:16)(cid:175)(cid:241)(cid:211) in English ”very good location and means of entertainment available but therooms were unfortunately bad”. This review has two diﬀerent aspects (lo-cation and rooms) with two diﬀerent sentiments (pos and neg). Even withthis diﬃculty, our model can correctly recognize and determine the expressedpolarity of each aspect.The main challenge of sentiment analysis is the ability of classifying the re-views sentiment polarity if there is shifting words like “not”. This is dueto the issue that it completely alters the polarity of aspects. For example,in this sentence “The hotel location is not good for the elderly”.TraditionalSA methods need shifting words lexicons for classifying the aspect polarity(Hotel

7. Conclusion and future work

ABSA can provide us with more detailed information than SA becauseit gives us information about sentiments for each aspect in the text. Threemain tasks for ABSA may, therefore, be deﬁned : T1: aspect category iden-tiﬁcation, T2: aspect opinion target extraction, and T3: aspect polaritydetection. The topics of this study are T2 and T3 tasks. Two GRU basedmodels were adopted, to handle research work. (a) Deep Learning model thattakes advantage of the representations on both words and characters via thecombination of bidirectional GRU, CNN, and CRF (BGRU-CNN-CRF) toextract the main opinionated aspect (OTE). (b) an interactive attention net-work based on bidirectional GRU (IAN-BGRU) to identify sentiment polaritytoward extracted aspects. We evaluated our models using the benchmarkedArabic hotel reviews dataset. The results indicates that the proposed meth-ods achieved better performance than baseline research on both tasks with22nhancement of 38.5% for opinion target extraction task (T2) ,and 7.5% foraspect sentiment polarity classiﬁcation (T3). Obtaining F1 score of 69.44%for T2 ,and accuracy of 83.98% for T3 .For future work, we will optimize the performance of our DL model workingon the OTE task by using FastText embeddings instead of Word2vec. Inaddition, we intend to utilize BERT for ABSA tasks.

References [1] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Man-andhar, M. AL-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De Clercq,V. Hoste, M. Apidianaki, X. Tannier, N. Loukachevitch, E. Kotelnikov,N. Bel, S. M. Jim´enez-Zafra, G. Eryi˘git, SemEval-2016 task 5: As-pect based sentiment analysis, in: Proceedings of the 10th Interna-tional Workshop on Semantic Evaluation (SemEval-2016), Associationfor Computational Linguistics, San Diego, California, 2016, pp. 19–30.URL: . doi: .[2] T. A. Rana, Y.-N. Cheah, Aspect extraction in sentiment analysis:comparative analysis and survey, Artiﬁcial Intelligence Review 46 (2016)459–483.[3] J. Zhao, K. Liu, L. Xu, Sentiment analysis: mining opinions, sentiments,and emotions, 2016.[4] M. Al-Smadi, O. Qawasmeh, M. Al-Ayyoub, Y. Jararweh, B. Gupta,Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of arabic hotels’ reviews, Journal of computa-tional science 27 (2018) 386–393.[5] M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceed-ings of the tenth ACM SIGKDD international conference on Knowledgediscovery and data mining, 2004, pp. 168–177.[6] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androut-sopoulos, S. Manandhar, SemEval-2014 task 4: Aspect based sentimentanalysis, in: Proceedings of the 8th International Workshop on Seman-tic Evaluation (SemEval 2014), Association for Computational Linguis-23ics, Dublin, Ireland, 2014, pp. 27–35. URL: . doi: .[7] M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, I. Androut-sopoulos, SemEval-2015 task 12: Aspect based sentiment analysis, in:Proceedings of the 9th International Workshop on Semantic Evalua-tion (SemEval 2015), Association for Computational Linguistics, Den-ver, Colorado, 2015, pp. 486–495. URL: . doi:10.18653/v1/S15-2082