SentiQ: A Probabilistic Logic Approach to Enhance Sentiment Analysis Tool Quality
Wissam Maamar Kouadri, Salima Benbernou, Mourad Ouziri, Themis Palpanas, Iheb Ben Amor
SSentiQ: A Probabilistic Logic Approach to EnhanceSentiment Analysis Tool Quality
Wissam Maamar Kouadri [email protected]é de Paris
Salima Benbernou [email protected]é de Paris
Mourad Ouziri [email protected]é de Paris
Themis Palpanas [email protected]é de Paris & InstitutUniversitaire de France (IUF)
Iheb Ben Amor
[email protected] Consulting
ABSTRACT
The opinion expressed in various Web sites and social-media is anessential contributor to the decision making process of several organi-zations. Existing sentiment analysis tools aim to extract the polarity(i.e., positive, negative, neutral) from these opinionated contents.Despite the advance of the research in the field, sentiment analysistools give inconsistent polarities, which is harmful to business de-cisions. In this paper, we propose SentiQ, an unsupervised Markovlogic Network-based approach that injects the semantic dimension inthe tools through rules. It allows to detect and solve inconsistenciesand then improves the overall accuracy of the tools. Preliminaryexperimental results demonstrate the usefulness of SentiQ.
CCS CONCEPTS • Machine learning → Text labelling ; Neural network ; Data qual-ity ; •
Information systems → First order logic.
KEYWORDS sentiment analysis, inconsistency , data quality , Markov logic net-work , logical inference
ACM Reference Format:
Wissam Maamar Kouadri, Salima Benbernou, Mourad Ouziri, Themis Pal-panas, and Iheb Ben Amor. 2020. SentiQ: A Probabilistic Logic Approachto Enhance Sentiment Analysis Tool Quality. In
Proceedings of Wisdom’20: Workshop on Issues of Sentiment Discovery and Opinion Mining (Wis-dom@KDD ’20).
ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/1122445.1122456
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]. WISDOM â ˘A´Z20,held in conjunction with KDD’20, August 24, 2020, San Diego, CA USA Âl’ 2020Copyright held by the owner/author(s). Publication rights licensed to WISDOMâ ˘A´Z20.See http://sentic.net/wisdom for details.
Wisdom@KDD ’20, August , 2020, SAN DIEGO, CA © 2020 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00https://doi.org/10.1145/1122445.1122456
With the proliferation of social media, people are increasingly shar-ing their sentiments and opinions online about products, services,individuals, and entities, which has spurred a growing interest in sen-timent analysis tools in various domains [8, 11, 13, 14, 17, 25, 27].The customer opinion, if yielded correctly, is crucial for the decision-making of any organization. Thus, numerous studies [3, 13, 16, 23,24] try to automate sentiment extraction from a massive volume ofdata by identifying the polarity of documents, i.e., positive, negative,or neutral.Nevertheless, sentiment analysis of social media data is still achallenging task[10] due to the complexity and variety of naturallanguage through which the same idea can be expressed and inter-preted using different text. Many research work have adopted theconsensus that semantically equivalent documents should have thesame polarity [3, 5, 12, 22, 26, 28]. For instance [5] have attrib-uted the same polarity labels to the semantically equivalent couples(event/effect) while [12] have augmented their sentiment datasetusing paraphrases and assign the original document’s polarity to thegenerated paraphrases.However, we found that these tools do not detect this similar-ity and assign different polarity labels to semantically equivalentdocuments; hence, considering in-tool inconsistency where the sen-timent analysis tool attribute different polarities to the semanticallyequivalent documents and inter-tool inconsistency where differentsentiment analysis tools attribute different polarities to the samedocuments that have a single polarity. This inconsistency can betranslated by the fact that at least one tool has given an incorrectpolarity. Consequently, returning an incorrect polarity in the querycan be misleading, and leads to poor business decision.Few works have used inconsistencies to improve systems’ ac-curacy, such as [20], that considers various labeling functions andminimizes the inter-tool inconsistency between them based on differ-ent factors: correlation, primary accuracy, and labelling abstinence.However, in [20], we resolve the inconsistency statistically, and ig-nore the semantic dimension that could enhance the results’ quality.The work in [5] has proposed to create a corpus of (event/effect)pairs for sentiment analysis by minimizing the sentiment distancebetween semantically equivalent (event/effect) pairs. In our work,we study the effect of solving the two types of inconsistency onaccuracy. We focus more on the improvement that we can obtain by a r X i v : . [ c s . A I] A ug isdom@KDD ’20, August , 2020, SAN DIEGO, CA Maamar kouadri et al. resolving in-tool inconsistency between the documents i.e., resolv-ing inconsistency such that all semantically equivalent documentsget the same polarity label and resolving both inconsistencies. To thebest of our knowledge, the only work studying polarity inconsistencydoes this at word-level [9], by checking the polarity consistency forsentiment words inside and across dictionaries.Our work is the first to study the effect of resolving the polarityinconsistency on accuracy for in-tool inconsistency, and inter-toolinconsistency on document data. We seek to converge to the goldentruth by resolving in-tool and inter-tool inconsistencies. Each doc-ument has a unique polarity, by resolving in-tool and inter-toolinconsistency, we minimize the gap of incorrect labels and convergeto the gold truth. Such a method can be applied on any classificationtask in natural language processing. Contributions.
In summary, we make the following contributions: • We study the impact of inconsistency on the accuracy of thesentiment analysis tools. • We propose SentiQ, an approach that resolves both polarityinconsistencies: in-tool and inter-tool. The approach we areproposing is based on our earlier work to handle the incon-sistency in big data [2] on one side and on the probabilisticlogic framework, Markov Logic Network, on the other side. • We present preliminary experimental results using news head-lines datasets [4] and the sentiment treebank dataset [24].When compared to the majority voting to resolve inter-toolinconsistencies, our framework leads to the efficiency of us-ing the semantic dimension in optimizing the accuracy byresolving both in-tool and inter-tool inconsistencies. • Following the lessons learned from our experimental eval-uation, we discuss promising future research directions, in-cluding the semantic dimension’s use in different integrationproblems, such as truth inference in crowd-sourcing and ac-curacy optimization of different classification problems.
Paper Outline.
In the remainder of the paper, we present in section2 a motivation through a real example. In section 3, we provide somepreliminaries used in our work. In sections 4 and 5, we discuss theSENTIQ model based on Markov Network logic (MLN) while insection 6, we present our experiments and discussions.
We consider the following real life example collected from twitterand that represents statements about Trump’s restrictions on Chinesetechnology such that D = { d , . . . , d } and: • d : Chinese technological investment is the next target inTrump’s crackdown. • d : Chinese technological investment in the US is the nexttarget in Trump’s crackdown. • d : China urges end to United States crackdown on Huawei. • d : China slams United States over unreasonable crackdownon Huawei. • d : China urges the US to stop its unjustifiable crackdownon Huawei. • d : Trump softens stance on China technology crackdown. • d : Donald trump softens threat of new curbs on Chineseinvestment in American firms. • d : Trump drops new restrictions on China investment. A i Id P tb P sw P v P h A d Neutral Negative Neutral Negative d Negative Negative Neutral Negative A d Negative Positive Neutral Negative d Negative Negative Neutral Negative d Negative Negative Neutral Negative A d Neutral Positive Neutral Positive d Negative Negative Negative Positive d Negative Positive Neutral Positive d Neutral Negative Neutral Positive
Table 1: Predicted polarity on dataset D by different tools • d : Donald Trump softens tone on Chinese investments.We call each element of this dataset D a Document . We noticethat D can be clustered on subsets of semantically equivalent doc-uments. For instance, d and d are semantically equivalent asthey both express the idea that the US is restricting Chinese tech-nological investments. We denote this set by A and we write: A = { d , d } and A = { d , d , d } , which express that the Chi-nese government demands the US to stop the crackdown on Huawei,and A = { d , . . . , d } which conveys the idea that Trump reducesrestrictions on Chinese investments. We have: D = A ∪ A ∪ A . Weanalyse D using three sentiment analysis tools: Stanford SentimentTreebank [24], Sentiwordnet [1] and Vader [13]. In the rest of thispaper, we refer to the results of these tools using the polarity func-tions: P tb , P sw , P v ; we use P h to refer to the ground truth. Table 1summarizes the results of the analysis.We know that each document has a single polarity, so each precisetool should find this polarity, and a difference in prediction results isa sign that at least one tool is erroneous on this document. We alsoknow that semantically equivalent documents should have the samepolarity. However, in this real-life example, we observe differenttools attributing different polarities for the same document (e.g.,only P tb attributes the correct polarity to d in A ), which representan inter-tool inconsistency. Also, the same tool attributes differentpolarities for semantically equivalent documents (for e.g., P tb con-siders d as Neutral and d as Negative) which represent an in-toolinconsistency. A trivial method to resolve those inconsistencies isto use majority voting, inside the cluster of documents, or betweenfunctions. However, when applying the majority voting baseline onthis example, we found that the polarity is Neдative in A whichrepresents the correct polarity of the cluster while we found thatthe polarity is Neдative in A , which is not a correct polarity inthis case. Because with simple majority voting, we got only a localvision of the polarity function, and we ignore its behavior on the restof the data. Definition 3.1. (Sentiment Analysis)
Sentiment Analysis is the process of extracting a polarity π ∈{ + , − , } from a document d i . With + for Positive polarity, − forNegative polarity and for Neutral polarity. In this paper, we referto polarity functions as P t k s.t: P t k : D → π . We refer to the set ofall functions as Π s.t Π = { P t , . . . , P t n } entiQ: A Probabilistic Logic Approach to Enhance Sentiment Analysis Tool Quality Wisdom@KDD ’20, August , 2020, SAN DIEGO, CA Definition 3.2. (Polarity Consistency)
Cluster: cluster is a set of semantically equivalent documents:for a cluster A l = { d , . . . , d n } we have ∀ d i , d j ∈ A l , d i s ⇐⇒ d j . Sentiment Quality: we define the polarity consistency of a givencluster A i as the two following rules: In-tool Consistency means that semantically equivalent documentsshould get the same polarity, s.t.: ∀ d i , d j ∈ A , P ∗ ∈ Π P ∗ ( d i ) = P ∗ ( d j ) (1) Inter-tool Consistency means that all polarity functions should givethe same polarity to the same document: ∀ d i ∈ A , P t k , P t ′ k ∈ Π P t k ( d i ) = P t ′ k ( d i ) (2) Definition 3.3. (Markov Logic Network (MLN))In this section, We recall Markov logic network (MLN) model [6,21] which is a general framework for joining logical and Probability.
MLN is defined as a set of weighted first-order logic (FOL) formulawith free variables L = {( l , w ) , . . . , ( l n , w n )} , with w i ∈ IR ∪ ∞ and l i an FOL constraint. With a set of constants C = { c , . . . , c m } ,it constitutes the Markov network M L , C . The M L , C contains onenode for each predicate grounding that its value is 1 if the groundingis true and 0 otherwise. Each formula of L is represented by a featurenode that its value is 1 if the formula l i grounding is true and 0otherwise. The syntax of the formulas that we adopted in this paperis the FOL syntax. World x over a domain C is a set of possible grounding of MLN constraints over C . Hard Constraints are constraints with infinite weight w i = ∞ . Aworld x that violates these constraints is impossible. Soft Constraints are constraints with a finite weight ( w i ∈ IR ) thatcan be violated. World’s Probability is the probability distribution of possible worlds x in M L , C given by Pr ( X = x ) = Z exp ( (cid:213) i w i , n i ( x )) , where n i ( x ) is the number of the true grounding of F i in x and Z isa normalization factor. Grounding.
We define grounding as the operation of replacing pred-icate variables by constants from C . The polarity inconsistency is a complex problem due to the tool anddocument natures and the relations between them. This problemcan be solved using semantics to model the relations between tools,documents, and statistic dimension to optimize both the inconsis-tency and the accuracy of the system —this why we chose
MLN tomodel the resulted inconsistent system. We present the details of oursemantic model in this section.
Our semantic model is a knowledge-base KB = < R , F > , where (1) R is a set of rules (FOL formulas) defining the vocabulary ofour application which consists of concepts (sets of individu-als) and relations between them.(2) F is a set of facts representing the instances of the conceptsor individuals defined in R .We represent each document by the concept Document , each po-larity function in the system by its symbol and the polarity that it at-tributes to the
Document . For instance, P tb + ( d ) , P tb , and P tb − rep-resent respectively the polarities (+, 0, -) attributed to the Document ( d ) by the polarity function P tb . Each Document is Positive , Neдative ,or
Neutral . This is represented respectively by the concepts
IsPositive , IsNeдative , and
IsNeutral . We also have the relation sameAs as asemantic similarity between documents in the input dataset clusters.For instance, sameAs ( d , d ) indicates that the documents Document ( d ) and Document ( d ) are semantically equivalent. We define two types of rules from R in our framework, Inferencerules and
Inconsistency rules : Inference rules IR
The inference rules allow deriving the implicit instances. Theymodel the quality of the polarity at in-tool and inter-tool levels.They are soft rules that add an uncertainty layer to different polarityfunctions based on the inconsistency of tools. • In-tool consistency rules.
This set of rules models the fact that allthe documents of the cluster should have the same polarity. They aredefined as follows (for the sake of clarity we omitted the predicate
Document ( d i ) in all logical rules) : IR sameAs ( ? d i , ? d j ) ∧ IsPositive ( ? d j ) → IsPositive ( ? d i ) IR sameAs ( ? d i , ? d j ) ∧ IsPositive ( ? d i ) → IsPositive ( ? d j ) The rule IR denotes that if two documents d i and d j are semanti-cally equivalent (expressed with sameAs relation), they got the samepolarity, which translates the in-tool consistency defined in equa-tion 1. The sameAs relation is transitive, symmetric, and reflexive.We express the symmetry by duplicating the rule for both documentsof the relation (rules IR and IR instead of only one rule). For in-stance, when applying the rule IR on the relation sameAs ( d , d ) and the instances IsNeutral ( d ) and IsNeдative ( d ) , we infer thenew instance IsNeutral ( d ) . The instance IsNeдative ( d ) is in-ferred when applying the rule IR . The transitivity is handled inthe instantiating step (algorithm 1) and we ignore the reflexivity ofthe relation because it does not infer additional knowledge. Notethat IR and IR are examples of rules. The set of rules is presentedin Algorithm 2. • Inter-tool consistency rules.
These rules model the inter-toolconsistency described in equation 2 by assuming that each functiongives the correct polarity to the document. For example, given theinstances P tb − ( d ) the rule IR infers IsNeдative ( d ) . For each toolin the system, we create the following rules by replacing P t k ∗ withthe polarity function of the tool. IR P t k + ( ? d i ) → IsPositive ( ? d i ) IR P t k − ( ? d i ) → IsNeдative ( ? d i ) IR P t k ( ? d i ) → IsNeutral ( ? d i ) isdom@KDD ’20, August , 2020, SAN DIEGO, CA Maamar kouadri et al. Those rules are soft rules that allow us to represent inconsistenciesin the system and attribute a ranking to the rules that we use in thein-tools uncertainty calculation. The idea behind this modeling isthat if the inter-tool consistency is respected, all tools will attributethe same polarity to this document; otherwise, the document willhave different polarities (contradicted polarities). To represent thiscontradiction, we define, next, inconsistency rules.
Inconsistency rules ICR
They are considered as hard rules that represent the disjunctionbetween polarities since each document has a unique polarity.
ICR
IsPositive ( ? d i ) → ¬ IsNeдative ( ? d i ) ∧ ¬ IsNeutral ( ? d i ) ICR
IsNeдative ( ? d i ) → ¬ IsPositive ( ? d i ) ∧ ¬ IsNeutral ( ? d i ) ICR
IsNeutral ( ? d i ) → ¬ IsPositive ( ? d i ) ∧ ¬ IsNeдative ( ? d i ) These rules generate negative instances that create inconsistenciesused in learning inference rules weights.For instance, consider the following instances P tb − ( d ) and P sw + ( d ) from the motivating example. By applying the inter-tool consistencyinference rules, we infer: IsNeдative ( d ) and IsPositive ( d ) . How-ever, F appears consistent even it contains polarity inconsistencies.We get the inconsistency once applying the inconsistency rules. Weget: ¬ IsPositive ( d ) , ¬ IsNeutral ( d ) , ¬ IsNeдative ( d ) that repre-sent an apparent inconsistency in F. As depicted in Figure 1, the proposed inconsistency resolution pro-cess follows four main phases: • Inference of implicit knowledge: The system infers all im-plicit knowledge needed for the inconsistencies before apply-ing the learning procedure of
MLN . • Detection of inconsistencies: Having the explicit and implicitknowledge, we apply the inconsistency rules to discover theinconsistency. • Inference of correct polarity: Using the saturated fact set F and R , the system learns first the weights of M R , F , and usethem to infer the correct polarities. • Resolve the in-tool inconsistencies : Since we are in an un-certain environment, we can still have some in-tool inconsis-tencies after the previous phase, that we resolve by applyinga weighted majority voting.The phases will be detailed in the next section.
In this section we discuss the reasoning process to solve the incon-sistencies and improve the accuracy.
Our data are first saved in a relational database, where each tablerepresents a concept, and the table content represents the concept’sdomain. For that, instantiating our data follows the steps of Algo-rithm 1.Each function and its polarity is represented by a table. The con-tent of the table is the document ID that got this polarity by the function. The instantiating process converts the content of the data-base to logic predicates that we use in our reasoning. The purpose ofthis algorithm is to fill in the set F with the prior knowledge neededin the reasoning. Our prior knowledge is the documents, polaritiesattributed by the functions to documents, and the semantic similaritybetween documents represented by the SameAs predicate. We notethat we do not consider the ground truth. We adopt an unsupervisedapproach because inconsistency resolution is useful when we do notknow the valid prediction from the invalid ones.
Algorithm 1
Instantiating
Input : Database with prior knowledge
Output : F:Set of generated Facts (polarities and same as) procedure I NSTANTIATING //Step1: Add all Polarities attributed to documents for each P t k ∈ Functions : for each d i ∈ P t + k : F.add ( P t + k ( d i )) for each d i ∈ P t − k : F.add ( P t − k ( d i )) for each d i ∈ P t k : F.add ( P t k ( d i )) //Step2: Add sameAs relations clusters = groupeByClusterId(D) for each cluster ∈ clusters : for i ∈ {0,. . . , len(cluster)} : for j ∈ {i+1 ,. . . , len(cluster)} : if SameAs ( d i , d j ) (cid:60) F : F . add ( SameAs ( d i , d j )) return F In MLN , the learning is done only on the available knowledge in F .For this, we infer all implicit knowledge in the system before ap-plying the learning procedure. The inference procedure is presentedin Algorithm 2. This inference phase is crucial for an integratedlearning since most polarity knowledge are implicit. For instance,consider the two documents d and d from the motivating example.We have P sw + ( d ) and P sw − ( d ) , by inferring documents polaritiesusing inter-tool consistency rules IR and IR , we get IsPositive ( d ) and IsNeдative ( d ) . When applying the in-tool consistency rules onthe previous concepts and the relation sameAs ( d , d ) ( IR and IR ),we infer the new polarities IsNeдative ( d ) and IsPositive ( d ) .We ensure that we inferred all implicit knowledge by redoingthe inference until no new knowledge are inferred. Such process iscalled inference by saturation. After inferring all implicit knowledge in the set F , we apply theinconsistency rules ICR that allow to explicitly define the incon-sistencies as it is presented in Algorithm 3. We apply this ruleson a saturated knowledge base because most inconsistencies areimplicit. For instance, if we apply the inconsistency rules directlyafter inferring the polarities
IsPositive ( d ) and IsNeдative ( d ) , weget ¬ IsNeдative ( d ) , and ¬ IsPositive ( d ) . However, when applyingthe in-tool consistency rules on the previous concepts and relation sameAs ( d , d ) (saturation process), we obtain IsNeдative ( d ) and entiQ: A Probabilistic Logic Approach to Enhance Sentiment Analysis Tool Quality Wisdom@KDD ’20, August , 2020, SAN DIEGO, CA Algorithm 2
Implicit Knowledge Inference Algorithm
Input : F with prior knowledдe
Output : Saturated F procedure I NFERENCE //step1: Infer Polarities by applying // inter-tool consistency rules functions=D.getFunctions( ) d i for each P t + k ( d i ) ∈ functions : if IsPositive ( d i ) (cid:60) F : F . add ( IsPositive ( d i )) for each P t − k ( d i ) ∈ functions : if IsNeдative ( d i ) (cid:60) F : F . add ( IsNeдative ( d i )) for each P t k ( d i ) ∈ functions : if IsNeutral ( d i ) (cid:60) F : F . add ( IsNeutral ( d i )) //step2: Infer Polarities by applying // in-tool consistency rules sameAsRelations = дetSameAs ( F ) repeat: for each SameAs ∈ sameAsRelations : if IsPositive ( d i ) ∈ F ∧ IsPositive ( d j ) (cid:60) F : F . add ( IsPositive ( d j )) if IsPositive ( d j ) ∈ F ∧ IsPositive ( d i ) (cid:60) F : F . add ( IsPositive ( d i )) if IsNeдative ( d i ) ∈ F ∧ IsNeдative ( d j ) (cid:60) F : F . add ( IsNeдative ( d j )) if IsNeдative ( d j ) ∈ F ∧ IsNeдative ( d i ) (cid:60) F : F . add ( IsNeдative ( d i )) if IsNeutral ( d i ) ∈ F ∧ IsNeutral ( d j ) (cid:60) F : F . add ( IsNeutral ( d j )) if IsNeutral ( d j ) ∈ F ∧ IsNeutral ( d i ) (cid:60) F : F . add ( IsNeutral ( d i )) until: no new inferred instance return: F IsPositive ( d ) , and when applying the inconsistency rules on thisinstances, we get ¬ IsPositive ( d ) and ¬ IsNeдative ( d ) which rep-resents an implicit inconsistency in the fact set F . Therefore, ap-plying the inconsistency rules on F after the saturation process isan important step in our reasoning procedure, because it shows allinconsistencies even the implicit ones. Here we discuss how to reduce the inconsistencies discovered inthe previous phase, by applying the
MLN approach. The reasoningprocess will first learn the rules’ weights of R and after will use themto infer the correct polarities. Grounding.
The grounding algorithm enumerates all possible as-signments of formulas to its free variables. (the set of possibleworlds). We used the grounding algorithm described in [19] be-cause it speeds up the inference process. We adopted the closedworld assumption; hence we consider all groundings that are notpresent in the Fact set as false.
Learning.
To learn the rules’ weights, we use the discriminativetraining described in [18]. The training consists of optimizing theconditional log-likelihood given by:
Figure 1: SentiQ overview log − P ( Y = y | X = x ) = log Z x − (cid:213) i w i n i ( x , y ) where X represents priors (saturated inconsistent fact set), Y the setof queries (in our case: IsPositive ( d ) , IsNeдative ( d ) , IsNeutral ( d ) ), Z x the normalization factor over the set of worlds, and n i ( x , y ) thenumber of the correct groundings of the formula l i (the inferencerules) in the worlds where Y holds.We used in the optimization the Diagonal Newton discriminativemethod described in [18] that calculates the Hessain of the negativeconditional log-likelihood given by: ∂∂ w i ∂ w j − loдP ( Y = y | X = x ) = E w [ n i n j ] − E w [ n i ] E w [ n j ] With E w the expectation. We call the inference procedure MC-SAT to estimate the number of satisfied (correct) formulas ( n i , n j ).We can see that we consider the rules independently in the learn-ing process. We calculate the number of each formula’s correctgrounding separately in the world; hence we do not take into consid-eration the implicit knowledge, which justifies the inference of allimplicit knowledge and inconsistencies before learning. Inference.
The inference in
MLN [21] contains two steps, ground-ing step, where we sample all possible worlds based on the priors andconstruct a large weighted Sat formula used in satisfiability calcula-tion, and search step to find the best weight assignment to this Satformula. In our work, we used the marginal inference algorithm thatestimates the atoms’ probability and returns the query answer witha probability score representing the confidence. It uses the MC-Satalgorithm, which combines satisfiability verification with MCMC by isdom@KDD ’20, August , 2020, SAN DIEGO, CA Maamar kouadri et al. calling in each step the SampleSat algorithm that is a combination ofSimulated Annealing and WalkSat. Note that the walkSat algorithmselects in each iteration an unsatisfiable clause, selects an atom fromthe clause, and flip its truth value to satisfy the clause.
Majority voting could be a solution to the inconsistency problem.However, this trivial method takes into consideration only the votingsubset (cluster) and ignores information about the voters (polarityfunctions) from the other voting subsets (other clusters), which mayhurt the accuracy.To enhance the quality in terms of accuracy of the inconsistencyissue resolution, the process in SentiQ follows two steps:
Step1.
We use
MLN to model the different inconsistencies andselect the most appropriate polarity of the set (phase 1 to phase 3 ofthe process). We illustrate in Figure 1 the global workflow of oursystem. As input, we have an unlabeled dataset D (1) that we clusterto group the semantically equivalent documents in clusters. Then, (2)we extract the polarities from the documents using different polarityfunctions ( P tb , P sw , P v ). After that, (4) we construct our knowledgebase KB by creating first the fact set F (Algorithm 1).(5) We inferall implicit knowledge by applying inference rules ( IR ) on the Factset until saturation( Algorithm 2). Then we apply inconsistencyrules (ICR) to generate different inconsistencies between polarities(Algorithm 3). (7) We learn the weights of inference rules. (8) Theoutput of the learning procedure is a set of weighted inference rulesthat we apply on the prior knowledge to infer the most appropriatepolarities for documents.Running the motivating example in this system shows an improve-ment in both the consistency and accuracy (accuracy of 88% and alow inconsistency).
Step2. (phase 4 of the process) As we still have inconsistenciesfrom the previous step, we propose to resolve those remaining in-consistencies by using weighted majority voting with as weights thepolarities probability, which leads to an accuracy of 100% on themotivating example.
Algorithm 3
Discover inconsistencies
Input : Saturated F
Output : Inconsistent F procedure I NCONSISTENCY I NFERENCE //step1: get all polarities from the F // and apply inconsistency rules polarities=F.getPolarities() for each Polarity ∈ polarities : if Polarity == IsPositive ( d i ) : F . add (¬ IsNeдative ( d i )) F . add (¬ IsNeutral ( d i )) if Polarity == IsNeдative ( d i ) : F . add (¬ IsPositive ( d i )) F . add (¬ IsNeutral ( d i )) if Polarity == I Neutral ( d i ) : F . add (¬ IsPositive ( d i )) F . add (¬ IsNeдative ( d i )) Figure 2: Accuracy optimization on stanfod tree-bankFigure 3: Accuracy optimization on news headlines
Tools.
In our experiments, we use five representative sentiment anal-ysis tools, a convolutional neural network with word embedding as P cnn _ txt [16], a convolutional neural network with character embed-ding as P char _ cnn [7] , [13] as P v , [1] as P sw and [24] as P tb . Wechose these tools because of their performance and their associationwith different methods’ categories, so they have different behaviorswithin inconsistency. Dataset.
We studied the effect of inconsistency resolution on accu-racy using two publicly available datasets for sentiment analysis:News headlines dataset [4] and the test data of the sentiment tree-bank dataset [24] (sst). To consider the in-tool inconsistency, and forexperimental purposes, we augmented the datasets with paraphrasesusing a generative adversarial network (GAN) [15].For each document in the dataset, we generated three paraphraseddocuments with the same polarity as the original one. These datasetsallow us to study the effect of resolving in-tool and inter-tool in-consistency on accuracy. Note that in our future work, we use aclustering method on the data to create our clusters.Statistics about the datasets are presented in Table 2
Experiments.
To evaluate the efficiency of resolving inconsisten-cies using SentiQ on the accuracy of the system, we compare it tothe Majority Voting (MV) baseline. We use MV to resolve the in-tool inconsistency, inter-tool inconsistency, and both inconsistencies; entiQ: A Probabilistic Logic Approach to Enhance Sentiment Analysis Tool Quality Wisdom@KDD ’20, August , 2020, SAN DIEGO, CA
Statistics
News _ heads SST
Table 2: Statistics on datasets. then, we calculate the accuracy on the dataset after resolving con-tradictions. The majority voting for in-tool inconsistency resolutionconsists of calculating the most repeated polarity in the cluster andattributes it to all cluster documents : P t k ( A ) = arдmax { + , , −} { (cid:205) d i ∈ A ( P tk ( d i ) =+ ) , (cid:205) d i ∈ A ( P tk ( d i ) = ) , (cid:205) d i ∈ A ( P tk ( d i ) = −) } . Inter-tool inconsistency resolution using ma-jority voting consists of attributing to the document the polarityattributed by most tools: P ∗ ( d i ) = arдmax { + , , −} { (cid:205) P t k ∈ Π ( P t k ( d i ) =+ ) , (cid:205) P t k ∈ Π ( P t k ( d i ) = ) , (cid:205) P tk ∈ Π ( P tk ( d i ) = −) } . Resolving both inconsistencies with MVconsists of considering in the cluster all polarities given by polar-ity functions and attributing to each document the most repeatedpolarity. Accuracy Optimization with SentiQ.
To evaluate the accuracy im-provement obtained by SentiQ, we run SentiQ on the two datasetsNews headlines and SST. The Figures 2,3 present the accuracy ofresolving inconsistencies using SentiQ on the two datasets SST andnews headlines respectively with the two queries
IsNeдative ( d ) and IsPositive ( d ) and the polarity functions P char _ cnn , P text _ cnn , P sw and P v .We observe an accuracy improvement of 0.629 and 0.56 on thetwo datasets SST and the news headlines, respectively. These pre-liminary results prove the efficiency of resolving both in-tool in-consistency and inter-tool inconsistency using SentiQ to improvethe accuracy. To analyze the performances and limits of SentiQ, wecompare it in the next section to the MV baseline in the presence ofvariable-sized datasets. Accuracy optimization and dataset size.
The results are pre-sented in the Table 3.We evaluate the accuracy optimization of polarity functions onsamples of different sizes ( , , and ) from the newsheadlines dataset using SentiQ and MV to resolve in-tool incon-sistencies, inter-tool inconsistencies, and both of them. "OriginalAcc" represents the original accuracy of the polarity function onthis dataset, while "MV in-tool" represents the accuracy on differentsamples after resolving in-tool inconsistency using MV. "Inter-toolMV" represents the overall accuracy of the system after solvinginter-tool inconsistencies, and the last line of the table represents theaccuracy obtained after inferring the polarity of the whole systemusing our SentiQ. Results.
We observe that resolving in-tool inconsistency increasesthe accuracy of tools in most of the cases. The only case wherewe have accuracy degradation corresponds to the tool P v , whereaccuracy changes from acc = . to acc = . after resolvinginconsistencies. When analyzing the data of this case, we found thatmost of this tool’s predictions where Neutral instead of the dataground truth. As a result, majority voting falsified the results of thecorrectly predicted instances. Resolving inter-tool inconsistency using majority voting decreaseseffectiveness in the case of tools that are incompatible in terms of ac-curacy (i.e., having widely different accuracy scores). Like the caseof the two samples of size=25 and size=100 of the Table 3, wherethe weak nature of P v , P sw , and P tb on the datasets has influencedthe performance of the voting system (accuracy decreased from . to . on the dataset of size 25 and from . to . on the datasetof size 100). SentiQ addresses this problem, because it weighs differ-ent tools based on the inconsistencies on the whole dataset. SentiQprovides an accuracy improvement of . on the first dataset, . on the second, and . on the third dataset, outperforming majorityvoting.This leads to other research problems, especially that of scalabil-ity, since we could not run experiments with a larger dataset, due tothe high inference complexity of the Markov solver. Therefore, weneed a more efficient Markov logic solver adapted to analyze largescale social media data.We also observe that the MLN solver deletes some rules from themodel (by attributing a negative, or a weight), which can penalizethe inference. The final results of the system could be boosted byadding business rules that can improve the polarity inference. Thisapproach can be applied to various problems such as truth inferencein crowd-sourcing, and other classification problems. We provedthat resolving both in-tool and inter-tool inconsistency outperformsusing only inter-tool inconsistencies. In this paper, we presented an MLN-based approach to solve incon-sistencies and improve classification accuracy. Our results show theefficiency of including semantics to resolve in-tool inconsistency.The initial results of SentiQ are promising and confirm that resolv-ing in-tool inconsistency boosts accuracy. However, to test SentiQefficiency in resolving inconsistencies and improving the accuracyof social media data, we need MLN solvers that can scale with thedata size. Finally, we plan to investigate the use of domain expertrules for improving the polarity inference of SentiQ.
This work has been supported by the ANRT French program andIMBA consulting.
REFERENCES [1] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet3.0: an enhanced lexical resource for sentiment analysis and opinion mining.. In
Lrec . 2200–2204.[2] Salima Benbernou and Mourad Ouziri. 2017. Enhancing data quality by cleaninginconsistent big RDF data. In . IEEE, 74–79.[3] Erik Cambria, Soujanya Poria, Devamanyu Hazarika, and Kenneth Kwok. 2018.SenticNet 5: discovering conceptual primitives for sentiment analysis by means ofcontext embeddings. In
Proceedings of AAAI .[4] Keith Cortis, André Freitas, Tobias Daudert, Manuela Huerlimann, Manel Zarrouk,Siegfried Handschuh, and Brian Davis. 2017. Semeval-2017 task 5: Fine-grainedsentiment analysis on financial microblogs and news. In
Proceedings of the 11thInternational Workshop on Semantic Evaluation (SemEval-2017) . 519–535.[5] Haibo Ding and Ellen Riloff. 2018. Weakly supervised induction of affectiveevents by optimizing semantic consistency. In
Thirty-Second AAAI Conference onArtificial Intelligence .[6] Pedro M. Domingos and Daniel Lowd. 2019. Unifying logical and statistical AIwith Markov logic.
Commun. ACM
62, 7 (2019), 74–83. https://doi.org/10.1145/3241978 isdom@KDD ’20, August , 2020, SAN DIEGO, CA Maamar kouadri et al.
Tools Original Acc
MV in − tool size = 25 100 500 1500 size = 25 100 500 1500 P char _ cnn P cnn _ txt P sw P tb P v inter _ tool MV SentiQ
N/A N/A N/A N/A
Table 3: Accuracy of tools before/after inconsistency resolution. The best performance for each dataset size is marked in bold. [7] Cicero Dos Santos and Maira Gatti. 2014. Deep convolutional neural networksfor sentiment analysis of short texts. In
Proceedings of COLING 2014, the 25thInternational Conference on Computational Linguistics: Technical Papers . 69–78.[8] Mauro Dragoni and Giulio Petrucci. 2018. A fuzzy-based strategy for multi-domain sentiment analysis.
International Journal of Approximate Reasoning
IEEE Trans. Knowl. Data Eng.
27, 3 (2015), 838–851.[10] DI Hernández Farias and Paolo Rosso. 2017. Irony, sarcasm, and sentimentanalysis. In
Sentiment Analysis in Social Networks . Elsevier, 113–128.[11] Ronen Feldman. 2013. Techniques and applications for sentiment analysis.
Com-mun. ACM
56, 4 (2013), 82–89.[12] Guohong Fu, Yu He, Jiaying Song, and Chaoyue Wang. 2014. Improving Chinesesentence polarity classification via opinion paraphrasing. In
Proceedings of TheThird CIPS-SIGHAN Joint Conference on Chinese Language Processing . 35–42.[13] CJ Hutto Eric Gilbert. 2014. Vader: A parsimonious rule-based model for senti-ment analysis of social media text. In
Eighth International Conference on Weblogsand Social Media (ICWSM-14). Available at (20/04/16) http://comp. social. gatech.edu/papers/icwsm14. vader. hutto. pdf .[14] Stephan Greene and Philip Resnik. 2009. More than words: Syntactic packagingand implicit sentiment. In
Proceedings of human language technologies: The2009 annual conference of the north american chapter of the association forcomputational linguistics . Association for Computational Linguistics, 503–511.[15] Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adver-sarial Example Generation with Syntactically Controlled Paraphrase Networks. In
Proceedings of the 2018 Conference of the North American Chapter of the Associ-ation for Computational Linguistics: Human Language Technologies, Volume 1(Long Papers) . 1875–1885.[16] Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXivpreprint arXiv:1408.5882 (2014).[17] Efthymios Kouloumpis, Theresa Wilson, and Johanna D Moore. 2011. Twittersentiment analysis: The good the bad and the omg!
Icwsm
11, 538-541 (2011),164.[18] Daniel Lowd and Pedro Domingos. 2007. Efficient weight learning for Markovlogic networks. In
European conference on principles of data mining and knowl-edge discovery . Springer, 200–211.[19] Feng Niu, Christopher Ré, AnHai Doan, and Jude Shavlik. 2011. Tuffy: Scalingup statistical inference in markov logic networks using an rdbms. arXiv preprintarXiv:1104.3216 (2011).[20] Alexander Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason A. Fries, SenWu, and Christopher Ré. 2020. Snorkel: rapid training data creation with weaksupervision.
VLDB J.
29, 2 (2020), 709–730. https://doi.org/10.1007/s00778-019-00552-1[21] Matthew Richardson and Pedro M. Domingos. 2006. Markov logic networks.
Mach. Learn.
62, 1-2 (2006), 107–136. https://doi.org/10.1007/s10994-006-5833-1[22] Julian Risch and Ralf Krestel. 2018. Aggression identification using deep learn-ing and data augmentation. In
Proceedings of the First Workshop on Trolling,Aggression and Cyberbullying (TRAC-2018) . 150–158.[23] Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysiswith deep convolutional neural networks. In
Proceedings of the 38th InternationalACM SIGIR Conference on Research and Development in Information Retrieval .ACM, 959–962.[24] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning,Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semanticcompositionality over a sentiment treebank. In
Proceedings of the 2013 conferenceon empirical methods in natural language processing . 1631–1642.[25] Mikalai Tsytsarau and Themis Palpanas. 2016. Managing Diverse Sentiments atLarge Scale.
IEEE Trans. Knowl. Data Eng.
28, 11 (2016), 3028–3040. [26] Soroush Vosoughi, Prashanth Vijayaraghavan, and Deb Roy. 2016. Tweet2vec:Learning tweet embeddings using character-level cnn-lstm encoder-decoder. In
Proceedings of the 39th International ACM SIGIR conference on Research andDevelopment in Information Retrieval . ACM, 1041–1044.[27] Yequan Wang, Aixin Sun, Jialong Han, Ying Liu, and Xiaoyan Zhu. 2018. Sen-timent Analysis by Capsules. In
Proceedings of the 2018 World Wide Web Con-ference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018 . 1165–1174.[28] Jason W Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques forboosting performance on text classification tasks. arXiv preprint arXiv:1901.11196arXiv preprint arXiv:1901.11196