[PDF] A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles

Abstract

This paper presents a novel two-stage framework to extract opinionated sentences from a given news article. In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionated. In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article and relation between the sentences. In the HITS schema, the opinionated sentences are treated as Hubs and the facts around these opinions are treated as the Authorities. The algorithm is implemented and evaluated against a set of manually marked data. We show that using HITS significantly improves the precision over the baseline Naive Bayes classifier. We also argue that the proposed method actually discovers the underlying structure of the article, thus extracting various opinions, grouped with supporting facts as well as other supporting opinions from the article.

Full PDF

AA Novel Two-stage Framework for Extracting Opinionated Sentencesfrom News Articles

Rajkumar Pujari , Swara Desai , Niloy Ganguly and Pawan Goyal Dept. of Computer Science and Engineering,Indian Institute of Technology Kharagpur, India – 721302 Yahoo! India [email protected], { niloy,pawang } @cse.iitkgp.ernet.in [email protected] Abstract

This paper presents a novel two-stageframework to extract opinionated sentencesfrom a given news article. In the ﬁrst stage,Na¨ıve Bayes classiﬁer by utilizing the localfeatures assigns a score to each sentence- the score signiﬁes the probability of thesentence to be opinionated. In the secondstage, we use this prior within the HITS(Hyperlink-Induced Topic Search) schema toexploit the global structure of the article andrelation between the sentences. In the HITSschema, the opinionated sentences are treatedas Hubs and the facts around these opinionsare treated as the Authorities. The algorithmis implemented and evaluated against a set ofmanually marked data. We show that usingHITS signiﬁcantly improves the precisionover the baseline Na¨ıve Bayes classiﬁer.We also argue that the proposed methodactually discovers the underlying structure ofthe article, thus extracting various opinions,grouped with supporting facts as well as othersupporting opinions from the article.

With the advertising based revenues becoming the mainsource of revenue, ﬁnding novel ways to increasefocussed user engagement has become an importantresearch topic. A typical problem faced by webpublishing houses like Yahoo!, is understanding thenature of the comments posted by readers of articles posted at any moment on its website. A lotof users engage in discussions in the comments sectionof the articles. Each user has a different perspectiveand thus comments in that genre - this many a times,results in a situation where the discussions in thecomment section wander far away from the article’stopic. In order to assist users to discuss relevant pointsin the comments section, a possible methodology canbe to generate questions from the article’s content thatseek user’s opinions about various opinions conveyedin the article (Rokhlenko and Szpektor, 2013). Itwould also direct the users into thinking about aspectrum of various points that the article coversand encourage users to share their unique, personal, daily-life experience in events relevant to the article.This would thus provide a broader view point forreaders as well as perspective questions can be createdthus catering to users with rich user generated content,this in turn can increase user engagement on the articlepages. Generating such questions manually for hugevolume of articles is very difﬁcult. However, if onecould identify the main opinionated sentences withinthe article, it will be much easier for an editor togenerate certain questions around these. Otherwise, thesentences themselves may also serve as the points fordiscussion by the users.Hence, in this paper we discuss a two-stagealgorithm which picks opinionated sentences fromthe articles. The algorithm assumes an underlyingstructure for an article, that is, each opinionatedsentence is supported by a few factual statements thatjustify the opinion. We use the HITS schema toexploit this underlying structure and pick opinionatedsentences from the article.The main contribtutions of this papers are as follows.First, we present a novel two-stage framework forextracting opinionated sentences from a news article.Secondly, we propose a new evaluation metric thattakes into account the fact that since the amountof polarity (and thus, the number of opinionatedsentences) within documents can vary a lot and thus,we should stress on the ratio of opinionated sentencesin the top sentences, relative to the ratio of opinionatedsentences in the article. Finally, discussions on how theproposed algorithm captures the underlying structureof the opinions and surrounding facts in a news articlereveal that the algorithm does much more than justextracting opinionated sentences.This paper has been organised as follows. Section2 discusses related work in this ﬁeld. In section 3, wediscuss our two-stage model in further details. Section4 discusses the experimental framework and the results.Further discussions on the underlying assumptionbehind using HITS along with error analysis are carriedout in Section 5. Conclusions and future work aredetailed in Section 6. Opinion mining has drawn a lot of attention in recentyears. Research works have focused on mining a r X i v : . [ c s . C L ] J a n pinions from various information sources such asblogs (Conrad and Schilder, 2007; Harb et al., 2008),product reviews (Hu and Liu, 2004; Qadir, 2009; Daveet al., 2003), news articles (Kim and Hovy, 2006;Hu and Liu, 2006) etc. Various aspects in opinionmining have been explored over the years (Ku etal., 2006). One important dimension is to identifythe opinion holders as well as opinion targets. (Lu,2010) used dependency parser to identify the opinionholders and targets in Chinese news text. (Choiet al., 2005) use Conditional Random Fields toidentify the sources of opinions from the sentences.(Kobayashi et al., 2005) propose a learning basedanaphora resolution technique to extract the opiniontuple < Subject, Attribute, V alue > . Opinionsummarization has been another important aspect (Kimet al., 2013).A lot of research work has been done for opinionmining from product reviews where most of the textis opinion-rich. Opinion mining from news articles,however, poses its own challenges because in contrastwith the product reviews, not all parts of news articlespresent opinions (Balahur et al., 2013) and thus ﬁndingopinionated sentences itself remains a major obstacle.Our work mainly focus on classifying a sentence in anews article as opinionated or factual. There have beenworks on sentiment classiﬁcation (Wiebe and Riloff,2005) but the task of ﬁnding opinionated sentences isdifferent from ﬁnding sentiments, because sentimentsmainly convey the emotions and not the opinions.There has been research on ﬁnding opinionatedsentences from various information sources. Someof these works utilize a dictionary-based (Fei et al.,2012) or regular pattern based (Brun, 2012) approachto identify aspects in the sentences. (Kim and Hovy,2006) utilize the presence of a single strong valencewors as well as the total valence score of all words ina sentence to identify opinion-bearing sentences. (Zhaiet al., 2011) work on ﬁnding ‘evaluative’ sentences inonline discussions. They exploit the inter-relationshipof aspects, evaluation words and emotion words toreinforce each other.Thus, while ours is not the ﬁrst attempt atopinion extraction from news articles, to the bestof our knowledge, none of the previous works hasexploited the global structure of a news article toclassify a sentence as opinionated/factual. Thoughsummarization algorithms (Erkan and Radev, 2004;Goyal et al., 2013) utilize the similarity betweensentences in an article to ﬁnd the important sentences,our formulation is different in that we conceptualizetwo different kinds of nodes in a document, as opposedto the summarization algorithms, which treat all thesentences equally.In the next section, we describe the propsoedtwo-stage algorithm in detail. Figure 1 gives a ﬂowchart of the proposed two-stagemethod for extracting opinionated sentences from newsarticles. First, each news article is pre-processed toget the dependency parse as well as the TF-IDF vectorcorresponding to each of the sentences present in thearticle. Then, various features are extracted fromthese sentences which are used as input to the Na¨ıveBayes classiﬁer, as will be described in Section 3.1.The Na¨ıve Bayes classiﬁer, which corresponds to theﬁrst-stage of our method, assigns a probability scoreto each sentence as being an opinionated sentence.In the second stage, the entire article is viewed as acomplete and directed graph with edges from everysentence to all other sentences, each edge having aweight suitably computed. Iterative HITS algorithmis applied to the sentence graph, with opinionatedsentences conceptualized as hubs and factual sentencesconceptualized as authorities. The two stages of ourapproach are detailed below.

The Na¨ıve Bayes classiﬁer assigns the probability foreach sentence being opinionated. The classiﬁer istrained on 70 News articles from politics domain,sentences of which were marked by a groupof annotators as being opinionated or factual.Each sentence was marked by two annotators.The inter-annotator agreement using Cohen’s kappacoefﬁcient was found to be . .The features utilized for the classiﬁer are detailedin Table 1. These features were adapted from thosereported in (Qadir, 2009; Yu and Hatzivassiloglou,2003). A list of positive and negative polar words,further expanded using wordnet synsets was takenfrom (Kim and Hovy, 2005). Stanford dependencyparser (De Marneffe et al., 2006) was utilized tocompute the dependencies for each sentence within thenews article.After the features are extracted from the sentences,we used the Weka implementation of Na¨ıve Bayes totrain the classiﬁer .Table 1: Features List for the Na¨ıve Bayes Classiﬁer1. Count of positive polar words2. Count of negative polar words3. Polarity of the root verb of the sentence4. Presence of aComp, xComp and advModdependencies in the sentence The Na¨ıve Bayes classiﬁer as discussed in Section 3.1utilizes only the local features within a sentence. Thus,the probability that a sentence is opinionated remains igure 1: Flow Chart of Various Stages in Our Approachindependent of its context as well as the documentstructure. The main motivation behind formulatingthis problem in HITS schema is to utilize the hiddenlink structures among sentences. HITS stands for‘Hyperlink-Induced Topic Search’; Originally, thisalgorithm was developed to rank Web-pages, with aparticular insight that some of the webpages ( Hubs )served as catalog of information, that could lead usersdirectly to the other pages, which actually containedthe information (

Authorities ).The intuition behind applying HITS for the task ofopinion extraction came from the following assumptionabout underlying structure of an article. A news articlepertains to a speciﬁc theme and with that theme inmind, the author presents certain opinions. Theseopinions are justiﬁed with the facts present in the articleitself. We conceptualize the opinionated sentencesas

Hubs and the associated facts for an opinionatedsentence as

Authorities for this

Hub .To describe the formulation of HITS parameters,let us give the notations. Let us denote a document D using a set of sentences { S , S , . . . , S i , . . . , S n } ,where n corresponds to the number of sentences inthe document D . We construct the sentence graphwhere nodes in the graph correspond to the sentencesin the document. Let H i and A i denote the huband authority scores for sentence S i . In HITS, theedges always ﬂow from a Hub to an Authority. Inthe original HITS algorithm, each edge is given thesame weight. However, it has been reported that usingweights in HITS update improves the performancesigniﬁcantly (Li et al., 2002). In our formulation,since each node has a non-zero probablility of acting as a hub as well as an authority, we have outgoing aswell as incoming edges for every node. Therefore, theweights are assigned, keeping in mind the proximitybetween sentences as well as the probability (of beingopinionated/factual) assigned by the classiﬁer. Thefollowing criteria were used for deciding the weightfunction.• An edge in the HITS graph goes from a hub(source node) to an authority (target node). So, theedge weight from a source node to a target nodeshould be higher if the source node has a high hubscore.• A fact corresponding to an opinionated sentenceshould be discussing the same topic. So, the edgeweight should be higher if the sentences are moresimilar.• It is more probable that the facts around anopinion appear closer to that opinionated sentencein the article. So, the edge weight from a source totarget node decreases as the distance between thetwo sentences increases.Let W be the weight matrix such that W ij denotesthe weight for the edge from the sentence S i to thesentence S j . Based on the criteria outlined above, weformulate that the weight W ij should be such that W ij ∝ H i W ij ∝ Sim ij W ij ∝ dist ij where we use cosine similarity between the sentencevectors to compute Sim ij . dist ij is simply the numberf sentences separating the source and target node.Various combinations of these factors were tried andwill be discussed in section 4. While factors likesentence similarity and distance are symmetric, havingthe weight function depend on the hub score makes itasymmetric, consistent with the basic idea of HITS.Thus, an edge from the sentence S i to S j is givena high weight if S i has a high probability score ofbeing opinionated (i.e., acting as hub) as obtained theclassiﬁer.Now, for applying the HITS algorithm iteratively,the Hubs and Authorities scores for each sentenceare initialized using the probability scores assignedby the classiﬁer. That is, if P i ( Opinion ) denotesthe probability that S i is an opinionated sentence asper the Na¨ıve Bayes Classiﬁer, H i (0) is initializedto P i ( Opinion ) and A i (0) is initialized to − P i ( Opinion ) . The iterative HITS is then applied asfollows: H i ( k ) = Σ j W ij A i ( k − (1) A i ( k ) = Σ j W ji H i ( k − (2)where H i ( k ) denote the hub score for the i th sentence during the k th iteration of HITS. The iterationis stopped once the mean squared error between theHub and Authority values at two different iterationsis less than a threshold (cid:15) . After the HITS iteration isover, ﬁve sentences having the highest Hub scores arereturned by the system. The experiment was conducted with 90 news articles inpolitics domain from Yahoo! website. The sentencesin the articles were marked as opinionated or factualby a group of annotators. In the training set, 1393out of 3142 sentences were found to be opinianated.In the test set, 347 out of 830 sentences were markedas opinionated. Out of these articles, articleswere used for training the Na¨ıve Bayes classiﬁer aswell as for tuning various parameters. The rest articles were used for testing. The evaluation wasdone in an Information Retrieval setting. That is, thesystem returns the sentences in a decreasing order oftheir score (or probability in the case of Na¨ıve Bayes)as being opinionated. We then utilize the humanjudgements (provided by the annotators) to computeprecision at various points. Let op ( . ) be a binaryfunction for a given rank such that op ( r ) = 1 if thesentence returned as rank r is opinionated as per thehuman judgements.A P @ k precision is calculated as follows: P @ k = (cid:80) kr =1 op ( r ) k (3)While the precision at various points indicates howreliable the results returned by the system are, itdoes not take into account the fact that some of the documents are opinion-rich and some are not. Forthe opinion-rich documents, a high P @ k value mightbe similar to picking sentences randomly, whereas forthe documents with a very few opinions, even a lower P @ k value might be useful. We, therefore, deviseanother evaluation metric M @ k that indicates the ratioof opinionated sentences at any point, normalized withrespect to the ratio of opinionated sentences in thearticle.Correspondingly, an M @ k value is calculated as M @ k = P @ kRatio op (4)where Ratio op denotes the fraction of opinionatedsentences in the whole article. Thus Ratio op = Number of opinionated sentencesNumber of sentences (5)The parameters that we needed to ﬁx for the HITSalgorithm were the weight function W ij and thethreshold (cid:15) at which we stop the iteration. We varied (cid:15) from . to . multiplying it by in each step.The results were not sensitive to the value of (cid:15) andwe used (cid:15) = 0 . . For ﬁxing the weight function,we tried out various combinations using the criteriaoutlined in Section 3.2. Various weight functions andthe corresponding P @5 and M @5 scores are shown inTable 2. Firstly, we varied k in Sim ijk and found thatthe square of the similarity function gives better results.Then, keeping it constant, we varied l in H il and foundthe best results for l = 3 . Then, keeping both of theseconstants, we varied α in ( α + d ) . We found the bestresults for α = 1 . . With this α , we tried to vary l againbut it only reduced the ﬁnal score. Therefore, we ﬁxedthe weight function to be W ij = H i (0) Sim ij (1 + 1 dist ij ) (6)Note that H i (0) in Equation 6 corresponds to theprobablity assigned by the classiﬁer that the sentence S i is opinionated.We use the classiﬁer results as the baseline for thecomparisons. The second-stage HITS algorithm isthen applied and we compare the performance withrespect to the classiﬁer. Table 3 shows the comparisonresults for various precision scores for the classiﬁerand the HITS algorithm. In practical situation, aneditor requires quick identiﬁcation of 3-5 opinionatedsentences from the article, which she can then use toformulate questions. We thus report P @ k and M @ k values for k = 3 and k = 5 .From the results shown in Table 3, it is clearthat applying the second-stage HITS over the Na¨ıveBayes Classiﬁer improves the performance by a largedegree, both in term of P @ k and M @ k . Forinstance, the ﬁrst-stage NB Classiﬁer gives a P @5 of . and P @3 of . . Using the classiﬁer outputsduring the second-stage HITS algorithm improves theable 2: Average P @5 and M @5 scores: Performancecomparison between various functions for W ij Function P @5 M @5 Sim ij Sim ij Sim ij Sim ij H i Sim ij H i Sim ij H i Sim ij H i Sim ij H i d Sim ij H i (0 . d ) Sim ij H i (0 . d ) Sim ij H i (0 . d ) Sim ij H i (0 . d ) Sim ij H i (1 + d ) Sim ij H i (1 . d ) Sim ij H i (1 + d ) Table 3: Average P @5 , M @5 , P @3 and M @3 scores:Performance comparison between the NB classiﬁer andHITSSystem P@5 M@5 P@3 M@3 NB Classiﬁer

HITS

Imp. (%) +21.2 +17.7 +35.8 +30.8 preformance by . to . in the case of P @5 . For P @3 , the improvements were much more signiﬁcantand a . improvement was obtained over the NBclassiﬁer. M @5 and M @3 scores also improve by . and . respectively.Strikingly, while the classiﬁer gave nearly the samescores for P @ k and M @ k for k = 3 and k = 5 ,HITS gave much better results for k = 3 than k = 5 .Specially, the P @3 and M @3 scores obtained by HITSwere very encouraging, indicating that the proposedapproach helps in pushing the opinionated sentences tothe top. This clearly shows the advantage of using theglobal structure of the document in contrast with thefeatures extracted from the sentence itself, ignoring thecontext.Figures 2 and 3 show the P @5 , M @5 , P @3 and M @3 scores for individual documents as numberedfrom to on the X-axis. The articles aresorted as per the ratio of P @5 (and M @5 ) obtainedusing the HITS and NB classiﬁer. Y-axis shows thecorresponding scores. Two different lines are used torepresent the results as returned by the classiﬁer andthe HITS algorithm. A dashed line denotes the scoresobtained by HITS while a continuous line denotesthe scores obtained by the NB classiﬁer. A detailedanalysis of these ﬁgures can help us draw the followingconclusions:• For of the articles (numbered to ) HITSimproves over the baseline NB classiﬁer. For of the articles (numbered to ) the resultsprovided by HITS were the same as that of thebaseline. For of the articles (numbered to ) HITS gives a performance lower than that ofthe baseline. Thus, for of the documents, thesecond-stage performs at least as good as the ﬁrststage. This indicates that the second-stage HITSis quite robust.• M @5 results are much more robust for the HITS,with of the documents having an M @5 score > . An M @ k score > indicates that the ratio ofopinionated sentences in top k sentences, pickedup by the algorithm, is higher than the overall ratioin the article.• For of the articles, (numbered , − and − ), HITS was able to achieve a P @3 = 1 . .Thus, for these 9 articles, the top sentencespicked up by the algorithm were all marked asopinionated.The graphs also indicate a high correlation betweenthe results obtained by the NB classiﬁer and HITS.We used Pearson’s correlation to ﬁnd the correlationstrength. For the P @5 values, the correlation wasfound to be . and for the M @5 values, thecorrelation was obtained as . .In the next section, we will ﬁrst attempt to furtheranalyze the basic assumption behind using HITS,by looking at some actual Hub-Authority structures,captured by the algorithm. We will also take somecases of failure and perform error analysis. First point that we wanted to verify was, whetherHITS is really capturing the underlying structure ofthe document. That is, are the sentences identiﬁed asauthorities for a given hub really correspond to the factssupporting the particular opinion, expressed by the hubsentence.Figure 4 gives two examples of the Hub-Authoritystructure, as captured by the HITS algorithm, for twodifferent articles. For each of these examples, we showthe sentence identiﬁed as Hub in the center along withthe top four sentences, identiﬁed as Authorities for thathub. We also give the annotations as to whether thesentences were marked as ‘opinionated’ or ‘factual’ bythe annotators.In both of these examples, the hubs wereactually marked as ‘opinionated’ by the annotators.Additionally, we ﬁnd that all the four sentences,identiﬁed as authorities to the hub, are very relevant tothe opinion expressed by the hub. In the ﬁrst example,top 3 authority sentences are marked as ‘factual’ by theannotator. Although the fourth sentence is marked as‘opinionated’, it can be seen that this sentence presentsa supporting opinion for the hub sentence.While studying the second example, we found thatwhile the ﬁrst authority does not present an importantfact, the fourth authority surely does. Both of these a) Comparison of P@5 values (b) Comparison of M@5 values

Figure 2: Comparison Results for 20 Test articles between the Classiﬁer and HITS: P@5 and M@5 (a) Comparison of P@3 values (b) Comparison of M@3 values

Figure 3: Comparison Results for 20 Test articles between the Classiﬁer and HITS: P@3 and M@3 (a) Hub-Authority Structure: Example 1 (b) Hub-Authority Structure: Example 2

Figure 4: Example from two different test articles capturing the Hub-Authority Structurewere marked as ‘factual’ by the annotators. In thisparticular example, although the second and thirdauthority sentences were annotated as ‘opinionated’,these can be seen as supporting the opinion expressedby the hub sentence. This example also gives usan interesting idea to improve diversiﬁcation in theﬁnal results. That is, once an opinionated sentenceis identiﬁed by the algorithm, the hub score of allits suthorities can be reduced proportional to the edge weight. This will reduce the chances of the supportingopinions being reurned by the system, at a later stageas a main opinion.We then attempted to test our tool on arecently published article, “

What’s Wrong witha Meritocracy Rug? ” . The tool could pick up a very http://news.yahoo.com/whats-wrong-meritocracy-rug-070000354.html mportant opinion in the article, “ Most people tend tothink that the most qualiﬁed person is someone wholooks just like them, only younger. ”, which was ranked nd by the system. The supporting facts and opinionsfor this sentence, as discovered by the algorithmwere also quite relevant. For instance, the top twoauthorities corresponding to this sentence hub were:1. And that appreciation, we learned painfully, caneasily be tinged with all kinds of genderedelements without the person who is making thedecisions even realizing it. And many of the traits we value, and how wevalue them, also end up being laden with genderovertones.

We then tried to analyze certain cases of failures.Firstly, we wanted to understand why HITS was notperforming as good as the classiﬁer for 3 articles(Figures 2 and 3). The analysis revealed that thesupporting sentences for the opinionated sentences,extracted by the classiﬁer, were not very similar onthe textual level. Thus a low cosine similarity scoreresulted in having lower edge weights, thereby gettinga lower hub score after applying HITS. For one of thearticles, the sentence picked up by HITS was wronglyannotated as a factual sentence.Then, we looked at one case of failure due to theerror introduced by the classiﬁer prior probablities.For instance, the sentence, “

The civil war betweenestablishment and tea party Republicans intensiﬁed this week when House Speaker John Boehner slammedoutside conservative groups for ridiculous pushbackagainst the bipartisan budget agreement which clearedhis chamber Thursday. ” was classiﬁed as anopinionanted sentence, whereas this is a factualsentence. Looking closely, we found that the sentencecontains three polar words (marked in bold), aswell as an advM od dependency between the pair(slammed,when). Thus the sentence got a high initialprior by the classiﬁer. As a result, the outgoing edgesfrom this node got a higher H i factor. Some of theauthorities identiﬁed for this sentence were:• For Democrats, the tea party is the gift that keepson giving. • Tea party sympathetic organizations, Boehnerlater said, “are pushing our members in placeswhere they don’t want to be”. which had words, similar to the original sentence, thushaving a higher

Sim ij factor as well. We found thatthese sentences were also very close within the article.Thus, a high hub prior along with a high outgoingweight gave rise to this sentence having a high hubscore after the HITS iterations. To facilitate easy usage and understanding of thesystem by others, a web interface has been built for the system . The webpage caters for users to eitherinput a new article in form of text to get top opinionatedsentences or view the output analysis of the system overmanually marked test data consisting of 20 articles.The words in green color are positive polar words,red indicates negative polar words. Words marked inviolet are the root verbs of the sentences. The coloredgraph shows top ranked opinionated sentences inyellow box along with top supporting factual sentencesfor that particluar opinionated sentence in purple boxes.Snapshots from the online interface are provided inFigures 5 and 6. In this paper, we presented a novel two-stageframework for extracting the opinionated sentencesin the news articles. The problem of identifyingtop opinionated sentences from news articles is verychallenging, especially because the opinions are notas explicit in a news article as in a discussion forum.It was also evident from the inter-annotator agreementand the kappa coefﬁcient was found to be . .The experiments conducted over 90 Newsarticles (70 for training and 20 for testing) clearlyindicate that the proposed two-stage methodalmost always improves the performance of thebaseline classiﬁer-based approach. Speciﬁcally, theimprovements are much higher for P @3 and M @3 scores ( . and . over the NB classiﬁer). An M @3 score of . and P @3 score of . indicates thatthe proposed method was able to push the opinionatedsentences to the top. On an average, out of top sentences returned by the system were actuallyopinionated. This is very much desired in a practicalscenario, where an editor requires quick identiﬁcationof 3-5 opinionated sentences, which she can then useto formulate questions.The examples discussed in Section 5 bring outanother important aspect of the proposed algorithm.In addition to the main objective of extracting theopinionated sentences within the article, the proposedmethod actually discovers the underlying structure ofthe article and would certainly be useful to presentvarious opinions, grouped with supporting facts as wellas supporting opinions in the article.While the initial results are encouraging, there isscope for improvement. We saw that the resultsobtained via HITS were highly correlated with theNa¨ıve Bayes classiﬁer results, which were used inassigning a weight to the document graph. Onedirection for the future work would be to experimentwith other features to improve the precision of theclassiﬁer. Additionally, in the current evaluation,we are not evaluating the degree of diversity of theopinions returned by the system. The Hub-Authority available at http://cse.iitkgp.ac.in/resgrp/cnerg/temp2/final.php igure 5: Screenshot from the Web InterfaceFigure 6: Hub-Authority Structure as output on the Web Interfacestructure of the second example gives us an interestingidea to improve diversiﬁcation and we would like toimplement that in future.In the future, we would also like to apply this workto track an event over time, based on the opinionatedsentences present in the articles. When an event occurs,articles start out with more factual sentences. Overtime, opinions start surfacing on the event, and as theevent matures, opinions predominate the facts in thearticles. For example, a set of articles on a planecrash would start out as factual, and would offer expertopinions over time. This work can be used to plot thematurity of the media coverage by keeping track offacts v/s opinions on any event, and this can be usedby organizations to provide a timeline for the event.We would also like to experiment with this model on a different media like microblogs. References

Alexandra Balahur, Ralf Steinberger, Mijail Kabadjov,Vanni Zavarella, Erik Van Der Goot, Matina Halkia,Bruno Pouliquen, and Jenya Belyaeva. 2013.Sentiment analysis in the news. arXiv preprintarXiv:1309.6202 .Caroline Brun. 2012. Learning opinionated patternsfor contextual opinion detection. In

COLING(Posters) , pages 165–174.Yejin Choi, Claire Cardie, Ellen Riloff, and SiddharthPatwardhan. 2005. Identifying sources of opinionswith conditional random ﬁelds and extractionatterns. In

Proceedings of the conference onHuman Language Technology and EmpiricalMethods in Natural Language Processing ,pages 355–362. Association for ComputationalLinguistics.Jack G Conrad and Frank Schilder. 2007. Opinionmining in legal blogs. In

Proceedings of the 11thinternational conference on Artiﬁcial intelligenceand law , pages 231–236. ACM.Kushal Dave, Steve Lawrence, and David M Pennock.2003. Mining the peanut gallery: Opinion extractionand semantic classiﬁcation of product reviews. In

Proceedings of the 12th international conference onWorld Wide Web , pages 519–528. ACM.Marie-Catherine De Marneffe, Bill MacCartney,Christopher D Manning, et al. 2006. Generatingtyped dependency parses from phrase structureparses. In

Proceedings of LREC , volume 6, pages449–454.G¨unes Erkan and Dragomir R Radev. 2004.Lexrank: Graph-based lexical centrality as saliencein text summarization.

J. Artif. Intell. Res.(JAIR) ,22(1):457–479.Geli Fei, Bing Liu, Meichun Hsu, Malu Castellanos,and Riddhiman Ghosh. 2012. A dictionary-basedapproach to identifying aspects im-plied byadjectives for opinion mining. In

Proceedings ofCOLING 2012 (Posters) .Pawan Goyal, Laxmidhar Behera, and Thomas MartinMcGinnity. 2013. A context-based word indexingmodel for document summarization.

Knowledgeand Data Engineering, IEEE Transactions on ,25(8):1693–1705.Ali Harb, Michel Planti´e, Gerard Dray, Mathieu Roche,Franc¸ois Trousset, and Pascal Poncelet. 2008.Web opinion mining: How to extract opinions fromblogs? In

Proceedings of the 5th internationalconference on Soft computing as transdisciplinaryscience and technology , pages 211–217. ACM.Minqing Hu and Bing Liu. 2004. Mining opinionfeatures in customer reviews. In

Proceedingsof Nineteeth National Conference on ArtiﬁcialIntellgience (AAAI) .Minqing Hu and Bing Liu. 2006. Opinion extractionand summarization on the web. In

AAAI , volume 7,pages 1621–1624.Soo-Min Kim and Eduard Hovy. 2005. Automaticdetection of opinion bearing words and sentences.In

Proceedings of IJCNLP , volume 5.Soo-Min Kim and Eduard Hovy. 2006. Extractingopinions, opinion holders, and topics expressedin online news media text. In

Proceedings ofthe Workshop on Sentiment and Subjectivity inText , pages 1–8. Association for ComputationalLinguistics. Hyun Duk Kim, Malu Castellanos, MeichunHsu, ChengXiang Zhai, Umeshwar Dayal, andRiddhiman Ghosh. 2013. Compact explanatoryopinion summarization. In

Proceedings of the22nd ACM international conference on Conferenceon information & knowledge management , pages1697–1702. ACM.Nozomi Kobayashi, Ryu Iida, Kentaro Inui, andYuji Matsumoto. 2005. Opinion extraction usinga learning-based anaphora resolution technique.In

The Second International Joint Conferenceon Natural Language Processing (IJCNLP),Companion Volume to the Proceeding of Conferenceincluding Posters/Demos and Tutorial Abstracts .Lun-Wei Ku, Yu-Ting Liang, and Hsin-Hsi Chen.2006. Opinion extraction, summarization andtracking in news and blog corpora. In

AAAISpring Symposium: Computational Approaches toAnalyzing Weblogs , volume 100107.Longzhuang Li, Yi Shang, and Wei Zhang. 2002.Improvement of hits-based algorithms on webdocuments. In

Proceedings of the 11th internationalconference on World Wide Web , pages 527–535.ACM.Bin Lu. 2010. Identifying opinion holders and targetswith dependency parser in chinese news texts. In

Proceedings of Human Language Technologies: The2010 Annual Conference of the North AmericanChapter of the ACL .Ashequl Qadir. 2009. Detecting opinion sentencesspeciﬁc to product features in customer reviewsusing typed dependency relations. In

Proceedingsof the Workshop on Events in Emerging Text Types ,eETTs ’09, pages 38–43.Oleg Rokhlenko and Idan Szpektor. 2013. Generatingsynthetic comparable questions for news articles. In

ACL , pages 742–751.Janyce Wiebe and Ellen Riloff. 2005. Creatingsubjective and objective sentence classiﬁers fromunannotated texts. In

Computational Linguisticsand Intelligent Text Processing , pages 486–497.Springer.Hong Yu and Vasileios Hatzivassiloglou. 2003.Towards answering opinion questions: Separatingfacts from opinions and identifying the polarityof opinion sentences. In

Proceedings of the2003 Conference on Empirical Methods in NaturalLanguage Processing , EMNLP ’03, pages 129–136.Zhongwu Zhai, Bing Liu, Lei Zhang, Hua Xu,and Peifa Jia. 2011. Identifying evaluativesentences in online discussions. In