[PDF] The evolution of argumentation mining: From models to social media and emerging tools

Abstract

Argumentation mining is a rising subject in the computational linguistics domain focusing on extracting structured arguments from natural text, often from unstructured or noisy text. The initial approaches on modeling arguments was aiming to identify a flawless argument on specific fields (Law, Scientific Papers) serving specific needs (completeness, effectiveness). With the emerge of Web 2.0 and the explosion in the use of social media both the diffusion of the data and the argument structure have changed. In this survey article, we bridge the gap between theoretical approaches of argumentation mining and pragmatic schemes that satisfy the needs of social media generated data, recognizing the need for adapting more flexible and expandable schemes, capable to adjust to the argumentation conditions that exist in social media. We review, compare, and classify existing approaches, techniques and tools, identifying the positive outcome of combining tasks and features, and eventually propose a conceptual architecture framework. The proposed theoretical framework is an argumentation mining scheme able to identify the distinct sub-tasks and capture the needs of social media text, revealing the need for adopting more flexible and extensible frameworks.

Full PDF

TThe Evolution of Argumentation Mining: From Modelsto Social Media and Emerging Tools

Anastasios Lytos a , Thomas Lagkas b,c , Panagiotis Sarigiannidis d , KalinaBontcheva a a Department of Computer Science, The University of Sheﬃeld, Sheﬃeld, UK b Computer Science Department, The University of Sheﬃeld International Faculty - CITYCollege, Thessaloniki, Greece c South East European Research Centre (SEERC), Thessaloniki, Greece d Department of Informatics and Telecommunications Engineering, University of WesternMacedonia, Kozani, Greece

Abstract

Argumentation mining is a rising subject in the computational linguistics do-main focusing on extracting structured arguments from natural text, often fromunstructured or noisy text. The initial approaches on modeling arguments wasaiming to identify a ﬂawless argument on speciﬁc ﬁelds (Law, Scientiﬁc Papers)serving speciﬁc needs (completeness, eﬀectiveness). With the emerge of Web2.0 and the explosion in the use of social media both the diﬀusion of the dataand the argument structure have changed. In this survey article, we bridge thegap between theoretical approaches of argumentation mining and pragmaticschemes that satisfy the needs of social media generated data, recognizing theneed for adapting more ﬂexible and expandable schemes, capable to adjust tothe argumentation conditions that exist in social media. We review, compare,and classify existing approaches, techniques and tools, identifying the positiveoutcome of combining tasks and features, and eventually propose a conceptualarchitecture framework. The proposed theoretical framework is an argumenta-tion mining scheme able to identify the distinct sub-tasks and capture the needsof social media text, revealing the need for adopting more ﬂexible and extensibleframeworks.

Keywords:

Argumentation mining, argumentation models, computationallinguistics, social media, machine learning, argumentation tools

1. Introduction

Argumentation mining (AM) is identiﬁed as a multidisciplinary researchtopic, with roots on rhetoric and philosophy [1], which gained the interest of the

Email addresses: [email protected] (Anastasios Lytos), [email protected] (Thomas Lagkas), [email protected] (PanagiotisSarigiannidis), [email protected] (Kalina Bontcheva)

Received 8 January, Revised 21 May 2019, Accepted 9 June 2019, Available online 3 July 2019.

Published in Information Processing & Management, Elsevier, Volume 56, Issue 6.Published version available at https://doi.org/10.1016/j.ipm.2019.102055.sy @ 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license(http://creativecommons.org/licenses/by-nc-nd/4.0/) a r X i v : . [ c s . I R ] J u l cientiﬁc community because of its potential when novel Artiﬁcial Intelligence(AI) algorithms and techniques are exploited. The recent advances in MachineLearning (ML) in combination with the emergence of social web can enableimpressive progress in diﬀerent scientiﬁc ﬁelds with great impact on commercialapplications. An AM system has the capacity to mine and analyse a greatvolume of text data through a variety of sources, providing tools for policy-making and socio-political sciences [2, 3, 4], software engineering [5], while itopens new horizons for the broader area of business, economics and ﬁnance,with the digital marketing being the most promising ﬁeld [6, 7, 8, 9].Opinion mining and sentiment analysis could be characterized as the prede-cessors of AM in a simpliﬁed form and their limits have already been questionedin the eﬀort for seeking a deeper understanding of the human reasoning [10, 11].As AM we deﬁne a series of actions that could be independent or connectedto each other and they are relevant to the tasks of detection, extraction andevaluation of arguments, where argument is a piece of text oﬀering evidence orreasoning in favor or against a speciﬁc topic.The core of the human reasoning is the argumentation process and AM-related tasks attempt to solve a series of problems such as the detection ofan argumentative stance towards a speciﬁc object, the analysis and evaluationof the argument’s components and the detection of possible relations betweenthem. The cohesion in the components of the arguments and the existence ofbacking in the claims is a major objective in argument mining because thesecharacteristics have the ability to alter a claim to an argument. For human be-ings the interpretation of an argument is realized instantly without needing anyspecial eﬀort because of their skill to grasp the context of the information andto accomplish connections with previous experiences (facts, opinions, feelings).This ability of combining multiple sources of information is missing in exist-ing argument models as it is diﬃcult even for human annotators to agree uponspeciﬁc guidelines for the modeling of an argument. Furthermore, identifyingthe thin line between implicit and explicit arguments is harsh as often annota-tors are implicitly driven by the context. Probably that is also the reason whythe majority of research is focused on structured data like law text [12, 13, 14],scientiﬁc text [15, 16, 17], formal debates [18] or news articles [19, 20] insteadof unstructured text like informal discourse and web-user generated data, al-though recently there have been some notable research endeavors towards thisdirection, which are presented the section 3.Annotating and automatically analysing arguments from Web with greatheterogeneity of contents and diversity of jargon is a challenging task. Argu-ments in social media and informal discourse are sometimes implicit, meaningthat the logical structure of an argument’s components (premises, claims, war-rants) are not always spelled out and instantly distinguishable, hence, analysismust take place in order to determine the distinctive components. In text de-rived from social media arguments frequently are missing, as it is common atweet or a facebook post to contain just a stance on a speciﬁc topic withoutsupporting it with evidence or reasoning.The decoding of the human reasoning process into computer language is challenging task as it consists of many subprocesses that are diﬃcult to beseparated and analysed. The medium for arguing for human beings is natu-ral language, whereas the input for ML algorithms and techniques should bedistinct, structured and composed with well-established rules. A wide range ofmethodologies have been applied for modeling natural language, such as explicitdistinctive components [1], argumentative zoning [21], tree structures, dialog-oriented diagrams [22], serial structure of arguments [23, 24] and modiﬁcationsto simpler structures of existing schemes [25, 26].However, the claim or other parts of an argument might be implicit [27, 8,28, 29, 9] and tacit assumptions or premises (enthymemes) take place relatedto commonsense reasoning. This process is named completion or enthymematicargumentation and takes place often and unconsciously in casual discourse thatcan be found in Web-generated data. The distinction between explicit and im-plicit argument lies on the presence of certain syntactic constructions or lexemes(such as conjunctions). Implicit arguments, where lack of these characteristicsis noticed, can be identiﬁed through previously gained knowledge and logicalinference.This a-priori knowledge is extremely diﬃcult to be expressed through con-ventional argumentation schemes, which demand a strict structure of the com-ponents of the argument. The early approaches [1, 30] were focusing on thephilosophical aspect of the argument, whereas modern approaches consider un-structured data and implicit relations between the components of the argument[27, 8, 28, 29, 9]. For example, in Toulmin’s model [1, 31] (ﬁgure 3), which hasa great impact on modern argumentation schemes, a detailed microstructure isproposed with six speciﬁed components, 1) an indisputable datum , 2) a subjec-tive claim on the foundation of datum, 3) a warrant that links them imposedby logical inference, 4) the backing of the justiﬁcation 5) leading to a degree ofconﬁdence (qualiﬁers) as long as 6) a rebuttal cannot withstand the claim. Datathat do not have a structured, well-speciﬁed format are hard to be representedby such a strict model.Breaking down arguments deriving from web or from casual discourse is ademanding task with doubts if it is even feasible. An evidence of the previousstatement is the fact that the ﬁeld of opinion mining thrives in social media data[32, 33] and especially in twitter [34, 35], on the contrary only limited researchhas been conducted on AM in unstructured data and fewer frameworks havebeen designed able to capture the special features of social media. In 5.1 wepropose a conceptual framework able to capture the speciﬁc features of socialmedia text and also enhance other tasks in the wider area of NLP with the useof argumentative features.There is a small number of related works surveying argumentation mining.Peldszus and Stede [36] realized a thorough revision of models and diagrams forarguments, able to be exploited in the context of AM. We provide the necessarybackground for argumentation schemes and we evaluate them based on theirsuitability on social media text, but we do not devote the entire paper to thistopic. Both [37] and [38], surveyed a big spectrum of the AM ﬁeld, describingmodels, corpora and methods, but they overlook the special nature of social me-ia and the special features they present. On contrary, we focus on text derivedfrom social media and our entire approach is based on the characteristics of thesocial media; chaotic nature, noisy text, vague claims, complicated network re-lations, implicit premises, etc. While the authors in [39] have addressed AM inweb-generated data, they provided limited information about the connection ofAM with its distinctive sub-tasks or other NLP tasks. Instead, we perceive AMas an entire pipeline with distinctive sub-tasks, able to be both stand-alone andcorrelate with each other, and therefore boost existing tasks such as sentimentanalysis, topic modeling, rumour identiﬁcation, etc.Our contribution lies on the following main axes: bridge the gap betweenareas that are interrelated, but without having explicit connection, illustrate thecurrent methods, tasks and tools that exist in the wider area of AM, and stressthe importance of automatic AM mechanisms and tools. This extensive surveypaper illustrates the evolution of argumentation and reveals the formation ofAM as a scientiﬁc ﬁeld through a complete review of its roots and the needsthat currently form it, followed by a thorough presentation and comparison ofthe existing AM approaches in text derived from social media. The existingtools for the wider area of NLP are presented, assessing their potential use inAM, followed by the presentation of the tools that have been speciﬁcally builtfor argumentation. Furthermore, a complete conceptual architecture for AM isproposed, covering its diﬀerent subtasks and possible relations with other ﬁelds.In the next section, the most inﬂuential logical schemes are presented as wellas the ﬁrst attempts of connecting the argumentation process with AI. Everyaspect of the AM problem is presented in section 3 with a focus on noisy dataderived from social media. Section 4 presents the existing tools in the ﬁeld ofAM and classiﬁes them into two categories: general-purpose NLP tools and toolsspeciﬁcally designed for the task of AM. In the last section a new conceptualAM framework for handling text derived from social media is proposed andeventually, future direction for the prosperous ﬁeld of AM are provided.

2. Argumentation Theory and Models

Although argumentation mining as a term was ﬁrst introduced in 2009 byPalau and Moens [40], the act of argumentation and its eﬀects are studied sincethe 4th century BC [41]. Since then, many approaches on studying argumen-tation have been studied and many theories, schemes, and diagrams have beendeveloped.The primary factor that has led to the creation of novel evaluation and visu-alization techniques for argument representation is the need for simple, but eﬀec-tive ways to break down, analyse and eventually better understand arguments.Argumentation can reach a high level of complexity, thus simpler forms of rep-resentation are needed. The process of argument illustration and fragmentationis a fundamental concept in AM, where the arguments are inspected, evaluatedand eventually expressed in a binary format, capable to be interpreted fromdiﬀerent algorithms. In the next subsection, signiﬁcant argumentation theories igure 1: Whately’s diagram [43] for analysing arguments based on backward reasoning. Theﬁnal conclusion is represented as the root of the tree and its assertions are represented asleaves and the depth of the tree is proportionate to the complexity of the argument. are presented, followed by the ﬁrst research works that realize the connectionof argumentation with AI.

Argumentation holds its roots in rhetoric and philosophy thus argumenta-tion diagrams have been developed as aiding tools for the analysis of argumentsin well-structured documents and apart from its original role of teaching thereasoning process without falling into logical fallacies, Reed et al. [21] havealso stressed their capabilities as analytical tools for meta-philosophy purposes.Diagram techniques started as practical tool for teaching logic and then devel-oped in a more reﬁned method that is used as a concept idea for the modelingof an argument. Logical diagrams have boosted the ﬁeld of informal logic, asthey oﬀer a tool for analysing and evaluating arguments used in everyday lifein a much more pragmatic environment comparing to formal logic. A detailedreview on argumentation diagrams and their connection with other ﬁelds suchas formal and informal logic, law, and artiﬁcial intelligence is presented in [21].As the goal of this review paper is neither the connection of argument diagramsto modern AM techniques as in [36], nor the introduction of a classiﬁcationsystem for argumentation schemes [42] in this subsection a synopsis of the ﬁvemore inﬂuential diagrams is presented and they are evaluated based on theirsuitability for the tasks of AM in noisy text. It has to be stressed that the AMdiagrams have not been speciﬁcally designed to serve the modern constructionof arguments as expressed in social media, as the tasks of detection, classiﬁca-tion and evaluation of argumentative content in noisy text require more ﬂexibleschemes, such the one proposed in the subsection 5.1.One of the ﬁrst uses of diagrams in argumentation took place in 1857 byWhately [43], opposing to the enumeration of technical rules, in an eﬀort ofsimplifying the teaching method of argumentation in his era. Whately’s dia-gram theory is based on the concept of identifying the concluding assertion and igure 2: Beardsley’s convergent argument scheme [44] provided with an example. The serialpremises eventually lead (converge) to the ﬁnal conclusion. It should be noted that the linksbetween the premises are not evaluated. tracing the reasoning backwards, grounding the original assertion and eventu-ally forming a tree with assertions and proofs. In Figure 1, the conclusion isrepresented as the root of the tree, and the assertions are located underneath.In the classical example of Socrates syllogism (Socrates is a man, all men aremortal, therefore Socrates is mortal), the conclusion (Socrates is mortal) wouldbe the root of the tree and the two premises (p1- Socrates is a man, p2 – allmen are mortal) would be the two leaves of the tree. The complexity of thereasoning process and the depth of the tree are proportionate, and could leadto complicated process that requires both well-structured arguments and expe-rienced annotators.Another signiﬁcant work in the ﬁeld of Argumentation is conducted byBeardsley in [44] introducing a designing principal that is applied until today;representation of the argument’s distinctive statements as linked nodes. As thenodes can be connected with each other with diﬀerent ways, he created threebasic classes 1) convergent, 2) divergent and 3) serial arguments. In Figure 2,a logic ﬂow is illustrated depicting serial linking between premises, leading to aconvergence for the ﬁnal argument. In the example of the convergent argument,diﬀerent premises eventually contribute towards the establishment of a reliableand robust argument, supported with enough backing.The only ﬂaw in Beardsley’s theory is the lack of support between the state-ments in the nodes. Thus, the statements which form the argument are consid-ered ﬂawless and there are not subjects of support, debate or evaluation, henceit cannot be applied in ambiguous, implicit or imperfect arguments.In 1958, Toulmin [45] suggested one very inﬂuential scheme until today, ex-amining the role that diﬀerent utterances might have in the persuasive perspec-tive of the argument. In Toulmin’s model, six functional roles were suggested,namely datum, claim, warrant, backing, qualiﬁers, and rebuttal.In Figure 3 an example of legal nature is expressed through the Toulmin’smodel. The warrant in the depicted example ( “A manborn in Bermuda will gen- igure 3: Toulmin’s proposed scheme provided with an example [45]. Based on its detailedstructure, an argument can be assessed through the review of its distinctive components. erally be a British subject” ) supports adequately the initial datum ( “Harry wasborn in Bermuda” ) as it includes a real strong backing using a legal framework( “The following statues and other legal provisions” ). The above distinctive com-ponents are enhanced through the defy of a possible counter-argument ( “Bothhis parents were aliens..” ) and eventually lead with certainty expressed throughthe qualiﬁer (

So, presumably ) to the conclusion ( “Harry is a British subject” ).The novelty on his approach lies on the fact that it requires the assignment ofa predeﬁned characterization for the cognitive connection between the diﬀerentcomponents of the argument. Through his proposed scheme, Toulmin managedto handle enthymematic relations by deﬁning diﬀerent aspects of a syllogism andconnecting the inference with the warrant. The proposed scheme it is widelyused in AM and a modiﬁcation has even been applied in web-generated data[39], however the small IAA in some topics indicate the diﬃculty of applying asuch complex model in heterogeneous text.Another argumentation theory that still has a strong inﬂuence until todayas recent research works [46, 47] adopt its scheme in a data driven approachfor AM tasks, is developed by Freeman [48]. Freeman’s theory could be charac-terized as an upgrade of the Beardley’s theory, as it uses the scheme of induc-tive/deductive reasoning and enhances it with the concept of modality, whichindicates the strength of induced conclusion by the premises. It could be saidthat the term of modality is an adjustment of the Toulmin’s qualiﬁer, but witha focus on the evaluation of the argument. Both the concepts of reasoning ﬂowand argument strength have the potential to enhance the AM tasks as they oﬀernew unprecedented tasks that have not been tested, as only sentiment ﬂows inAM have been researched [49] until now.The introduction of a prominent text-organization theory took place in the1980’s by Mann [50], aiming at the organization of the text into diﬀerent re-gions. Each region has a central part (nucleus) that is essential for the compre-hension of the text, and a number of satellites containing additional informationabout the nucleus. The nucleus and the satellite are correlated with each other igure 4: Rhetorical Structure Theory scheme provided with an example [50]. The examplethat is used is the theorem of perception of apparent motion (initial nucleus), which is justiﬁedby a set of premises, and each premise is analysed consequently based on the nucleus-satellitemodel. through diﬀerent relations (circumstance, elaboration, evidence, etc.) which canbe changed, manipulated, added or subdivided depending on the topic and thetask at hand. The nucleus-satellite distinction is applied recursively until everyentity of the discourse is a part of a RST relation and eventually a tree-structurehierarchy is created as depicted in Figure 4, where the theorem of the percep-tion of apparent motion (initial nucleus) is supported by a set premises, whereeach premise sequentially is expressed through the model of nucleus-satelliteexpressing a speciﬁc relation (preparation, condition, means). The applicationof RST has improved the performance of sentiment polarity classiﬁcation whenenhanced with argumentation [51], but the authors state the need for furtherin-depth evaluation for its impact.

Argumentation diagrams have boosted the entire ﬁeld of computational lin-guistics, but as they are theoretical models, they do not have the capacity toreform the entire argumentation ﬁeld. In the 1990’s, the ﬁrst research connect-ing argumentation with the area of Artiﬁcial Intelligence (AI) was conducted.These early attempts on connecting arguments with AI created a new ﬁeld un-der the name Computational Argumentation, the ﬁeld which has formed AM ina great extent. Computational Argumentation is used extensively in domainswhere reasoning process is sophisticated, such as law or medicine, and thereforetraditional methods like formal logic and classical decision theory cannot beapplied.One of the ﬁrst researches studying the relationship between argumenta-tion, cognitive psychology and AI was conducted by Pollock [52] describing theconnection of defeasible reasoning in philosophy and nonmonotonic reasoningn AI through the deﬁnition of a set of rules. An important step ahead wasmade by Dung [53], who researched the relation between argumentation andLogic Programming, focusing on the modelling of the fundamental mechanismshumans use in argumentation and expressing them with a number of rules anddeﬁnitions capable to be interpreted by AI algorithms.Similar to Dung’s method, Krause et al. [54] developed the logic of argumen-tation, an approach for deﬁning reasoning in cases of uncertainty. By deﬁningrules, deﬁnitions and propositions they have managed to create a theoretical sys-tem capable to estimate the strength of an argument based on the distinctiveaxioms used for its construction. The degree of justiﬁcation was also researchedby Pollock [55], focusing on the diﬀerent degrees of justiﬁcation the distinctivecomponents of an argument can provide when they are ”summed” .Another work that indicates the relationship between argumentation andlogic programming was held by Parsons et al. [56], where a framework was devel-oped capable to interpret a broad range of negotiation in a multi-agent system.The distinctive arguments of the agents are considered as logical steps towardsto an acceptable compromise, not necessarily towards the optimum proposal. Amore detailed review on the automated negotiation is held in [57] presenting amore generic framework and three diﬀerent approaches (game theoretic models,heuristics, argumentation-based) for the negotiation process. The developmentof a dialectical argumentation scheme was introduced in [58], where the authorsimplemented a fully-functional agent capable to maintain the natural ﬂow onthe debate by arguing on a topic and in the same time responding to obligationsrelated to the discourse.Although many of the aforementioned frameworks are designed to coverargumentation in its full generality, any system with pre-deﬁned rules is notsuggested for use in the open and constantly changing environment of socialmedia, as it does not have the ability to learn and adopt. In Table 1, a synopsis ofthe theories and schemes that have been presented in this section are illustratedand evaluated based on a number of criteria focusing on their suitability on AMin social media, but the table could could also be used as a point of referencefor tasks in the wider area of NLP.The ﬁrst criteria of a theory’s evaluation is the explicit expression of anykind of relations between the distinctive components of the argument, a prop-erty that can facilitate advanced NLP tasks such as relations identiﬁcation, ev-idence detection, and facts recognition. In the theories that are based on logicprogramming, the relations are expressed as rules, whereas in Toulmin’s theoryis expressed through the qualiﬁer and in Beardley’s as modality. The secondcriteria in the comparison table is the level of complexity, which is determinedbased on the number of components and relations each theory involves. Both thein-depth analysis and the construction of the reverse tree in Whately’s theory,and the deﬁnition of six diﬀerent components in Toulmin’s diagram present asexpected high levels of complexity. Once again logic programming theories canbe grouped together presenting high complexity as they deﬁne multiple rulesand axioms in order to cover a wide range of cases. A special note should begiven in RST, as it is a ﬂexible open scheme, where components and relationsuthor(s) LinkRelation ComplexityLevel Application innoisy data Applicationin AMWhately No High Low MediumBeardsley No Low Medium HighToulmin Yes High Medium-Low HighFreeman Yes Medium Medium MediumRST Yes Open High HighPollock Yes High Low MediumDung Yes High Low MediumKrause Yes High Low MediumParsons Yes High Low MediumNew Rhetoric Yes High Low Medium

Table 1: A synopsis of logical schemes and computational theories based on the degree theycan meet the needs in a modern NLP environment. Link relation - if the connection distinctivecomponents are explicitly evaluated, Complexity level - is assessed based on the number ofcomponents each theory includes. can be deﬁned and modiﬁed based on each case study. The ﬁrst assessmentof the theories/schemes take place based on their suitability to be applied inNLP tasks in noisy environment. The complexity level of a theory is inverselyproportional to its suitability when applied on noisy environment as compli-cated reasoning processes cannot be deployed in text lacking grammatical andstructural rules. Finally, the last column of the table illustrates the applicabilitylevel of the theories for AM tasks independently from the source of the content,where three schemes stand out (Beardsley, Toulmin, RST), each one for diﬀer-ent reasons. Beardsley’s theory is straightforward enough to be applied easilyin a wide range of goals, Toulmin’s detailed scheme is the option for in-depthanalysis and RST can be easily modiﬁed in order to cover the needs and therequirements of each case study.Computational argumentation contributed signiﬁcantly in the creation of theAM ﬁeld, but the strict rules that used to be posed seem outdated. Althoughthe foundations of logic programming are based on knowledge- and logic-basedsolutions, the recent advances in ML have been used extensively in data-drivenapproaches [59, 60], but they seem to have reached the upper limit of their ca-pabilities. Rule-based machine learning could be considered it as a data-drivenapproach of inductive logic programming that combines background knowledgeand ability to learn based on human readable theories. Another approach com-bining a data-driven solution in unison with argumentation structure took placein [51] where the probabilistic classiﬁers have been improved with the incorpo-ration of an argument database.

3. The status of AM in Social Media

AM is considered a complicated task because it challenges a series of closelyinterrelated distinctive tasks that come under this general term. The classiﬁca-ion of arguments consists from a series of steps and it is considered the ﬁnaloutput of the pipeline, however, other distinctive parts of the pipeline, suchas argument detection, could form the main research question. Especially inthe world of social media (and web-generated context in general) tasks thatare focused on the reliability and the strength of the arguments seem to gainthe interest of the research community. In the rapidly changing environment ofsocial media, the presence or not of argumentative features can assist scholarsin the tasks of rumour spreading, fake news detection and source identiﬁcation.A recent survey [37] suggests a general-purpose pipeline for the task of AMwith two main components, argument component detection and argument struc-ture prediction. The proposed architecture can be generalized in every sourceof raw text without signiﬁcant problems, however if it has to be applied in AMtasks from text derived from social media would face some obstacles, as eitherimportant generalizations would take place either some tasks would be ignored.In our eﬀort to discover better the connection between AM and social mediawe describe the basic tasks that take place in an AM pipeline and compare theresults gained from diﬀerent research methods. In 3.1 the task of argument de-tection is presented, followed by the tasks of relations identiﬁcation in 3.2 andstance detection in 3.3. In the last sub-section, diﬀerent tasks that are relatedwith the concept of argument reliability in social media are presented.

The classiﬁcation of a sentence or a series of sentences as argumentative ornon-argumentative is a crucial step towards AM in social media. Argumentdetection is in essence a preliminary binary classiﬁcation that would enable thesubsequent in-depth analysis of the argument, such as persuasiveness detection,relations identiﬁcation between the components of the argument or automaticevaluation of the argument.Data from social media are a characterized as a special category of web-generated data and this becomes clear if an attempt is made on applying theToulmin scheme in a typical tweet. In the work of Addawood and Bashir [3] thefollowing tweet from their dataset is presented:

RT @ItIsAMovement ”Withoutstrong encryption, you will be spied on systematically by lots of people” - Whit-ﬁeld Diﬃe . The above tweet cannot easily ﬁt in the Toulmin’s scheme (or anyother theoretical scheme) as the fact consists the same time the conclusion ofthe argument, whereas the component of the backing is expressed through thequote of an expert opinion. A similar belief is also expressed in [61] where theauthors claim ”we (almost) never ﬁnd such a kind of complete structure of thearguments”.

Towards to the analysis of the tweet, the ﬁrst step is deﬁning the necessarybackground knowledge that a reader/annotator has on the discussed topic, inorder the IAA not to be aﬀected by the diﬀerent level of the annotators’ knowl-edge. For the ease of the annotation task some general rules can be deﬁned thatcan provide a useful insight for the nature of the text. When the user cites theopinion of an expert, express a feeling or irony it is very likely the tweet to beargumentative. On the other hand, if the tweet is the title of an article and uthors Task(s) Size Topic

Table 2: Ck - Cohen’s kappa [75], Ka - Krippendorﬀ’s alpha [76], Dice [77], Fk - Fleiss kappa[78]. In [67], the IAA is the average of the three sub-tasks. Details of datasets that have beenused in recent research papers, displaying the distinctive tasks in AM, the diﬀerent sizes ofthe datsaets, the plethora of topics and the range in IAA metrics. imply shares a link without any comments, the tweet can be characterized asnon-argumentative. An example of a non-argumentative tweet is provided inthe work of Dusmanu et al. [63] regarding Brexit:

72% of people who identiﬁedas “English” supported .The rules that have been described should be consider more as guidelines,as there are many cases where a tweet cannot easily ﬁt any of the proposedcategories. In many cases a twitter user takes into account more informationthan simply the information contained in the tweet, as the beliefs or the statusof the user who tweeted. Thus, deﬁning a speciﬁc level of knowledge of theannotators is crucial for every use case.Other important factors that aﬀect the annotation process and the quality ofthe dataset are the requested task, the size of the dataset, the debate topic andthe number of the annotators that have been used for carrying out the speciﬁctask. The above aspects can be used as metrics for measuring the quality of thecollected data, as often scholars put thresholds in IAA for accepting or rejectingspeciﬁc datasets. Table 2 provides a detailed review of the characteristics ofthe datasets that have been used in recent research papers and are analysedin the following subsections. The table includes both annotated datasets thathave been used in supervised ML approaches, but also datasets that have beenused in un-supervised approaches. There is a scarce of annotated datasets inAM and most importantly there is a diﬃculty in their re-use, as they are oftenannotated for very speciﬁc tasks. The recent review paper of [38] provides acomplete synopsis of the available datasets in the area of AM, without focusingon social media generated text.Based on the results depicted in Table 2, the IAA depends heavily fromthe task at hand as it is revealed in Addawood et al. [60] and in Addawoodand Bashir [3]. On those two research papers, the same annotators, in thesame dataset present higher IAA for the tasks of Evidence/Topic Classiﬁcationin comparison to the tasks of Argument/Stance Detection, although the lateroﬀer more possible classes from the former. In the case of limiting the possibleclasses for the same task, as in [74], the IAA increases as it is expected, but thetask ceases to provide a detailed analysis. The last point that should be stressedconsidering the IAA is the diﬀerent metrics that are used in the diﬀerent studies,as the number of annotators aﬀects the available options.The task of argument detection as a necessary preliminary step takes placein the work of Addawood and Bashir [3] and also in Dusmanu et al. [63], wherethe proposed pipelines end up in the recognition of evidence type and in thesource identiﬁcation respectively. In both works the detection of argumentativetweets is necessary, as the non-argumentative tweets consist respectively the42.3% and the 29.3% of the constructed datasets. Both papers adopt a super-vised approach where a manual annotation is required, succeeding a substantialinter-annotation agreement (IAA) in terms of Cohen’s kappa equal to 0.67 and0.77, respectively. A similar architecture was also adopted in [61] ending up inthe construction of argumentation graphs. The DART dataset [79] was usedas input of the pipeline, containing 2702 argumentative tweets and 1181 non- asic (Lexical) Semantic Sentiment Subjectivity Twitter OtherAddawood and Bashir [3] n-gram,length,question,exclamation PoS,LIWC summary,variables LIWC,sentiment,lexicon Clue lexicon followers,friends,user activity,URL+title,hashtags,veriﬁed account,mentions Psycometric,LIWCBosc et al. [61] n-gram,punctuation,tokens,capitalization PoS smileysDeturck et al. [62] tokens,lemmas PoS TextBlobfeatures TextBlobfeatures wordembeddingsDusmanu et al. [63] n-gram Syntactic tree,parse trees,dependency relationsWordNet synset AlchemyAPI Punctuations,emoticonsSendi and Latiri [64] IR,TM NRCemotion lexiconDufour et al. [65] punctuation, personal pronouns Emotion words Emoticons,hashtagsCocarascu and Toni [66] n-grams word embeddeingsargumentativenessMa et al. [67] n-gram,Relevancefeature,Okapi BM25 Topic-IndependentClaim-RelatedLexicon,Topic-DependentClaim-OrientedLexicon Controversylexicon URLs,retweetreply TwitStanWikiClaimTwitArgumentMohammad et al. [68] n-grams PoS 1)NRC EmotionLexicon,2)Hu and LiuLexicon MPQASubjectivityLexicon hashtag Target of interest,encodingsZarella and Marsh [69] word embeddingsWei et al. [70] word embeddings,target contentEbrahimi et al. [71] n-grams,linguisticpatterns 1) LIWC20072) VADERLai et al. [72] social medianetwork communities,Hashtags,Mentions, RepliesJohnson and Goldwasser [73] Wordsfrequency keyword-basedheuristic OpinionFinder 2.0 Temporal activityAddawood et al. [60] n-grams PoS Subjectivitylexicon MPQASubjectivityLexicon RT, title, mention,veriﬁed account, url,followers, following,posts, hashtag Argumentativeness,source typeKonstantinovskiy et al. [74] PoS, NER, tf-idf sentenceembeddings

Table 3: Features used for AM tasks in social media. IR = Information Retrieval applied aslexicon-based queries, PoS = Part of Speech, TM = Topic Modeling applied with the techniqueof Latent Dirichlet Allocation (LDA), LIWC = Linguistic Enquiry and Word Count [80, 81] argumentative tweets. In contrast to [3, 63], Bosc et al. [61] measured the IAAin terms of Krippendorﬀ’s alpha, resulting to the satisfactory a=0.81.On the other hand, a semi-supervised approach was followed in the paperspresented in CLEF 2018 conference for the task of Multilingual Cultural Miningand Retrieval. In the work of Deturck et al. [62] the assumption that anargumentative text is structured to eﬀectively combine arguments and opinionstakes place, thus argumentation is measured through structuration. A similarapproach was followed in Sendi and Latiri [64], where argumentative tweetsare deﬁned as the sum of three separate tasks: information retrieval, topicmodeling and sentiment score. The work from Dufour et al. [65] follows adistance supervision approach through the detection of ﬁve pre-deﬁned features:emotion words, emoticons, particular punctuation signs, personal pronouns andhashtags.The features used in the aforementioned research, but also for those thathey are going to be presented in the next sections, are provided in Table 3,where are summarized in 5 diﬀerent categories. Lexical features are the at-tributes that are used most frequently in the wider spectrum of the NLP andthey are strongly correlated with the diﬀerent applications of n-grams, whereasas semantic features we deﬁne the characteristics of the language that can pro-vide a deeper insight of the data. Sentiment features are those who can triggeremotions and usually they are detected with the use of speciﬁc lexicons or li-braries and the subjectivity features often indicate an opinionated and thereforean argumentative tweet. The twitter-speciﬁed features are oﬀered as metadatathrough the twitter API and concern the speciﬁc characteristics a tweet con-tains, and in the last column we have collected the features that they cannotbe grouped under any of the previous categories. Apart from the semantic andsentiment features, LIWC oﬀer statistics that include personal concerns, coredrives and needs, which are summarized under the psychometric category. Inthe work of Deturck et al. [62] the use of word embeddings takes place aimingat a diversity ﬁltering in order the most argumentative tweets to be discovered.It has to be mentioned that the classiﬁcation of features is diﬀerent in [3](the only work providing a detailed table of features), classifying emotional toneand subjectivity score under the linguistics features. Furthermore, we includethe tasks of information retrieval and topic modeling as described in [64] aslexical feature, provided that they do not discover any semantic purpose. Thoseadjustments took place in order to provide a useful taxonomy and a beneﬁcialcomparison, but it should be noted that diﬀerent categorizations can take place.Regarding the results of the classiﬁcation, those can be extracted from theworks that have followed a supervised approach and they are presented in Table4. In the ﬁrst column the names of the authors are depicted, in the secondand the third columns the algorithms and the features that have been usedare displayed respectively. The last three columns depict the metrics that canevaluate the performance of the algorithms. Concerning the feature selectionand their impact for the classiﬁcation task, the use of all the possible featuresperforms better in each case. The selection of the classiﬁcation algorithm doesnot seem to aﬀect signiﬁcantly the performance of the task in [63], whereasin [3] the use of SVM surpasses the alternative algorithms. For the task ofargument detection, the best results are achieved in [60] with 0.89 F1, whereas[63] and [61] achieve 0.78 F1 and 0.67 F1 respectively. The results of the semi-supervised approaches are measured in terms of ranking quality either usingthe Normalized Discounted Cumulative Gain (NDCG) either simply presentingqualitative results.

The choice and the customization of the theoretical argumentation modelthat will be adopted in any research project aﬀects the individual tasks that willbe raised. Especially the task of relations identiﬁcation, or argument structureprediction as expressed in [37], is the most susceptible part of the AM pipelineto potential changes in the adopted model. The task of annotating relationsbetween parts of text requires the adoption of a holistic approach, capable to uthors Algorithm Features Precision Recall F1Addawood and Bashir [3] DT n-gram 0.72 0.69 0.66SVM n-gram 0.81 0.78 0.77NB n-gram 0.70 0.67 0.64DT all features 0.87 0.87 0.87SVM all features 0.89 0.89 0.89NB all features 0.79 0.79 0.85Bosc et al. [61] LR lexical - - 0.64LR lexical +semantic - - 0.66LR all features - - 0.67Dusmanu et al. [63] RF n-gram 0.76 0.69 0.71LR n-gram 0.76 0.71 0.73LR all features 0.80 0.77 0.78

Table 4: The results of the supervised ML algorithms for the task of argument detecton.Un-supervised methods are not included. RF = Round Forest, LR = Logistic Regression, DT= Decision Tree, SVM = Support Vector Machine, NB = Naive Bayes. Rounding took placein order the results to be displayed in the same scale. identify connections with both preceding and succeeding components of theselected model, the relations between the entities of the network and eventuallyoﬀer a better understanding of the argument.Both Toulmin’s [45] and Freeman’s [48] theories, two of the most inﬂuentialtheories in the wider ﬁeld of logic and argumentation, explicitly deﬁne relationsbetween the components of the arguments. In data derived from social me-dia, argument’s component identiﬁcation is a challenging task as both their sizeand their chaotic nature do not allow strict rules and principles to be applied.As a consequence of this situation, the task of relations identiﬁcation shouldbe redeﬁned and include both micro and macro analysis. The micro-analysisevaluates the quality and the completeness of the argument, whereas the macro-analysis expresses the relation of an argument either towards a known topic ortowards an argument previously expressed. In social media, network analysisalgorithms can signiﬁcantly boost macro-analysis tasks, as the introduction ofnetwork-based features reveal underlying relations between the users and even-tually improve the prediction model [72].The possible outcomes of a macro-analysis in social media text are limitedto support/attack/neither relations indicating in a great extent the outcome ofthe stance detection. On the other hand, micro-analysis is related more withAM and other reliability-related tasks, as it evaluates the integrity and thecohesion of the argument. The arguments extracted from online resources arenot characterized as high-quality data, as often complicated reasoning processshould take place in order the argument to be understood. Arguments withmissing premises are called enthymemes [8, 9] and take place often in informaldiscourse, creating a challenge for the approach that should be followed; discardthe argument or try to ﬁll the missing premise.he need for both micro and macro analysis for the task of relations identi-ﬁcation, in combination with the low-quality data from social media creates theneed for the establishment of simple, but eﬀective rules and standards. Towardsthe need for providing a straightforward deﬁnition able to capture both microand macro analysis, we deﬁne three entities (argument, topic, completeness)able to capture the complicated nature of the task. We convert the problem toa mathematical expression in a triple, where the task of relations’ identiﬁcationsis split in two parts, where the ﬁrst part connects the argument with a speciﬁcproblem (favor/against/neither) and the second part evaluates the structure ofthe argument. Eventually, expressing the relations identiﬁcation as a triple wehave: ( a ij , t i , c j )where the expressed argument a ij is open to a macro-analysis considering atopic t i and a micro-analysis for its completeness c j .Only few researchers have explored the task of relations identiﬁcation in textderived from social media, because of the chaotic nature of social media andthe wide presence of vague claims. In subsection 3.1 the ﬁrst step (argumentdetection) in the proposed pipeline of Bosc et al. [79] was presented, whichis followed by the prediction of attack/support relations between tweets andarguments. Their adopted approach is similar to textual entailment, thus theExcitement Open Platform (EOP) and the Recognizing Textual Entailment(RTE) framework were used. A second method was also applied implementinga neural sequence classiﬁer, however none of the methods presented satisfactoryresults. In fact, the detection of support-relation achieved 0.20 F1 and theattack-relation 0.16 F1 using neural model, and with use of EOP+RTE thesupport-relation achieved 0.17 F1 and the attack-relation 0.0 F1. Apart from thelow score in the automatic detection of relations, even the IAA was signiﬁcantlylower a=0.67 for the speciﬁc task, comparing to IAA for the task of argumentdetection which reached a=0.81.A new method for extracting argumentative relations of attack, support orneither is presented in [66] based on the Relation-based Argumentation Mining(RbAM) model. The proposed model was tested in the dataset of Carstensand Toni [51] and afterwards it was applied on the task of relations predictionbetween tweets and news headlines on two diﬀerent datasets [82] [83]. Apartfrom diﬀerent implementations of neural networks, the impact of trained andnon-trained was also evaluated demonstrating the supremacy of the trained em-beddings. Apart from the use of word embeddings and the argumentativenessfeatures extracted from RbAM, the authors do not describe the rest of the fea-tures, instead they simply use the term standard features, thus safe conclusionsfor the use of features cannot be drawn.Broadening the limits of relations prediction task, Ma et al. [67] introduceda 3-step framework including both micro and macro-analysis of the argument,in contrast to [79] and [66] where only a macro analysis took place. Besides theattack/support relation between a tweet and a topic, the authors also examinedthe relatedness of a topic towards the pre-deﬁned topic and the existence or not uthor Scope Task Algorithms Metric ScoreBosc et al. [61] macro-analysis support / attack EOP + RTE F1 support 0.17F1 attack 0.0LSTM F1 support 0.20F1 attack 0.16Cocarascu and Toni [66] macro-analysis support /attack /neither LSTM dataset [83] dataset [82]P 0.59 0.97R 0.97 0.90F1 0.73 0.94Ma et al. [67] micro & macroanalysis topic relatedness,support/attack,arguable reason SVM light MAP 0.59 0.50P@5 0.53 0.51P@10 0.48 0.44 Table 5: The diﬀerent sub-tasks that have been accomplished in the context of relations iden-tiﬁcation task. MAP = Mean Average Precision, P@5 = Precision@5, P@10 = [email protected] took place in order the results to be displayed in the same scale. of an arguable reason, where the evaluation of the argument toward its com-pleteness take place. Considering the complexity of the proposed methodologyand the comparison with state-of-the-art baselines [68, 84, 85], the presented re-sults that can be characterized as promising. The entire process is characterizedas information retrieval task, thus the learning-to-rank approach was adoptedand the metric precision at k was used, which indicates the precision among thek top results of the retrieval. Considering the three distinctive sub-tasks thatconsist the claim-oriented tweet retrieval task have an increasing complexity, asit is pointed out through the report of the IAA, where topic-relevance reached90.1%, clear stance 78.2% and detection of arguable reason 75.2%.In Table 5, a summarization of research papers on the task of relationsidentiﬁcation in text derived from social media takes place. The ﬁrst column ofthe table presents the authors of the paper, the second one interprets the scope ofthe task(s) according to the proposed deﬁnition and the task are presented in thethird column. The next three columns present the technical details (algorithms,metric, score) for the implementation of each proposed method. In [66] theresults for the two datasets that their proposed methodology has been testedare presented. It should be stressed that the scores are not directly comparable,as diﬀerent research papers carry out diﬀerent tasks; instead we should focus onthe coexistence of diﬀerent approaches and the level of diﬃculty of each one.Each one of the presented research papers exploits data derived from twitter,as the focus of this literature review paper is AM in social media. Web-derivedtext seems to thrive as a source of AM pipelines, including the task of relationidentiﬁcation, but the source of data usually is a more structured source of text,such as debate forums [86, 87, 88, 89]. Although the information found in socialmedia are characterized as noisy text and it is far from an ideal scenario forAM, the constant generation of content allows to the researchers to conductresearch including the time axis in order to understand users’ behaviour [72,90] and evaluate their impact beyond the network [91, 92]. Users in socialedia platforms usually express emotions or quick messages with very littleargumentation, however the introduction of argumentative features can enhanceother NLP tasks [60, 66]. Both micro [93] and macro [94] analysis have theattention of the research community, whereas they have been approaches thatcombine them [87, 88]. Another research topic that has gained the interest ofthe research community is the reconstruction of implicit warrants, although theexisting research papers [8, 9, 29] do not utilize social media as source.

In contrast to relations’ identiﬁcation task, stance detection is a popular taskamong researchers in the NLP community, either as autonomous and indepen-dent task either as a part of an extensive pipeline. Stance detection is thrivingeven in the challenging environment of social media and Twitter is often usedas a source of information.Stance detection is related to many sub-ﬁelds of the wider NLP area, such assentiment analysis, textual entailment, topic extraction and AM. In the contextof AM in social media, we deﬁne stance detection as the task of automaticallydetermining the attitude of the author towards a speciﬁc topic exploiting anykind of information that can be collected. The stance can be determined eitherexclusively by the content of the text either from combination of features thatare capable to reveal speciﬁc characteristics such as argumentativeness [60] ornetwork communities [72, 95].The main diﬀerence between opinion mining and stance detection as it isexpressed in AM pipelines lies in the concept of data aggregation from thewider environment towards the ﬁnal outcome. Stance detection is consideredas the ﬁnal part of the pipeline that exploits the ﬁndings of the previous steps,rather than a stand-alone task.The main diﬀerence between opinion mining and stance detection as it isexpressed in AM pipelines lies in the concept of data aggregation from the widerenvironment towards the ﬁnal outcome. The term wider environment applies onboth combination of sources and tasks [96], as web-generated data and especiallysocial media oﬀer an excellent environment for sentiment analysis, but a poorone for argumentation or opinion mining, thus stance detection is consideredas the ﬁnal part of the pipeline that exploits the ﬁndings of the previous steps,rather than a stand-alone task.As the research community has shown great interest in the task of stancedetection, it is impossible to present each research paper, rather we decided tofocus on the research methodologies that either aggregate data from diﬀerentsources either have the ability to be used as part of a bigger system. Similar torelations’ identiﬁcation task, we also provide a mathematical expression of thestance detection, inﬂuenced by [97] on the deﬁnition of opinion mining. Stancedetection is expressed as the quintuple:( h i , s ijkl , d j , r k , t l ) uthor Table 6: The results of the supervised and weakly-supervised ML approaches that have beenfollowed for the stance detection in social media text. Fully supervision on [69], [68], [60].Weakly supervision on [70], [71], [73], [72]. The [69], [68], [98], [71] and [60] are applied on thesame dataset. RNN = Recurrent Neural Network where h i is the person who holds a speciﬁc stance s ijkl for a speciﬁc debate d l , justiﬁed by a rationale r k in a speciﬁc time t l . The rationale of the stancefor a speciﬁc debate can be assessed for its quality [63, 3] through a variety ofsub-tasks, such as facts identiﬁcation, evidence recognition, source classiﬁcationand reasoning evaluation.The sixth task of SemEval-2016 [99] introduced the shared task of stancedetection in tweets, providing a signiﬁcant boost in the ﬁeld as new methodolo-gies were suggested and the constructed dataset was also used in later research.The shared task consisted of two parts regarding the supervision framework tobe followed (fully-supervised, weakly-supervised). As the constructed datasetwas used afterwards from the completion of the task, more modern approacheshave surpassed the top performances described in the task.The best-performing system of the competition [69] proposed a recurrentneural network capable to extract information from unlabeled datasets usingword embeddings. The use of word embeddings as features was also critical in[68], where a simpler linear SVM algorithm achieved F-score up to 0.70, surpass-ing the previously highest score of 0.68. Apart from the use of word embeddings,the presence or absence of the target of interest in the tweet improved the resultsof the algorithm. One more improvement in the same dataset for the same taskwas achieved by Wei et al. [70] reaching the F1 to 0.71, where an end-to-endneural model was proposed which makes better use of target information.Considering the results for the weakly-supervised framework as describedin [99], those are signiﬁcantly lower as no training data are provided for thetopic that is researched, thus the developed methodologies rely heavily on tech-niques that can transfer knowledge from diﬀerent topics. The submission withthe highest performance for the task achieved 0.56 F1 [98] and proposed a con-volutional neural network including a modiﬁed softmax layer able to performthree class classiﬁcation, although the training is consisted of two classes. Aweakly-supervised approach exploiting the network structure information pro-posed from Ebrahimi et al. [71] improved the previously best score and reached.57 F1.Weakly supervised approaches exploiting social media features for politicalstance detection were also adopted in Lai et al. [72] and in Johnson and Gold-wasser [73]. The former is focused on the Italian referendum in 2016 and employsa holistic approach for the stance detection adopting a diachronic perspective forthe user’s stance including twitter-speciﬁc features and social network commu-nities, achieving 0.90 f-micro with the use of svm. A similar approach was alsofollowed in [73], able to capture both the content and the social context throughlinguistic patterns reaching 0.86 accuracy. The novelty in their approach lieson the absence of manual annotation, as the annotation of the political stancestook place with the use of ISideWith.com.In the work of Addawood et al. [60], an advancement of a previously es-tablished scheme [3] is proposed capable to carry the task of stance detectionresulting in a F1 score of 0.93 with the use of the Decision Tree algorithm.In this paper, the introduction of argumentativeness as a feature take placeincreasing signiﬁcantly the performance of the algorithm. The ﬁndings of thepaper indicate that argumentativeness features are the most informative onesfor the successful categorization of favor and neutral categories, stressing theimportance for introducing AM techniques in diﬀerent text mining tasks.Table 6 presents the algorithms, the metrics and the score selected researchpapers have achieved. In the second column the possible classes that are pro-vided to the classiﬁcation algorithms are depicted, there is either a binary ap-proach (favor, against) either the ‘neither’ option is included. Considering thealgorithms that are used, SVM and neural networks algorithms are used, apartfrom [73] where a probabilistic soft logic was adopted. In the last two columns,the score that has achieved measured with diﬀerent metrics are presented. Asit is expected fully supervision ML approaches achieve higher results when theyare compared to weakly supervision ML approaches, when both applied on thesame dataset. Apart from the ML approach, deﬁning the number of possibleclasses plays a role in the performance of the algorithms, as the binary approachachieves normally higher results. The wide use of Twitter in combination with its public nature have estab-lished it as the most appropriate social network for studying a variety of tasksrelated to virality, such as viral marketing, rumour diﬀusion, event and fakenews detection. The majority of the proposed methods are relied to social net-work analysis, exploiting the metadata oﬀered by the social network (friends,followers, time of publishing, etc.). As the scope of this paper is neither anin-depth review of rumour detection techniques and methods, nor the evidenceidentiﬁcation for claims in any kind of text, we are going to focus on researchwork that connects argumentativeness with the evaluation of the argument’sreliability in text derived from social media.The Twitter platform is often used as a mean of expressing arguments forcontroversial topics, some of them are eﬃciently supported with facts and ev-idence from reliable sources, whereas in some other cases instead of backing uthor Task

Addawood et al. [3] Evidence Classiﬁcation 6 SVM F1-macro 0.83Dusmanu et al. [63] Factual vs opinion 2 LR P 0.81R 0.79F1 0.80Source Identiﬁcation NA str match + h. P 0.69R 0.64F1 0.67Konstantinovskiy et al. [74] Claim detection 2 LR P 0.88R 0.80F1 0.837 LR P-micro 0.71R-micro 0.73F1-micro 0.70P-macro 0.61R-macro 0.44F1-macro 0.48

Table 7: The results of the supervised ML approaches that have been followed for reliability-related tasks in AM, in text derived from social media. For the task of source identiﬁcation arule-based approach is followed. str match = string matching. h = heuristic algorithm their claims, they simply express feelings or unsupported allegations. The con-stant data generation and rapid pace of news ﬂow create a chaotic environmentwith limited time for claims to evaluated and facts to be assessed. Due to theenvironment that has been created, where users express opinions and views inreal time without using sophisticated or pretentious vocabulary, a unique oppor-tunity is raised for argument evaluation on various political and social issues.The automatic evaluation of arguments has the potential to reduce the inci-dents of rumour spreading, the faster detection of fake news and eventually theimprovement of the quality of public political discourse.An essential part of a complete argument is the suﬃcient backing of theoriginal claim, either in the form of premises either in the form of backingwith evidence and facts presentation. A simple, but robust structure of claimand supporting evidence is adopted in [3] for the classiﬁcation of arguments’evidence, where the ultimate step of the proposed pipeline is the classiﬁcationof evidence to six diﬀerent categories. The proposed pipeline of Dusmanu etal. [63] contains two tasks that are related to the reliability of the expressedargument, the distinction of factual information from opinions and the sourceidentiﬁcation from a pre-deﬁned list. For both the tasks the use of all theavailable features boost the results of the classiﬁcation.In the work of Konstantinovskiy et al. [74], where the objective is the con-struction of a reliable fact-checking mechanism, both an annotation scheme andan automated claim detection method is proposed. Two diﬀerent approachesare presented, the binary model of claim/no claim and a multi-class classiﬁ-cation with seven categories describing the claim. The proposed methodology(the binary model) overcomes previously established mechanisms (Claimbuster,ClaimRank) in terms of F1, as it achieves F1 0.83, while the multi-class classi-ﬁcation displays the impressive 0.70 F1-micro, and in terms of macro average itachieves 0.48 F1-macro.n Table 7 a summarization of the research work that accomplishes reliability-related tasks in the context of AM in text derived from social media is presented.Four tasks have been recognized in this category and the results of the diﬀerentapproaches are heavily relied on the number of the alternative classes. The 0.83F1-macro that is achieved in [3] is impressive if we consider that there are sixavailable classes, and for a similar task with seven available classes the F1-macrois 0.48 in [74]. The diﬀerence could be merely explained through the exploitationof more features in [3] in comparison to [74], where not that advanced featureswere used.Besides the aforementioned research, there have been some approaches onconnected reliability and evidence with argument’s strength, but they are notapplied in social media text. For example, a research work that uses argumenta-tiveness as a feature, apart from [60], takes place in Cocarascu and Toni [66] forthe task of deceptive reviews detection, leading to an improvement of the pre-diction algorithms. It has to be noticed that the exclusive use of argumentativefeatures without topic modelling or the use of additional features cannot surpassthe baseline. The work of Park and Cardie [6, 7] is another great example ofcombining argumentation with evidence classiﬁcation, suggesting three diﬀerentcategories of justiﬁcations in online user comments, but as in [66], both papersexploit more structured forms of arguments. Similarly, the task of context de-pendent claim detection [100, 101] utilizes hundreds of Wikipedia articles, butit has not been applied in social media text.Another task that is related with both the quality evaluation of the argu-ment and the in-depth analysis in macro-scale is the enthymeme reconstruction.Although the task has not been applied yet in data derived from social media,at least in our knowledge, it has been tested in web-generated data [8, 9] and westrongly believe that text from social media oﬀer an excellent testing ground forthe identiﬁcation of implicit supporting claims. The task of reasoning compre-hension as it has been presented in [29] has the potential to be applied in socialmedia and advance the wider ﬁeld of AM, but the complexity of the proposedmethods raises concerns for the transferability of the task to new environmentand un-trained annotators.

4. Existing Tools in AM

The increasing interest in AM has increased the need for suitable tools,such as grammar parsers, sentiment lexicons, software for boosting the manualannotation tasks, and tools capable to automatically extract arguments fromnatural language. In the area of NLP, there is a wide range of available toolscovering various aspects and addressing diﬀerent challenges. However, there islack of standardization, accessibility, and acceptability of the existing tools, dueto proprietary formats developed for modeling of natural language in diﬀerentdomains.In subsection 4.1, we provide a synopsis of some of the most prominent toolsin the area of NLP and how they could improve the process of accomplishingM tasks. In the second subsection 4.2, we narrow down the area of interestand focus on tools that are speciﬁed for the task of AM.

The annotation process is of major importance in any NLP system, thusdiﬀerent tools have been proposed following diﬀerent approaches. The ﬁrst in-troduced web-based opensource annotation tool is BRAT [102], which is basedon the STAV text annotation visualizer [103] and is characterized by the widevariety of tasks that can be accomplished and scientiﬁc work that has beenconducted using it. It has been adopted in diﬀerent ﬁelds like visualization, en-tity mention detection, event extraction, coreference resolution, normalization,chunking, dependency syntax, and meta-knowledge annotation.A tool following the approach of BRAT is WebAnno [104] which has kept theweb interface and visualization capabilities of BRAT and modiﬁed the serverlayer. WebAnno has improved speciﬁc weaknesses of BRAT focusing on userand quality management with the addition of monitoring tools and interfacesfor crowdsourcing. Currently, WebAnno is in version 3.0 and also oﬀers a web-instance through the CLARIN-D infrastructure. Both BRAT and WebAnnoare an open and live project that could easily be modiﬁed in order to includetasks in the sphere of AM, such as argument detection, relation identiﬁcationor reasoning evaluation.The construction of graphs for text annotation is followed in GraPAT [105]covering diﬀerent tasks like sentiment analysis, argumentation structure, rhetor-ical text structure, and natural visualization of the annotation process. Theinitial goal for the development of GraPAT was to increase IAA and maximizethe automation of annotation without neglecting neither the variability nor theannotation speed/comfort. GraPAT can be considered as the successor of RST-Tool [106], as it maintains the principles of RST annotation, enriching it withmore capabilities like sentiment analysis and argument structure annotationmodel.GATE [107] has dominant presence in the wider ﬁeld of text engineeringoﬀering numerous tools and capabilities from simple tasks (e.g. informationextraction, named-entity, etc.), to modiﬁcations for cutting-edge technologies,such as cloud-enabled software and social media analysis. Regarding the ar-gument annotation task, GATE oﬀers Teamware [108], a web-based softwaresuite which provides the environment for collaborative annotation and cura-tion. GATE Teamware stands out as the only annotation tool, to the best ofour knowledge, which supports execution of an automatic NLP system, beforemanual annotation.The DiGAT tool [109] has been developed alongside with an annotationscheme and a graph-based inter-annotator agreement measure based on semanticsimilarity. Similarly to GraPAT, DiGAT also relies on graph structures for http://brat.nlplab.org/index.html https://webanno.sfs.uni-tuebingen.de/ ool Web UI Manual Annotation Arg Retrieval Arg EvaluationWebAnno [112] Yes YesBRAT [102] Yes YesGraPAT [105] Yes YesDiGAT [109] Yes YesMARGOT [113] Yes Yes YesOVA+ [114] Yes YesTOAST [115] Yes Yes YesGATE Temware [108] Yes YesArgs [116] Yes Yes YesArgumenText [117] Yes Yes YesRationale [118] Yes Yes Table 8: A summarization of the existing NLP tools that can enhance the process of AM.The table includes tools in the wider NLP area which can be integrated at any stage of anAM pipeline. the annotation process, aiming at simple and accurate annotation of relationsbetween entities in long text.The establishment of the TextCoop platform alongside the Dislog languageis presented in [110]. TextCoop is the only tool in this subsection that followsa logic-based approach, heavily inﬂuenced by RST, modeling the conclusion asa nucleus and the support as a satellite. As it is described, TextCoop oﬀersa functional web interface, however, to the best of our knowledge, this is notprovided, yet. Another tool that is in a similar status with TextCoop is TURK-SENT [111], a manual annotation tool with multi-lingually capabilities aimingat automatic sentiment analysis of text derived from social media. Its mentionedweb-based interface does not seem to be available, yet.The Stanford CoreNLP toolkit [119] receives great acceptability from theNLP community as it oﬀers a broad range of grammatical analysis tools, dif-ferent APIs for the majority of the programming languages, and the ability torun as a simple web service. However, a speciﬁc tool for AM-related tasks isnot yet developed. Among the existing tools oﬀered by the toolkit, the StanfordOpenIE is more closely related with AM, as it enables the extraction of relationtuples out of binary relations. Two more tools included in the toolkit and can beused in an AM pipeline is the Stanford Relation Extractor, which ﬁnds relationsbetween two entities located by the Stanford Named Entity Recognizer, and theNeural Network Dependency Parser, a dependency parser that establishes rela-tionships between ”head” words and ”modiﬁer” words in a sentence. The webinterface provided by the toolkit oﬀers up to ten diﬀerent annotators and thevisualization of the schemes has been realized using the BRAT software.The majority of the aforementioned tools are on-going, open-source projectsthat can be modiﬁed in order to carry out AM-related tasks, whereas someothers (GraPAT, DiGAT) are graph-based annotation tools that are used for http://corenlp.stanford.edu/ dentifying the relations between chunks of text. Other functionalities such assentiment evaluation and name entity recognition can boost AM related tasks,as sentiment features are used in the majority of the existing research papers inthe area as Table 3 illustrates. If any of the subtasks that are included in theAM pipeline can be executed automated and reliable through a software tool,then this tool should be exploited. In this sub-section, we present the tools that have been designed to enhancethe argumentation process. Some of those tools have been implemented to boostthe annotation step, others oﬀer an argumentation search engine and there aretools capable to automatically grade an argument or even perform the entireprocess of AM.The Centre for Argument Technology has produced a series of tools cover-ing diﬀerent aspects of AM. The latest developed tool is OVA [114] and hasreplaced to a certain extent the tool of Araucaria [120]. Arivina [121], the suc-cessor of MAgtALO [122], oﬀers a dialogue system implementing the concept ofmixed initiative argumentation, where human players and agents debate havingequal levels of participation. The process for the calculation of the acceptabil-ity semantics on structured argumentation frameworks is completed throughTOAST [115].Probably the most inﬂuential achievement of the considered organization isthe establishment of the Argument Web [123], a repository in cooperation witha series of tools, systems and services, such as AIFdb, the main search interfacefor the Argument Web. ArguBlogging [124] materializes the concept of crowd-sourcing in argumentation by capturing arguments that take place in onlineplatforms (tumblr and Blogger are supported) and provides them as feed to theArgumentat Web. Concerning the educational aspect of argumentation, Argu-grader oﬀers automatic grading and provides detailed feedback to studentsregarding successful or not construction of arguments.A prototype argument search framework is proposed in [116] able to carryout the entire process of an argumentation search engine, from user query andargument retrieval to ranking and presentation of arguments . The argumentsearch engine is relied on pre-structured arguments from a deﬁned list of debateportals, and a standard mapping process takes place in order to convert theconcepts that characterize each argument in the diﬀerent debate portals to thecommon argument model. http://ova.arg-tech.org/ http://arvina.arg-tech.org/ http://toast.arg-tech.org/ system able to utilize the heterogeneity and big volume data is imple-mented in Stab et al. [117], under the name ArgumenText. The ArgumenTextuses 400 million plain-text documents from diﬀerent sources and deploys a se-ries of technologies in order to construct a solid pipeline able to materialize asequence of sub-tasks that eventually present ranked pro and con argumentsthrough a web interface.At this moment we are aware of only one tool that accomplishes the com-plete task of automatic annotation in terms of AM. MARGOT [113] is built onthe foundations of Lippi and Torroni [125] and extends the previous establishedmodel by including the task of evidence detection and providing a web inter-face . The syntactic structures that are followed in argumentative discourseis the fundamental idea on which the tool was built. The model implementedfor AM involves two binary classiﬁcation problems, the argumentative sentencedetection and the argument components boundaries detection. The former isaddressed with a combination of tree kernel and bag-of-words, whereas for thelatter SVM-HMM with bag-of-words, part-of-speech, lemma and named entityare employed. In overall, the tool achieves acceptable evaluation scores, re-garding the complexity of the task, but there is still room for improvement incovering various domains.Another tool that is very similar to MARGOT, sharing the same interface ,is CLAUDETTE [14], an on-line platform addressing possible unfair or abusiveterms of service. The platform realizes a 2-step algorithm including the binarytask of detecting an unfair clause and if the ﬁrst step is positive, then a classi-ﬁcation task through 8 possible categories takes place. For this classiﬁcation, acombination of eight SVMs exploiting lexical features is used with fair resultsin all used metrics.On industry level, Austhink Pty Ltd. is continuously developing softwaretools aiming at the improvement of the general reasoning and argumentationprocess through training sessions. The two most successful software tools areRationale [118, 126] and bCisive [127], with both of them oﬀering online lim-ited free versions . The former supports a series of activities in the area ofAM, such as construction, visualization and mapping of arguments, while thelatter focuses on providing support to business decision through hypothesis anddecision maps. Rationale is considered as the successor of Reason!Able [128].A synopsis of the existing NLP tools that have been discussed in this sec-tion is presented in Table 8. The existing tools can be classiﬁed into threecategories, tools that aid the manual annotation process, general-purpose NLPtools and tools that oﬀer an entire mechanism for argument search, retrievaland evaluation. It has to be underlined the evaluation of the argument diﬀersto the diﬀerent approaches, as in [113] the number of claims and premises are http://margot.disi.unibo.it/ http://155.185.228.137/claudette/ resented, in [115] the weight of the argument is calculated and in [116, 117]the arguments are categorized as pro and con.

5. Proposed conceptual framework and future directions

As every scientiﬁc ﬁeld that is in a premature level, AM has to take stepstowards its establishment in the wider research community of NLP by invokingthe interest to a greater audience through more applications of AM in a widerspectrum of scenarios. Towards this vision, there are speciﬁc challenges andconcerns that have to be faced and technologies that have to be tested. In theremainder of this section, we propose a conceptual framework able to capturethe needs of AM in social media and boost NLP-related tasks, and present somepromising approaches that can signiﬁcantly contribute in the AM process. Ourproposed conceptual framework is in our knowledge the ﬁrst pipeline capable toconnect various NLP tasks between them having in prominent position the AMtasks in noisy environment. Previously proposed pipelines are either focusedon the task-at-hand [3, 129, 6] either do not present any connection with otherNLP-related tasks [37].

The two most dominant characteristics of text derived from social media isthe short length and the lack of deﬁned norms. Considering the fact that thetypical length of a tweet is less than 50 characters, the deﬁnition of argumentboundaries in many cases is not feasible, as an argumentative tweet can hardlycontain information unrelated to the major claim. On the other hand, thealready unstructured nature of text data in combination with the massive useof jargon and emoticons establish an environment where any lexical rule is reallychallenging to be applied, as either has to be speciﬁc for each case study eitherloosely deﬁned in order to include diﬀerent cases. Both assumptions wouldprobably lead to lack of transferable knowledge, a crucial objective for almostevery proposed methodology.A possible criticism for the lack of the boundaries-deﬁnition task as deﬁnedin [37] is the increase of the upper limit in Twitter to 280 characters from 140,as well as the existence of various social media including forums such creat-edebate.com where the norm indicates well-structured arguments and lengthsigniﬁcant longer than 50 characters. However, the great dominance of Twitterin socio-political issues, which is also boosted from the online presence of politi-cal leaders, combined with the shrinking of general-purpose forums have formedan environment which the use of social media data seems to be the only sourceof web-generated data in the near future.The increase use of social media in the wider area of AM as a source ofinformation highlights the need for the deﬁnition of a new scheme devoted forthe specialized procedures in the ﬁeld of AM in social media. The proposedconceptual architecture, as it is depicted in Figure 5, contains three main com-ponents and each one includes distinct tasks that could form a second focused igure 5: The Proposed Conceptual Architecture for AM pipeline, but could also exist independently from the other tasks. The ﬁrstcomponent contains the core tasks of an AM procedure, the tasks that makethe heart of the pipeline and can be applied in diﬀerent text data, from oﬃcialpolitical speeches to tweets and comments in products or services reviews. Theother two components include tasks that are involved somehow in the process ofevaluation the reliability of the text. The additional components should not beunderestimated and should be considered of equal importances compared to theAM core task, as the interest of the scientiﬁc community is constantly increasingfor tasks such as fact checking, evidence and fake news detection, even thoughthe connection between AM and reliability-related tasks is not yet very solid.The ﬁrst step of the ﬁrst component in the proposed conceptual architectureis the task of argument detection, where the identiﬁcation of a sentence asargumentative or not takes place. A signiﬁcant amount of work intentionallyignores the step of argument detection [26, 72], as argumentative rhetoric is aprerequisite for the evaluation of persuasive essays or political debates. Whencollecting heterogeneous data from social media the detection of argumentativetext is unavoidable, as not all users intend to persuade for or against a discussedtopic, but simply express a reﬂection, a feeling or a question. The task ofargument detection is considered as an essential step for the AM pipeline insocial media, as the following steps of pipeline is not possible to be completedif is missing. great amount of research work is focused on the identiﬁcation of the com-ponents that construct the argument, especially the early attempts of argumen-tation (see section 2.1), presenting as the ultimate goal the successful analysisof the argument, emphasizing in the reasoning concept behind the structure ofthe argument. The concept of in-depth analysis of an argument through theidentiﬁcation of the argument’s components and the discovery of the under-lying relations between them has been adopted and developed in the ﬁeld ofAM in social media, adjusting to the new environment including the relationsand the interaction between the entities of the network. The two main tasksthat fall under this category is the relation-based AM [51] and the enthymemereconstruction [8]. Both tasks provide a useful insight for the structure of theargument, but instead of trying to evaluate its impact, they focus on the eﬃ-cient use of small datasets [9, 66] or they are designed to aid the task of stancedetection [8].Earlier in the text we characterized Opinion Mining as the predecessor ofAM, mostly because the terms stance detection and opinion mining can be usedinterchangeably. Stance detection is the ﬁnal step of the proposed frameworkand it is a task that is heavily depended from the previous steps of the pipeline,as the nature of social media data demands a signiﬁcant pre-process procedure.This procedure in AM pipeline is expressed through the detection of argumen-tative tokens and the identiﬁcation of relationship between the components ofthe argument (explicit or implicit).The constant generation of data in social media has raised signiﬁcant con-cerns for the quality of information that is shared and read in social media.The connection between argument in social media and reliability has not yetdiscovered in depth, comparing with the amount of work that is dedicated inthe areas of rumour and fake news detection. However, tasks such as evidencetype classiﬁcation, source identiﬁcation and facts recognition have emerged inthe area of AM, raising the awareness of the connection between the expressedargument and its reliability.In the proposed conceptual architecture for AM in social media we devote aunique component of the pipeline in reliability-related tasks, in order to stressits importance and the room of the development that exists in the area. AMreliability tasks are not considered as core tasks of the AM pipeline, but theycan enhance the procedure especially when applied in arguments derived fromsocial media, as the backing of the claims is many times inaccurate or it is basedon rumours or hoaxes. Other tasks that can be assisted by the progress in theﬁeld of AM and integrate parts of the proposed AM core tasks is sentimentclassiﬁcation [68], subjectivity/objectivity identiﬁcation, topic modeling [130],and recommendation algorithms [131].Diﬀerent tasks can be easily executed and combined through the compo-nents described in our proposed architecture, as it is both detailed and easilymodiﬁable. The example provided in 3.1 concerning the Apple/FBI encryptiondebate (

RT @ItIsAMovement ”Without strong encryption, you will be spied onsystematically by lots of people” - Whitﬁeld Diﬃe ), can be assessed for its ar-gumentativeness nature, the relations that expressed through the retweet andhe mention, its stance towards the discussed debate, its completeness and in-tegrity, while other NLP-related tasks can also be enhanced from the ﬁndingsof the above tasks. Our intentions on our proposed framework is to be re-garded and used as a mean for enhancing various NLP tasks with the use ofargumentativeness features.

Handling data with great volume and variety is not an easy task, especiallyif we consider the well-known ”gold rule” , where manual annotation is requiredfor a subset of data. The process of manual annotation is labor intensive andtime consuming, whereas possible use of unsupervised machine learning algo-rithms could solve the problem of lack of trained annotators. The need fornovel algorithms and techniques is emphasized also in previous review papers[12, 37] and although there is an evident trend towards unsupervised [132, 133]or semi-supervised ML algorithms [7, 20, 129, 134] with notable performance,they could be further improved, as there has not been extensive work neitheron the use of suitable features nor on the design of argument schemes. A recentliterature review from Silva et al. [135] presents the trends in semi-supervisedlearning for tweet sentiment analysis and it can be used as a point of referencealso in the ﬁeld of AM.Deep learning techniques are able to handle a great volume of data in an un-supervised or semi-supervised way and they have achieved break-trough resultsin NLP ﬁeld. Deep learning has been applied [93, 39, 9] in AM, but does notseem to overpass other ML algorithms, mainly because of the limited availabledatasets, however more research should take place in order safe conclusion tobe drawn.In the ﬁelds of Sentiment Analysis and Opinion Mining there is a major trendtowards deep learning, which is adopted in multidisciplinary ﬁelds and in variouskinds of test cases [33, 136]. Concerning related applications in the AM ﬁeld,several deep learning architectures are combined with unsupervised learningin the pretraining stage for feature extraction, in order to obtain importantsemantic features through this process. It is probably infeasible to completelysolve such a complex problem, like AM, without the use of manual annotation,but these techniques could signiﬁcantly reduce load from the labor intensiveprocess of manual annotation.The concept behind AM is ﬁnding the underlying reasoning of an expressedopinion, not only identifying if this opinion is positive or negative. In humanreasoning, this process is accomplished naturally by combining a-priori knowl-edge towards a speciﬁc subject and knowledge on the ethos and inﬂuences of theperson who express an opinion. A similar approach is followed for irony detec-tion [137], where previous work determined that contextualizing information isrequired [138]. Oddly enough, an approach of that sort is yet to be followed inthe ﬁeld of AM, although combining multiple sources of information seems to bethe natural ﬂow of human reasoning for argument detection and classiﬁcation.Exploiting background knowledge can be achieved through the use of semanticencoders mainly in models employing reinforcement learning.he studies in the ﬁeld of opinion mining and sentiment analysis are focusedon real-world scenarios combining diﬀerent scientiﬁc domains like politics orsocial networking and their roots are originated from reviews, recommendationsystems, and digital marketing. Most of the research on opinion mining aims atimproving speciﬁc commercial applications, whereas AM architectures have notbeen tested thoroughly in such scenarios. In order AM to go one step further, itis crucial that more real-world scenarios will be employed combining backgroundknowledge and information from multiple sources.

6. Conclusion

Argumentation Mining is an attempt for a deeper understanding of naturallanguage, is the natural evolution of opinion mining, but instead of trying tounderstand what others think, the focus is on understanding why. The analysisof the human reasoning process is the ultimate goal of AM and it is acquiredby exploiting the inherent structure of an argument through the identiﬁcationof its distinctive parts implying the inferential process that is followed.The understanding of the human reasoning can oﬀer unprecedented capa-bilities and achieve breakthrough changes in a wide spectrum of applications asinformation for the decision-making process can be retrieved. The existing workin the ﬁeld indicates its great potential but we must consider the fact that AMis still in a premature stage and there are steps that are required for realizinghuman-level reasoning or at least to be able to interpret it at a suﬃcient level.The research community has focused on the modeling of the argument and inthe suitable selection of features, whereas the selection of the ML algorithmseems not to play a crucial role for the accomplishment of the diﬀerent tasks.Diﬀerent approaches have been tested for modeling arguments from diagramsdepictions to modiﬁcations of well-established theoretical models and no modelexcels in comparison with others, creating the need for establishing a ﬂexibleframework able to capture the needs of the diﬀerent tasks that appear in AMproblems.In our review, we try to shed light to the Argumentation ﬁeld, in order toprovide a clear view of the wider area with a focus on automatic AM in socialmedia text. We present diﬀerent models that take place in previous researchworks and break them down to their core tasks. Inspired from the individualAM sub-tasks, we propose a complete conceptual architecture for AM that canbe easily adopted and modiﬁed depending on the goals of every research workand we present the existing tools in both the wider NLP area and more speciﬁedtools for the task of AM.Argumentation Mining could stimulate a series of applications, where theevaluation and the classiﬁcation of reasoning is essential, especially in web con-tent, where both reliability and reasoning validity of a user holding a positionare questionable. Tasks such as troll detection, knowledge retrieval, and in-formation validation could be signiﬁcantly beneﬁted by the progress in AM.Cutting- edge techniques in human-computer interaction, opinion mining, andrecommendation systems could adopt parts of an AM system and enhance theirerformance. The successful interpretation, evaluation, and taxonomy of ar-guments will eventually lead to human level reasoning machines, which canunderstand, evaluate, and eventually create knowledge.

ReferencesReferences [1] S. E. Toulmin, The Uses of Argument, Cambridge University Press, Cam-bridge, 2003. doi:10.1017/CBO9780511840005 .[2] M. Liebeck, K. Esau, S. Conrad, What to Do with an Airport? MiningArguments in the German Online Participation Project Tempelhofer Feld,in: Proceedings of the Third Workshop on Argument Mining (ArgMin-ing2016), Association for Computational Linguistics, Stroudsburg, PA,USA, 2016, pp. 144–153. doi:10.18653/v1/W16-2817 .[3] A. A. Addawood, M. N. Bashir, What is Your Evidence? A Study of Con-troversial Topics on Social Media, in: Proceedings of the 3rd Workshopon Argument Mining, Berlin, Germany, 2016, pp. 1–11.[4] F. Boltuˇzi´c, J. ˇSnajder, Back up your Stance: Recognizing Arguments inOnline Discussions, in: Proceedings of the First Workshop on Argumen-tation Mining, Association for Computational Linguistics, Stroudsburg,PA, USA, 2014, pp. 49–58. doi:10.3115/v1/W14-2107 .[5] Z. Kurtanovi´c, W. Maalej, On user rationale in software engineering, Re-quirements Engineering (2018) 1–23 doi:10.1007/s00766-018-0293-2 .[6] J. Park, C. Cardie, Identifying Appropriate Support for Propositions inOnline User Comments, in: Proceedings of the First Workshop on Argu-mentation Mining, Baltimore, Maryland USA, 2014, pp. 29–38.[7] J. Park, A. Katiyar, B. Yang, Conditional Random Fields for IdentifyingAppropriate Types of Support for Propositions in Online User Comments,in: Proceedings of the 2nd Workshop on Argumentation Mining, Denver,Colorado, 2015, pp. 39–44.[8] P. Rajendran, Contextual stance classiﬁcation of opinions: A step towardsenthymeme reconstruction in online reviews, in: Proceedings of the 3rdWorkshop on Argument Mining, Berlin, Germany, 2016, pp. 31–39.[9] P. Rajendran, D. Bollegala, S. Parsons, Is Something Better than Noth-ing? Automatically Predicting Stance-based Arguments using DeepLearning and Small Labelled Dataset, in: 16th Annual Conference of theNorth American Chapter of the Association for Computational Linguis-tics: Human Language Technologies, New Orleans, Louisiana, 2018, pp.28–34.10] F. Belbachir, M. Boughanem, Using language models to improve opiniondetection, Information Processing & Management 54 (6) (2018) 958–968. doi:10.1016/J.IPM.2018.07.001 .[11] M. Tubishat, N. Idris, M. A. Abushariah, Implicit aspect extraction insentiment analysis: Review, taxonomy, oppportunities, and open chal-lenges, Information Processing & Management 54 (4) (2018) 545–563. doi:10.1016/J.IPM.2018.03.008 .[12] R. Mochales, M.-F. Moens, Argumentation mining, Artiﬁcial Intelligenceand Law 19 (1) (2011) 1–22. doi:10.1007/s10506-010-9104-x .[13] J. Savelka, K. D. Ashley, Extracting Case Law Sentences for Argumenta-tion about the Meaning of Statutory Terms, in: Proceedings of the 3rdWorkshop on Argument Mining, 2016, pp. 50–59.[14] M. Lippi, P. Palka, G. Contissa, F. Lagioia, H.-W. Micklitz, G. Sartor,P. Torroni, CLAUDETTE: an Automated Detector of Potentially UnfairClauses in Online Terms of Service, arXiv preprint.[15] N. L. Green, Towards mining scientiﬁc discourse using argumentationschemes, Argument & Computation 9 (2) (2018) 121–135. doi:10.3233/AAC-180038 .[16] A. Lauscher, G. Glavaˇs, K. Eckert, ArguminSci: A Tool for Analyzing Ar-gumentation and Rhetorical Aspects in Scientiﬁc Writing, in: Proceedingsof the 5th Workshop on Argument Mining, Association for ComputationalLinguistics, Brussels, Belgium, 2018, pp. 22–28.[17] A. Lauscher, G. Glavaˇs, S. P. Ponzetto, An Argument-Annotated Cor-pus of Scientiﬁc Publications, in: Proceedings of the 5th Workshop onArgument Mining, Association for Computational Linguistics, Brussels,Belgium, 2018, pp. 40–46.[18] N. Naderi, G. Hirst, Argumentation Mining in Parliamentary Discourse,in: Workshop on Computational Models of Natural Argument, Interna-tional Workshop on Empathic Computing, Springer, Cham, Bertinoro,Italy, 2015, pp. 16–25. doi:10.1007/978-3-319-46218-9{\_}2 .[19] B. K. Bal, P. S. Dizier, Towards Building Annotated Resources for Ana-lyzing Opinions and Argumentation in News Editorials, in: Proceedingsof the Seventh conference on International Language Resources and Eval-uation, Valletta, Malta, 2010, pp. 1152–1158.[20] C. Sardianos, I. M. Katakis, G. Petasis, V. Karkaletsis, Argument Extrac-tion from News, in: Proceedings of the 2nd Workshop on ArgumentationMining, Denver, Colorado, 2015, pp. 56–66.21] C. Reed, D. Walton, F. Macagno, Argument diagramming in logic, lawand artiﬁcial intelligence, The Knowledge Engineering Review 22 (01)(2007) 87. doi:10.1017/S0269888907001051 .[22] M. Skeppstedt, M. Sahlgren, C. Paradis, A. Kerren, Unshared task:(Dis)agreement in online debates, in: Proceedings of the 3rd Workshopon Argument Mining, Berlin, Germany, 2016, pp. 154–159.[23] J. B. Freeman, Argument structure : representation and theory, Springer,2011.[24] D. Walton, How to Refute an Argument Using Artiﬁcial Intelligence, Stud-ies in Logic, Grammar and Rhetoric 23 (36) (2011) 123–154.[25] A. Peldszus, M. Stede, Rhetorical structure and argumentation structurein monologue text, in: Proceedings of the 3rd Workshop on ArgumentMining, Berlin, Germany, 2016, pp. 103–112.[26] C. Stab, I. Gurevych, Annotating Argument Components and Relationsin Persuasive Essays, in: Proceedings of COLING 2014, the 25th Inter-national Conference on Computational Linguistics: Technical Papers ,,Dublin, Ireland, 2014, pp. 1501–1510.[27] N. L. Green, Manual Identiﬁcation of Arguments with Implicit Conclu-sions Using Semantic Rules for Argument Mining, in: Proceedings of the4th Workshop on Argument Mining, Copenhagen, Denmark, 2017, pp.73–78.[28] F. Boltuˇzic, J. Snajder, Fill the Gap! Analyzing Implicit Premises betweenClaims from Online Debates, in: Proceedings of the 3rd Workshop onArgument Mining, Berlin, Germany, 2016, pp. 124–133.[29] I. Habernal, H. Wachsmuth, I. Gurevych, B. Stein, The Argument Rea-soning Comprehension Task: Identiﬁcation and Reconstruction of Im-plicit Warrants, in: 16th North American Chapter of the Association forComputational Linguistics: Human Language Technologies, New Orleans,Louisiana, USA, 2018, pp. 1930–1940.[30] W. C. Mann, S. A. Thompson, Rhetorical Structure Theory: Toward afunctional theory of text organization, Text - Interdisciplinary Journal forthe Study of Discourse 8 (3) (1988) 243–281. doi:10.1515/text.1.1988.8.3.243 .[31] P. Reisert, N. Inoue, N. Okazaki, K. Inui, A Computational Approach forGenerating Toulmin Model Argumentation, in: Proceedings of the 2ndWorkshop on Argumentation Mining, Denver, Colorado, 2015, pp. 45–55.[32] S. Lee, T. Ha, D. Lee, J. H. Kim, Understanding the majority opinionformation process in online environments: An exploratory approach toFacebook, Information Processing & Management 54 (6) (2018) 1115–1128. doi:10.1016/J.IPM.2018.08.002 .33] H. T. Nguyen, M. Le Nguyen, Multilingual opinion mining on YouTube –A convolutional N-gram BiLSTM word embedding, Information Process-ing & Management 54 (3) (2018) 451–462. doi:10.1016/J.IPM.2018.02.001 .[34] A. Chandra Pandey, D. Singh Rajpoot, M. Saraswat, Twitter sentimentanalysis using hybrid cuckoo search method, Information Processing &Management 53 (4) (2017) 764–779. doi:10.1016/J.IPM.2017.02.004 .[35] A. Giachanou, F. Crestani, Like It or Not, ACM Computing Surveys 49 (2)(2016) 1–41. doi:10.1145/2938640 .[36] A. Peldszus, M. Stede, From Argument Diagrams to Argumentation Min-ing in Texts, International Journal of Cognitive Informatics and NaturalIntelligence 7 (1) (2013) 1–31. doi:10.4018/jcini.2013010101 .[37] M. Lippi, P. Torroni, Argumentation Mining, ACM Transactions on In-ternet Technology 16 (2) (2016) 1–25. doi:10.1145/2850417 .[38] E. Cabrio, S. Villata, Five Years of Argument Mining: a Data-drivenAnalysis, in: Proceedings of the Twenty-Seventh International Joint Con-ference on Artiﬁcial Intelligence, International Joint Conferences on Ar-tiﬁcial Intelligence Organization, California, 2018, pp. 5427–5433. doi:10.24963/ijcai.2018/766 .[39] I. Habernal, I. Gurevych, Argumentation Mining in User-Generated WebDiscourse, Computational Linguistics 43 (1) (2017) 125–179. doi:10.1162/COLI{\_}a{\_}00276 .[40] R. M. Palau, M.-F. Moens, Argumentation mining, in: Proceedings of the12th International Conference on Artiﬁcial Intelligence and Law - ICAIL’09, ACM Press, New York, New York, USA, 2009, p. 98. doi:10.1145/1568234.1568246 .[41] Aristotle, G. A. Kennedy, On Rhetoric: A Theory of Civic Discourse(2006).[42] D. Walton, F. Macagno, A classiﬁcation system for argumentationschemes, Argument & Computation 6 (3) (2016) 219–245. doi:10.1080/19462166.2015.1123772 .[43] R. Whately, Elements of logic., Harper & Brothers, New York, USA, 1857.[44] M. C. Beardsley, Practical Logic, The Philosophical Quarterly doi:10.2307/2216487 .[45] S. E. Toulmin, The Uses of Argument, Cambridge University Press, 1958. doi:10.1080/00048405985200191 .46] T. Kuribayashi, P. Reisert, N. Inoue, K. Inui, Towards Exploiting Argu-mentative Context for Argumentative Relation Identiﬁcation, in: Proceed-ings of the 24th Annual Conference of the Society of Language Processing(March 2018), 2018, pp. 284–287.[47] A. Peldszus, M. Stede, Joint prediction in MST-style discourse pars-ing for argumentation mining, in: Proceedings of the 2015 Conferenceon Empirical Methods in Natural Language Processing, Association forComputational Linguistics, Stroudsburg, PA, USA, 2015, pp. 938–948. doi:10.18653/v1/D15-1110 .[48] J. B. Freeman, Dialectics and the macrostructure of arguments: a theoryof argument structure, Foris Publications, 1991.[49] H. Wachsmuth, J. Kiesel, B. Stein, Sentiment Flow - A General Modelof Web Review Argumentation, in: Proceedings of the 2015 Conferenceon Empirical Methods in Natural Language Processing, Lisbon, Portugal,2015, pp. 601–611.[50] W. C. Mann, Discourse structures for text generation, in: Proceedingsof the 10th international conference on Computational linguistics -, As-sociation for Computational Linguistics, Morristown, NJ, USA, 1984, pp.367–375. doi:10.3115/980431.980567 .[51] L. Carstens, F. Toni, Using Argumentation to Improve Classiﬁcation inNatural Language Problems, ACM Transactions on Internet Technology17 (3) (2017) 1–23. doi:10.1145/3017679 .[52] J. L. Pollock, Defeasible reasoning, Cognitive Science 11 (4) (1987) 481–518. doi:10.1016/S0364-0213(87)80017-4 .[53] P. M. Dung, On the acceptability of arguments and its fundamental rolein nonmonotonic reasoning, logic programming and n-person games, Ar-tiﬁcial Intelligence 77 (2) (1995) 321–357. doi:10.1016/0004-3702(94)00041-X .[54] P. Krause, S. Ambler, M. Elvang-Goransson, J. Fox, A Logic of Argu-mentation for Reasoning under Uncertainty, Computational Intelligence11 (1) (1995) 113–131. doi:10.1111/j.1467-8640.1995.tb00025.x .[55] J. L. Pollock, Defeasible reasoning with variable degrees of justiﬁca-tion, Artiﬁcial Intelligence 133 (1-2) (2001) 233–282. doi:10.1016/S0004-3702(01)00145-X .[56] S. D. Parsons, N. R. Jennings, Neogotiation Through Argumentation - APreliminary Report, in: 2nd Int. Conf. on Multi-Agent Systems, Japan,1996, pp. 267–274.57] N. Jennings, P. Faratin, A. Lomuscio, S. Parsons, M. Wooldridge,C. Sierra, Automated Negotiation: Prospects, Methods and Challenges,Group Decision and Negotiation 10 (2) (2001) 199–215. doi:10.1023/A:1008746126376 .[58] F. Grasso, A. Cawsey, R. Jones, Dialectical argumentation to solve con-ﬂicts in advice giving: a case study in the promotion of healthy nutrition,International Journal of Human-Computer Studies 53 (6) (2000) 1077–1115. doi:10.1006/IJHC.2000.0429 .[59] C. Stab, T. Miller, I. Gurevych, Cross-topic Argument Mining from Het-erogeneous Sources Using Attention-based Neural Networks, CoRR.[60] A. Addawood, J. Schneider, M. Bashir, Stance Classiﬁcation of TwitterDebates: The Encryption Debate as A Use Case, in: Proceedings of the8th International Conference on Social Media & Society, ACM Press, NewYork, New York, USA, 2017, pp. 1–10. doi:10.1145/3097286.3097288 .[61] T. Bosc, E. Cabrio, S. Villata, Tweeties Squabbling: Positive and NegativeResults in Applying Argument Mining on Social Media., in: Proceedings ofthe 6th International Conference on Computational Models of Argument,Potsdam, Germany, 2016, pp. 21–32.[62] K. Deturck, D. Nouvel, F. Segond, ERTIM@MC2: Diversiﬁed Argumen-tative Tweets Retrieval, in: CLEF MC2 2018 Lab Overview, Avignon,France, 2018, pp. 302–308.[63] M. Dusmanu, E. Cabrio, S. Villata, Argument Mining on Twitter: Ar-guments, Facts and Sources, in: Proceedings of the 2017 Conference onEmpirical Methods in Natural Language Processing, Copenhagen, Den-mark, 2017, pp. 2317–2322.[64] S. Sendi, C. Latiri, Opinion Argumentation based on Combined Informa-tion Retrieval and Topic Modeling., in: Working Notes of CLEF 2018 -Conference and Labs of the Forum, Avignon, France, 2018.[65] R. Dufour, R. Mickael, A. Delorme, D. Malinas, LIA@CLEF 2018: Min-ing Events Opinion Argumentation from Raw Unlabeled Twitter Datausing Convolutional Neural Network., in: Working Notes of CLEF 2018 -Conference and Labs of the Evaluation Forum, Avignon, France, 2018.[66] O. Cocarascu, F. Toni, Combining deep learning and argumentativereasoning for the analysis of social media textual content using smalldatasets, Computational Linguistics (2018) 1–37 doi:10.1162/coli{\_}a{\_}00338 .[67] W. Ma, W. Chao, Z. Luo, X. Jiang, Claim Retrieval in Twitter, in: WebInformation Systems Engineering – WISE 2018, Dubai, United Arab Emi-rates, 2018, pp. 297–307. doi:10.1007/978-3-030-02922-7{\_}20 .68] S. M. Mohammad, P. Sobhani, S. Kiritchenko, Stance and Sentiment inTweets, ACM Transactions on Internet Technology 17 (3) (2017) 1–23. doi:10.1145/3003433 .[69] G. Zarrella, A. Marsh, MITRE at SemEval-2016 Task 6: Transfer Learn-ing for Stance Detection, in: International Workshop on Semantic Evalu-ation (SemEval-2016), San Diego, California, 2016, p. 458–463.[70] P. Wei, J. Lin, W. Mao, Multi-Target Stance Detection via a DynamicMemory-Augmented Network, in: The 41st International ACM SIGIRConference on Research & Development in Information Retrieval - SIGIR’18, ACM Press, New York, New York, USA, 2018, pp. 1229–1232. doi:10.1145/3209978.3210145 .[71] J. Ebrahimi, D. Dou, D. Lowd, Weakly Supervised Tweet Stance Clas-siﬁcation by Relational Bootstrapping, in: Proceedings of the 2016 Con-ference on Empirical Methods in Natural Language Processing, Austin,Texas, 2016, p. 1012–1017.[72] M. Lai, V. Patti, G. Ruﬀo, P. Rosso, Stance Evolution and Twitter Inter-actions in an Italian Political Debate, in: NLDB 2018: Natural LanguageProcessing and Information Systems, Springer, Cham, Paris, France, 2018,pp. 15–27. doi:10.1007/978-3-319-91947-8{\_}2 .[73] K. Johnson, D. Goldwasser, Identifying Stance by Analyzing Political Dis-course on Twitter, in: Proceedings of the First Workshop on NLP andComputational Social Science, Association for Computational Linguistics,Stroudsburg, PA, USA, 2016, pp. 66–75. doi:10.18653/v1/W16-5609 .[74] L. Konstantinovskiy, O. Price, M. Babakar, A. Zubiaga, Towards Auto-mated Factchecking: Developing an Annotation Schema and Benchmarkfor Consistent Automated Claim Detection, in: EMNLP 2018: Confer-ence on Empirical Methods in Natural Language Processing, Brussels,Belgium, 2018.[75] J. Carletta, Assessing agreement on classiﬁcation tasks: the kappa statis-tic, Computational Linguistics 22 (2) (1996) 249–254.[76] K. krippendorﬀ, Measuring the Reliability of Qualitative Text Analy-sis Data, Quality & Quantity 38 (6) (2004) 787–800. doi:10.1007/s11135-004-8107-7 .[77] L. R. Dice, Measures of the Amount of Ecologic Association BetweenSpecies, Ecology 26 (3) (1945) 297–302. doi:10.2307/1932409 .[78] J. L. Fleiss, Measuring nominal scale agreement among many raters., Psy-chological Bulletin 76 (5) (1971) 378–382. doi:10.1037/h0031619 .79] T. Bosc, E. Cabrio, S. Villata, DART: a Dataset of Arguments and theirRelations on Twitter - Semantic Scholar, in: LREC, Portoroˇz, Slovenia,2016, pp. 1258–1263.[80] J. W. Pennebaker, M. E. Francis, Cognitive, Emotional, and LanguageProcesses in Disclosure, Cognition and Emotion 10 (6) (1996) 601–626. doi:10.1080/026999396380079 .[81] J. W. Pennebaker, Writing About Emotional Experiences as a Therapeu-tic Process, Psychological Science 8 (3) (1997) 162–166. doi:10.1111/j.1467-9280.1997.tb00403.x .[82] W. Guo, H. Li, H. Ji, M. Diab, Linking Tweets to News: A Frameworkto Enrich Short Text Data in Social Media, in: Proceedings of the 51stAnnual Meeting of the Association for Computational Linguists, Soﬁa,Bulgaria, 2013, p. 239–249.[83] S. Tan, Spot the lie: Detecting untruthful online opinion on twitter., Ph.D.thesis, Imperial College London (2017).[84] T. Goudas, C. Louizos, G. Petasis, V. Karkaletsis, Argument Extractionfrom News, Blogs, and Social Media, in: Artiﬁcial Intelligence: Methodsand Applications.SETN 2014., Springer, Cham, 2014, pp. 287–299. doi:10.1007/978-3-319-07064-3{\_}23 .[85] H. Roitman, S. Hummel, E. Rabinovich, B. Sznajder, N. Slonim, E. Aha-roni, On the Retrieval of Wikipedia Articles Containing Claims on Con-troversial Topics, in: Proceedings of the 25th International ConferenceCompanion on World Wide Web - WWW ’16 Companion, ACM Press,New York, New York, USA, 2016, pp. 991–996. doi:10.1145/2872518.2891115 .[86] J. Lawrence, M. Snaith, B. Konat, K. Budzynska, C. Reed, DebatingTechnology for Dialogical Argument, ACM Transactions on Internet Tech-nology 17 (3) (2017) 1–23. doi:10.1145/3007210 .[87] G. Morio, K. Fujita, Annotating Online Civic Discussion Threads for Ar-gument Mining, in: Proceedings of the IEEE/WIC/ACM InternationalConference on Web Intelligence 2018 (WI’18), IEEE, Santiago, Chile,2018, pp. 801–807. doi:10.1109/IIAI-AAI.2017.123 .[88] A. Galassi, M. Lippi, P. Torroni, Argumentative Link Prediction usingResidual Networks and Multi-Objective Learning, in: Proceedings of the5th Workshop on Argument Mining, Brussels, Belgium, 2018, pp. 1–10.[89] V. Eidelman, B. Grom, Argument Identiﬁcation in Public Comments fromeRulemaking (5 2019). doi:10.1145/3322640.3326714 .90] M. Lai, M. Tambuscio, V. Patti, G. Ruﬀo, P. Rosso, Extracting GraphTopological Information and Users’ Opinion, Springer, Cham, 2017, pp.112–118. doi:10.1007/978-3-319-65813-1{\_}10 .[91] D. Maynard, I. Roberts, M. A. Greenwood, D. Rout, K. Bontcheva, Aframework for real-time semantic social media analysis, Web Semantics:Science, Services and Agents on the World Wide Web 44 (2017) 75–88. doi:10.1016/J.WEBSEM.2017.05.002 .[92] K. Cortis, A. Freitas, T. Daudert, M. H¨urlimann, M. Zarrouk, S. Hand-schuh, B. Davis, SemEval-2017 Task 5: Fine-Grained Sentiment Analysison Financial Microblogs and News, in: Proceedings of the 11th Inter-national Workshop on Semantic Evaluations (SemEval-2017), Vancouver,Canad, 2017, pp. 519–535.[93] C. Schulz, S. Eger, J. Daxenberger, T. Kahse, I. Gurevych, Multi-Task Learning for Argumentation Mining in Low-Resource Settings, in:NAACL HLT 2018, 2018, pp. 35–41.[94] J. Lawrence, J. Park, K. Budzynska, C. Cardie, B. Konat, C. Reed, Us-ing Argumentative Structure to Interpret Debates in Online DeliberativeDemocracy and eRulemaking, ACM Transactions on Internet Technology17 (3) (2017) 1–22. doi:10.1145/3032989 .[95] M. Grˇcar, D. Cherepnalkoski, I. Mozetiˇc, P. Kralj Novak, Stance and in-ﬂuence of Twitter users regarding the Brexit referendum, ComputationalSocial Networks 4 (1) (2017) 6. doi:10.1186/s40649-017-0042-6 .[96] A. Lytos, T. Lagkas, P. Sarigiannidis, K. Bontcheva, Argumentation Min-ing: Exploiting Multiple Sources and Background Knowledge, in: 12thSouth East European Doctoral Student Conference DSC2018, 2018.[97] B. Liu, Sentiment Analysis and Opinion Mining, Morgan & Claypool Pub-lishers, 2012.[98] W. Wei, X. Zhang, X. Liu, W. Chen, T. Wang, pkudblab at SemEval-2016Task 6 : A Speciﬁc Convolutional Neural Network System for EﬀectiveStance Detection, in: Proceedings of SemEval-2016, San Diego, California,2016, pp. 384–388.[99] S. M. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, C. Cherry,SemEval-2016 Task 6: Detecting Stance in Tweets, in: InternationalWorkshop on Semantic Evaluation (SemEval-2016), San Diego, Califor-nia, 2016, pp. 31–41.[100] R. Rinott, L. Dankin, C. Alzate Perez, M. M. Khapra, E. Aharoni,N. Slonim, Show Me Your Evidence - an Automatic Method for ContextDependent Evidence Detection, in: Proceedings of the 2015 Conferenceon Empirical Methods in Natural Language Processing, Association foromputational Linguistics, Stroudsburg, PA, USA, 2015, pp. 440–450. doi:10.18653/v1/D15-1050 .[101] R. Levy, Y. Bilu, D. Hershcovich, E. Aharoni, N. Slonim, Context Depen-dent Claim Detection, in: COLING - International Committee on Com-putational Linguistics, Dublin, Ireland, 2014, pp. 1489–1500.[102] P. Stenetorp, S. Pyysalo, G. Topi, T. Ohta, S. Ananiadou, J. i. Tsujii,BRAT: a Web-based Tool for NLP-Assisted Text Annotation, in: Proceed-ings of the 13th Conference of the European Chapter of the Associationfor Computational Linguistics, Avignon, France, 2012, pp. 102–107.[103] P. Stenetorp, G. Topi´c, S. Pyysalo, T. Ohta, J.-D. Kim, J. Tsujii, BioNLPShared Task 2011: supporting resources, in: Proceedings of the BioNLPShared Task 2011 Workshop, Association for Computational Linguistics,Portland, Oregon, 2011, pp. 112–120.[104] R. Eckart De Castilho, E. Ujdricza-Maydt, S. M. Yimam, S. Hartmann,I. Gurevych, A. Frank, C. Biemann, A Web-based Tool for the IntegratedAnnotation of Semantic and Syntactic Structures, in: Proceedings of theWorkshop on Language Technology Resources and Tools for Digital Hu-manities (LT4DH), Osaka, Japan, 2016, pp. 76–84.[105] J. Sonntag, M. Stede, GraPAT: a Tool for Graph Annotations, in: Pro-ceedings of the Ninth International Conference on Language Resourcesand Evaluation (LREC-2014), Reykjavik, Iceland, 2014, pp. 4141–4151.[106] M. O’Donnell, RSTTool 2.4 – A Markup Tool for Rhetorical StructureTheory, in: Proceedings of the International Natural Language Genera-tion Conference (INLG’2000), Mitzpe Ramon, Israel, 2000, pp. 253 – 256.[107] H. Cunningham, V. Tablan, A. Roberts, K. Bontcheva, Getting MoreOut of Biomedical Documents with GATE’s Full Lifecycle Open SourceText Analytics, PLoS Computational Biology 9 (2) (2013) e1002854. doi:10.1371/journal.pcbi.1002854 .[108] K. Bontcheva, H. Cunningham, I. Roberts, A. Roberts, V. Tablan,N. Aswani, G. Gorrell, GATE Teamware: a web-based, collaborative textannotation framework, Language Resources and Evaluation 47 (4) (2013)1007–1029. doi:10.1007/s10579-013-9215-6 .[109] C. Kirschner, J. Eckle-Kohler, I. Gurevych, Linking the Thoughts: Analy-sis of Argumentation Structures in Scientiﬁc Publications, in: Proceedingsof the 2nd Workshop on Argumentation Mining, Denver, Colorado, 2015,pp. 1–11.[110] P. Saint-Dizier, Processing natural language arguments with the TextCoopplatform, Argument & Computation 3 (1) (2012) 49–82. doi:10.1080/19462166.2012.663539 .111] U. Eryiit, F. Samet, C. Etin, M. Yanık, T. Temel, TURKSENT: A Sen-timent Annotation Tool for Social Media, in: Proceedings of the 7th Lin-guistic Annotation Workshop & Interoperability with Discourse, Soﬁa,Bulgaria, 2013, pp. 131–134.[112] S. M. Yimam, I. Gurevych, R. Eckart De Castilho, C. Biemann, WebAnno:A Flexible, Web-based and Visually Supported System for Distributed An-notations, in: Proceedings of the 51st Annual Meeting of the Associationfor Computational Linguistics, Soﬁa, Bulgaria, 2013, pp. 1–6.[113] M. Lippi, P. Torroni, MARGOT: A web server for argumentation mining,Expert Systems with Applications 65 (2016) 292–303. doi:10.1016/J.ESWA.2016.08.050 .[114] Janier Mathilde, Lawrence John, Reed Chris, OVA+: An argument analy-sis interface, in: Proceedings of the 5th International Conference on Com-putational Models of Argument (COMMA’14)., 2014, p. 463–464.[115] Snaith Mark, Reed Chris, TOAST: Online ASPIC+ implementation, in:Proceedings of the Fourth International Conference on ComputationalModels of Argument (COMMA 2012), Vienna, Austria, 2012.[116] H. Wachsmuth, M. Potthast, K. Al Khatib, Y. Ajjour, J. Puschmann,J. Qu, J. Dorsch, V. Morari, J. Bevendorﬀ, B. Stein, Building an Argu-ment Search Engine for the Web, in: Proceedings of the 4th Workshop onArgument Mining, Association for Computational Linguistics, Strouds-burg, PA, USA, 2017, pp. 49–59. doi:10.18653/v1/W17-5106 .[117] C. Stab, J. Daxenberger, C. Stahlhut, T. Miller, B. Schiller, C. Tauch-mann, S. Eger, I. Gurevych, ArgumenText: Searching for Arguments inHeterogeneous Sources, in: Proceedings of the 2018 Conference of theNorth American Chapter of the Association for Computational Linguis-tics: Demonstrations, Association for Computational Linguistics, Strouds-burg, PA, USA, 2018, pp. 21–25. doi:10.18653/v1/N18-5005 .[118] T. van Gelder, The rationale for RationaleTM, Law, Probability and Risk6 (1-4) (2007) 23–42. doi:10.1093/lpr/mgm032 .[119] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, D. Mc-Closky, The Stanford CoreNLP Natural Language Processing Toolkit,in: Proceedings of 52nd Annual Meeting of the Association for Com-putational Linguistics: System Demonstrations, Association for Com-putational Linguistics, Baltimore, Maryland, 2014, pp. 55–60. doi:10.3115/v1/P14-5010 .[120] C. Reed, G. Rowe, Araucaria: Software For Argument Analysis, Diagram-ming And Representation, International Journal on Artiﬁcial IntelligenceTools 13 (04) (2004) 961–979. doi:10.1142/S0218213004001922 .121] M. Snaith, J. Lawrence, C. Reed, Mixed Initiative Argument in PublicDeliberation, in: International Conference on Online Deliberation, Leeds,UK, 2010, pp. 2–13.[122] C. Reed, S. Wells, Dialogical Argument as an Interface to Complex De-bates, IEEE Intelligent Systems 22 (6) (2007) 60–65. doi:10.1109/MIS.2007.106 .[123] C. Reed, K. Budzynska, R. Duthie, M. Janier, B. Konat, J. Lawrence,A. Pease, M. Snaith, The Argument Web: an Online Ecosystem of Tools,Systems and Services for Argumentation, Philosophy & Technology 30 (2)(2017) 137–160. doi:10.1007/s13347-017-0260-8 .[124] F. Bex, M. Snaith, J. Lawrence, C. Reed, ArguBlogging: An applicationfor the Argument Web, Web Semantics: Science, Services and Agents onthe World Wide Web 25 (2014) 9–15. doi:10.1016/J.WEBSEM.2014.02.002 .[125] M. Lippi, P. Torroni, Context-independent claim detection for argumentmining, in: Proceedings of the 24th International Conference on Artiﬁ-cial Intelligence, AAAI Press = The Association for the Advancement ofArtiﬁcial Intelligence Press, Buenos Aires, Argentina, 2015, pp. 185–191.[126] P. Sbarski, T. van Gelder, K. Marriott, D. Prager, A. Bulka, Visu-alizing Argument Structure, in: International Symposium on VisualComputing 2008: Visualizing Argument Structure, Springer, Berlin,Heidelberg, Las Vegas, NV, USA, 2008, pp. 129–138. doi:10.1007/978-3-540-89639-5{\_}13 .[127] K. Marriott, P. Sbarski, T. van Gelder, D. Prager, A. Bulka, Hi-Treesand Their Layout, IEEE Transactions on Visualization and ComputerGraphics 17 (3) (2011) 290–304. doi:10.1109/TVCG.2010.45 .[128] T. V. Gelder, Learning to reason: a Reason!-Able approach, in: Cog-nitive Science in Australia, 2000: Proceedings of the Fifth AustralasianCognitive Science Society Conference, Adelaide, 2000.[129] K. Al-Khatib, H. Wachsmuth, M. Hagen, J. K¨ohler, B. Stein, Cross-Domain Mining of Argumentative Text through Distant Supervision, in:Proceedings of NAACL-HLT 2016, San Diego, California, 2016, pp. 1395–1404.[130] H. Wachsmuth, B. Stein, A Universal Model for Discourse-Level Argumen-tation Analysis, ACM Transactions on Internet Technology 17 (3) (2017)1–24. doi:10.1145/2957757 .[131] M. Karimi, D. Jannach, M. Jugovac, News recommender systems – Surveyand roads ahead, Information Processing & Management 54 (6) (2018)1203–1227. doi:10.1016/J.IPM.2018.04.008 .132] F. Boltuˇzi´c, J. ˇSnajder, Identifying Prominent Arguments in Online De-bates Using Semantic Textual Similarity, in: 2nd Workshop on Argumen-tation Mining (ARG-MINING 2015), Denver, Colorado, USA, 2015, pp.110–115.[133] X. Duan, M. Liao, X. Zhao, W. Wu, P. Lv, An Unsupervised JointModel for Claim Detection, in: International Conference on Cogni-tive Systems and Signal Processing: Cognitive Systems and SignalProcessing, Springer, Singapore, 2019, pp. 197–209. doi:10.1007/978-981-13-7983-3{\_}18 .[134] E. Shnarch, C. Alzate, L. Dankin, M. Gleize, Y. Hou, L. Choshen,R. Aharonov, N. Slonim, Will it Blend? Blending Weak and Strong La-beled Data in a Neural Network for Argumentation Mining, in: Proceed-ings of the 56th Annual Meeting of the Association for ComputationalLinguistics (Volume 2: Short Papers), Vol. 2, Melbourne, Australia, 2018,pp. 599–605.[135] N. F. F. D. Silva, L. F. S. Coletta, E. R. Hruschka, A Survey and Compar-ative Study of Tweet Sentiment Analysis via Semi-Supervised Learning,ACM Computing Surveys 49 (1) (2016) 1–26. doi:10.1145/2932708 .[136] M.-Y. Day, Y.-D. Lin, Deep Learning for Sentiment Analysis on GooglePlay Consumer Review, in: 2017 IEEE International Conference on Infor-mation Reuse and Integration (IRI), IEEE, San Diego, CA, USA, 2017,pp. 382–388. doi:10.1109/IRI.2017.79doi:10.1109/IRI.2017.79