[PDF] Linear Temporal Public Announcement Logic: a new perspective for reasoning the knowledge of multi-classifiers

Abstract

Current applied intelligent systems have crucial shortcomings either in reasoning the gathered knowledge, or representation of comprehensive integrated information. To address these limitations, we develop a formal transition system which is applied to the common artificial intelligence (AI) systems, to reason about the findings. The developed model was created by combining the Public Announcement Logic (PAL) and the Linear Temporal Logic (LTL), which will be done to analyze both single-framed data and the following time-series data. To do this, first, the achieved knowledge by an AI-based system (i.e., classifiers) for an individual time-framed data, will be taken, and then, it would be modeled by a PAL. This leads to developing a unified representation of knowledge, and the smoothness in the integration of the gathered and external experiences. Therefore, the model could receive the classifier's predefined -- or any external -- knowledge, to assemble them in a unified manner. Alongside the PAL, all the timed knowledge changes will be modeled, using a temporal logic transition system. Later, following by the translation of natural language questions into the temporal formulas, the satisfaction leads the model to answer that question. This interpretation integrates the information of the recognized input data, rules, and knowledge. Finally, we suggest a mechanism to reduce the investigated paths for the performance improvements, which results in a partial correction for an object-detection system.

Full PDF

aa r X i v : . [ c s . A I] S e p L INEAR T EMPORAL P UB LIC A NNOUNCEMENT L OGIC : A NEW PER SPECTIVE FOR R EASONING THE KNOWLEDGE OFMULTI - CLASSIFIERS

Amirhoshang Hoseinpour Dehkordi

School of Computer ScienceInstitute for Research in Fundamental Sciences,Tehran, 19538 - 33511, Iran [email protected]

Majid Alizadeh ∗ School of Mathematics, Statistics and Computer ScienceCollege of Science, University of Tehran,Tehran, 14155-6455, Iran [email protected]

Ali Movaghar

Department of Computer Engineering,Sharif University of Technology,Tehran, 11155-9517, Iran [email protected]

September 10, 2020 A BSTRACT

Current applied intelligent systems have crucial shortcomings either in reasoning the gathered knowl-edge, or representation of comprehensive integrated information. To address these limitations, wedevelop a formal transition system which is applied to the common artiﬁcial intelligence (AI) sys-tems, to reason about the ﬁndings. The developed model was created by combining the PublicAnnouncement Logic (PAL) and the Linear Temporal Logic (LTL), which will be done to analyzeboth single-framed data and the following time-series data. To do this, ﬁrst, the achieved knowledgeby an AI-based system (i.e., classiﬁers ) for an individual time-framed data, will be taken, and then,it would be modeled by a PAL. This leads to developing a uniﬁed representation of knowledge, andthe smoothness in the integration of the gathered and external experiences. Therefore, the modelcould receive the classiﬁer’s predeﬁned -or any external- knowledge, to assemble them in a uniﬁedmanner. Alongside the PAL, all the timed knowledge changes will be modeled, using a temporallogic transition system. Later, following by the translation of natural language questions into thetemporal formulas, the satisfaction leads the model to answer that question. This interpretation in-tegrates the information of the recognized input data, rules, and knowledge. Finally, we suggest amechanism to reduce the investigated paths for the performance improvements, which results in apartial correction for an object-detection system.

Nowadays, artiﬁcial intelligence (AI) is inseparable from real-world applications. Besides, the classiﬁcation, whichis considered as the classic application of AI-based algorithms, the cognitive understanding is also gained signiﬁcantattention more recently. The current works on visual understandings of both image (see [1], [2], and [3]), and video(see [4], [5], and [6]) show the importance of data understandings. Although to date developed approaches get more ac-curacy via proposing new approaches, there is no certain approach formed to solve such problems, entirely. Moreover,due to the probabilistic nature of most of these approaches, it seems that it is impossible to attain complete accuracyin the near future. ∗ Corresponding author

PREPRINT - S

EPTEMBER

10, 2020To be more precise, followed by the accessing of current works, the process of data understanding could be dividedinto the following steps: the ﬁrst and the most popular one is the object recognition from the input data, for whichthere are many solutions, with fairly great performances [7]. The next step is the knowledge extraction, which is away to understand the knowledge acquired in the initial step. Herein, objects will be transformed into symbols, inwhich they could be shared and aggregated in a modal logic approach [8]. Following that, for a reasoning system, itis mandatory to interpret guaranteed rules which could be fed into the system, using predeﬁned protocols (i.e., “cat isan animal”). The ﬁnal step is to collect a set of knowledge that would lead a model to answer ordinary questions. Thelast three steps will be discussed further in this study.

Although comprehensive deﬁnition of AI has always been a challenge between scientists, the reasoning is one of themost mentioned and agreed dimensions of AI (see [9], [10], [11], [12]). This dimension of AI is trended in most scienceﬁelds, due to the availability of big and high-quality datasets [13], fast progress of natural language, image, and videoprocessing systems (which could be assumed video as an ordered set of images [14], [15]). Furthermore, it motivatesthe most recent works in this area to target Visual data understanding and question answering [16], and natural languagequeries [17], [18]. For instance, a question-answering (QA) reasoning system is developed by [19] which was acombination of methods developed in previous works in QA reasoning systems. Further, they introduced an algorithmto select the best paths of commonsense knowledge, in order to get the whole inference required for QA. Although,the method works well for the speciﬁc applications, yet, it is not a general solution for the QA problem. Following thismethod, and together with systematic analysis of popular knowledge resources and the knowledge-integration methods,[20], stated a modeling solution “non-extractive commonsense QA”, which is more accurate but still does not cover allsuch problems. More recently, the “CoLlision Events for Video REpresentation and Reasoning” (CLEVRER) methodwas developed by [16], which is a reasoning system in video streams based on human intelligence. In this research,unlike most previous studies, “causal structures” are taken into account. Also, this model could answer four mainvarieties of questions, which are descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”). Hereabouts, the method for the extraction of features fromthe video frames is ResNet-50 [21], which is a well-known object detection method. This method (CLEVRER) is oneof the most accepted methods in such a ﬁeld. Due to the characteristics of the reasoning systems together with thepower of the temporal logic in formalizing natural language into formulas, this kind of model gets more interested inreasoning systems [22]. In a new perspective, the Metric Temporal Logic (MTL), which is an extension of the LinearTemporal Logic (LTL), was studied in [23] to handle stochastic state information. Considering the modal logic in suchproblems would lead to a new way of problem-solving because this approach seems suitable for such problems.

This study aims to provide a ﬂexible reasoning system, which could render the knowledge achieved by AI-based sys-tems. Hither, we are going to develop a model to be applied to the existed classiﬁers, to translate the informationattained from the results. Besides, our model could handle multi-knowledge ﬂow in a multi-classiﬁer scenario. More-over, in the developed model, the obtained knowledge by classiﬁers will be aggregated in a deﬁned and uniﬁed format.Accordingly, the analysis would be performed considering each scenario of the collaboration for such classiﬁers (i.e.,the investigation of knowledge of each agent, a group of agents, knowledge which is distributed between agents, etc.).Herein, the veriﬁcation of the investigated formulas will be proposed for such a statement using the deﬁnition of “ver-iﬁed formula” which is provided by [8]. Furthermore, a human language to the deﬁned temporal formula translationapproach is addressed, to get a more expressive model, and also to conﬁrm the power of the adaption of such a modelin real-world applications. Finally, a strategy for determining the probability of each state-transition is also suggested.These probabilities assist us to apply the method in a more optimized fashion. This ﬁnal adjustment directs us to a“correction for time-series data” strategy for time-series object detection algorithms.

In this section, we are going to deﬁne the problem that we will be solved in this study. There are many deﬁnitionsfor intelligent agents. One of the most common out of all is to develop human-like reasoning from input data. Thereare many approaches in which objects, referred to as knowledge, could be extracted from input data. But still, thereis a lack of a generalized system to model to infer from gained knowledge. To reach that kind of intelligence from aclassiﬁer, we developed a model in the following steps:1. Assume the classiﬁer as an artiﬁcial neural network (ANN), which is the most common way for classifyingapproaches, this kind of classiﬁer could not guarantee full accuracy. These classiﬁers are based on statistical2

PREPRINT - S

EPTEMBER

10, 2020methods, and the accuracy is based on the architecture and training set. Even by the assumption of goodarchitecture, if a new input data fed into an ANN, the network could not guarantee the correct output. Toovercome this problem, [24] presented an approach to verify predeﬁned properties in such models. Thismethod leads these models to verify properties for each input. In the case that the property could not beveriﬁed, [8] developed a multi-agent epistemic logic model to ﬁnd all possible outcomes concerning thatproperty, by that multiple classiﬁers. By applying this method, the outcome of the classiﬁer would be a set ofpossible outputs(i.e., possible worlds).2. Generally, the set of knowledge is directly driven by classiﬁers. This set of knowledge contains the classes offound objects, yet from a real-world perspective, each detected object would contain more information (i.e.,ontology rules). To reﬂect this, an intelligent agent should understand the category of the object together withsub-categories which are fed in as input rules. In this model, for each possible class of object, predeﬁnedinferences would be extracted in a uniﬁed and formal manner.3. The previously developed model was exclusively designed for single-framed data, therefor the time-seriesdata could not be considered. To bring them into consideration, we developed a combination of temporallogic transition systems [25] with the developed epistemic logic model. This leads us to extract all possiblesequences (named execution path) of knowledge represented in time-series data.4. Here, input-output would be in question-answering format. Accordingly, to get an outcome from the devel-oped model, the model should provide an answer to respecting questions that are asked. To do this, ﬁrst, allpossible answers would be extracted due to the asked question. Then, using preexisted approaches, naturallanguage statements will be translated into temporal formulas. Finally, each formula of every possible answerwould be investigated in the developed temporal model and the satisfying formula reﬂects the answers to thequestion.5. The veriﬁcation in this transition system is also deﬁned by modifying the veriﬁcation deﬁnition in [24] and[8]. Accordingly, the situation of the answer would be determined (whether it is a veriﬁed answer, possibleanswer, veriﬁed for a single classiﬁer or it needs some information to get a possible/veriﬁed answer).6. The space state explosion, and following that, numerous execution paths could make the proposed methodvery time and space consuming in large models. To overcome that difﬁculty, an approach is developed, toﬁnd the most probable path and investigate the satisfaction on that path. This method for ﬁnding the mostprobable path could be lead us to a data-stream correction (if there are misclassiﬁcations in small fractions ofsuch data), in semi-continuous data-streams.Here, we are going to do the mentioned steps as an example of a video stream and classiﬁers. In this case, classiﬁerscould determine objects appearing during the video, following by extracting all possible outcomes for each imageknown as data-frame. Then, by collecting all input rules of objects (i.e., an input rule could be “elephants” are“animals” so, in each possible situation an “elephant” detected, we infer that an “animal” is also detected) each possibleoutcome will be enhanced for each image in the video. Next, by placing the model of all possible situations for imagesin a hierarchical sequencing the transition system would be made. Consequently, we could investigate the satisfactionof formulas that are created by the translation of questions. After that, for the sake of the performance, we willcalculate the probability of each transition relation, following that, ﬁnding the most probable execution path. In ﬁg 1,a model for two-framed data is illustrated, in which the most probable path is w , w , w , w , . This path could leadus to correct the misclassiﬁcations of classiﬁers. In this section, we are going to introduce a logical model for interpreting single-framed data. Following each step ofthe development of the model, an application of such a model was represented. The epistemic model is based on anextension of PAL that is mentioned in [8]. It will be applied to extract knowledge from a single-framed data (i.e., animage).Let us, ﬁrst introduce the syntax and semantics of the logic. The syntax of the language PAL is as follows, in BNF: φ ::= p | ¬ φ | ( φ ∧ φ ) | K i φ | D A φ | [ φ ] φ. Where p as a propositional variable (atomic formula) is a pair of the form ( x, c ) , in which x denotes the input data and c denotes the target class. We also notice that K i φ read as “i-th classiﬁer knows φ ”. In other words, the i-th classiﬁer wasassured about the truth value of φ , so φ could be named a “robust” knowledge for the classiﬁer i . Followed that, D A φ read as “ φ is a distributed knowledge in a group A of classiﬁers”. The formula D A φ holds exactly when, aggregationof knowledge of agents in the group A , satisﬁes φ . Here, the formula φ will be named a “robust” knowledge within3 PREPRINT - S

EPTEMBER

10, 2020 w , w , w , w , w , w , w , . . . . . . . . . . . Figure 1: A combination of transition system and epistemic model for two-framed data. The probability of eachtransition is also determined. The most probable execution path out of all 36 possible ones is w , w , w , w , M = ( W , R , V ) , W = { w , , w , , w , } , R = { ( w , , w , ) , ( w , , w , ) , ( w , , w , ) } ; M = ( W , R , V ) , W = { w , , w , } , R = { ( w , , w , ) } ; T S = (

S, R, s , s − , → , L ) ,S = w , , ∪ W ∪ W ∪ w , , s = w , , s − = w , ,R = R ∪ R , → = { ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) } ;group A . The robustness of formula φ ensures that the formula is veriﬁed by those classiﬁers [8]. Finally, the formula [ ψ ] φ read as “after a correct announcement of ψ , φ will hold”. By employing this operator, it could investigate that,which knowledge is signiﬁcantly missing. In other words, by adding which information, the system could satisfy theinspecting formulas.A PAL Kripke model is a tuple M = ( W, R , . . . , R n , V ) , where W = { w , . . . , w k } is a set of worlds (hereaboutsthe set W is a representative for all possible output results of the input data), R i ⊆ W × W is an equivalent relationbetween worlds for each classiﬁer i in { , . . . , n } . The intended meaning of sR i s ′ relation is, the worlds s and s ′ cannot be distinguished by the i -th classiﬁer. And ﬁnally, V : W −→ P rop is the evaluation function speciﬁed thatthe knowledge represented in any world, where

P rop is the set of all atoms. We extend the evaluation function V toall formulas as follows: • M , w | = p iff p ∈ V ( w ) , • M , w | = ¬ φ iff M , w ϕ , • M , w | = φ ∧ ψ iff M , w | = φ and M , w | = ψ , • M , w | = K i φ iff ∀ v ∈ R i ( w ) , M , v | = φ , • M , w | = D A φ iff ∀ v ∈ R A ( w ) , M , v | = φ , where R A := T i ∈ A R i . • M , w | = [ ψ ] φ iff M , w | = ψ implies M ψ , w | = φ .The intended meaning of satisfaction of an atomic formula ( x, c ) in a world w is “class c has appeared as respectiveclassiﬁer’s output class for the input data x ”. We also mention that for every formula φ and ψ , φ (cid:15) ψ means that forevery model M , and every point w ∈ M , M, w (cid:15) φ implies that M, w (cid:15) ψ .In the following example, we applied the model on an image as an input. Example. (Representation of an Image) Assume that, we have an image as input, with some objects on the picture.The classiﬁers are trained by the same sets of objects as an output class. Moreover, it is assumed that the classiﬁers aremulti-object detectors, which means they could detect more than one object class in the output if they exist. The nextassumption in this section is, the rules of relations between objects and their features are given (i.e. dog is an animal4

PREPRINT - S

EPTEMBER

10, 2020and has two eyes, four legs, etc.). There are many approaches for this purpose, for example, the rules or categoriescould be derived from object tags in a natural language processing (NLP) model (see [26], [27]).

Remark:

1. (Sub-features of a result class) Hereby, for each image, we have a Kripke model, in which, each wordcontains possible outcomes from the image, which are detected by classiﬁers. So, it can be written

M, w (cid:15) ( x, c ) ∧· · ·∧ ( x, c k ) , P ( w ) = { ( x, c ) ∧· · ·∧ ( x, c k ) } . Moreover, each output class has some other features that could be drivenfrom it. For instance, a human knows that “cats” are animals, they have two eyes, four legs, etc. To be more formal,for each output class ( x, c i ) , it could be written ( x, c i ) (cid:15) ( x, c i ) ∧ · · · ∧ ( x, c i m ) , in which ( x, c i j ) is a concluded asa feature of ( x, c i ) . Note that, it could be even more nested, in other words, ( x, c i j ) (cid:15) ( x, c i j ) ∧ · · · ∧ ( x, c i jl ) and soon.2. (Veriﬁcation of a sub-features) Here, if there is only one possible world for the input image, the outcome is“robust” and this means that the output is veriﬁed concerning the predeﬁned property [8]. In the case that thereis more than one possible world represented for the image, it is also possible to ﬁnd some other robust features.To introduce this kind of robustness, assume that two possible worlds w, w ′ are possible ones for the input image.The world M, w (cid:15) ( x, “ Chair ′′ ) ∧ ( x, “ Cat ′′ ) and M, w ′ (cid:15) ( x, “ Chair ′′ ) ∧ ( x, “ Dog ′′ ) , we could say that M (cid:15) ( x, “ Chair ′′ ) ∧ ( x, “ Animal ′′ ) it represents that the image robust for “ Chair ′′ and “ Animal ′′ . It is derived fromthe fact ( x, “ Dog ′′ ) (cid:15) ( x, “ Animal ′′ ) , ( x, “ Cat ′′ ) (cid:15) ( x, “ Animal ′′ ) . This kind of veriﬁcation will be more usefulin the process of reasoning about questions. For illustration, assume that the question asks information about existing“animals” in the image and doesn’t care which animal showed in the image. So we could provide the veriﬁed answerabout existing animals. As an illustration, algorithm 1 developed to aggregate all available knowledge, includingthe classiﬁer’s knowledge and sub-features. The output of this algorithm would be a Kripke model. Moreover, inalgorithm 2 satisfaction of PAL formulas will be determined. Algorithm 1

The Single-Framed Knowledge Extraction (SFKE) function shall collect the knowledge, produced byclassiﬁers, together with the aggregation of external subset rules, we also use MASKS function developed in [8].Let C as the set of classiﬁers, ζ as the set of rules, x as the single framed input data, and η as the deﬁned conﬁdentneighborhood function SFKE( C , ζ, x , η ) ⊲ V could be driven from the model M R , M := MASKS ( C , x , η ) for all possible world w ∈ M do for all atomic formula p ∈ w do Add all sub rules of p deﬁned in ζ to V ( w ) . return M Algorithm 2

The PAL Satisfaction function (PALS) shall investigate the satisfaction of PAL formulas.Let φ as the PAL formula, M as the Kripke model, and w as a world function PALS( φ, M , w ) ⊲ R i and V could be driven from the model M if φ is an atomic formula then return φ ∈ V ( w ) ⊲ This will be true if φ ∈ V ( w ) and false otherwise if φ is in form of ¬ ψ then return ¬ PALS( ψ, M , w ) if φ is in form of ψ ∧ ψ then return PALS( ψ , M , w ) ∧ PALS( ψ , M , w ) if φ is in form of K i ψ then return ∧ w ′ PALS( ψ, M , w ′ ); ∀ w ′ ∈ R i ( w ) if φ is in form of D A ψ then R A := T i ∈ A R i return ∧ w ′ PALS( ψ, M , w ′ ); ∀ w ′ ∈ R A ( w ) return false 5 PREPRINT - S

EPTEMBER

10, 2020

In this section to extract all possible sequences, named execution paths, of knowledge represented in time-series data,we introduce an LTL extension of PAL by the following grammar, in BNF: φ ::= p | ¬ φ | ( φ ∧ φ ) | K j φ | D A φ | [ φ ] φ Φ ::= φ | ( ¬ Φ) | (Φ ∧ Φ) | X Φ | [Φ U Φ] Here, the temporal operators are X Φ (in the ne X t data-frame Φ must be true) and [Φ U Ψ] ( Ψ must remain true U ntil Φ becomes true). To deﬁne the Kripke semantic of this logic, assume that M = ( { w } , R , . . . , R k , V ) ,M = ( W , R , . . . , R k , V ) , . . ., M n − = ( W n − , R ( n − , . . . , R ( n − k , V n − ) , M n = ( { w n } , R n , . . . , R n k , V n ) are PAL models, where W i ’s are mutually disjoint sets, and V ( w ) = V n ( w ) = ∅ .We build a new model T S = (

S, R, s , s − , → , L ) , known as a transition system, in which S = S ni =0 W i is the set ofstates, R = { R i j | ≤ i ≤ n, ≤ j ≤ k } , s = w and s − = w n are initial and ﬁnal states, → = W i × W i +1 , ≤ i < n are transition relations, and the labeling function L : S −→ P rop which assigning propositional letters to statesis deﬁned by L ( w i j ) = V i ( w i j ) , for each w i j ∈ W i .An executive path π i...j T S in the model

T S is a sequence of worlds w i w i +1 . . . w j where w i → w i +1 → · · · → w j for w k ∈ W k , i ≤ k ≤ j . An executive path π i...j T S is called total and denoted by π T S if w i = s and w j = s − . The setof all total executive paths will be denoted by Π T S . Next, we introduce the semantics for

LT P AL as follows: • T S , π i...j

T S (cid:15) φ iff T S , w i (cid:15) φ , • T S , π i...j T S (cid:15) X Φ iff T S , π i +1 ...j T S (cid:15) Φ , • T S , π i...j T S (cid:15) [Φ U Φ ] iff there exist m , i ≤ m ≤ j , T S , π m...j

T S (cid:15) Φ , and for all k , i ≤ k < j, we have T S , π i...k

T S (cid:15) Φ .We notice that in the ﬁrst clause above we look at the transition system T S over PAL formulas simply as a Kripkemodel. Moreover, algorithm 3 is developed to create a transition system from PAL models.By the function TEMS developed in algorithm 4, temporal formulas could be investigated. Other temporal operatorscould be driven from the two operators next ( X Φ ), and until ( U Φ ) in the following way: • F Φ ≡ [ ⊤ U Φ] • [Φ R Ψ] ≡ ¬ [ ¬ Φ U Ψ] • [Φ W Ψ] ≡ [Ψ R (Φ ∨ Ψ)] • G Φ ≡ [ ⊥ R Φ] The intended meaning of future ( F Φ ) is “eventually Φ becomes true”, global ( G Φ ) is “ Φ must remain true forever”, release ( [Φ R Ψ]) ) is “ Ψ remains true until and including when Φ becomes true, if Φ never becomes true, Ψ alwaysremain true”, and weak until ( [Φ W Ψ] ) is “ Φ has to remain true at least until Ψ ; if Ψ never holds, Φ must alwaysremains true” [28], [29]. Easy interpretation of human language in modal logics (especially in LTL) is one of the most important strengths ofsuch logics (see [30],[31], and [32]). This kind of interpretation could help robots to react to human orders [33]. Inorder to convert the question into a formula, ﬁrst of all, we need to extract all possible answers to the question. Thenafter converting each possible answer to LTL formula using the existing -and abovementioned- approaches we willinvestigate the satisfactory of each answer. The developed model would lead us a PAL modiﬁcations on such LTLformulas. To explain that let Φ( p , . . . , p σ ) be an LTL translated formula and φ i , ≤ i ≤ σ is a PAL formula. Then Φ( φ , . . . , φ σ ) is obtained from φ by substituting each of p i with φ i , for all ≤ i ≤ σ , respectively. For instance,for Φ( p , p ) = G [ p U Xp ] , φ = ¬ D A ¬ p , and φ = K i p , we have Φ( φ , φ ) = G [( ¬ D A ¬ p ) U h X ( K i p ) i ] ,which, for the transition system T S and the investigated execution path π T S , means that “ p should always be apossible answer until in the world which right after, p is a robust knowledge for the i-th agent”. By deﬁning a way for6 PREPRINT - S

EPTEMBER

10, 2020

Algorithm 3

The Time-Series Transition System (TSTS) function shall create the transition system by the given inputinformation.Let C as the set of classiﬁers, ζ as the set of rules, X as the time-series input data of size k , and η as the predeﬁnedconﬁdent neighborhood function TSTS( C , ζ, X, η ) ⊲ M = ( W, R , . . . , R n , V ) S := { w } W ′ := { w } R , . . . , R n := { ( w , w ) } s := w s − := w → := ∅ L ( w ) := ∅ T S := (

S, R , . . . , R n , s , s − , → , L ) for all x in X do M := SFKE( C , ζ, x , η ) ⊲ Kripke model M = ( W, R ′ , . . . , R ′ n , V ) S := S ∪ W R , . . . , R n := R , . . . , R n ∪ R ′ , . . . , R ′ n for all w ∈ W and w ′ ∈ W ′ do → := → ∪ { ( w ′ , w ) } W ′ := W W := { w k +1 } S := S ∪ W R , . . . , R n := R , . . . , R n ∪ { ( w k +1 , w k +1 ) } s − := w k +1 for all w ∈ W and w ′ ∈ W ′ do → := → ∪ { ( w ′ , w ) } return T S

Algorithm 4

The TEMporal Satisfaction function (TEMS) function shall investigate the satisfaction of LTL formulas.Let Φ as the LTL formula, T S as the transition system, and w as a world function TEMS( Φ , T S , π i...jT S ) ⊲ → could be driven from the model T S if i > j then return false if Φ is a PAL formula then return PALS( Φ , M , w i ) ⊲ let w i as ﬁrst world in path π i...jT S if Φ is in form of ¬ Ψ then return ¬ TEMS( Ψ , T S , π i...jT S ) if Φ is in form of Ψ ∧ Ψ then return TEMS( Ψ , T S , π i...jT S ) ∧ TEMS( Ψ , T S , π i...jT S ) if Φ is in form of X Ψ then return TEMS( Ψ , T S , π i +1 ...jT S ) if Φ is in form of [Ψ U Ψ ] then if TEMS( Ψ , T S , π i...jT S ) then return true while TEMS( Ψ , T S , π i...jT S ) do i := i + 1 if TEMS( Ψ , T S , π i...jT S ) then return true return false 7 PREPRINT - S

EPTEMBER

10, 2020introducing robustness and possibility for a classiﬁer, together with the deﬁnition of veriﬁed and possibility concerninga group of classiﬁers, answers could be labeled to assure the questioner, “how reliable is this answer”. This wouldlead us to design a more proper system for critical applications. Moreover, by capturing missing knowledge, it couldbe caught that which knowledge is mandatory for the system which is not discovered by classiﬁers or external, andwhich part of the system could be manipulated to obtain the missing information. In the following, we would like tointroduce when a formula is veriﬁed, possible, robust or it is missing information in our model:1. The formula Φ( p , . . . , p σ ) is veriﬁed for the group A of classiﬁers exactly when, for all π T S ∈ Π T S wehave

T S, π

T S (cid:15) Φ( D A p , . . . , D A p σ ) .2. The formula Φ( p , . . . , p σ ) is a possible scenario for the group A of classiﬁers exactly when, there exists π T S ∈ Π T S , in which we have

T S, π

T S (cid:15) Φ( ¬ D A ¬ p , . . . , ¬ D A ¬ p σ ) .3. The formula Φ( p , . . . , p σ ) is robust with i-th agent’s perspective exactly when, for all π T S ∈ Π T S , we have

T S, π

T S (cid:15) Φ( K i p , . . . , K i p σ ) .4. The formula Φ( p , . . . , p σ ) is possible with i-th agent’s perspective exactly when, there exists π T S ∈ Π T S ,in which we have

T S, π

T S (cid:15) Φ( ¬ K i ¬ p , . . . , ¬ K i ¬ p σ ) .5. A PAL formula ψ is called veriﬁed-missing information for a formula Φ( p , . . . , p σ ) in group A of classiﬁersexactly when, there exists π T S ∈ Π T S , in which we have

T S, π

T S Φ( D A p , . . . , D A p σ ) and for all π T S ∈ Π T S , we have

T S, π

T S (cid:15)

Φ([ ψ ] h D A p i , . . . , [ ψ ] h D A p σ i ) .6. A PAL formula ψ is called possible-missing information for a formula Φ( p , . . . , p σ ) in group A of classiﬁersexactly when, for all π T S ∈ Π T S , we have

T S, π

T S Φ( ¬ D A ¬ p , . . . , ¬ D A ¬ p σ ) and there exists π T S ∈ Π T S , in which we have

T S, π

T S (cid:15)

Φ([ ψ ] h¬ D A ¬ p i , . . . , [ ψ ] h¬ D A ¬ p σ i ) .7. A PAL formula ψ is called veriﬁed-missing information for a formula Φ( p , . . . , p σ ) of i-th classiﬁers exactlywhen, there exists π T S ∈ Π T S , in which we have

T S, π

T S Φ( K i p , . . . , K i p σ ) and for all π T S ∈ Π T S ,we have

T S, π

T S (cid:15)

Φ([ ψ ] h K i p i , . . . , [ ψ ] h K i p σ i ) .8. A PAL formula ψ is called possible-missing information for a formula Φ( p , . . . , p σ ) of i-th classiﬁers exactlywhen, for all π T S ∈ Π T S , we have

T S, π

T S Φ( ¬ K i ¬ p , . . . , ¬ K i ¬ p σ ) and there exists π T S ∈ Π T S , inwhich we have

T S, π

T S (cid:15)

Φ([ ψ ] h¬ K i ¬ p i , . . . , [ ψ ] h¬ K i ¬ p σ i ) . Example. (Representation of a Video) In the previous chapter, a PAL logic for representing image data was il-lustrated. In this example, assume that the given ﬁnite video stream contains n − images, and models of epis-temic information extraction for these images are assumed to be M = ( W , R , . . . , R k , V ) , . . . , M n − =( W n − , R ( n − , . . . , R ( n − k , V n − ) . By adding two dummy models M and M n the transition model T S =( S, R, s , s − , → , L ) could be created.For clariﬁcation, we will describe a real-world situation: assume that, there is a medical clean-room in an Operatingtheater, which is monitored by a camera. The recorded video will be fed into a set of classiﬁers A , which are fairlyaccurate. The room will be cleaned with an air conditioner together with an ultra-violet (UV) led. As it is known thatthe UV led would damage humans, thus, it must be turned off while a person is in the room. On the opposite, the airconditioner should work during the appearance of a person in that room, it also should be stopped in an empty roomfor saving electricity power. So, the ﬁrst protocol is to “shout down UV while any person monitored”, and the secondone is “shout down the air conditioner when no one is in the room”. The classiﬁers should answer the vital questionof “How much the protocols are robust?”.Let p as “at least a human exists” and q as “the UV is on”, and r as “air conditioner is on”. The LTL formulas for theﬁrst question is: • G ( p ⇒ X ¬ q ) : which is the translation of “by observation of human, the UV should be shouted down”. • For all π T S ∈ Π T S , T S, π

T S (cid:15) G ( D A p ⇒ X h D A ¬ q i ) : by the satisfaction of this formula the satisfactionof the property would be veriﬁed within group A . • There exists π T S ∈ Π T S , T S, π

T S (cid:15) G ( ¬ D ¬ p ⇒ X h¬ Dq i ) : by the satisfaction of this formula, there is ascenario in which the formula holds. It means that this property is possible within group A . • For all π T S ∈ Π T S , T S, π

T S (cid:15) G ( K i p ⇒ X h K i ¬ q i ) : by the satisfaction of this formula, this propertywould be robust for the i-th classiﬁer. • There exists π T S ∈ Π T S , T S, π

T S (cid:15) G ( ¬ K i ¬ p ⇒ X h¬ K i q i ) : by the satisfaction of this formula, the i-thclassiﬁer, consider this property as a possible one.8 PREPRINT - S

EPTEMBER

10, 2020More handily, for all π T S ∈ Π T S , T S, π

T S (cid:15) G ( ¬ D A ¬ p ⇒ X h D A ¬ q i ) will ensure the system in any possiblesituation, the possible existence of anybody should enforce UV-let to turned off. Accordingly, this system leads us toinvestigate whether protocols are followed or not. Besides, it can be observed that any error by the system is causedby which classiﬁer. Traversing all paths in the developed transition system could be very time-consuming, concerning the size of PALmodels. Fortunately, the computation could be reduced in special conditions to reach an applicable approach. Inparticular, and for instance, image frames of video streams are strongly correlated with the next and previous imageframes [34]. In other words, it could be assumed, the video stream as semi-continues data and the overall changesbetween frames should be minimal. It also could be assumed as a Markov-chain memoryless data ﬂow, which meansevery data frame is just related to its previous data frame. Using this property of video streams together with recenttechnology of NLP the most probable path could be determined. Here, we will calculate the changes by the label ofobjects represented in frames and ﬁnding the most probable path among all possible paths. As it mentioned before, theset of paths Π T S in the transition system

T S are created by worlds of n + 1 models M , . . . , M n , in which relationsare between two successive worlds of models M i and M ( i +1) . To ﬁnd out the similarities and differences betweentwo worlds, ﬁrst, we extract word-tags of true atomic formulas in each world, and then we will ﬁnd the probability ofeach relation by NLP approaches, by ﬁnding the tags’ similarities in both worlds (see [35], [36], [37]). After ﬁndingthe similarity score of each transition relation, the most possible path could be found in time complexity O ( | → | ) by following a greedy algorithm. To introduce the approach, we assume the transition system was made of n + 1 PAL models M i = ( W i , R i , . . . , R i n , V ) , ≤ i ≤ n . Let minπ ... ( i − T S → w i j be the best-found path of length i + 1 , starting at the initial state s to the world w i j , and also let P → ij,k be the represented probability score betweenworlds w i j and w ( i +1) k (which is calculated similarity between tags in worlds w i j and w ( i +1) k by the NLP algorithm,i.e. Siamese [37]). The possibility score of the most probable path from s to w ( i +1) k is P minπ ...i T S → w ( i +1) k = min j ( P minπ ... ( i − T S → w ij × P → j,ki ) and the path would be minπ ...i T S → w ( i +1) k ≡ minπ ... ( i − T S → w i j → w ( i +1) k .By continuing this approach to the state s − the best path would be converged into one and the most possible pathwould be found. This path could be applied in many real-world applications, i.e., a frequent mistake in the videostream object detection systems is misclassiﬁcations in some special frames, because of the exceptional situation ofthe object. Employing this algorithm would lead to getting rid of the partial misclassiﬁcations. Moreover, the abovementioned epistemic formula could be applied in the most probable path for reducing the time of computations. Thealgorithm 5 is also developed to ﬁnd the most probable path among the transition system. Algorithm 5

The Most Probable Path Extraction (MPPE) function shall collect the most possible path of a data-stream.The similarity score of tags in the 3rd line could be found with any arbitrary NLP algorithms.Let

T S as a transition system function MPPE(

T S ) for all ( w i , w i +1 ) ∈→ do P ( w i ,w i +1 ) := similarity score of tags in w i and w i +1 for index ∈ { , . . . , length ( π T S ) } do for all w i ∈ W i do minP w i := 0 minπ w i := ∅ for all π ...i T S = minπ ...i − T S → w i do if P π ...i T S > minP w i then minP w i := P π ...i T S minπ w i := π ...i T S return minπ s − The statistical nature of most common classiﬁcation algorithms, together with reaching a high accuracy in most appliedapplications (i.e. classiﬁcation, NLP, etc.) made them so popular, but unfortunately, the reasoning about answers seems9

PREPRINT - S

EPTEMBER

10, 2020so complicated. Accordingly, the logical point of view in classiﬁcation problems to extract information and reasoningshould be get noticed in critical cases. In the introduced approach, ﬁrst, a straight way is represented to aggregateknowledge acquired by any classiﬁcation or by any input rules, in a uniﬁed manner. Next, by introducing an LTL-based transition system, the ﬂow of knowledge in time-series data is modeled. Later, to answer any question, allpossible answers will be converted to LTL formulas. Such formulas could be investigated in the developed model, so,satisﬁed formulas could be reported as answers. The reliability of answers could also be investigated using deﬁnedrobustness and veriﬁcation concepts. Moreover, it is possible to catch missing information that could lead the system tothe answer, for any level of reliability. The captured missing information could lead us to ﬁnd out classiﬁers’ shortagesof information and the cause of the wrong decision makings. In the ﬁnal part, an approach was developed to ﬁnd themost probable path in order to reduce the state-space in larger models.

References [1] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and DeviParikh. Vqa: Visual question answering. In

Proceedings of the IEEE international conference on computervision , pages 2425–2433, 2015.[2] Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. Visual7w: Grounded question answering in images.In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4995–5004, 2016.[3] Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and composi-tional question answering. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pages 6700–6709, 2019.[4] Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, and Gunhee Kim. Tgif-qa: Toward spatio-temporalreasoning in visual question answering. In

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , pages 2758–2766, 2017.[5] Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler.Movieqa: Understanding stories in movies through question-answering. In

Proceedings of the IEEE conferenceon computer vision and pattern recognition , pages 4631–4640, 2016.[6] Amir Zadeh, Michael Chan, Paul Pu Liang, Edmund Tong, and Louis-Philippe Morency. Social-iq: A questionanswering benchmark for artiﬁcial social intelligence. In

Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition , pages 8807–8817, 2019.[7] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Uniﬁed, real-time objectdetection. In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages 779–788,2016.[8] Amirhoshang Hoseinpour Dehkordi, Majid Alizadeh, Ebrahim Ardeshir-Larijani, and Ali Movaghar. MASKS:A Multi-Artiﬁcial Neural Networks System’s veriﬁcation approach. 7 2020.[9] Patrick H Winston. Artiﬁcial intelligence 3rd edition.

Addison-Wesley, Reading, MA , 34:167–339, 1992.[10] Stuart Russell and Peter Norvig. Artiﬁcial intelligence: a modern approach. 2002.[11] Nils J Nilsson.

The quest for artiﬁcial intelligence . Cambridge University Press, 2009.[12] Robert P Goldman and Eugene Charniak. Probabilistic text understanding.

Statistics and Computing , 2(2):105–114, 1992.[13] Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick.Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In

Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition , pages 2901–2910, 2017.[14] Chen Sun and Ram Nevatia. Active: Activity concept transitions in video event classiﬁcation. In

Proceedings ofthe IEEE International Conference on Computer Vision , pages 913–920, 2013.[15] Kevin Tang, Li Fei-Fei, and Daphne Koller. Learning latent temporal structure for complex event detection. In , pages 1250–1257. IEEE, 2012.[16] Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B Tenenbaum.Clevrer: Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442 , 2019.[17] Jiyang Gao, Chen Sun, Zhenheng Yang, and Ram Nevatia. Tall: Temporal activity localization via languagequery. In

Proceedings of the IEEE international conference on computer vision , pages 5267–5275, 2017.10

PREPRINT - S

EPTEMBER

10, 2020[18] Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. Localizingmoments in video with natural language. In

Proceedings of the IEEE international conference on computervision , pages 5803–5812, 2017.[19] Lisa Bauer, Yicheng Wang, and Mohit Bansal. Commonsense for generative multi-hop question answeringtasks. In

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages4220–4230, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.[20] Kaixin Ma, Jonathan Francis, Quanyang Lu, Eric Nyberg, and Alessandro Oltramari. Towards generalizableneuro-symbolic systems for commonsense question answering. arXiv preprint arXiv:1910.14087 , 2019.[21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In

Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.[22] Brendan Fong, Alberto Speranzon, and David I Spivak. Temporal landscapes: A graphical temporal logic forreasoning. arXiv preprint arXiv:1904.01081 , 2019.[23] Daniel de Leng and Fredrik Heintz. Approximate stream reasoning with metric temporal logic under uncertainty.In

Proceedings of the AAAI Conference on Artiﬁcial Intelligence , volume 33, pages 2760–2767, 2019.[24] Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety veriﬁcation of deep neural networks. InRupak Majumdar and Viktor Kuncak, editors,

Computer Aided Veriﬁcation - 29th International Conference, CAV2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I , volume 10426 of

Lecture Notes in ComputerScience , pages 3–29. Springer, 2017.[25] Rob Gerth, Doron Peled, Moshe Y Vardi, and Pierre Wolper. Simple on-the-ﬂy automatic veriﬁcation of lineartemporal logic. In

International Conference on Protocol Speciﬁcation, Testing and Veriﬁcation , pages 3–18.Springer, 1995.[26] Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, and Luo Si. Structbert: Incorpo-rating language structures into pre-training for deep language understanding. arXiv preprint arXiv:1908.04577 ,2019.[27] Xin Rong. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 , 2014.[28] Akash Hossain and François Laroussinie. From quantiﬁed ctl to qbf. arXiv preprint arXiv:1906.10005 , 2019.[29] Marta Kwiatkowska, Alessio Lomuscio, and Hongyang Qu. Parallel model checking for temporal epistemic logic.In

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artiﬁcial Intelligence , pages543–548, 2010.[30] Hadas Kress-Gazit, Georgios E Fainekos, and George J Pappas. Translating structured english to robot con-trollers.

Advanced Robotics , 22(12):1343–1359, 2008.[31] Juraj Dzifcak, Matthias Scheutz, Chitta Baral, and Paul Schermerhorn. What to do and how to do it: Translatingnatural language directives into temporal and dynamic logic representation for goal management and actionexecution. In , pages 4163–4168. IEEE, 2009.[32] Rani Nelken and Nissim Francez. Automatic translation of natural language system speciﬁcations into temporallogic. In

International Conference on Computer Aided Veriﬁcation , pages 360–371. Springer, 1996.[33] Cameron Finucane, Gangyuan Jing, and Hadas Kress-Gazit. Ltlmop: Experimenting with language, temporallogic and robot control. In , pages1988–1993. IEEE, 2010.[34] Jawadul H Bappy, Sujoy Paul, Ertem Tuncel, and Amit K Roy-Chowdhury. Exploiting typicality for selectinginformative and anomalous samples in videos.

IEEE Transactions on Image Processing , 28(10):5214–5226,2019.[35] Wei Yang, Wei Lu, and Vincent W Zheng. A simple regularization-based algorithm for learning cross-domainword embeddings. arXiv preprint arXiv:1902.00184 , 2019.[36] Atish Pawar and Vijay Mago. Calculating the similarity between words and sentences using a lexical databaseand corpus statistics. arXiv preprint arXiv:1802.05667 , 2018.[37] Jonas Mueller and Aditya Thyagarajan. Siamese recurrent architectures for learning sentence similarity. In thirtieth AAAI conference on artiﬁcial intelligence , 2016.11, 2016.11