Linear Temporal Public Announcement Logic: a new perspective for reasoning the knowledge of multi-classifiers
Amirhoshang Hoseinpour Dehkordi, Majid Alizadeh, Ali Movaghar
aa r X i v : . [ c s . A I] S e p L INEAR T EMPORAL P UB LIC A NNOUNCEMENT L OGIC : A NEW PER SPECTIVE FOR R EASONING THE KNOWLEDGE OFMULTI - CLASSIFIERS
Amirhoshang Hoseinpour Dehkordi
School of Computer ScienceInstitute for Research in Fundamental Sciences,Tehran, 19538 - 33511, Iran [email protected]
Majid Alizadeh ∗ School of Mathematics, Statistics and Computer ScienceCollege of Science, University of Tehran,Tehran, 14155-6455, Iran [email protected]
Ali Movaghar
Department of Computer Engineering,Sharif University of Technology,Tehran, 11155-9517, Iran [email protected]
September 10, 2020 A BSTRACT
Current applied intelligent systems have crucial shortcomings either in reasoning the gathered knowl-edge, or representation of comprehensive integrated information. To address these limitations, wedevelop a formal transition system which is applied to the common artificial intelligence (AI) sys-tems, to reason about the findings. The developed model was created by combining the PublicAnnouncement Logic (PAL) and the Linear Temporal Logic (LTL), which will be done to analyzeboth single-framed data and the following time-series data. To do this, first, the achieved knowledgeby an AI-based system (i.e., classifiers ) for an individual time-framed data, will be taken, and then,it would be modeled by a PAL. This leads to developing a unified representation of knowledge, andthe smoothness in the integration of the gathered and external experiences. Therefore, the modelcould receive the classifier’s predefined -or any external- knowledge, to assemble them in a unifiedmanner. Alongside the PAL, all the timed knowledge changes will be modeled, using a temporallogic transition system. Later, following by the translation of natural language questions into thetemporal formulas, the satisfaction leads the model to answer that question. This interpretation in-tegrates the information of the recognized input data, rules, and knowledge. Finally, we suggest amechanism to reduce the investigated paths for the performance improvements, which results in apartial correction for an object-detection system.
Nowadays, artificial intelligence (AI) is inseparable from real-world applications. Besides, the classification, whichis considered as the classic application of AI-based algorithms, the cognitive understanding is also gained significantattention more recently. The current works on visual understandings of both image (see [1], [2], and [3]), and video(see [4], [5], and [6]) show the importance of data understandings. Although to date developed approaches get more ac-curacy via proposing new approaches, there is no certain approach formed to solve such problems, entirely. Moreover,due to the probabilistic nature of most of these approaches, it seems that it is impossible to attain complete accuracyin the near future. ∗ Corresponding author
PREPRINT - S
EPTEMBER
10, 2020To be more precise, followed by the accessing of current works, the process of data understanding could be dividedinto the following steps: the first and the most popular one is the object recognition from the input data, for whichthere are many solutions, with fairly great performances [7]. The next step is the knowledge extraction, which is away to understand the knowledge acquired in the initial step. Herein, objects will be transformed into symbols, inwhich they could be shared and aggregated in a modal logic approach [8]. Following that, for a reasoning system, itis mandatory to interpret guaranteed rules which could be fed into the system, using predefined protocols (i.e., “cat isan animal”). The final step is to collect a set of knowledge that would lead a model to answer ordinary questions. Thelast three steps will be discussed further in this study.
Although comprehensive definition of AI has always been a challenge between scientists, the reasoning is one of themost mentioned and agreed dimensions of AI (see [9], [10], [11], [12]). This dimension of AI is trended in most sciencefields, due to the availability of big and high-quality datasets [13], fast progress of natural language, image, and videoprocessing systems (which could be assumed video as an ordered set of images [14], [15]). Furthermore, it motivatesthe most recent works in this area to target Visual data understanding and question answering [16], and natural languagequeries [17], [18]. For instance, a question-answering (QA) reasoning system is developed by [19] which was acombination of methods developed in previous works in QA reasoning systems. Further, they introduced an algorithmto select the best paths of commonsense knowledge, in order to get the whole inference required for QA. Although,the method works well for the specific applications, yet, it is not a general solution for the QA problem. Following thismethod, and together with systematic analysis of popular knowledge resources and the knowledge-integration methods,[20], stated a modeling solution “non-extractive commonsense QA”, which is more accurate but still does not cover allsuch problems. More recently, the “CoLlision Events for Video REpresentation and Reasoning” (CLEVRER) methodwas developed by [16], which is a reasoning system in video streams based on human intelligence. In this research,unlike most previous studies, “causal structures” are taken into account. Also, this model could answer four mainvarieties of questions, which are descriptive (e.g., “what color”), explanatory (“what’s responsible for”), predictive (“what will happen next”), and counterfactual (“what if”). Hereabouts, the method for the extraction of features fromthe video frames is ResNet-50 [21], which is a well-known object detection method. This method (CLEVRER) is oneof the most accepted methods in such a field. Due to the characteristics of the reasoning systems together with thepower of the temporal logic in formalizing natural language into formulas, this kind of model gets more interested inreasoning systems [22]. In a new perspective, the Metric Temporal Logic (MTL), which is an extension of the LinearTemporal Logic (LTL), was studied in [23] to handle stochastic state information. Considering the modal logic in suchproblems would lead to a new way of problem-solving because this approach seems suitable for such problems.
This study aims to provide a flexible reasoning system, which could render the knowledge achieved by AI-based sys-tems. Hither, we are going to develop a model to be applied to the existed classifiers, to translate the informationattained from the results. Besides, our model could handle multi-knowledge flow in a multi-classifier scenario. More-over, in the developed model, the obtained knowledge by classifiers will be aggregated in a defined and unified format.Accordingly, the analysis would be performed considering each scenario of the collaboration for such classifiers (i.e.,the investigation of knowledge of each agent, a group of agents, knowledge which is distributed between agents, etc.).Herein, the verification of the investigated formulas will be proposed for such a statement using the definition of “ver-ified formula” which is provided by [8]. Furthermore, a human language to the defined temporal formula translationapproach is addressed, to get a more expressive model, and also to confirm the power of the adaption of such a modelin real-world applications. Finally, a strategy for determining the probability of each state-transition is also suggested.These probabilities assist us to apply the method in a more optimized fashion. This final adjustment directs us to a“correction for time-series data” strategy for time-series object detection algorithms.
In this section, we are going to define the problem that we will be solved in this study. There are many definitionsfor intelligent agents. One of the most common out of all is to develop human-like reasoning from input data. Thereare many approaches in which objects, referred to as knowledge, could be extracted from input data. But still, thereis a lack of a generalized system to model to infer from gained knowledge. To reach that kind of intelligence from aclassifier, we developed a model in the following steps:1. Assume the classifier as an artificial neural network (ANN), which is the most common way for classifyingapproaches, this kind of classifier could not guarantee full accuracy. These classifiers are based on statistical2
PREPRINT - S
EPTEMBER
10, 2020methods, and the accuracy is based on the architecture and training set. Even by the assumption of goodarchitecture, if a new input data fed into an ANN, the network could not guarantee the correct output. Toovercome this problem, [24] presented an approach to verify predefined properties in such models. Thismethod leads these models to verify properties for each input. In the case that the property could not beverified, [8] developed a multi-agent epistemic logic model to find all possible outcomes concerning thatproperty, by that multiple classifiers. By applying this method, the outcome of the classifier would be a set ofpossible outputs(i.e., possible worlds).2. Generally, the set of knowledge is directly driven by classifiers. This set of knowledge contains the classes offound objects, yet from a real-world perspective, each detected object would contain more information (i.e.,ontology rules). To reflect this, an intelligent agent should understand the category of the object together withsub-categories which are fed in as input rules. In this model, for each possible class of object, predefinedinferences would be extracted in a unified and formal manner.3. The previously developed model was exclusively designed for single-framed data, therefor the time-seriesdata could not be considered. To bring them into consideration, we developed a combination of temporallogic transition systems [25] with the developed epistemic logic model. This leads us to extract all possiblesequences (named execution path) of knowledge represented in time-series data.4. Here, input-output would be in question-answering format. Accordingly, to get an outcome from the devel-oped model, the model should provide an answer to respecting questions that are asked. To do this, first, allpossible answers would be extracted due to the asked question. Then, using preexisted approaches, naturallanguage statements will be translated into temporal formulas. Finally, each formula of every possible answerwould be investigated in the developed temporal model and the satisfying formula reflects the answers to thequestion.5. The verification in this transition system is also defined by modifying the verification definition in [24] and[8]. Accordingly, the situation of the answer would be determined (whether it is a verified answer, possibleanswer, verified for a single classifier or it needs some information to get a possible/verified answer).6. The space state explosion, and following that, numerous execution paths could make the proposed methodvery time and space consuming in large models. To overcome that difficulty, an approach is developed, tofind the most probable path and investigate the satisfaction on that path. This method for finding the mostprobable path could be lead us to a data-stream correction (if there are misclassifications in small fractions ofsuch data), in semi-continuous data-streams.Here, we are going to do the mentioned steps as an example of a video stream and classifiers. In this case, classifierscould determine objects appearing during the video, following by extracting all possible outcomes for each imageknown as data-frame. Then, by collecting all input rules of objects (i.e., an input rule could be “elephants” are“animals” so, in each possible situation an “elephant” detected, we infer that an “animal” is also detected) each possibleoutcome will be enhanced for each image in the video. Next, by placing the model of all possible situations for imagesin a hierarchical sequencing the transition system would be made. Consequently, we could investigate the satisfactionof formulas that are created by the translation of questions. After that, for the sake of the performance, we willcalculate the probability of each transition relation, following that, finding the most probable execution path. In fig 1,a model for two-framed data is illustrated, in which the most probable path is w , w , w , w , . This path could leadus to correct the misclassifications of classifiers. In this section, we are going to introduce a logical model for interpreting single-framed data. Following each step ofthe development of the model, an application of such a model was represented. The epistemic model is based on anextension of PAL that is mentioned in [8]. It will be applied to extract knowledge from a single-framed data (i.e., animage).Let us, first introduce the syntax and semantics of the logic. The syntax of the language PAL is as follows, in BNF: φ ::= p | ¬ φ | ( φ ∧ φ ) | K i φ | D A φ | [ φ ] φ. Where p as a propositional variable (atomic formula) is a pair of the form ( x, c ) , in which x denotes the input data and c denotes the target class. We also notice that K i φ read as “i-th classifier knows φ ”. In other words, the i-th classifier wasassured about the truth value of φ , so φ could be named a “robust” knowledge for the classifier i . Followed that, D A φ read as “ φ is a distributed knowledge in a group A of classifiers”. The formula D A φ holds exactly when, aggregationof knowledge of agents in the group A , satisfies φ . Here, the formula φ will be named a “robust” knowledge within3 PREPRINT - S
EPTEMBER
10, 2020 w , w , w , w , w , w , w , . . . . . . . . . . . Figure 1: A combination of transition system and epistemic model for two-framed data. The probability of eachtransition is also determined. The most probable execution path out of all 36 possible ones is w , w , w , w , M = ( W , R , V ) , W = { w , , w , , w , } , R = { ( w , , w , ) , ( w , , w , ) , ( w , , w , ) } ; M = ( W , R , V ) , W = { w , , w , } , R = { ( w , , w , ) } ; T S = (
S, R, s , s − , → , L ) ,S = w , , ∪ W ∪ W ∪ w , , s = w , , s − = w , ,R = R ∪ R , → = { ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) , ( w , , w , ) } ;group A . The robustness of formula φ ensures that the formula is verified by those classifiers [8]. Finally, the formula [ ψ ] φ read as “after a correct announcement of ψ , φ will hold”. By employing this operator, it could investigate that,which knowledge is significantly missing. In other words, by adding which information, the system could satisfy theinspecting formulas.A PAL Kripke model is a tuple M = ( W, R , . . . , R n , V ) , where W = { w , . . . , w k } is a set of worlds (hereaboutsthe set W is a representative for all possible output results of the input data), R i ⊆ W × W is an equivalent relationbetween worlds for each classifier i in { , . . . , n } . The intended meaning of sR i s ′ relation is, the worlds s and s ′ cannot be distinguished by the i -th classifier. And finally, V : W −→ P rop is the evaluation function specified thatthe knowledge represented in any world, where
P rop is the set of all atoms. We extend the evaluation function V toall formulas as follows: • M , w | = p iff p ∈ V ( w ) , • M , w | = ¬ φ iff M , w ϕ , • M , w | = φ ∧ ψ iff M , w | = φ and M , w | = ψ , • M , w | = K i φ iff ∀ v ∈ R i ( w ) , M , v | = φ , • M , w | = D A φ iff ∀ v ∈ R A ( w ) , M , v | = φ , where R A := T i ∈ A R i . • M , w | = [ ψ ] φ iff M , w | = ψ implies M ψ , w | = φ .The intended meaning of satisfaction of an atomic formula ( x, c ) in a world w is “class c has appeared as respectiveclassifier’s output class for the input data x ”. We also mention that for every formula φ and ψ , φ (cid:15) ψ means that forevery model M , and every point w ∈ M , M, w (cid:15) φ implies that M, w (cid:15) ψ .In the following example, we applied the model on an image as an input. Example. (Representation of an Image) Assume that, we have an image as input, with some objects on the picture.The classifiers are trained by the same sets of objects as an output class. Moreover, it is assumed that the classifiers aremulti-object detectors, which means they could detect more than one object class in the output if they exist. The nextassumption in this section is, the rules of relations between objects and their features are given (i.e. dog is an animal4
PREPRINT - S
EPTEMBER
10, 2020and has two eyes, four legs, etc.). There are many approaches for this purpose, for example, the rules or categoriescould be derived from object tags in a natural language processing (NLP) model (see [26], [27]).
Remark:
1. (Sub-features of a result class) Hereby, for each image, we have a Kripke model, in which, each wordcontains possible outcomes from the image, which are detected by classifiers. So, it can be written
M, w (cid:15) ( x, c ) ∧· · ·∧ ( x, c k ) , P ( w ) = { ( x, c ) ∧· · ·∧ ( x, c k ) } . Moreover, each output class has some other features that could be drivenfrom it. For instance, a human knows that “cats” are animals, they have two eyes, four legs, etc. To be more formal,for each output class ( x, c i ) , it could be written ( x, c i ) (cid:15) ( x, c i ) ∧ · · · ∧ ( x, c i m ) , in which ( x, c i j ) is a concluded asa feature of ( x, c i ) . Note that, it could be even more nested, in other words, ( x, c i j ) (cid:15) ( x, c i j ) ∧ · · · ∧ ( x, c i jl ) and soon.2. (Verification of a sub-features) Here, if there is only one possible world for the input image, the outcome is“robust” and this means that the output is verified concerning the predefined property [8]. In the case that thereis more than one possible world represented for the image, it is also possible to find some other robust features.To introduce this kind of robustness, assume that two possible worlds w, w ′ are possible ones for the input image.The world M, w (cid:15) ( x, “ Chair ′′ ) ∧ ( x, “ Cat ′′ ) and M, w ′ (cid:15) ( x, “ Chair ′′ ) ∧ ( x, “ Dog ′′ ) , we could say that M (cid:15) ( x, “ Chair ′′ ) ∧ ( x, “ Animal ′′ ) it represents that the image robust for “ Chair ′′ and “ Animal ′′ . It is derived fromthe fact ( x, “ Dog ′′ ) (cid:15) ( x, “ Animal ′′ ) , ( x, “ Cat ′′ ) (cid:15) ( x, “ Animal ′′ ) . This kind of verification will be more usefulin the process of reasoning about questions. For illustration, assume that the question asks information about existing“animals” in the image and doesn’t care which animal showed in the image. So we could provide the verified answerabout existing animals. As an illustration, algorithm 1 developed to aggregate all available knowledge, includingthe classifier’s knowledge and sub-features. The output of this algorithm would be a Kripke model. Moreover, inalgorithm 2 satisfaction of PAL formulas will be determined. Algorithm 1
The Single-Framed Knowledge Extraction (SFKE) function shall collect the knowledge, produced byclassifiers, together with the aggregation of external subset rules, we also use MASKS function developed in [8].Let C as the set of classifiers, ζ as the set of rules, x as the single framed input data, and η as the defined confidentneighborhood function SFKE( C , ζ, x , η ) ⊲ V could be driven from the model M R , M := MASKS ( C , x , η ) for all possible world w ∈ M do for all atomic formula p ∈ w do Add all sub rules of p defined in ζ to V ( w ) . return M Algorithm 2
The PAL Satisfaction function (PALS) shall investigate the satisfaction of PAL formulas.Let φ as the PAL formula, M as the Kripke model, and w as a world function PALS( φ, M , w ) ⊲ R i and V could be driven from the model M if φ is an atomic formula then return φ ∈ V ( w ) ⊲ This will be true if φ ∈ V ( w ) and false otherwise if φ is in form of ¬ ψ then return ¬ PALS( ψ, M , w ) if φ is in form of ψ ∧ ψ then return PALS( ψ , M , w ) ∧ PALS( ψ , M , w ) if φ is in form of K i ψ then return ∧ w ′ PALS( ψ, M , w ′ ); ∀ w ′ ∈ R i ( w ) if φ is in form of D A ψ then R A := T i ∈ A R i return ∧ w ′ PALS( ψ, M , w ′ ); ∀ w ′ ∈ R A ( w ) return false 5 PREPRINT - S
EPTEMBER
10, 2020
In this section to extract all possible sequences, named execution paths, of knowledge represented in time-series data,we introduce an LTL extension of PAL by the following grammar, in BNF: φ ::= p | ¬ φ | ( φ ∧ φ ) | K j φ | D A φ | [ φ ] φ Φ ::= φ | ( ¬ Φ) | (Φ ∧ Φ) | X Φ | [Φ U Φ] Here, the temporal operators are X Φ (in the ne X t data-frame Φ must be true) and [Φ U Ψ] ( Ψ must remain true U ntil Φ becomes true). To define the Kripke semantic of this logic, assume that M = ( { w } , R , . . . , R k , V ) ,M = ( W , R , . . . , R k , V ) , . . ., M n − = ( W n − , R ( n − , . . . , R ( n − k , V n − ) , M n = ( { w n } , R n , . . . , R n k , V n ) are PAL models, where W i ’s are mutually disjoint sets, and V ( w ) = V n ( w ) = ∅ .We build a new model T S = (
S, R, s , s − , → , L ) , known as a transition system, in which S = S ni =0 W i is the set ofstates, R = { R i j | ≤ i ≤ n, ≤ j ≤ k } , s = w and s − = w n are initial and final states, → = W i × W i +1 , ≤ i < n are transition relations, and the labeling function L : S −→ P rop which assigning propositional letters to statesis defined by L ( w i j ) = V i ( w i j ) , for each w i j ∈ W i .An executive path π i...j T S in the model
T S is a sequence of worlds w i w i +1 . . . w j where w i → w i +1 → · · · → w j for w k ∈ W k , i ≤ k ≤ j . An executive path π i...j T S is called total and denoted by π T S if w i = s and w j = s − . The setof all total executive paths will be denoted by Π T S . Next, we introduce the semantics for
LT P AL as follows: • T S , π i...j
T S (cid:15) φ iff T S , w i (cid:15) φ , • T S , π i...j T S (cid:15) X Φ iff T S , π i +1 ...j T S (cid:15) Φ , • T S , π i...j T S (cid:15) [Φ U Φ ] iff there exist m , i ≤ m ≤ j , T S , π m...j
T S (cid:15) Φ , and for all k , i ≤ k < j, we have T S , π i...k
T S (cid:15) Φ .We notice that in the first clause above we look at the transition system T S over PAL formulas simply as a Kripkemodel. Moreover, algorithm 3 is developed to create a transition system from PAL models.By the function TEMS developed in algorithm 4, temporal formulas could be investigated. Other temporal operatorscould be driven from the two operators next ( X Φ ), and until ( U Φ ) in the following way: • F Φ ≡ [ ⊤ U Φ] • [Φ R Ψ] ≡ ¬ [ ¬ Φ U Ψ] • [Φ W Ψ] ≡ [Ψ R (Φ ∨ Ψ)] • G Φ ≡ [ ⊥ R Φ] The intended meaning of future ( F Φ ) is “eventually Φ becomes true”, global ( G Φ ) is “ Φ must remain true forever”, release ( [Φ R Ψ]) ) is “ Ψ remains true until and including when Φ becomes true, if Φ never becomes true, Ψ alwaysremain true”, and weak until ( [Φ W Ψ] ) is “ Φ has to remain true at least until Ψ ; if Ψ never holds, Φ must alwaysremains true” [28], [29]. Easy interpretation of human language in modal logics (especially in LTL) is one of the most important strengths ofsuch logics (see [30],[31], and [32]). This kind of interpretation could help robots to react to human orders [33]. Inorder to convert the question into a formula, first of all, we need to extract all possible answers to the question. Thenafter converting each possible answer to LTL formula using the existing -and abovementioned- approaches we willinvestigate the satisfactory of each answer. The developed model would lead us a PAL modifications on such LTLformulas. To explain that let Φ( p , . . . , p σ ) be an LTL translated formula and φ i , ≤ i ≤ σ is a PAL formula. Then Φ( φ , . . . , φ σ ) is obtained from φ by substituting each of p i with φ i , for all ≤ i ≤ σ , respectively. For instance,for Φ( p , p ) = G [ p U Xp ] , φ = ¬ D A ¬ p , and φ = K i p , we have Φ( φ , φ ) = G [( ¬ D A ¬ p ) U h X ( K i p ) i ] ,which, for the transition system T S and the investigated execution path π T S , means that “ p should always be apossible answer until in the world which right after, p is a robust knowledge for the i-th agent”. By defining a way for6 PREPRINT - S
EPTEMBER
10, 2020
Algorithm 3
The Time-Series Transition System (TSTS) function shall create the transition system by the given inputinformation.Let C as the set of classifiers, ζ as the set of rules, X as the time-series input data of size k , and η as the predefinedconfident neighborhood function TSTS( C , ζ, X, η ) ⊲ M = ( W, R , . . . , R n , V ) S := { w } W ′ := { w } R , . . . , R n := { ( w , w ) } s := w s − := w → := ∅ L ( w ) := ∅ T S := (
S, R , . . . , R n , s , s − , → , L ) for all x in X do M := SFKE( C , ζ, x , η ) ⊲ Kripke model M = ( W, R ′ , . . . , R ′ n , V ) S := S ∪ W R , . . . , R n := R , . . . , R n ∪ R ′ , . . . , R ′ n for all w ∈ W and w ′ ∈ W ′ do → := → ∪ { ( w ′ , w ) } W ′ := W W := { w k +1 } S := S ∪ W R , . . . , R n := R , . . . , R n ∪ { ( w k +1 , w k +1 ) } s − := w k +1 for all w ∈ W and w ′ ∈ W ′ do → := → ∪ { ( w ′ , w ) } return T S
Algorithm 4
The TEMporal Satisfaction function (TEMS) function shall investigate the satisfaction of LTL formulas.Let Φ as the LTL formula, T S as the transition system, and w as a world function TEMS( Φ , T S , π i...jT S ) ⊲ → could be driven from the model T S if i > j then return false if Φ is a PAL formula then return PALS( Φ , M , w i ) ⊲ let w i as first world in path π i...jT S if Φ is in form of ¬ Ψ then return ¬ TEMS( Ψ , T S , π i...jT S ) if Φ is in form of Ψ ∧ Ψ then return TEMS( Ψ , T S , π i...jT S ) ∧ TEMS( Ψ , T S , π i...jT S ) if Φ is in form of X Ψ then return TEMS( Ψ , T S , π i +1 ...jT S ) if Φ is in form of [Ψ U Ψ ] then if TEMS( Ψ , T S , π i...jT S ) then return true while TEMS( Ψ , T S , π i...jT S ) do i := i + 1 if TEMS( Ψ , T S , π i...jT S ) then return true return false 7 PREPRINT - S
EPTEMBER
10, 2020introducing robustness and possibility for a classifier, together with the definition of verified and possibility concerninga group of classifiers, answers could be labeled to assure the questioner, “how reliable is this answer”. This wouldlead us to design a more proper system for critical applications. Moreover, by capturing missing knowledge, it couldbe caught that which knowledge is mandatory for the system which is not discovered by classifiers or external, andwhich part of the system could be manipulated to obtain the missing information. In the following, we would like tointroduce when a formula is verified, possible, robust or it is missing information in our model:1. The formula Φ( p , . . . , p σ ) is verified for the group A of classifiers exactly when, for all π T S ∈ Π T S wehave
T S, π
T S (cid:15) Φ( D A p , . . . , D A p σ ) .2. The formula Φ( p , . . . , p σ ) is a possible scenario for the group A of classifiers exactly when, there exists π T S ∈ Π T S , in which we have
T S, π
T S (cid:15) Φ( ¬ D A ¬ p , . . . , ¬ D A ¬ p σ ) .3. The formula Φ( p , . . . , p σ ) is robust with i-th agent’s perspective exactly when, for all π T S ∈ Π T S , we have
T S, π
T S (cid:15) Φ( K i p , . . . , K i p σ ) .4. The formula Φ( p , . . . , p σ ) is possible with i-th agent’s perspective exactly when, there exists π T S ∈ Π T S ,in which we have
T S, π
T S (cid:15) Φ( ¬ K i ¬ p , . . . , ¬ K i ¬ p σ ) .5. A PAL formula ψ is called verified-missing information for a formula Φ( p , . . . , p σ ) in group A of classifiersexactly when, there exists π T S ∈ Π T S , in which we have
T S, π
T S Φ( D A p , . . . , D A p σ ) and for all π T S ∈ Π T S , we have
T S, π
T S (cid:15)
Φ([ ψ ] h D A p i , . . . , [ ψ ] h D A p σ i ) .6. A PAL formula ψ is called possible-missing information for a formula Φ( p , . . . , p σ ) in group A of classifiersexactly when, for all π T S ∈ Π T S , we have
T S, π
T S Φ( ¬ D A ¬ p , . . . , ¬ D A ¬ p σ ) and there exists π T S ∈ Π T S , in which we have
T S, π
T S (cid:15)
Φ([ ψ ] h¬ D A ¬ p i , . . . , [ ψ ] h¬ D A ¬ p σ i ) .7. A PAL formula ψ is called verified-missing information for a formula Φ( p , . . . , p σ ) of i-th classifiers exactlywhen, there exists π T S ∈ Π T S , in which we have
T S, π
T S Φ( K i p , . . . , K i p σ ) and for all π T S ∈ Π T S ,we have
T S, π
T S (cid:15)
Φ([ ψ ] h K i p i , . . . , [ ψ ] h K i p σ i ) .8. A PAL formula ψ is called possible-missing information for a formula Φ( p , . . . , p σ ) of i-th classifiers exactlywhen, for all π T S ∈ Π T S , we have
T S, π
T S Φ( ¬ K i ¬ p , . . . , ¬ K i ¬ p σ ) and there exists π T S ∈ Π T S , inwhich we have
T S, π
T S (cid:15)
Φ([ ψ ] h¬ K i ¬ p i , . . . , [ ψ ] h¬ K i ¬ p σ i ) . Example. (Representation of a Video) In the previous chapter, a PAL logic for representing image data was il-lustrated. In this example, assume that the given finite video stream contains n − images, and models of epis-temic information extraction for these images are assumed to be M = ( W , R , . . . , R k , V ) , . . . , M n − =( W n − , R ( n − , . . . , R ( n − k , V n − ) . By adding two dummy models M and M n the transition model T S =( S, R, s , s − , → , L ) could be created.For clarification, we will describe a real-world situation: assume that, there is a medical clean-room in an Operatingtheater, which is monitored by a camera. The recorded video will be fed into a set of classifiers A , which are fairlyaccurate. The room will be cleaned with an air conditioner together with an ultra-violet (UV) led. As it is known thatthe UV led would damage humans, thus, it must be turned off while a person is in the room. On the opposite, the airconditioner should work during the appearance of a person in that room, it also should be stopped in an empty roomfor saving electricity power. So, the first protocol is to “shout down UV while any person monitored”, and the secondone is “shout down the air conditioner when no one is in the room”. The classifiers should answer the vital questionof “How much the protocols are robust?”.Let p as “at least a human exists” and q as “the UV is on”, and r as “air conditioner is on”. The LTL formulas for thefirst question is: • G ( p ⇒ X ¬ q ) : which is the translation of “by observation of human, the UV should be shouted down”. • For all π T S ∈ Π T S , T S, π
T S (cid:15) G ( D A p ⇒ X h D A ¬ q i ) : by the satisfaction of this formula the satisfactionof the property would be verified within group A . • There exists π T S ∈ Π T S , T S, π
T S (cid:15) G ( ¬ D ¬ p ⇒ X h¬ Dq i ) : by the satisfaction of this formula, there is ascenario in which the formula holds. It means that this property is possible within group A . • For all π T S ∈ Π T S , T S, π
T S (cid:15) G ( K i p ⇒ X h K i ¬ q i ) : by the satisfaction of this formula, this propertywould be robust for the i-th classifier. • There exists π T S ∈ Π T S , T S, π
T S (cid:15) G ( ¬ K i ¬ p ⇒ X h¬ K i q i ) : by the satisfaction of this formula, the i-thclassifier, consider this property as a possible one.8 PREPRINT - S
EPTEMBER
10, 2020More handily, for all π T S ∈ Π T S , T S, π
T S (cid:15) G ( ¬ D A ¬ p ⇒ X h D A ¬ q i ) will ensure the system in any possiblesituation, the possible existence of anybody should enforce UV-let to turned off. Accordingly, this system leads us toinvestigate whether protocols are followed or not. Besides, it can be observed that any error by the system is causedby which classifier. Traversing all paths in the developed transition system could be very time-consuming, concerning the size of PALmodels. Fortunately, the computation could be reduced in special conditions to reach an applicable approach. Inparticular, and for instance, image frames of video streams are strongly correlated with the next and previous imageframes [34]. In other words, it could be assumed, the video stream as semi-continues data and the overall changesbetween frames should be minimal. It also could be assumed as a Markov-chain memoryless data flow, which meansevery data frame is just related to its previous data frame. Using this property of video streams together with recenttechnology of NLP the most probable path could be determined. Here, we will calculate the changes by the label ofobjects represented in frames and finding the most probable path among all possible paths. As it mentioned before, theset of paths Π T S in the transition system
T S are created by worlds of n + 1 models M , . . . , M n , in which relationsare between two successive worlds of models M i and M ( i +1) . To find out the similarities and differences betweentwo worlds, first, we extract word-tags of true atomic formulas in each world, and then we will find the probability ofeach relation by NLP approaches, by finding the tags’ similarities in both worlds (see [35], [36], [37]). After findingthe similarity score of each transition relation, the most possible path could be found in time complexity O ( | → | ) by following a greedy algorithm. To introduce the approach, we assume the transition system was made of n + 1 PAL models M i = ( W i , R i , . . . , R i n , V ) , ≤ i ≤ n . Let minπ ... ( i − T S → w i j be the best-found path of length i + 1 , starting at the initial state s to the world w i j , and also let P → ij,k be the represented probability score betweenworlds w i j and w ( i +1) k (which is calculated similarity between tags in worlds w i j and w ( i +1) k by the NLP algorithm,i.e. Siamese [37]). The possibility score of the most probable path from s to w ( i +1) k is P minπ ...i T S → w ( i +1) k = min j ( P minπ ... ( i − T S → w ij × P → j,ki ) and the path would be minπ ...i T S → w ( i +1) k ≡ minπ ... ( i − T S → w i j → w ( i +1) k .By continuing this approach to the state s − the best path would be converged into one and the most possible pathwould be found. This path could be applied in many real-world applications, i.e., a frequent mistake in the videostream object detection systems is misclassifications in some special frames, because of the exceptional situation ofthe object. Employing this algorithm would lead to getting rid of the partial misclassifications. Moreover, the abovementioned epistemic formula could be applied in the most probable path for reducing the time of computations. Thealgorithm 5 is also developed to find the most probable path among the transition system. Algorithm 5
The Most Probable Path Extraction (MPPE) function shall collect the most possible path of a data-stream.The similarity score of tags in the 3rd line could be found with any arbitrary NLP algorithms.Let
T S as a transition system function MPPE(
T S ) for all ( w i , w i +1 ) ∈→ do P ( w i ,w i +1 ) := similarity score of tags in w i and w i +1 for index ∈ { , . . . , length ( π T S ) } do for all w i ∈ W i do minP w i := 0 minπ w i := ∅ for all π ...i T S = minπ ...i − T S → w i do if P π ...i T S > minP w i then minP w i := P π ...i T S minπ w i := π ...i T S return minπ s − The statistical nature of most common classification algorithms, together with reaching a high accuracy in most appliedapplications (i.e. classification, NLP, etc.) made them so popular, but unfortunately, the reasoning about answers seems9
PREPRINT - S
EPTEMBER
10, 2020so complicated. Accordingly, the logical point of view in classification problems to extract information and reasoningshould be get noticed in critical cases. In the introduced approach, first, a straight way is represented to aggregateknowledge acquired by any classification or by any input rules, in a unified manner. Next, by introducing an LTL-based transition system, the flow of knowledge in time-series data is modeled. Later, to answer any question, allpossible answers will be converted to LTL formulas. Such formulas could be investigated in the developed model, so,satisfied formulas could be reported as answers. The reliability of answers could also be investigated using definedrobustness and verification concepts. Moreover, it is possible to catch missing information that could lead the system tothe answer, for any level of reliability. The captured missing information could lead us to find out classifiers’ shortagesof information and the cause of the wrong decision makings. In the final part, an approach was developed to find themost probable path in order to reduce the state-space in larger models.
References [1] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and DeviParikh. Vqa: Visual question answering. In
Proceedings of the IEEE international conference on computervision , pages 2425–2433, 2015.[2] Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. Visual7w: Grounded question answering in images.In
Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4995–5004, 2016.[3] Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and composi-tional question answering. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pages 6700–6709, 2019.[4] Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, and Gunhee Kim. Tgif-qa: Toward spatio-temporalreasoning in visual question answering. In
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , pages 2758–2766, 2017.[5] Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler.Movieqa: Understanding stories in movies through question-answering. In
Proceedings of the IEEE conferenceon computer vision and pattern recognition , pages 4631–4640, 2016.[6] Amir Zadeh, Michael Chan, Paul Pu Liang, Edmund Tong, and Louis-Philippe Morency. Social-iq: A questionanswering benchmark for artificial social intelligence. In
Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition , pages 8807–8817, 2019.[7] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time objectdetection. In
Proceedings of the IEEE conference on computer vision and pattern recognition , pages 779–788,2016.[8] Amirhoshang Hoseinpour Dehkordi, Majid Alizadeh, Ebrahim Ardeshir-Larijani, and Ali Movaghar. MASKS:A Multi-Artificial Neural Networks System’s verification approach. 7 2020.[9] Patrick H Winston. Artificial intelligence 3rd edition.
Addison-Wesley, Reading, MA , 34:167–339, 1992.[10] Stuart Russell and Peter Norvig. Artificial intelligence: a modern approach. 2002.[11] Nils J Nilsson.
The quest for artificial intelligence . Cambridge University Press, 2009.[12] Robert P Goldman and Eugene Charniak. Probabilistic text understanding.
Statistics and Computing , 2(2):105–114, 1992.[13] Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick.Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition , pages 2901–2910, 2017.[14] Chen Sun and Ram Nevatia. Active: Activity concept transitions in video event classification. In
Proceedings ofthe IEEE International Conference on Computer Vision , pages 913–920, 2013.[15] Kevin Tang, Li Fei-Fei, and Daphne Koller. Learning latent temporal structure for complex event detection. In , pages 1250–1257. IEEE, 2012.[16] Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B Tenenbaum.Clevrer: Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442 , 2019.[17] Jiyang Gao, Chen Sun, Zhenheng Yang, and Ram Nevatia. Tall: Temporal activity localization via languagequery. In
Proceedings of the IEEE international conference on computer vision , pages 5267–5275, 2017.10
PREPRINT - S
EPTEMBER
10, 2020[18] Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. Localizingmoments in video with natural language. In
Proceedings of the IEEE international conference on computervision , pages 5803–5812, 2017.[19] Lisa Bauer, Yicheng Wang, and Mohit Bansal. Commonsense for generative multi-hop question answeringtasks. In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages4220–4230, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.[20] Kaixin Ma, Jonathan Francis, Quanyang Lu, Eric Nyberg, and Alessandro Oltramari. Towards generalizableneuro-symbolic systems for commonsense question answering. arXiv preprint arXiv:1910.14087 , 2019.[21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016.[22] Brendan Fong, Alberto Speranzon, and David I Spivak. Temporal landscapes: A graphical temporal logic forreasoning. arXiv preprint arXiv:1904.01081 , 2019.[23] Daniel de Leng and Fredrik Heintz. Approximate stream reasoning with metric temporal logic under uncertainty.In
Proceedings of the AAAI Conference on Artificial Intelligence , volume 33, pages 2760–2767, 2019.[24] Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety verification of deep neural networks. InRupak Majumdar and Viktor Kuncak, editors,
Computer Aided Verification - 29th International Conference, CAV2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I , volume 10426 of
Lecture Notes in ComputerScience , pages 3–29. Springer, 2017.[25] Rob Gerth, Doron Peled, Moshe Y Vardi, and Pierre Wolper. Simple on-the-fly automatic verification of lineartemporal logic. In
International Conference on Protocol Specification, Testing and Verification , pages 3–18.Springer, 1995.[26] Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, and Luo Si. Structbert: Incorpo-rating language structures into pre-training for deep language understanding. arXiv preprint arXiv:1908.04577 ,2019.[27] Xin Rong. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 , 2014.[28] Akash Hossain and François Laroussinie. From quantified ctl to qbf. arXiv preprint arXiv:1906.10005 , 2019.[29] Marta Kwiatkowska, Alessio Lomuscio, and Hongyang Qu. Parallel model checking for temporal epistemic logic.In
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence , pages543–548, 2010.[30] Hadas Kress-Gazit, Georgios E Fainekos, and George J Pappas. Translating structured english to robot con-trollers.
Advanced Robotics , 22(12):1343–1359, 2008.[31] Juraj Dzifcak, Matthias Scheutz, Chitta Baral, and Paul Schermerhorn. What to do and how to do it: Translatingnatural language directives into temporal and dynamic logic representation for goal management and actionexecution. In , pages 4163–4168. IEEE, 2009.[32] Rani Nelken and Nissim Francez. Automatic translation of natural language system specifications into temporallogic. In
International Conference on Computer Aided Verification , pages 360–371. Springer, 1996.[33] Cameron Finucane, Gangyuan Jing, and Hadas Kress-Gazit. Ltlmop: Experimenting with language, temporallogic and robot control. In , pages1988–1993. IEEE, 2010.[34] Jawadul H Bappy, Sujoy Paul, Ertem Tuncel, and Amit K Roy-Chowdhury. Exploiting typicality for selectinginformative and anomalous samples in videos.
IEEE Transactions on Image Processing , 28(10):5214–5226,2019.[35] Wei Yang, Wei Lu, and Vincent W Zheng. A simple regularization-based algorithm for learning cross-domainword embeddings. arXiv preprint arXiv:1902.00184 , 2019.[36] Atish Pawar and Vijay Mago. Calculating the similarity between words and sentences using a lexical databaseand corpus statistics. arXiv preprint arXiv:1802.05667 , 2018.[37] Jonas Mueller and Aditya Thyagarajan. Siamese recurrent architectures for learning sentence similarity. In thirtieth AAAI conference on artificial intelligence , 2016.11, 2016.11