[PDF] Formalizing Integration Patterns with Multimedia Data (Extended Version)

Abstract

The previous works on formalizing enterprise application integration (EAI) scenarios showed an emerging need for setting up formal foundations for integration patterns, the EAI building blocks, in order to facilitate the model-driven development and ensure its correctness. So far, the formalization requirements were focusing on more "conventional" integration scenarios, in which control-flow, transactional persistent data and time aspects were considered. However, none of these works took into consideration another arising EAI trend that covers social and multimedia computing. In this work we propose a Petri net-based formalism that addresses requirements arising from the multimedia domain. We also demonstrate realizations of one of the most frequently used multimedia patterns and discuss which implications our formal proposal may bring into the area of the multimedia EAI development.

Full PDF

FFormalizing Integration Patterns with MultimediaData (Extended Version)

Marco Montali, Andrey Rivkin

Free University of Bozen-Bolzano { lastname } @inf.unibz.it Daniel Ritter

SAP [email protected]

Abstract —The previous works on formalizing enterprise ap-plication integration (EAI) scenarios showed an emerging needfor setting up formal foundations for integration patterns, theEAI building blocks, in order to facilitate the model-drivendevelopment and ensure its correctness. So far, the formalizationrequirements were focusing on more “conventional” integrationscenarios, in which control-ﬂow, transactional persistent data andtime aspects were considered. However, none of these works tookinto consideration another arising EAI trend that covers socialand multimedia computing. In this work we propose a Petri net-based formalism that addresses requirements arising from themultimedia domain. We also demonstrate realizations of one ofthe most frequently used multimedia patterns and discuss whichimplications our formal proposal may bring into the area of themultimedia EAI development.

Index Terms —high-level Petri nets, enterprise integration pat-terns, multimedia data

I. I

NTRODUCTION

Recent business and socio-technical trends start relying onsmart applications with advanced analysis techniques, the IoT,business and social networks [1], [19], [26]. This entails theneed to employ enterprise application integration (EAI) forprocessing unstructured multimedia and semantic data, withconcrete applications like smart logistics, disease detection inagriculture and health-care, social sentiment analysis. The lat-ter has been recently studied in the context of multimedia EAIin [19], [20]. More generally, the need for handling multimodeland knowledge-enriched data (incl. text and multimedia data)was also identiﬁed in the related data management [1] andevent-based processing domains (considering audio, video andsocial events) [25].While multimedia integration solutions become more rel-evant and complex, solid formal foundations are crucial inorder to ensure the behavioral correctness of multimedia EAIsolutions (cf. [19]). Such formal foundations had been givenby formalizing the execution semantics of integration patterns[12], [19] – the building blocks of EAI solutions – usingcolored Petri nets (CPNs) [10] and (timed) DB-nets [21], [22].Still, results of these works do not apply to multimedia data,whereas the recent survey [19] identiﬁes a lack of a suitableformalism for multimedia integration patterns.

Example 1:

Fig. 1 shows an excerpt from a social me-dia sentiment harvesting application (cf. https://tinyurl.com/yautcagl), in which images (and texts) are either collectedfrom Human Data Intelligence providers or directly from

Fig. 1. SAP Social Intelligence – image sentiments (excerpt) social media sources like Facebook, Twitter, or Instagram.To guide the search, a social intelligence system (e. g., ERP,CRM) provides lists of topics and keywords of interest aswell as time- or item-based metadata like a sinceId , denotingthe earliest feed of interest. Then, using textual Splitter andContent Enricher patterns, distinct queries are separated intomultiple request messages with the sinceId as header (H). Theresulting media feed entries contain images in the messagebody (B) that are processed by multiple subsequent steps. First,images without humans or products are ﬁltered out using animage Message Filter (i. e., no sentiment about a product).Then an image Content Enricher marks the features, alongwhich the relevant parts in the images are split into separatemessages by an image Splitter. Finally, an image Enricherdetermines the emotional state of the human (towards theproduct) and adds the information to the image message, whilepreserving the image. The images with marked and determinedsentiment as well as the association to the original topic arereturned to the social intelligence system. (cid:4) a r X i v : . [ c s . A I] S e p n the absence of a formal representation of integrationprocesses like the one in Ex. 1, questions like “what doesthe process do?”, “is it functionally correct?” and “howcan it be improved?” cannot be answered. Consequently,reasoning about integration patterns with multimedia data(or even combined with textual data processing) is currentlynot possible, but desirable. To answer these questions, thiswork combines the streams of previous research on EAI withmultimedia data [20] and formal representations of integrationpatterns on textual data [10], [21], [22] towards a novel formalrepresentation of multimedia EAI solutions, which, apart frombeing deﬁned using rigorous mathematical toolbox, shouldalso allow to formally represent multimedia data in integrationpatterns and allow for further theoretical development alongthe line of formal analysis. To this end, we build uponprevious works on the EAI formalization using CPNs andDB-nets, and propose a new Petri net-based formalism called multimedia nets (MM-nets for short). It can be essentiallyseen as marriage between CPNs and a multimedia storagewhose conceptual representation is tuned to address variousrequirements speciﬁc to multimedia data management in theEAI context (e. g., representation of multimedia messages,multimedia operations).In summary, the main contributions of this work along itsoutline are threefold. (1) First of all, we analyze multimediadata integration patterns regarding their requirements for deﬁn-ing a suitable formalism in Sect. II. (2)

Then, in Sect. III westudy formal syntax and semantics of the MM-nets and discusscertain design decisions behind the conceptual representationof multimedia data-related parts of the formalism. To thebest of our knowledge, this is the ﬁrst attempt to propose aformalism that would account for semantic knowledge and theway it is manipulated along a process execution. (3)

Finally,we give semantically correct realizations to one of the mostfrequently used multimedia integration patterns in Sect. IV.In Sect. V we discuss related work and conclude by brieﬂydiscussing further open research challenges in Sect. VI.II. B

ACKGROUND AND R EQUIREMENTS A NALYSIS

In this section we brieﬂy summarize multimedia integrationpatterns from which we derive requirements for a suitableformalization that we compare to the closest known relatedwork on formalisms for textual integration patterns usingCPNs [10] and (timed) DB-nets [21], [22].

A. Multimedia Integration Patterns

In previous work [20], we identiﬁed several integrationpatterns from the pattern catalogs [12], [19], [23] that are es-pecially relevant for multimedia data (cf. Tab. I). In addition tothe pattern name and the corresponding multimedia operation,the (semantic) conﬁguration arguments relevant for modelingsuch patterns are added. While, in general, the tasks of themultimedia patterns are similar to those working with textualdata, they differ in terms of the message representation aswell as performed operations and required storage (cf. [20]).A multimedia message consists of a body and an optional

TABLE II

NTEGRATION PATTERN M ULTIMEDIA A SPECTS , ALL INFORMATIONAPART FROM DB TAKEN FROM [20] (

LOGICAL – L OG , PHYSICAL – P HY , RE - CALCULATED – RECAL ., DB – PERSIST ; - : YES , (cid:7) : NO ) PatternName MultimediaOperation Arguments Phy Log DB

ChannelAdapter format con-version format indicator write create (cid:7)

Splitter ﬁxed grid,object-based grid: horizontal, verti-cal cuts; object create recal./write (cid:7)

Router,Filter select object object - read (cid:7)

Aggregator ﬁxed grid,object-based grid: rows, columns,heights, width create recal./write - Translator,ContentFilter coloring color (scheme) write recal./write (cid:7)

ContentEnricher add shape,OCR text object, shape+color,text write recal./write - FeatureDetector segmentation,matching object classiﬁer read create (cid:7)

ImageResizer scale image size: height, width write write (cid:7)

IdempotentReceiver detector,similarity object for comparison - read - MessageValidator detector validation criteria - read (cid:7) set of attachments, and both of them “physically” containmultimedia data (e. g., image). In addition to a set of key-value header entries denoting metadata concerning the dataexchange (e. g., HTTP headers), there is a set of properties thatcarries the semantics of the multimedia data (e. g., human withemotion, product) in the message body (or attachments). In[20] it is assumed that a multimedia message is transient (i. e.,processed in a pipes-and-ﬁlter style), and that all operationsare executed directly on media objects and their metadata con-tained in the message. Moreover, for representing multimediamessages, [20] adapts a concept from the multimedia databasedomain (e. g., cf. [5]), which separates the logical and physical representations to isolate the runtime from modeling, as it isabstractly reﬂected in the multimedia message model in Fig. 2.The physical representation accounts for the actual multi-media data, and thus operations on the physical representationliterally read (i. e., read ), create new (i. e., create ), or changeexisting multimedia data (i. e., write ) like cutting parts of orresizing an image. When the physical representation is readand interpreted, semantic information is extracted (e. g., adetected human emotion), together with additional informationlike coordinates of the detected object ( coord. ), its color andthe conﬁdence of the detection (cf.

Conf. ; e. g., type=“human” with

Conf.=0.85 ). The detection is done by a Feature Detectorpattern (cf. Tab. I) that has a set of ML-trained classiﬁers foreach expected feature in multimedia data. During the detec-tion, the distinct features are identiﬁed using the classiﬁersand the corresponding logical representation gets created. Incontrast to [5], where a relational multimedia model is used,the semantic information is then represented logically as partof the domain object model with references to the physicalrepresentation. This could be represented using, for example,the RDF standard [7] (which we rely on in a formalism pre-sented in Sect. III-A). For example, Fig. 2 denotes an abstractview of the domain object model by visually representing thesemantic concepts as

Type (e. g., virtual human), with sub-types

SType (e. g., emotion). Note that such as e. g., XSD or ig. 2. Conceptual Multimedia Message Model (from [20])

WSDL, in which business domain objects are encoded (e. g.,business partner, customer, employee), are not sufﬁcient (cf.[20]) as they are normally used for representing textual domainmodels.During the modeling of a process, the user works close tothe logical representation by implicitly using operations likeread/query (i. e., read ), change (i. e., write ), newly create (i. e., create ), and adapt changes without new detection or creation(i. e., recal. ). The aforementioned detector can be also usedfor cases when the physical representation has changed andthe corresponding logical part is invalidated, thus requiring are-detection ( write ). However, for efﬁciency reasons, if theeffect of the physical operation on the logical representa-tion is known, then only a recalculation of the logical partcan be used ( recal. ). As such, the logical representationdenotes a canonical data model based on the domain modelor message schema of the multimedia messages as well asoperations on them. Tab. I shows the physical (

Phy ) andlogical (

Log ) operations required by the different patterns aswell as database information ( DB ) indicating whether a patternrequires persistent storage for its operation. Example 2:

The image Splitter uses an object-based splitting(cf. Tab. I), where the object is a human face. During theprocessing, new physical multimedia objects (e. g., images ofhuman faces in the original image) are created (

Phy :create)and the logical representation is either created from the scratch(

Log :create) or attempted to be recalculated (

Log :recal.) ac-cording to the knowledge about the split. Notice that this doesnot require a persistent storage of multimedia messages. (cid:4)

B. Formalization Requirements

The formalization requirements of multimedia integrationpatterns are derived from the patterns in Tab. I. The baserequirements (also found for integration patterns on textualdata [21]) are necessary for representing the control ﬂow thatmessages go through in the integration process (i. e.,

REQ-0 “control ﬂow (pipes and ﬁlter)” ). The next requirementsconcern the processing of multimedia data. As discussedbefore, the data processing has two different aspects, dealingwith the representation of multimedia messages, and physicaland logical operations on the messages.The representation of multimedia messages requires supportfor multimedia and semantic data, which is not providedby the CPN [10] and (timed) DB-net [21], [22] approaches (i. e.,

REQ-1(a) “Multimedia message representation)” ).Physical multimedia operations on the data require capabilitieslike marking an image with some geometrical shape for theContent Enricher (i. e.,

REQ-1(b) “Multimedia data oper-ations” ), whereas the logical representation requires keepingthe logical part “up-to-date” (i. e., it does not describe fea-tures of a physical object that are not there). This not onlyallows to efﬁciently process the data stored on the logicalpart, e. g., by using SPARQL queries (cf. [20]), but also tosupport the modeling, during which semantic operations on themultimedia data can be speciﬁed (i. e.,

REQ-1(c) “Semantic/ metadata operations” ). Another data processing aspectconcerns the persistent storage for patterns like the Aggregatorand Idempotent Receiver, which require the storage for theiroperations (i. e.,

REQ-1(d) “Persistently store multimediadata, semantic data / metadata” ). In addition to the func-tional requirements, it is important to provide a suitableformal representation of such multimedia integration patternsso as to facilitate their correct representation and furtherdevelopment (i. e.,

REQ-2 “Formal rigorous semantics” ),as done in model-driven development [4]. Notice that furtherrequirements from [21] like time, transaction and exceptionhandling are out-of-scope, since they are not directly relatedto multimedia data. Tab. II summarizes the formalizationrequirements that we consider in this work by setting thecoverage of two approaches based on colored Petri nets [10]and DB-nets [16], [21], which, to date, are the only onesthat have been used for formalising integration patterns. WhileCPNs provide a solid foundation for control (cf. REQ-0) anda simple data ﬂow representation, DB-nets extend the lattertowards the support of persistent data with CRUD operationsfor working with external, transactional databases. However,none of them supports multimedia data or semantic/metadataoperations (REQ-1(b)–(c)). CPNs do not support the modelingof persistent storage, whereas DB-nets do not allow for aconceptually correct representation of multimedia data as theycannot support two different storages (one for logical and theother for physical data) that would also need to be manageddifferently. As long as no multimedia or semantic data aspectsare concerned, DB-nets can store that data (REQ-1(d)). Thewell-deﬁned semantics of CPNs and DB-nets allow to conductvarious types of model-based analysis, ranging from model-based testing via simulation to complex veriﬁcation usingvariants of temporal logics.III. M

ULTIMEDIA NETS

In this section we present the formalism of MM-nets thatbuilds upon CPNs and takes inspiration from the multi-layered representation adopted within the DB-net approachby subsequently deﬁning multimedia data and semantic op-erations as well as multimedia storage for the data-relatedrequirements (cf. REQ-1(a)–(d) from Tab. II). Conceptually,MM-nets are structured as follows: (i) a multimedia storage stores multimedia data together with their metadata; (ii) a control layer employs a variant of CPNs to capture the control-ﬂow dimension of the modeled process; (iii) a data logic layer ABLE IIF

ORMALIZATION R EQUIREMENTS ( COVERED : - , PARTIALLY : ( - ), NOT : (cid:7) ) ID Requirement CPN (timed) db-netREQ-0 Control ﬂow (pipes andﬁlter) - -

REQ-1 (a) Multimedia message rep-resentation (cid:7) (cid:7) (b) Multimedia data opera-tions (cid:7) (cid:7) (c) Semantic / metadata oper-ations (cid:7) (cid:7) (d) Persistently store multi-media data, semantic data/ metadata (cid:7) ( - )REQ-2 Formal rigorous semantics - - embodies a communication interface between the multimediaand control layers. Using the data logic, the control layercan access the underlying multimedia storage (and tune itsown behavior depending on the obtained answer) as well asupdate it with data carried by the tokens and additional dataobtained from the external world. In what follows, we studyevery layer in detail and provide a formal deﬁnition of anMM-net. We also discuss how, in spite of certain conceptualand operational differences, DB-nets can be related to MM-nets. This observation provides insights on a possibility ofadopting formal analysis techniques studied for DB-nets forthe formalism of MM-nets. A. Multimedia storage

A data type is D = (cid:104) ∆ D , Γ D , Φ D , Σ D (cid:105) , where ∆ D is avalue domain, Γ D and Φ D are ﬁnite sets of predicate andfunction symbols deﬁned on top of elements of ∆ D , Σ D isthe signature interpretation, i. e., a function associating eachpredicate symbol S (resp., function symbol f ) of arity n ,denoted as S/n (resp., f /n ), to an n -ary relation Σ( S ) ⊆ ∆ n D (resp., to an n -ary function Σ( f ) : ∆ D × . . . ∆ D n → ∆ D ,where each D i is some type, possibly different from D ).For the sake of brevity, hereinafter we omit the signatureinterpretation in the data type deﬁnitions.Examples of data types are: str = (cid:104) S , { = s } , ∅(cid:105) – stringswith the equality predicate; int = (cid:104) Z , { = int , < int } , { succ : Z → Z }(cid:105) – integers with the usual comparison operators, aswell as the successor function; jpg = (cid:104) IMG , ∅ , { sub : IMG × IMG → IMG }(cid:105) – images in JPG format with a binaryimage subtraction function. We use D to denote a type domain,that is, a ﬁnite set of data types, and write (cid:3) D = (cid:83) D∈ D (cid:3) D ,for (cid:3) ∈ { ∆ , Γ , Φ } . Also, for ease of presentation, we singleout a domain of multimedia (object) types D MO , s.t. D MO ∩ D = ∅ , and ﬁx a string-based type oid = (cid:104) S , { = oid } , ∅(cid:105) ∈ D for specifying proper addresses of objects. Functions from Φ D MO provide the basis for deﬁning operations discussed inREQ-1(b)–(c). Finally, we shall use a function type , deﬁnedon D ∪ D MO , to return a data type of a variable or a value.In this work we make a design decision for modelingmultimedia data in which one focuses on the object metadataand treats them as the “ﬁrst class citizen” (e. g., similar to[5]), assuming that the actual (multimedia) objects are kept in some storage and can be accessed/manipulated only byreferences. In this case one should distinguish two differenttypes of object manipulations. One type focuses on the way theobjects are accessed and viewed/manipulated (e.g., accessingand resizing an image stored on some server), whereas theother considers auxiliary information about objects and thusallows for treating them in a more reﬁned way. To accountfor the ﬁrst type, we introduce object storage (or database) O db that stores multimedia objects of various types. We shallnot go into technical aspects of such database, but instead justassume that it provides functionality for adding and deletingmultimedia data, and that every object can be accessed byusing its proper addresses. W.l.o.g., we formally treat O db as a set of pairs ( a, mo ) , where mo ∈ O db is a multimediaobject and a is its address of type oid , and assume that allthe addresses are unique and two distinct objects can never bereferenced by the same address . For convenience, we introducetwo functions: addr : D MO → ∆ oid that, given a multimediaobject, returns its address, and src : ∆ oid → D MO that, givenan address, returns an object that this address is pointing at.The metadata of all the objects from O db together with theiraddresses are kept in metadata storage M db that is representedas an RDF graph – a set of statements ( s, p, o ) , with s beinga subject, p being a predicate and o being an object. Eachstatement triple is an atomic construct. Its subject describes aninformation resource, while its predicate represents a statementproperty referenced by an internationalized resource identiﬁer(IRI) whose value is the statement object . Note that, while s , p and o carry values of IRIs, s and o can also be RDFliterals (for more details see [7]). Notably, IRIs as objectscan be used to represent more complex, tree-structured values.The usage of IRIs in RDF statements is crucial as it allowsfor the unambiguous identiﬁcation of information resources.As opposed to [7], we do not use blank nodes and thusconsider only ground RDF graphs. In what follows, we shalluse ∆ L to denote an inﬁnite set of RDF literals and ∆ I todenote an inﬁnite set of IRIs, and we may collectively referto both of them as RDF terms . Here, L = { ∆ L , Γ L , Φ L } and I = { ∆ L , Γ I , Φ I } respectively denote datatypes of RDFliterals and IRIs, where L , I ∈ D , and Γ and Φ are potentiallynonempty sets of predicate and function symbols that weintentionally leave unspeciﬁed as their content depends on aconcrete scenario (or, more speciﬁcally, on a used databasemanagement system). Lastly, given the complexity of the data-type management in RDF, for ease of presentation we employa type casting function :: that, given x and a target type t ,returns a value in x that is cast to t . W.l.o.g., we assume theextension of this function on variables.To query the multimedia storage we adopt SPARQL – thestandard W3C pattern-matching language for querying RDFgraphs [11]. There are plenty of formal ways to deﬁne thesyntax of SPARQL queries as well as the semantics of patternevaluation. In this paper, instead, we only provide intuitionsnecessary for understanding how metadata are accessed andhow SPARQL query answers can be manipulated in thecontext of the studied formalism of MM-nets. Let V RDF e an inﬁnite set { ? x, ? y, . . . } of RDF variables, where foreach ? x ∈ V RDF , type (? x ) = I ∪ L . The basic buildingblock of SPARQL queries is a triple pattern – a tuple from (∆ L ∪ ∆ I ∪V RDF ) × (∆ L ∪V RDF ) × (∆ L ∪ ∆ I ∪V RDF ) . Finitesets of such tuples form basic graph patterns (BGPs) [11].More complex graph patterns are inductively constructed fromBGPs using various operations (e.g., OPT, JOIN, UNION)that are applicable to graph patterns and built-in conditions.The semantics of graph patterns is deﬁned in terms of partialfunctions θ : V RDF → ∆ L ∪ ∆ I called mappings . Given aBGP P , θ ( P ) denotes the BGP obtained by applying θ toall variables in P . We use a function Vars ( P ) to denote theset of all variables in P . Both θ and Vars can be easilyextended to account for tuples of variables. Given an RDFgraph M db , the evaluation of a graph pattern P over M db isspeciﬁed as the set (cid:74) P (cid:75) M db of mappings inductively deﬁnedusing SPARQL operations and the BGP evaluation as the basecase [13], [17]. Notably, for the pattern evaluation we usethe simple entailment semantics, in which, in the base case,for every mapping θ ∈ (cid:74) P (cid:75) M db , it holds that θ ( P ) ⊆ M db .SPARQL queries use results of the pattern evaluation toform result sets and come in four different forms: SELECT , ASK , CONSTRUCT and

DESCRIBE [11]. However, in thiswork we are only interested in the ﬁrst two. A

SELECT query can be abstractly deﬁned as ( (cid:126)w, P ) , where P is agraph pattern and (cid:126)w = (cid:104) w , . . . , w k (cid:105) is a vector of answervariables, such that { w , . . . , w k } ⊆ Vars ( P ) . Such query isthen evaluated over a graph M db by applying mappings from θ ∈ (cid:74) P (cid:75) M db to the variables in (cid:126)w . We shall denote the resultingset (by default, SPARQL uses the bag-based semantics forthe query evaluation [11], but we opt for the set-based one)as ans ( M db , (cid:126)w, P ) . An ASK query returns a boolean valueindicating whether a pattern matches the given RDF graph andcan be seen as a special case of a

SELECT with an emptyset of answer variables. In what follows, we use Q to deﬁnethe set of all such SPARQL queries. Example 3:

For brevity, assume an RDF vocabulary mmdb that standardizes all the metadata attributes as well as relationsbetween them relevant to the scenario in Ex. 1. To extract anumber of segments that contain human faces in every imagein the multimedia storage together with that image identiﬁer,we can use query numSeg := ( (cid:104) ? id, ? c (cid:105) , P ) , where P = SELECT ? id, ? c WHERE { ? id mmdb : faceCount ? c } .For accessing information about the segments withhuman faces, query segs := ( (cid:104) ? id, ? seg (cid:105) , P ) isused. The pattern it employs is deﬁned as P = SELECT ? id, ? seg WHERE { ? id mmdb : faceSegment ? seg } .Here, every segment object stores an alphanumericstring carrying two pairs of coordinates (e. g., ”( , ) .. ( , )” ) to represent rectangle coordinateswithin which the segment is located. (cid:4) Since the multimedia storage essentially has no intensionalpart (i.e., there are no schemas either for object storageor metadata storage), we only deﬁne its extensional part.Formally, a D MO -typed multimedia storage instance is a pair ( M db , O db ) , where a M db is a metadata storage instance and O db is a multimedia object storage instance. Here, eachinstance should be understood as a set of address-object pairsand object metadata observed at the given point in time.Notice that this representation of metadata and multimediadata essentially fully meets REQ-1(a) and REQ-1(d). B. Data logic layer

Here we discuss how to manipulate the multimedia storageand show how to update metadata of objects stored in the mul-timedia storage (resp., object storage) by adding and deletingpossibly multiple triples (resp., multimedia objects) at once.Such updates are realized by means of parametrized actions,each of which consists of a set of templates – expressions that,once instantiated, assert which RDF triples (resp. multimediaobjects) will be deleted from and added to the database.We intend to provide two main types of operations forupdating the multimedia storage. The ﬁrst type works directlywith the metadata storage and allows to add and delete aﬁxed number of triples. When using this type of updates, themodeler, however, should be aware that any changes of objectmetadata should faithfully reﬂect the actual state of the objectitself. The second type allows to add, delete and/or updatemultimedia objects themselves. Sometimes such operationsshould be bundled with those of the ﬁrst type so as to ensurethe integrity of the object-metadata indivisibility principle. Weﬁx the inﬁnite set of typed variables V D , where for each x ∈ V D , type ( x ) = D and D ∈ D . Note that V RDF ⊂ V D . Deﬁnition 1:

A parameterized action α is a triple ( (cid:126)p, F D , F A ) , where:1) (cid:126)p is a tuple of action formal parameters – distinct vari-ables from V D ;2) F D = ( mm − , mo − ) and F A = ( mm + , mo + ) are two pairssuch that: • mm − and mm + are ﬁnite sets of triples ( s, p, o ) ∈ (∆ L ∪ ∆ I ∪ Y ) × (∆ L ∪ Y ) × (∆ L ∪ ∆ I ∪ Y ) to be deletedfrom and added to the metadata storage, where Y = Vars ( (cid:126)p ) ∩ V RDF ; • mo − is a ﬁnite set of addresses a of objects to bedeleted from the object storage, where a is either aconstant from ∆ oid or a variable from Vars ( (cid:126)p ) with type ( a ) = oid ; • mo + is a ﬁnite set of expressions a (cid:46) f ( x , . . . , x n ) generating objects to be added to the object storage,where a is either a constant from ∆ oid or a variableof type oid from Vars ( (cid:126)p ) , f ∈ Φ D MO with co-domain coDom ( f ) ⊂ ∆ D MO , and every x i is either a variablefrom Vars ( (cid:126)p ) or a constant from D ∪ D MO . (cid:3) Here, if an object with address a is already present in theobject storage, a (cid:46) f ( x , . . . , x n ) updates this object with theresult of the function call. If, instead, there is no object withsuch an address, then the same expression adds a pair ( a, r ) to the object storage, where r is a result of the function call.To access different components of α , we make use of thefollowing notation: α · params = (cid:126)p , α · del = F D , α · add = F A . Given a substitution σ : Vars ( α · params ) → ∆ L ∪ ∆ I ∪ ∆ oid for α · params , an action instance ασ is a ground actionesulting by substituting parameters in α with correspondingvalues in σ . An application of ασ to a multimedia storageinstance I = ( M db , O db ) , denoted as apply ( ασ, I ) , returnsa new instance of the multimedia storage I (cid:48) = ( M (cid:48) db , O (cid:48) db ) ,such that: • M (cid:48) db = ( M db \ mm − ασ ) ∪ mm + ασ , where mm − ασ = (cid:83) ( s,p,o ) ∈ mm − σ ( s, p, o ) and mm + ασ = (cid:83) ( s,p,o ) ∈ mm + σ ( s, p, o ) ; • O (cid:48) db = ( O db \ ( mo − ασ ∪ mo + ασ )) ∪ mo + ασ , where, as-suming for simplicity that X = x , . . . , x n , mo − ασ = (cid:83) a ∈ mo − ( a, src ( σ ( a ))) , mo + ασ = (cid:83) a(cid:46)f ( X ) ∈ mo + , ∃ o. ( a,o ) ∈ O db ( a, o ) and mo + ασ = (cid:83) a(cid:46)f ( X ) ∈ mo + ( a, f ( σ ( x ) , . . . , σ ( x n ))) .As in DB-nets [16], in order to avoid situations in which thesame fact is asserted to be added and deleted, we prioritizeadditions over deletions. The overall representation of actionsand their semantics together with the ability to use type-speciﬁc functions allow to account for REQ-1(b)–(c). Example 4:

The Splitter (cf. Ex. 1) employs an actioncalled

GET I MAGE that extracts sub-image o (cid:48) from image o with address a , based on the information about segment seg that identiﬁes it, and generates relevant metadata about o (cid:48) (likename n , new address a (cid:48) and new image identiﬁer id ) that areadded to the metadata storage. This action uses ﬁve formalinput parameters GET I MAGE · params = (cid:104) a, seg, a (cid:48) , id, n (cid:105) and performs the following updates. It only deletesthe metadata from the original image that arerelated to the selected segment: GET I MAGE · del =( { ( id :: L , mmdb : faceSegment , seg :: L ) } , ∅ ) . Then, using GET I MAGE · add = ( mm + , mo + ) , it adds all necessary metadataentries mm + := { ( id :: L , mmdb : address , a (cid:48) :: L )) , ( id :: L , mmdb : format , . jpg :: L )) , ( id :: L , mmdb : name , n :: L ) } and adds to the object storage an extracted imagewith mo + = { a (cid:46) extractIMG ( src ( a ) , seg ) } . Here, extractIMG : IMG × D → IMG takes as input an imageand a rectangular selection ( rect is a type deﬁned on top ofan alphanumeric set of rectangle coordinates D representingsegments), and returns a subimage deﬁned by the latter.To update image ( a, o ) by cutting another image withaddress a (cid:48) from it, we deﬁne action CUT F ROM

IMG, s.t.,

CUT F ROM

IMG · add = ( ∅ , { a (cid:46) sub ( src ( a ) , src ( a (cid:48) )) } ) .Knowing that image ( a, o ) is already in the storage, we usehere the action “updating” semantics. (cid:3) Notice that Deﬁnition 1 allows to specify actions whoseexecution may be still inconsistent. For example, one maydelete an object without removing its metadata, which isintuitively not an expected type of behavior.

C. Control layer

Before deﬁning the central notion of MM-net, we ﬁx somestandard notions related to multisets . For some set A , A ⊕ := { m : A → N } is the set of multisets over A . Given a multiset S ∈ A ⊕ , an element a ∈ A and n ∈ N , S ( a ) ∈ N denotes the Note that for RDF triples, a notion of substitution coincides with the oneof mapping with the only difference that the former is not partial. number of times a appears in S and we write a n ∈ S if S ( a ) = n . Given S , S ∈ A ⊕ , we deﬁne the following operations onmultisets: (i) S ⊆ S (resp., S ⊂ S ) if S ( a ) ≤ S ( a ) (resp., S ( a ) < S ( a ) ) for each a ∈ A ; (ii) S + S = { a n | a ∈ A and n = S ( a ) + S ( a ) } ; (iii) if S ⊆ S , S − S = { a n | a ∈ A and n = S ( a ) − S ( a ) } ; (iv) given a number k ∈ N , k · S = { a kn | a n ∈ S } ; (v) | m | = (cid:80) a ∈ A m ( a ) .A MM-net net assigns to each place a color type, which inturn corresponds to a data type or to a cartesian product ofmultiple data types from D . Inscriptions , represented as tuplesof variables from V RDF ∪ V D , constants from ∆ D ∪ ∆ L ∪ ∆ I and terms (constructed from functions from Φ D MO ∪ Φ D ,variables and constants, and denoted as T ), are used toreference contents of places. We denote by Ω A the set ofall possible inscriptions over a set A . Quite often, whenmanipulating various data objects, one would like to ensureprovision of fresh data values (for example, a generation ofglobally fresh object identiﬁers). To this end, we adopt thewell-known mechanism used in ν -Petri nets [24] and introducea countably inﬁnite set Υ D of D -typed fresh variables , wherefor every ν ∈ Υ D , we have that ∆ type ( ν ) is countablyinﬁnite (this provides an unlimited supply of fresh values).Hereinafter, we ﬁx a countably inﬁnite set of D -typed variable X D = V D (cid:93) Υ D as the disjoint union of “normal” variables V D and fresh variables Υ D . Let us also introduce a guard – a formula deﬁned as ϕ ::= S ( x , . . . , x m ) | ¬ ϕ | ϕ ∧ ϕ | (cid:62) ,where S ∈ Γ D (for some D ∈ D ) and x i is either a variableof type D , or a constant from ∆ D . We use G to denote a setof all possible guards. Notice that guards are not deﬁned onmultimedia objects. Deﬁnition 2: A D -typed MM-net N is a tuple ( D , P, T, F in , F out , color , query , guard , act ) , where: • P = P c ∪ P v is a ﬁnite set of places partitioned into controlplaces P c and view places P v (decorated as and canconnect to transitions only with read arcs); • T is a ﬁnite set of transitions, s.t. P ∩ T = ∅ ; • color : P → ℘ ( D ) is a place typing function; • query : P v → Q is a query assignment function, s.t.,for every p ∈ P v with query ( p ) = ( (cid:126)w, P ) , it holds that type ( (cid:126)w ) = color ( p ) ; • F in : P × T → Ω ⊕V D is an input ﬂow, s.t. type ( F in ( p, t )) = color ( p ) for every ( p, t ) ∈ P × T ; • guard : T → G is a partial guard assignment function,s.t., for every t ∈ T , Vars ( guard ( t )) ⊆ InVars ( t ) , where InVars ( t ) = ∪ p ∈ P Vars ( F in ( p, t )) ; • F out : T × P → Ω ⊕X D ∪ ∆ D ∪T is an output ﬂow, s.t. type ( F out ( t, p )) = color ( p ) for every ( t, p ) ∈ T × P ; • act : T → A is a partial action assignment function, where A is a ﬁnite set of actions. (cid:3) Note that the given deﬁnition does not restrict the usage ofobjects in the guard formulas to only those from D . In fact,one can even compare multimedia objects by using the src function. Inscriptions in the output ﬂow can inject possiblyfresh data via external variables that are not bound by any inputinscription and that are taken from OutVars ( t ) \ InVars ( t ) ,where OutVars ( t ) = ∪ p ∈ P Vars ( F out ( t, p )) and every vari- Segments L × L ch in str × oid Start [ id = id (cid:48) :: str ] str × oid × int Segments L × L GET I MAGE ( a,seg,ν a ,ν id ,n ) Split [ c > ∧ id = id (cid:48) :: str ] oid × str × oidstr × oid out Set str × oid CUT F ROM

IMG ( a,a (cid:48) ) Update ImageFinish1 [ c = ∧ empty ( s )] Finish2 [ c = ∧¬ empty ( s )] ch out str × oidSet str × oid Emit [ ¬ empty ( s )] (cid:104) id,a (cid:105)(cid:104) id,a,c :: int (cid:105) (cid:104) i d , a , c (cid:105) (cid:104) i d , a , c − (cid:105) (cid:104) ν a ,id,a (cid:105) ins ( s , ν i d , ν a ) (cid:104) s (cid:105) (cid:104) s (cid:105) (cid:104) s (cid:105) (cid:104) a (cid:48) ,id,a (cid:105) (cid:104) i d , a (cid:105) (cid:104) i d , a (cid:105) (cid:104) i d , a , c (cid:105) (cid:104) id,a,c (cid:105) (cid:104) id,a (cid:105)(cid:104) s (cid:105)(cid:104) s (cid:105) rem ( s, getL ( s )) getL ( s ) (cid:104) id (cid:48) ,c (cid:105) (cid:104) id (cid:48) ,seg (cid:105) Fig. 3. MM-net control layer of a Splitter. Guards are depicted in green,types are shown in bold next to corresponding places. able x can be either from Υ D or V D . Example 5:

A Splitter comprises a complex routing mechanism that,given a message, iteratively breaks it into smaller parts [12].In our scenario we aim at splitting a single image into smallerimages, shown in Fig. 3. The splitting is performed accordingto a simpliﬁed criterion: only images of human faces willbe extracted. All the information needed for extracting suchimages is supposed to be already in the metadata storage. Thelatter is the key assumption guaranteeing that the Splitter canalways identify needed elements and organize their processing.The net starts by extracting a number of segments con-taining human faces of all images in the multimedia storage(note that the information about the segments is supposed tobe already in the metadata storage). This is done using viewplace , s.t., query ( ) = numSeg , where numSeg is deﬁned in Ex. 3. By joining the input imageidentiﬁer id with the one in the view place, the net allowsto get a number of segments c for a concrete image. If nosub-images have been detected before, i.e., c = 0 , the netﬁnishes its work by consequently ﬁring transition Finish1 .Notice that

Finish1 ﬁres only if its guard has been satisﬁed,and that the inscription F out ( Finish1 , out ) contains a variable s that is bound to a set of pairs (identiﬁer, address) in place out . Initially, this out is supposed to contain an empty set.If the image contains such segments ( c (cid:54) = 0 ), the net entersa loop with c as the counter and repeats the following steps.First, it ﬁres Split that executes action

GET I MAGE assigned toit (cf. Ex. 4) that gets a sub-image based on the informationof the concrete segment taken from view place

Segments (s.t., query ( Segments ) = segs , where segs is deﬁned in Ex. 3) andadds it to the multimedia storage together with the relevantmetadata.

GET I MAGE instantiates is formal parameters withvariables that are bound to values coming from input places(like a and seg ) and that either simulate system input ( n isan unbounded variable simulating system input for an imagename) or fresh data injection ( ν a creates a globally newimage address, ν id generates a fresh image ID). Split also“remembers” the identiﬁer and address of the extracted image by adding them to the set of pairs in place out . Here, type

Set A (for A := A × . . . × A n and ∆ A := ∆ A × . . . × ∆ A n )is deﬁned over a set ℘ (∆ A ) , with predicate empty checkingwhether a set is empty or not, and three following functions: (i) ins adds an element to the set; (ii) getL returns the lastelement from the set; (iii) rem removes an element from theset. The net then remembers the address of the extracted imageand proceeds with ﬁring the Update Image transition thatexecutes action

CUT F ROM

IMG (cf. Ex. 4) that updates theimage with address a by removing from it the subimage withaddress a (cid:48) . The loop repeats until the counter has reached .Then the net ﬁres Finish2 after which it consequently emitsextracted pairs (using

Emit ) into the output place ch out . (cid:4) D. Execution semantics

In the nutshell, the execution semantics of MM-nets issimilar to the one of DB-nets [16]: it has to simultaneouslycapture the progression of both the multimedia storage andcontrol layer. To this end, at each point in time, a stateof a MM-net is represented using a so-called snapshot , thatconsists of multimedia storage instance I and marking m .The latter is formally deﬁned as function m : P → Ω ⊕ D , s.t. m ( p ) ∈ ∆ ⊕ color ( p ) and m ( v ) = ans ( M db , query ( v )) , for all p ∈ P and v ∈ P v . Note that the second condition in themarking deﬁnition guarantees that the marking of a view placecorresponds to the answers obtained by issuing its associatedquery over the underlying multimedia storage instance. In thefollowing, by writing a MM-net N in snapshot s = ( I , m ) ,we mean a marked net, with marking m , over a multimediastorage instance I = ( M db , O db ) .The ﬁring of transition t ∈ T in a snapshot is deﬁned w.r.t. aso-called binding for t deﬁned as σ : Vars ( t ) ∪ OutVars ( t ) → ∆ D that substitutes all variables in inscriptions on the arcsincident to t and, possibly, formal parameters of an actionsignature assigned to t with values from ∆ D . Deﬁnition 3:

A transition t ∈ T is enabled in a snapshot s = (cid:104)I , m (cid:105) , written as s [ t (cid:105) , if there exists a binding σ satisfyingthe following: (i) σ ( F in ( p, t )) ⊆ m ( p ) , for every p ∈ P ; (ii) σ ( guard ( t )) is true; (iii) σ ( x ) (cid:54)∈ Val ( s ) , for every x ∈ Υ D ∩ OutVars ( t ) . (cid:3) Essentially, a transition is enabled with a binding σ ifthe binding selects values carried by tokens from the inputplaces (which match inscriptions on the corresponding inputarcs), so that the data they carry make the guard attached tothe transition true and, moreover, assigns globally fresh andpairwise distinct (both in m and I ) values to variables from Υ D . Then, when a transition is enabled, it may ﬁre. Deﬁnition 4:

Let N be a MM-net in snapshot s = ( I , m ) ( I = ( M db , O db ) ), with t ∈ T enabled in s with some binding σ . Then, t may ﬁre producing new snapshot s (cid:48) = ( I (cid:48) , m (cid:48) ) ,s.t. m (cid:48) ( p ) = m ( p ) − σ ( F in ( p, t )) + σ ( F out ( t, p )) and I (cid:48) = apply ( act ( t ) σ, I ) . We denote this as s [ t (cid:105) s (cid:48) and assume thatthe deﬁnition is inductively extended to sequences τ ∈ T ∗ . (cid:3) Here,

Val ( s ) denotes the set of all constants occurring both in m and I . or net N in initial snapshot s , we use S ( N ) = { s |∃ τ ∈ T ∗ , s.t. | s [ t (cid:105) s } to denote the set of all snapshots of N reachable from its initial snapshot s .The execution semantics of a MM-net is deﬁned in termsof a possibly inﬁnite-state labeled transition system (LTS)accounting for all possible executions of the control layerstarting from an initial snapshot. States of this transitionsystems are MM-net snapshots, whereas transitions modelﬁrings of MM-net transitions under chosen bindings. Formally,given a MM-net N in snapshot s , the execution semantics of N is given by the LTS Λ N = ( S, s , → ) , where: • S is a possibly inﬁnite set of snapshots; • →⊆ S × T × S is a T -labelled transition relation betweenpairs of snapshots; • S and → are deﬁned by simultaneous induction as thesmallest sets satisfying the following conditions: (i) s ∈ S ; (ii) given s ∈ S , for every transition t ∈ T , binding σ andsnapshot s (cid:48) over N , if s [ t (cid:105) s (cid:48) , then s (cid:48) ∈ S and s t → s (cid:48) .With this we cover the last requirement from Tab. II. E. Connection to DB-nets

It is easy to see that MM-nets are Turing complete. How-ever, this formalism can still be potentially used for checkingformal properties of multimedia integration patterns and theircompositions. As this paper primarily focuses on developinga modeling formalism, we leave a more in-depth discussionof the formal analysis to the future work and show a keyconnection between MM-nets and their predecessor DB-netsthat, as it has been shown in [16] and [21], can be used formodel-based testing via simulation as well as veriﬁcation offormal properties such as reachability of a nonempty place.Formalisms of DB-nets and MM-nets are conceptually quitesimilar. The main difference lies in the type of persistentdata these formalisms manipulate and the data types they use(as we have stated before, DB-net data types do not supportfunctions). DB-nets allow for checking the reachability of anonempty place under restrictions limiting the data types thatone can use (namely, only strings and reals) and the “size”of information that can be simultaneously present in the netmarking and database instance [16]. The same result could bereconstructed for MM-nets with strings and reals (proviso thatone uses their deﬁnitions from [16]) as well as boundednessrestrictions imposed over the places and multimedia storageboundedness, and by encoding them into DB-nets. The RDFstorage used in MM-nets can be suitably represented in arelational database (e. g., [3] for more details), whereas theSPARQL queries can be translated to SQL as it is suggestedin [13]. For validating MM-nets, we can leverage the samemodular approach used for validating DB-nets in [21], [22].More speciﬁcally, one can use CPN Tools (http://cpntools.org/)for representing the control layer of MM-nets together withqueries assigned to view places and actions appearing ascode segments attached to transitions, and use its Access/CPNframework for deﬁning extensions that would allow to imple-ment the data manipulation logic running on common RDF /SPARQL frameworks like Apache Jena (jena.apache.org) and image processing capabilities like OpenCV (opencv.org), asused in [20].IV. M

ULTIMEDIA P ATTERN R EALIZATION

In this section we formalize multimedia integration patternsused in Fig. 1 as MM-nets, and thus demonstrate how tomodel multimedia Message Filter and Content Enricher. Therealization of the Splitter can be already found in Sect. III. InAppendix we also provide a realization of another importantmultimedia integration pattern, a Feature Detector, that is,however, not mandatory for our scenario. Some of the patternsare structurally similar to previous works on textual integrationpatterns (e. g., [10], [21]).

A. Multimedia Operations

We start by outlining various types of multimedia operationsused in this section. For simplicity, we consider only the jpg type and assume that all the functions that will beformally deﬁned further extend Φ jpg . Function countIMGs : IMG × S → N takes an image together with a feature patternand counts all sub-images in the given image that correspondto this pattern. If no sub-images have been detected, thefunction returns . To detect segments in images, we introducea function detectIMG : IMG × S → ℘ ( D ) that, given animage and a feature, returns a set of segments (of type Set rect )corresponding to image parts with the detected feature. Incertain scenarios, it is important to highlight detected objectswith a text and/or geometrical shapes. Function markIMG : IMG × D × S × S → IMG is the function that, given an image,segment coordinates, geometrical shape and color, draws thecolored shape around the speciﬁed segment in the image.

B. Message Filter

A multimedia Message Filter is supposed to check onincoming messages (carrying both multimedia objects andtheir metadata), ﬁltering out those that do not match a certaincriterion and routing the others to the output channel. ch in str × oid Images withTags L × L Accept [ tag :: str = ” human ” ∨ tag :: str = ” product ” ∧ id (cid:48) :: str = id ] Discard [ ¬ ( tag :: str = ” human ” ∨ tag :: str = ” product ”) ∧ id (cid:48) :: str = id ] ch out str × oid (cid:104) i d , a (cid:105) (cid:104) i d , a (cid:105) (cid:104) id, a (cid:105) (cid:104) i d (cid:48) , t a g (cid:105) (cid:104) i d (cid:48) , t a g (cid:105) Fig. 4. MM-net control layer of a Message Filter

In MM-nets, this pattern is realized by encoding the ﬁl-tering condition directly into the net using view places andSPARQL queries attached to them. Notice that this realizationimplicitly requires that the metadata already contain semanticinformation needed for checking the criterion (e. g., triplesspecifying how many humans/products are present on thepictures). The net in Fig. 4 starts by consuming a messagefrom input channel place ch in that carries an image identiﬁerand its address in the object storage. View place Images withTags is equipped with a query that extracts image identiﬁerstogether with tags of objects that they contain. The queryan be deﬁned as following: ( (cid:104) ? id, ? tag (cid:105) , P ) , where P = SELECT ? id, ? tag WHERE { ? id mmdb : containsObj ? tag } .The tags of interest are speciﬁed in transition guards thatrealize ﬁltering conditions. If the image with identiﬁer id satisﬁes the condition, i. e., the view place contains a pair inwhich the ﬁrst element matches the value carried by id andthe second element is either ” human ” or ” product ” , then thetoken with the input message is routed to the output channel byﬁring transition Accept . Otherwise, the message gets discardedwith transition

Discard . C. Content Enricher ch in str × str × oid T str × oid UPD I MAGE ( a,seg ) Enrich str × str Get Segment [ k (cid:48) :: str = k ∧ id (cid:48) :: str = id ] SegmentsByKey str × str × rect rect × str ch out str × oid (cid:104) k,id,a (cid:105) (cid:104) id,a (cid:105)(cid:104) k,id (cid:105) (cid:104) k,id (cid:105) (cid:104) seg :: rect ,id (cid:105)(cid:104) id,a (cid:105) (cid:104) seg,id (cid:105)(cid:104) id,a (cid:105)(cid:104) id (cid:48) ,k (cid:48) ,seg (cid:105) Fig. 5. MM-net control layer of a Content Enricher

A Content Enricher enriches the content of incoming mes-sages using external sources. In our case, it utilizes an imageidentiﬁer to access its metadata and extract information aboutdetected features, using which it then updates the image inthe object storage by adding to it extra visual components, asshown in Fig. 5.The net starts with a message that contains object iden-tiﬁer id , its address a and some feature key k that willbe used for acquiring data for the enrichment. Then thenet proceeds by splitting the message into two parts andusing its part with id and k to get information about asegment that is characterized by k in the metadata storage.To this end, we use view place SegmentsByKey that hasa query with the following graph pattern attached to it: P = SELECT ∗ WHERE { ? id mmdb : faceSegment ? s. ? id mmdb : prodSegment ? s. } . This pattern returns all triplesthat have mmdb : faceSegment and mmdb : prodSegment aspredicates. The net then allows to choose any segment seg that matches the image identiﬁer and the feature key. Finally,the enrichment step happens when transition Enrich getsﬁred and calls an action assigned to it. This action, called

UPD I MAGE , updates a (physical) image in the object storageby adding a red oval around some area in it deﬁned byselected segment seg . Formally,

UPD I MAGE · add = ( ∅ , { a (cid:46) markIMG ( src ( a ) , seg, ” oval ” , ” red ”) } ) . D. Discussion

We have demonstrated how some of the most important(multimedia) integration patterns can be formally representedin our formalism. Notice that, similarly to [10] and [21], [22],every pattern should be equipped with input and output chan-nels. This is needed to ensure their ﬂawless, message-basedcomposition: in case of two connected pattern formalizations,the output channel of the ﬁrst one should have the same typeas the input channel of the second. In the scenario described in Ex. 1 and depicted in Fig. 1, the whole process essentiallyrepresents a sequential composition of integration patterns andthus can be seamlessly implemented by following the order ofpatterns in Fig. 1 and by “fusing” output and input channelsof two neighboring MM-net pattern representations. Indeed,by mapping every task into its corresponding Petri net-basedrepresentation, one can easily build a model formalizing theentire SAP Social Intelligence scenario. Notice, however, that,strictly speaking, textual patterns in Fig. 1 would need tobe formalized by either using CPNs [10] or DB-nets [21].In the ﬁrst case, the control layer of MM-nets captures thewhole class of CPNs, resulting in their seamless adoptionwhen implementing multimedia EAI scenarios. In case ofDB-nets, however, one would need to study in more detailhow to implement the multi-model integration scenarios [20]that use both relational and multimedia databases. In ourcase, the textual Splitter and Content Enricher patterns do notrequire any database access and thus the whole scenario canbe implemented using MM-nets.Using the scenario in Ex. 1, we also identiﬁed a suitable wayto formally represent multimedia manipulating functions andsemantic operations using data types (cf. Sect. IV-A), SPARQLqueries (cf. Sect. III-A) and actions/queries in the data logiclayer (cf. Sect. III-B). Derived functions/queries/actions areproviding the full coverage of physical and logical operationsin Tab. I for studied patterns, and, moreover, can be seen aspattern-agnostic since they may be re-used in other image-based scenarios (formalized using MM-nets) as constructionprimitives, akin to pattern implementations.V. R

ELATED W ORK

Ritter et al. [19] showed that, for structured data, the onlyexisting formalization of integration patterns was studied byFahland et al. [10] using CPNs, and that was further extendedto cover a wider range of integration patterns with morereﬁned requirements in [21] using (timed) DB-nets. However,when considering multimedia integration patterns (cf. [20]),as brieﬂy introduced in Sect. II, these works cannot be useddirectly (cf. requirements in Tab. II), due to their lack ofmultimedia data operations (cf. REQ1(a)), semantic operations(cf. REQ-1(b)), and partially their storage (cf. REQ-1(c)). Tothe best of our knowledge, there have been no attempts toformalize multimedia integration patterns (cf. [19]).

Formalisms for integration patterns . Although the scenarioin Ex. 1 is captured in BPMN [23], this modeling languageis not suitable for our requirements (especially REQs-1(a–c), 2). Yet, BPMN diagrams can be formally representedusing Petri nets [9]. Petri nets offer a good trade-off betweenuser-friendly graphical modeling and a toolbox for formalanalysis of produced models. There are many data-awareextensions of Petri nets (e. g., [2], [8], [16], [18]) allowingto account for more complex, structured data. However, theycannot be readily used for representing multimedia EAI forreasons similar to those discussed above. Alternatively, it ispossible to study other modeling requirements in which themultimedia message is treated as the ﬁrst-class citizen, whilehe object storage is simply left out. Under this assumption,one could use the formalism of Petri nets with structureddata [2] (StDNs for short) for modeling multimedia EAI astokens in it carry XML documents that, in turn, could be usedfor representing multimedia metadata – the core concept of themultimedia message. The authors also delineate restrictionsrequired for the decidability of such properties as termination,coverability and boundedness. However, the formalism stilldoes not account for persistent data (violation of REQ-1(d) thatis crucial for some patterns) and would need to be extendedwith the support of functions.Mederly et al. [15] studied an approach for formalizingintegration patterns, in which messages are ﬁrst-order formulasand patterns are operations that add and delete messages, andthat uses AI planning for ﬁnding an integration process witha minimal number of components. While this approach sharesthe formalization objective, MM-nets apply to a broader setof objectives (e. g., formal analysis, simulation) and covermultimedia data, semantics and storage (cf. REQs-1(a–c)).

Multimedia data . The approach for storing and queryingmultimedia data employed in this work is similar to the one inthe OCAPI system [6], which was developed for the semanticintegration of image data using knowledge bases. Retrieval ofmultimedia information from (distributed) databases is coveredin [5]. The multimedia semantics are represented by semanticattributes based on extended generalized icons with a logicaland physical representation on a database. While our approachseparates these different representations as well, [5] targetsextended normal forms and functional dependencies betweendifferent attributes and does not deﬁne user interaction withthe multimedia semantics on a business application-relevantfeature level that could be used for message processing. Morerecently, [14] developed an image similarity query mechanismin the area of multimedia queries in multimedia databases.While no query syntax is provided, the approach could beused to formulate decisions based on image similarity.VI. C

ONCLUSIONS

The previous work on EAI with multimedia data [19], [20]pinpointed two main issues in the domain. On the one hand, itargued that the integration patterns for multimedia scenariosare still not fully investigated, which can cause their wrongadoption in the EAI development stack. On the other hand,there is no formalization of these patterns that would allowto minimize design-time mistakes, facilitate the model-drivendevelopment and provide possibility for checking correctnessof the implemented multimedia EAI scenarios. In this work wefocused on the second issue and distilled a list of requirements(cf. Tab. II) for the formal representation of multimedia EAI.To address these requirements, we studied a formalism ofMM-nets that marries CPNs and multimedia databases, andthat allows to specify operations that manipulate both themultimedia objects and their metadata. The paper also presentshow MM-nets can be used for formalizing some of the mostfrequently used multimedia integration patterns. We believethat the formalism studied in this paper can also provide more insights on engineering multimedia EAI. Currently weare working on developing a CPN Tools-based prototype formodeling and simulating MM-nets and studying more in-depthformal analysis of MM-net models. In the future, it would bealso interesting to study a domain-independent language forrepresenting multimedia manipulation functions and creatingtheir repository in order to facilitate their adoption in differentmultimedia EAI scenarios.R

EFERENCES[1] S. Abiteboul, M. Arenas, P. Barcel´o, M. Bienvenu, D. Calvanese,et al. Research directions for principles of data management (dagstuhlperspectives workshop 16151).

Dagstuhl Manifestos , 7(1):1–29, 2018.[2] E. Badouel, L. H´elou¨et, and C. Morvan. Petri nets with structured data.

Fundam. Inform. , 146(1):35–82, 2016.[3] M. A. Bornea, J. Dolby, A. Kementsietsidis, K. Srinivas, P. Dantressan-gle, O. Udrea, and B. Bhattacharjee. Building an efﬁcient RDF storeover a relational database. In

SIGMOD , pages 121–132. ACM, 2013.[4] M. Broy. Seamless model driven systems engineering based on formalmodels. In

Formal Methods and Software Engineering , pages 1–19.Springer Berlin Heidelberg, 2009.[5] S. Chang, V. Deufemia, G. Polese, and M. Vacca. A normalizationframework for multimedia databases.

IEEE Trans. Knowl. Data Eng.

AAAI , pages 1091–1099, 2017.[9] R. M. Dijkman, M. Dumas, and C. Ouyang. Formal semantics andanalysis of bpmn process models using petri nets.

QUT Tech. Rep. ,2007.[10] D. Fahland and C. Gierds. Analyzing and completing middlewaredesigns for enterprise integration using coloured petri nets. In

CAiSE

Enterprise integration patterns: Designing,building, and deploying messaging solutions . Addison-Wesley, 2004.[13] R. Kontchakov, M. Rezk, M. Rodriguez-Muro, G. Xiao, and M. Za-kharyaschev. Answering SPARQL queries over databases under OWL2 QL entailment regime. In

ISWC , pages 552–567, 2014.[14] S. Lin, M. T. ¨Ozsu, V. Oria, and R. T. Ng. An extendible hash formulti-precision similarity querying of image databases. In

VLDB , pages221–230, 2001.[15] P. Mederly, M. Lekav`y, M. Z´avodsk´y, and P. Navra. Construction ofmessaging-based enterprise integration solutions using AI planning. In

CEE-SET , pages 16–29, 2009.[16] M. Montali and A. Rivkin. Db-nets: On the marriage of colored petrinets and relational databases.

Trans. Petri Nets Other Model. Concurr. ,12:91–118, 2017.[17] J. P´erez, M. Arenas, and C. Guti´errez. Semantics and complexity ofSPARQL.

ACM Trans. Database Syst. , 34(3):16:1–16:45, 2009.[18] A. Polyvyanyy, J. M. E. M. van der Werf, S. Overbeek, and R. Brouwers.Information systems modeling: Language, veriﬁcation, and tool support.In

CAiSE , volume 11483, pages 194–212. Springer, 2019.[19] D. Ritter, N. May, and S. Rinderle-Ma. Patterns for emerging applicationintegration scenarios: A survey.

Information Systems , 67:36 – 57, 2017.[20] D. Ritter and S. Rinderle-Ma. Toward application integration withmultimedia data. In

EDOC , pages 103–112, 2017.[21] D. Ritter, S. Rinderle-Ma, M. Montali, and A. Rivkin. Formal founda-tions for responsible application integration.

Information Systems , page101439, 2019.[22] D. Ritter, S. Rinderle-Ma, M. Montali, A. Rivkin, and A. Sinha.Formalizing application integration patterns. In

EDOC , pages 11–20,2018.23] D. Ritter and J. Sosulski. Exception handling in message-based integra-tion systems and modeling using BPMN.

Int. J. Coop. Inf. Syst , 25(2),2016.[24] F. Rosa-Velardo and D. de Frutos-Escrig. Decidability and complexityof petri nets with unordered data.

Theoretical Computer Science ,412(34):4439–4451, 2011.[25] C. Tzelepis, Z. Ma, V. Mezaris, B. Ionescu, I. Kompatsiaris, G. Boato,N. Sebe, and S. Yan. Event-based media processing and analysis: Asurvey of the literature.

Image Vis. Comput. , 53:3–19, 2016.[26] O. Zimmermann, C. Pautasso, G. Hohpe, and B. Woolf. A decade ofenterprise integration patterns: A conversation with the authors.

IEEESoftware , 33(1):13–19, 2016. A PPENDIX

A. Feature Detector ch in str × oid ADD I MG C NT ( id,a ) Detect str × oid Get FaceSegments str × oid × Set rect

UPD M ETADATA ( id, mmdb : faceSegment , getL ( s )) Update Face Metadata [ ¬ empty ( s )] Get ProductSegments [ empty ( s )] str × oid × Set rect

UPD M ETADATA ( id, mmdb : prodSegment , getL ( s )) Update Product Metadata [ ¬ empty ( s )] Finish [ empty ( s )] ch out str × oid (cid:104) id,a (cid:105) (cid:104) id,a (cid:105) (cid:104) i d , a (cid:105) (cid:104) id,a, detectIMG ( src ( a ) , ” faces ”) (cid:105) (cid:104) id,a,s (cid:105)(cid:104) id,a, rem ( s, getL ( s )) (cid:105) (cid:104) i d , a , s (cid:105) (cid:104) id,a, detectIMG ( src ( a ) , ” products ”) (cid:105) (cid:104) id,a,s (cid:105)(cid:104) id,a, rem ( s, getL ( s )) (cid:105) (cid:104) i d , a , s (cid:105) (cid:104) id,a (cid:105) Fig. 6. The control layer of a MM-net representing a Feature Detector

A Feature Detector is a pattern that updates metadata of anobject. More speciﬁcally, it uses concrete feature classiﬁersbased on which it retrieves data that are later on added tothe metadata storage. In our case, this pattern uses a pictureidentiﬁer to access its metadata and to add information on howmany humans and products are in the picture, and, if any havebeen detected, provides data on the coordinates of segmentswhere human faces as well as products can be found.The net starts by executing transition

Detect that,in turn, calls action

ADD I MG C NT that has twoformal parameters id and a , and that upon ﬁringalso adds two RDF triples to the metadata storage: ( id, mmdb : faceCount , countIMGs ( src ( a ) , ” human face ”:: L )) and ( id, mmdb : prodCount , countIMGs ( src ( a ) , ” product ”:: L )) .The net then proceeds with updating the metadata storagewith the information about coordinates of sub-images thateither contain human faces or products. By ﬁring transition Get Face Segments , the net generates a set of segmentswith faces. Until this set is not empty, each of its elementsgets removed (using function rem ) and added to the meta-data storage with transition

Update Face Metadata . Thistransition calls action

UPD M ETADATA that has three formalparameters

UPD M ETADATA · params = (cid:104) id, l, seg (cid:105) , where id is an image identiﬁer, l is an IRI and seg is a segment. Thisaction does not remove anything and adds to the metadatastorage only one triple ( id :: L , l :: I , seg :: L ) . In case of UpdateFace Metadata , UPD M ETADATA adds to the metadata (of animage with identiﬁer id ) information about one face segmenttaken from set s that is speciﬁed with IRI mmdb : faceSegment .When the set of segments is empty, the net performs the sim-ilar procedure with product segments. That is, it ﬁrst gets a setof all the product segments by ﬁring Get Product Segments ,and then updates the metadata storage by consecutively ﬁring

Update Product Metadata for each segment from the set.After all the updates are done, the net ﬁnishes its computationby ﬁring

Finish and placing a token with the image identiﬁerand address into place ch outout