Formalizing Integration Patterns with Multimedia Data (Extended Version)
FFormalizing Integration Patterns with MultimediaData (Extended Version)
Marco Montali, Andrey Rivkin
Free University of Bozen-Bolzano { lastname } @inf.unibz.it Daniel Ritter
Abstract —The previous works on formalizing enterprise ap-plication integration (EAI) scenarios showed an emerging needfor setting up formal foundations for integration patterns, theEAI building blocks, in order to facilitate the model-drivendevelopment and ensure its correctness. So far, the formalizationrequirements were focusing on more “conventional” integrationscenarios, in which control-flow, transactional persistent data andtime aspects were considered. However, none of these works tookinto consideration another arising EAI trend that covers socialand multimedia computing. In this work we propose a Petri net-based formalism that addresses requirements arising from themultimedia domain. We also demonstrate realizations of one ofthe most frequently used multimedia patterns and discuss whichimplications our formal proposal may bring into the area of themultimedia EAI development.
Index Terms —high-level Petri nets, enterprise integration pat-terns, multimedia data
I. I
NTRODUCTION
Recent business and socio-technical trends start relying onsmart applications with advanced analysis techniques, the IoT,business and social networks [1], [19], [26]. This entails theneed to employ enterprise application integration (EAI) forprocessing unstructured multimedia and semantic data, withconcrete applications like smart logistics, disease detection inagriculture and health-care, social sentiment analysis. The lat-ter has been recently studied in the context of multimedia EAIin [19], [20]. More generally, the need for handling multimodeland knowledge-enriched data (incl. text and multimedia data)was also identified in the related data management [1] andevent-based processing domains (considering audio, video andsocial events) [25].While multimedia integration solutions become more rel-evant and complex, solid formal foundations are crucial inorder to ensure the behavioral correctness of multimedia EAIsolutions (cf. [19]). Such formal foundations had been givenby formalizing the execution semantics of integration patterns[12], [19] – the building blocks of EAI solutions – usingcolored Petri nets (CPNs) [10] and (timed) DB-nets [21], [22].Still, results of these works do not apply to multimedia data,whereas the recent survey [19] identifies a lack of a suitableformalism for multimedia integration patterns.
Example 1:
Fig. 1 shows an excerpt from a social me-dia sentiment harvesting application (cf. https://tinyurl.com/yautcagl), in which images (and texts) are either collectedfrom Human Data Intelligence providers or directly from
Fig. 1. SAP Social Intelligence – image sentiments (excerpt) social media sources like Facebook, Twitter, or Instagram.To guide the search, a social intelligence system (e. g., ERP,CRM) provides lists of topics and keywords of interest aswell as time- or item-based metadata like a sinceId , denotingthe earliest feed of interest. Then, using textual Splitter andContent Enricher patterns, distinct queries are separated intomultiple request messages with the sinceId as header (H). Theresulting media feed entries contain images in the messagebody (B) that are processed by multiple subsequent steps. First,images without humans or products are filtered out using animage Message Filter (i. e., no sentiment about a product).Then an image Content Enricher marks the features, alongwhich the relevant parts in the images are split into separatemessages by an image Splitter. Finally, an image Enricherdetermines the emotional state of the human (towards theproduct) and adds the information to the image message, whilepreserving the image. The images with marked and determinedsentiment as well as the association to the original topic arereturned to the social intelligence system. (cid:4) a r X i v : . [ c s . A I] S e p n the absence of a formal representation of integrationprocesses like the one in Ex. 1, questions like “what doesthe process do?”, “is it functionally correct?” and “howcan it be improved?” cannot be answered. Consequently,reasoning about integration patterns with multimedia data(or even combined with textual data processing) is currentlynot possible, but desirable. To answer these questions, thiswork combines the streams of previous research on EAI withmultimedia data [20] and formal representations of integrationpatterns on textual data [10], [21], [22] towards a novel formalrepresentation of multimedia EAI solutions, which, apart frombeing defined using rigorous mathematical toolbox, shouldalso allow to formally represent multimedia data in integrationpatterns and allow for further theoretical development alongthe line of formal analysis. To this end, we build uponprevious works on the EAI formalization using CPNs andDB-nets, and propose a new Petri net-based formalism called multimedia nets (MM-nets for short). It can be essentiallyseen as marriage between CPNs and a multimedia storagewhose conceptual representation is tuned to address variousrequirements specific to multimedia data management in theEAI context (e. g., representation of multimedia messages,multimedia operations).In summary, the main contributions of this work along itsoutline are threefold. (1) First of all, we analyze multimediadata integration patterns regarding their requirements for defin-ing a suitable formalism in Sect. II. (2)
Then, in Sect. III westudy formal syntax and semantics of the MM-nets and discusscertain design decisions behind the conceptual representationof multimedia data-related parts of the formalism. To thebest of our knowledge, this is the first attempt to propose aformalism that would account for semantic knowledge and theway it is manipulated along a process execution. (3)
Finally,we give semantically correct realizations to one of the mostfrequently used multimedia integration patterns in Sect. IV.In Sect. V we discuss related work and conclude by brieflydiscussing further open research challenges in Sect. VI.II. B
ACKGROUND AND R EQUIREMENTS A NALYSIS
In this section we briefly summarize multimedia integrationpatterns from which we derive requirements for a suitableformalization that we compare to the closest known relatedwork on formalisms for textual integration patterns usingCPNs [10] and (timed) DB-nets [21], [22].
A. Multimedia Integration Patterns
In previous work [20], we identified several integrationpatterns from the pattern catalogs [12], [19], [23] that are es-pecially relevant for multimedia data (cf. Tab. I). In addition tothe pattern name and the corresponding multimedia operation,the (semantic) configuration arguments relevant for modelingsuch patterns are added. While, in general, the tasks of themultimedia patterns are similar to those working with textualdata, they differ in terms of the message representation aswell as performed operations and required storage (cf. [20]).A multimedia message consists of a body and an optional
TABLE II
NTEGRATION PATTERN M ULTIMEDIA A SPECTS , ALL INFORMATIONAPART FROM DB TAKEN FROM [20] (
LOGICAL – L OG , PHYSICAL – P HY , RE - CALCULATED – RECAL ., DB – PERSIST ; - : YES , (cid:7) : NO ) PatternName MultimediaOperation Arguments Phy Log DB
ChannelAdapter format con-version format indicator write create (cid:7)
Splitter fixed grid,object-based grid: horizontal, verti-cal cuts; object create recal./write (cid:7)
Router,Filter select object object - read (cid:7)
Aggregator fixed grid,object-based grid: rows, columns,heights, width create recal./write - Translator,ContentFilter coloring color (scheme) write recal./write (cid:7)
ContentEnricher add shape,OCR text object, shape+color,text write recal./write - FeatureDetector segmentation,matching object classifier read create (cid:7)
ImageResizer scale image size: height, width write write (cid:7)
IdempotentReceiver detector,similarity object for comparison - read - MessageValidator detector validation criteria - read (cid:7) set of attachments, and both of them “physically” containmultimedia data (e. g., image). In addition to a set of key-value header entries denoting metadata concerning the dataexchange (e. g., HTTP headers), there is a set of properties thatcarries the semantics of the multimedia data (e. g., human withemotion, product) in the message body (or attachments). In[20] it is assumed that a multimedia message is transient (i. e.,processed in a pipes-and-filter style), and that all operationsare executed directly on media objects and their metadata con-tained in the message. Moreover, for representing multimediamessages, [20] adapts a concept from the multimedia databasedomain (e. g., cf. [5]), which separates the logical and physical representations to isolate the runtime from modeling, as it isabstractly reflected in the multimedia message model in Fig. 2.The physical representation accounts for the actual multi-media data, and thus operations on the physical representationliterally read (i. e., read ), create new (i. e., create ), or changeexisting multimedia data (i. e., write ) like cutting parts of orresizing an image. When the physical representation is readand interpreted, semantic information is extracted (e. g., adetected human emotion), together with additional informationlike coordinates of the detected object ( coord. ), its color andthe confidence of the detection (cf.
Conf. ; e. g., type=“human” with
Conf.=0.85 ). The detection is done by a Feature Detectorpattern (cf. Tab. I) that has a set of ML-trained classifiers foreach expected feature in multimedia data. During the detec-tion, the distinct features are identified using the classifiersand the corresponding logical representation gets created. Incontrast to [5], where a relational multimedia model is used,the semantic information is then represented logically as partof the domain object model with references to the physicalrepresentation. This could be represented using, for example,the RDF standard [7] (which we rely on in a formalism pre-sented in Sect. III-A). For example, Fig. 2 denotes an abstractview of the domain object model by visually representing thesemantic concepts as
Type (e. g., virtual human), with sub-types
SType (e. g., emotion). Note that such as e. g., XSD or ig. 2. Conceptual Multimedia Message Model (from [20])
WSDL, in which business domain objects are encoded (e. g.,business partner, customer, employee), are not sufficient (cf.[20]) as they are normally used for representing textual domainmodels.During the modeling of a process, the user works close tothe logical representation by implicitly using operations likeread/query (i. e., read ), change (i. e., write ), newly create (i. e., create ), and adapt changes without new detection or creation(i. e., recal. ). The aforementioned detector can be also usedfor cases when the physical representation has changed andthe corresponding logical part is invalidated, thus requiring are-detection ( write ). However, for efficiency reasons, if theeffect of the physical operation on the logical representa-tion is known, then only a recalculation of the logical partcan be used ( recal. ). As such, the logical representationdenotes a canonical data model based on the domain modelor message schema of the multimedia messages as well asoperations on them. Tab. I shows the physical (
Phy ) andlogical (
Log ) operations required by the different patterns aswell as database information ( DB ) indicating whether a patternrequires persistent storage for its operation. Example 2:
The image Splitter uses an object-based splitting(cf. Tab. I), where the object is a human face. During theprocessing, new physical multimedia objects (e. g., images ofhuman faces in the original image) are created (
Phy :create)and the logical representation is either created from the scratch(
Log :create) or attempted to be recalculated (
Log :recal.) ac-cording to the knowledge about the split. Notice that this doesnot require a persistent storage of multimedia messages. (cid:4)
B. Formalization Requirements
The formalization requirements of multimedia integrationpatterns are derived from the patterns in Tab. I. The baserequirements (also found for integration patterns on textualdata [21]) are necessary for representing the control flow thatmessages go through in the integration process (i. e.,
REQ-0 “control flow (pipes and filter)” ). The next requirementsconcern the processing of multimedia data. As discussedbefore, the data processing has two different aspects, dealingwith the representation of multimedia messages, and physicaland logical operations on the messages.The representation of multimedia messages requires supportfor multimedia and semantic data, which is not providedby the CPN [10] and (timed) DB-net [21], [22] approaches (i. e.,
REQ-1(a) “Multimedia message representation)” ).Physical multimedia operations on the data require capabilitieslike marking an image with some geometrical shape for theContent Enricher (i. e.,
REQ-1(b) “Multimedia data oper-ations” ), whereas the logical representation requires keepingthe logical part “up-to-date” (i. e., it does not describe fea-tures of a physical object that are not there). This not onlyallows to efficiently process the data stored on the logicalpart, e. g., by using SPARQL queries (cf. [20]), but also tosupport the modeling, during which semantic operations on themultimedia data can be specified (i. e.,
REQ-1(c) “Semantic/ metadata operations” ). Another data processing aspectconcerns the persistent storage for patterns like the Aggregatorand Idempotent Receiver, which require the storage for theiroperations (i. e.,
REQ-1(d) “Persistently store multimediadata, semantic data / metadata” ). In addition to the func-tional requirements, it is important to provide a suitableformal representation of such multimedia integration patternsso as to facilitate their correct representation and furtherdevelopment (i. e.,
REQ-2 “Formal rigorous semantics” ),as done in model-driven development [4]. Notice that furtherrequirements from [21] like time, transaction and exceptionhandling are out-of-scope, since they are not directly relatedto multimedia data. Tab. II summarizes the formalizationrequirements that we consider in this work by setting thecoverage of two approaches based on colored Petri nets [10]and DB-nets [16], [21], which, to date, are the only onesthat have been used for formalising integration patterns. WhileCPNs provide a solid foundation for control (cf. REQ-0) anda simple data flow representation, DB-nets extend the lattertowards the support of persistent data with CRUD operationsfor working with external, transactional databases. However,none of them supports multimedia data or semantic/metadataoperations (REQ-1(b)–(c)). CPNs do not support the modelingof persistent storage, whereas DB-nets do not allow for aconceptually correct representation of multimedia data as theycannot support two different storages (one for logical and theother for physical data) that would also need to be manageddifferently. As long as no multimedia or semantic data aspectsare concerned, DB-nets can store that data (REQ-1(d)). Thewell-defined semantics of CPNs and DB-nets allow to conductvarious types of model-based analysis, ranging from model-based testing via simulation to complex verification usingvariants of temporal logics.III. M
ULTIMEDIA NETS
In this section we present the formalism of MM-nets thatbuilds upon CPNs and takes inspiration from the multi-layered representation adopted within the DB-net approachby subsequently defining multimedia data and semantic op-erations as well as multimedia storage for the data-relatedrequirements (cf. REQ-1(a)–(d) from Tab. II). Conceptually,MM-nets are structured as follows: (i) a multimedia storage stores multimedia data together with their metadata; (ii) a control layer employs a variant of CPNs to capture the control-flow dimension of the modeled process; (iii) a data logic layer ABLE IIF
ORMALIZATION R EQUIREMENTS ( COVERED : - , PARTIALLY : ( - ), NOT : (cid:7) ) ID Requirement CPN (timed) db-netREQ-0 Control flow (pipes andfilter) - -
REQ-1 (a) Multimedia message rep-resentation (cid:7) (cid:7) (b) Multimedia data opera-tions (cid:7) (cid:7) (c) Semantic / metadata oper-ations (cid:7) (cid:7) (d) Persistently store multi-media data, semantic data/ metadata (cid:7) ( - )REQ-2 Formal rigorous semantics - - embodies a communication interface between the multimediaand control layers. Using the data logic, the control layercan access the underlying multimedia storage (and tune itsown behavior depending on the obtained answer) as well asupdate it with data carried by the tokens and additional dataobtained from the external world. In what follows, we studyevery layer in detail and provide a formal definition of anMM-net. We also discuss how, in spite of certain conceptualand operational differences, DB-nets can be related to MM-nets. This observation provides insights on a possibility ofadopting formal analysis techniques studied for DB-nets forthe formalism of MM-nets. A. Multimedia storage
A data type is D = (cid:104) ∆ D , Γ D , Φ D , Σ D (cid:105) , where ∆ D is avalue domain, Γ D and Φ D are finite sets of predicate andfunction symbols defined on top of elements of ∆ D , Σ D isthe signature interpretation, i. e., a function associating eachpredicate symbol S (resp., function symbol f ) of arity n ,denoted as S/n (resp., f /n ), to an n -ary relation Σ( S ) ⊆ ∆ n D (resp., to an n -ary function Σ( f ) : ∆ D × . . . ∆ D n → ∆ D ,where each D i is some type, possibly different from D ).For the sake of brevity, hereinafter we omit the signatureinterpretation in the data type definitions.Examples of data types are: str = (cid:104) S , { = s } , ∅(cid:105) – stringswith the equality predicate; int = (cid:104) Z , { = int , < int } , { succ : Z → Z }(cid:105) – integers with the usual comparison operators, aswell as the successor function; jpg = (cid:104) IMG , ∅ , { sub : IMG × IMG → IMG }(cid:105) – images in JPG format with a binaryimage subtraction function. We use D to denote a type domain,that is, a finite set of data types, and write (cid:3) D = (cid:83) D∈ D (cid:3) D ,for (cid:3) ∈ { ∆ , Γ , Φ } . Also, for ease of presentation, we singleout a domain of multimedia (object) types D MO , s.t. D MO ∩ D = ∅ , and fix a string-based type oid = (cid:104) S , { = oid } , ∅(cid:105) ∈ D for specifying proper addresses of objects. Functions from Φ D MO provide the basis for defining operations discussed inREQ-1(b)–(c). Finally, we shall use a function type , definedon D ∪ D MO , to return a data type of a variable or a value.In this work we make a design decision for modelingmultimedia data in which one focuses on the object metadataand treats them as the “first class citizen” (e. g., similar to[5]), assuming that the actual (multimedia) objects are kept in some storage and can be accessed/manipulated only byreferences. In this case one should distinguish two differenttypes of object manipulations. One type focuses on the way theobjects are accessed and viewed/manipulated (e.g., accessingand resizing an image stored on some server), whereas theother considers auxiliary information about objects and thusallows for treating them in a more refined way. To accountfor the first type, we introduce object storage (or database) O db that stores multimedia objects of various types. We shallnot go into technical aspects of such database, but instead justassume that it provides functionality for adding and deletingmultimedia data, and that every object can be accessed byusing its proper addresses. W.l.o.g., we formally treat O db as a set of pairs ( a, mo ) , where mo ∈ O db is a multimediaobject and a is its address of type oid , and assume that allthe addresses are unique and two distinct objects can never bereferenced by the same address . For convenience, we introducetwo functions: addr : D MO → ∆ oid that, given a multimediaobject, returns its address, and src : ∆ oid → D MO that, givenan address, returns an object that this address is pointing at.The metadata of all the objects from O db together with theiraddresses are kept in metadata storage M db that is representedas an RDF graph – a set of statements ( s, p, o ) , with s beinga subject, p being a predicate and o being an object. Eachstatement triple is an atomic construct. Its subject describes aninformation resource, while its predicate represents a statementproperty referenced by an internationalized resource identifier(IRI) whose value is the statement object . Note that, while s , p and o carry values of IRIs, s and o can also be RDFliterals (for more details see [7]). Notably, IRIs as objectscan be used to represent more complex, tree-structured values.The usage of IRIs in RDF statements is crucial as it allowsfor the unambiguous identification of information resources.As opposed to [7], we do not use blank nodes and thusconsider only ground RDF graphs. In what follows, we shalluse ∆ L to denote an infinite set of RDF literals and ∆ I todenote an infinite set of IRIs, and we may collectively referto both of them as RDF terms . Here, L = { ∆ L , Γ L , Φ L } and I = { ∆ L , Γ I , Φ I } respectively denote datatypes of RDFliterals and IRIs, where L , I ∈ D , and Γ and Φ are potentiallynonempty sets of predicate and function symbols that weintentionally leave unspecified as their content depends on aconcrete scenario (or, more specifically, on a used databasemanagement system). Lastly, given the complexity of the data-type management in RDF, for ease of presentation we employa type casting function :: that, given x and a target type t ,returns a value in x that is cast to t . W.l.o.g., we assume theextension of this function on variables.To query the multimedia storage we adopt SPARQL – thestandard W3C pattern-matching language for querying RDFgraphs [11]. There are plenty of formal ways to define thesyntax of SPARQL queries as well as the semantics of patternevaluation. In this paper, instead, we only provide intuitionsnecessary for understanding how metadata are accessed andhow SPARQL query answers can be manipulated in thecontext of the studied formalism of MM-nets. Let V RDF e an infinite set { ? x, ? y, . . . } of RDF variables, where foreach ? x ∈ V RDF , type (? x ) = I ∪ L . The basic buildingblock of SPARQL queries is a triple pattern – a tuple from (∆ L ∪ ∆ I ∪V RDF ) × (∆ L ∪V RDF ) × (∆ L ∪ ∆ I ∪V RDF ) . Finitesets of such tuples form basic graph patterns (BGPs) [11].More complex graph patterns are inductively constructed fromBGPs using various operations (e.g., OPT, JOIN, UNION)that are applicable to graph patterns and built-in conditions.The semantics of graph patterns is defined in terms of partialfunctions θ : V RDF → ∆ L ∪ ∆ I called mappings . Given aBGP P , θ ( P ) denotes the BGP obtained by applying θ toall variables in P . We use a function Vars ( P ) to denote theset of all variables in P . Both θ and Vars can be easilyextended to account for tuples of variables. Given an RDFgraph M db , the evaluation of a graph pattern P over M db isspecified as the set (cid:74) P (cid:75) M db of mappings inductively definedusing SPARQL operations and the BGP evaluation as the basecase [13], [17]. Notably, for the pattern evaluation we usethe simple entailment semantics, in which, in the base case,for every mapping θ ∈ (cid:74) P (cid:75) M db , it holds that θ ( P ) ⊆ M db .SPARQL queries use results of the pattern evaluation toform result sets and come in four different forms: SELECT , ASK , CONSTRUCT and
DESCRIBE [11]. However, in thiswork we are only interested in the first two. A
SELECT query can be abstractly defined as ( (cid:126)w, P ) , where P is agraph pattern and (cid:126)w = (cid:104) w , . . . , w k (cid:105) is a vector of answervariables, such that { w , . . . , w k } ⊆ Vars ( P ) . Such query isthen evaluated over a graph M db by applying mappings from θ ∈ (cid:74) P (cid:75) M db to the variables in (cid:126)w . We shall denote the resultingset (by default, SPARQL uses the bag-based semantics forthe query evaluation [11], but we opt for the set-based one)as ans ( M db , (cid:126)w, P ) . An ASK query returns a boolean valueindicating whether a pattern matches the given RDF graph andcan be seen as a special case of a
SELECT with an emptyset of answer variables. In what follows, we use Q to definethe set of all such SPARQL queries. Example 3:
For brevity, assume an RDF vocabulary mmdb that standardizes all the metadata attributes as well as relationsbetween them relevant to the scenario in Ex. 1. To extract anumber of segments that contain human faces in every imagein the multimedia storage together with that image identifier,we can use query numSeg := ( (cid:104) ? id, ? c (cid:105) , P ) , where P = SELECT ? id, ? c WHERE { ? id mmdb : faceCount ? c } .For accessing information about the segments withhuman faces, query segs := ( (cid:104) ? id, ? seg (cid:105) , P ) isused. The pattern it employs is defined as P = SELECT ? id, ? seg WHERE { ? id mmdb : faceSegment ? seg } .Here, every segment object stores an alphanumericstring carrying two pairs of coordinates (e. g., ”( , ) .. ( , )” ) to represent rectangle coordinateswithin which the segment is located. (cid:4) Since the multimedia storage essentially has no intensionalpart (i.e., there are no schemas either for object storageor metadata storage), we only define its extensional part.Formally, a D MO -typed multimedia storage instance is a pair ( M db , O db ) , where a M db is a metadata storage instance and O db is a multimedia object storage instance. Here, eachinstance should be understood as a set of address-object pairsand object metadata observed at the given point in time.Notice that this representation of metadata and multimediadata essentially fully meets REQ-1(a) and REQ-1(d). B. Data logic layer
Here we discuss how to manipulate the multimedia storageand show how to update metadata of objects stored in the mul-timedia storage (resp., object storage) by adding and deletingpossibly multiple triples (resp., multimedia objects) at once.Such updates are realized by means of parametrized actions,each of which consists of a set of templates – expressions that,once instantiated, assert which RDF triples (resp. multimediaobjects) will be deleted from and added to the database.We intend to provide two main types of operations forupdating the multimedia storage. The first type works directlywith the metadata storage and allows to add and delete afixed number of triples. When using this type of updates, themodeler, however, should be aware that any changes of objectmetadata should faithfully reflect the actual state of the objectitself. The second type allows to add, delete and/or updatemultimedia objects themselves. Sometimes such operationsshould be bundled with those of the first type so as to ensurethe integrity of the object-metadata indivisibility principle. Wefix the infinite set of typed variables V D , where for each x ∈ V D , type ( x ) = D and D ∈ D . Note that V RDF ⊂ V D . Definition 1:
A parameterized action α is a triple ( (cid:126)p, F D , F A ) , where:1) (cid:126)p is a tuple of action formal parameters – distinct vari-ables from V D ;2) F D = ( mm − , mo − ) and F A = ( mm + , mo + ) are two pairssuch that: • mm − and mm + are finite sets of triples ( s, p, o ) ∈ (∆ L ∪ ∆ I ∪ Y ) × (∆ L ∪ Y ) × (∆ L ∪ ∆ I ∪ Y ) to be deletedfrom and added to the metadata storage, where Y = Vars ( (cid:126)p ) ∩ V RDF ; • mo − is a finite set of addresses a of objects to bedeleted from the object storage, where a is either aconstant from ∆ oid or a variable from Vars ( (cid:126)p ) with type ( a ) = oid ; • mo + is a finite set of expressions a (cid:46) f ( x , . . . , x n ) generating objects to be added to the object storage,where a is either a constant from ∆ oid or a variableof type oid from Vars ( (cid:126)p ) , f ∈ Φ D MO with co-domain coDom ( f ) ⊂ ∆ D MO , and every x i is either a variablefrom Vars ( (cid:126)p ) or a constant from D ∪ D MO . (cid:3) Here, if an object with address a is already present in theobject storage, a (cid:46) f ( x , . . . , x n ) updates this object with theresult of the function call. If, instead, there is no object withsuch an address, then the same expression adds a pair ( a, r ) to the object storage, where r is a result of the function call.To access different components of α , we make use of thefollowing notation: α · params = (cid:126)p , α · del = F D , α · add = F A . Given a substitution σ : Vars ( α · params ) → ∆ L ∪ ∆ I ∪ ∆ oid for α · params , an action instance ασ is a ground actionesulting by substituting parameters in α with correspondingvalues in σ . An application of ασ to a multimedia storageinstance I = ( M db , O db ) , denoted as apply ( ασ, I ) , returnsa new instance of the multimedia storage I (cid:48) = ( M (cid:48) db , O (cid:48) db ) ,such that: • M (cid:48) db = ( M db \ mm − ασ ) ∪ mm + ασ , where mm − ασ = (cid:83) ( s,p,o ) ∈ mm − σ ( s, p, o ) and mm + ασ = (cid:83) ( s,p,o ) ∈ mm + σ ( s, p, o ) ; • O (cid:48) db = ( O db \ ( mo − ασ ∪ mo + ασ )) ∪ mo + ασ , where, as-suming for simplicity that X = x , . . . , x n , mo − ασ = (cid:83) a ∈ mo − ( a, src ( σ ( a ))) , mo + ασ = (cid:83) a(cid:46)f ( X ) ∈ mo + , ∃ o. ( a,o ) ∈ O db ( a, o ) and mo + ασ = (cid:83) a(cid:46)f ( X ) ∈ mo + ( a, f ( σ ( x ) , . . . , σ ( x n ))) .As in DB-nets [16], in order to avoid situations in which thesame fact is asserted to be added and deleted, we prioritizeadditions over deletions. The overall representation of actionsand their semantics together with the ability to use type-specific functions allow to account for REQ-1(b)–(c). Example 4:
The Splitter (cf. Ex. 1) employs an actioncalled
GET I MAGE that extracts sub-image o (cid:48) from image o with address a , based on the information about segment seg that identifies it, and generates relevant metadata about o (cid:48) (likename n , new address a (cid:48) and new image identifier id ) that areadded to the metadata storage. This action uses five formalinput parameters GET I MAGE · params = (cid:104) a, seg, a (cid:48) , id, n (cid:105) and performs the following updates. It only deletesthe metadata from the original image that arerelated to the selected segment: GET I MAGE · del =( { ( id :: L , mmdb : faceSegment , seg :: L ) } , ∅ ) . Then, using GET I MAGE · add = ( mm + , mo + ) , it adds all necessary metadataentries mm + := { ( id :: L , mmdb : address , a (cid:48) :: L )) , ( id :: L , mmdb : format , . jpg :: L )) , ( id :: L , mmdb : name , n :: L ) } and adds to the object storage an extracted imagewith mo + = { a (cid:46) extractIMG ( src ( a ) , seg ) } . Here, extractIMG : IMG × D → IMG takes as input an imageand a rectangular selection ( rect is a type defined on top ofan alphanumeric set of rectangle coordinates D representingsegments), and returns a subimage defined by the latter.To update image ( a, o ) by cutting another image withaddress a (cid:48) from it, we define action CUT F ROM
IMG, s.t.,
CUT F ROM
IMG · add = ( ∅ , { a (cid:46) sub ( src ( a ) , src ( a (cid:48) )) } ) .Knowing that image ( a, o ) is already in the storage, we usehere the action “updating” semantics. (cid:3) Notice that Definition 1 allows to specify actions whoseexecution may be still inconsistent. For example, one maydelete an object without removing its metadata, which isintuitively not an expected type of behavior.
C. Control layer
Before defining the central notion of MM-net, we fix somestandard notions related to multisets . For some set A , A ⊕ := { m : A → N } is the set of multisets over A . Given a multiset S ∈ A ⊕ , an element a ∈ A and n ∈ N , S ( a ) ∈ N denotes the Note that for RDF triples, a notion of substitution coincides with the oneof mapping with the only difference that the former is not partial. number of times a appears in S and we write a n ∈ S if S ( a ) = n . Given S , S ∈ A ⊕ , we define the following operations onmultisets: (i) S ⊆ S (resp., S ⊂ S ) if S ( a ) ≤ S ( a ) (resp., S ( a ) < S ( a ) ) for each a ∈ A ; (ii) S + S = { a n | a ∈ A and n = S ( a ) + S ( a ) } ; (iii) if S ⊆ S , S − S = { a n | a ∈ A and n = S ( a ) − S ( a ) } ; (iv) given a number k ∈ N , k · S = { a kn | a n ∈ S } ; (v) | m | = (cid:80) a ∈ A m ( a ) .A MM-net net assigns to each place a color type, which inturn corresponds to a data type or to a cartesian product ofmultiple data types from D . Inscriptions , represented as tuplesof variables from V RDF ∪ V D , constants from ∆ D ∪ ∆ L ∪ ∆ I and terms (constructed from functions from Φ D MO ∪ Φ D ,variables and constants, and denoted as T ), are used toreference contents of places. We denote by Ω A the set ofall possible inscriptions over a set A . Quite often, whenmanipulating various data objects, one would like to ensureprovision of fresh data values (for example, a generation ofglobally fresh object identifiers). To this end, we adopt thewell-known mechanism used in ν -Petri nets [24] and introducea countably infinite set Υ D of D -typed fresh variables , wherefor every ν ∈ Υ D , we have that ∆ type ( ν ) is countablyinfinite (this provides an unlimited supply of fresh values).Hereinafter, we fix a countably infinite set of D -typed variable X D = V D (cid:93) Υ D as the disjoint union of “normal” variables V D and fresh variables Υ D . Let us also introduce a guard – a formula defined as ϕ ::= S ( x , . . . , x m ) | ¬ ϕ | ϕ ∧ ϕ | (cid:62) ,where S ∈ Γ D (for some D ∈ D ) and x i is either a variableof type D , or a constant from ∆ D . We use G to denote a setof all possible guards. Notice that guards are not defined onmultimedia objects. Definition 2: A D -typed MM-net N is a tuple ( D , P, T, F in , F out , color , query , guard , act ) , where: • P = P c ∪ P v is a finite set of places partitioned into controlplaces P c and view places P v (decorated as and canconnect to transitions only with read arcs); • T is a finite set of transitions, s.t. P ∩ T = ∅ ; • color : P → ℘ ( D ) is a place typing function; • query : P v → Q is a query assignment function, s.t.,for every p ∈ P v with query ( p ) = ( (cid:126)w, P ) , it holds that type ( (cid:126)w ) = color ( p ) ; • F in : P × T → Ω ⊕V D is an input flow, s.t. type ( F in ( p, t )) = color ( p ) for every ( p, t ) ∈ P × T ; • guard : T → G is a partial guard assignment function,s.t., for every t ∈ T , Vars ( guard ( t )) ⊆ InVars ( t ) , where InVars ( t ) = ∪ p ∈ P Vars ( F in ( p, t )) ; • F out : T × P → Ω ⊕X D ∪ ∆ D ∪T is an output flow, s.t. type ( F out ( t, p )) = color ( p ) for every ( t, p ) ∈ T × P ; • act : T → A is a partial action assignment function, where A is a finite set of actions. (cid:3) Note that the given definition does not restrict the usage ofobjects in the guard formulas to only those from D . In fact,one can even compare multimedia objects by using the src function. Inscriptions in the output flow can inject possiblyfresh data via external variables that are not bound by any inputinscription and that are taken from OutVars ( t ) \ InVars ( t ) ,where OutVars ( t ) = ∪ p ∈ P Vars ( F out ( t, p )) and every vari- Segments L × L ch in str × oid Start [ id = id (cid:48) :: str ] str × oid × int Segments L × L GET I MAGE ( a,seg,ν a ,ν id ,n ) Split [ c > ∧ id = id (cid:48) :: str ] oid × str × oidstr × oid out Set str × oid CUT F ROM
IMG ( a,a (cid:48) ) Update ImageFinish1 [ c = ∧ empty ( s )] Finish2 [ c = ∧¬ empty ( s )] ch out str × oidSet str × oid Emit [ ¬ empty ( s )] (cid:104) id,a (cid:105)(cid:104) id,a,c :: int (cid:105) (cid:104) i d , a , c (cid:105) (cid:104) i d , a , c − (cid:105) (cid:104) ν a ,id,a (cid:105) ins ( s , ν i d , ν a ) (cid:104) s (cid:105) (cid:104) s (cid:105) (cid:104) s (cid:105) (cid:104) a (cid:48) ,id,a (cid:105) (cid:104) i d , a (cid:105) (cid:104) i d , a (cid:105) (cid:104) i d , a , c (cid:105) (cid:104) id,a,c (cid:105) (cid:104) id,a (cid:105)(cid:104) s (cid:105)(cid:104) s (cid:105) rem ( s, getL ( s )) getL ( s ) (cid:104) id (cid:48) ,c (cid:105) (cid:104) id (cid:48) ,seg (cid:105) Fig. 3. MM-net control layer of a Splitter. Guards are depicted in green,types are shown in bold next to corresponding places. able x can be either from Υ D or V D . Example 5:
A Splitter comprises a complex routing mechanism that,given a message, iteratively breaks it into smaller parts [12].In our scenario we aim at splitting a single image into smallerimages, shown in Fig. 3. The splitting is performed accordingto a simplified criterion: only images of human faces willbe extracted. All the information needed for extracting suchimages is supposed to be already in the metadata storage. Thelatter is the key assumption guaranteeing that the Splitter canalways identify needed elements and organize their processing.The net starts by extracting a number of segments con-taining human faces of all images in the multimedia storage(note that the information about the segments is supposed tobe already in the metadata storage). This is done using viewplace , s.t., query ( ) = numSeg , where numSeg is defined in Ex. 3. By joining the input imageidentifier id with the one in the view place, the net allowsto get a number of segments c for a concrete image. If nosub-images have been detected before, i.e., c = 0 , the netfinishes its work by consequently firing transition Finish1 .Notice that
Finish1 fires only if its guard has been satisfied,and that the inscription F out ( Finish1 , out ) contains a variable s that is bound to a set of pairs (identifier, address) in place out . Initially, this out is supposed to contain an empty set.If the image contains such segments ( c (cid:54) = 0 ), the net entersa loop with c as the counter and repeats the following steps.First, it fires Split that executes action
GET I MAGE assigned toit (cf. Ex. 4) that gets a sub-image based on the informationof the concrete segment taken from view place
Segments (s.t., query ( Segments ) = segs , where segs is defined in Ex. 3) andadds it to the multimedia storage together with the relevantmetadata.
GET I MAGE instantiates is formal parameters withvariables that are bound to values coming from input places(like a and seg ) and that either simulate system input ( n isan unbounded variable simulating system input for an imagename) or fresh data injection ( ν a creates a globally newimage address, ν id generates a fresh image ID). Split also“remembers” the identifier and address of the extracted image by adding them to the set of pairs in place out . Here, type
Set A (for A := A × . . . × A n and ∆ A := ∆ A × . . . × ∆ A n )is defined over a set ℘ (∆ A ) , with predicate empty checkingwhether a set is empty or not, and three following functions: (i) ins adds an element to the set; (ii) getL returns the lastelement from the set; (iii) rem removes an element from theset. The net then remembers the address of the extracted imageand proceeds with firing the Update Image transition thatexecutes action
CUT F ROM
IMG (cf. Ex. 4) that updates theimage with address a by removing from it the subimage withaddress a (cid:48) . The loop repeats until the counter has reached .Then the net fires Finish2 after which it consequently emitsextracted pairs (using
Emit ) into the output place ch out . (cid:4) D. Execution semantics
In the nutshell, the execution semantics of MM-nets issimilar to the one of DB-nets [16]: it has to simultaneouslycapture the progression of both the multimedia storage andcontrol layer. To this end, at each point in time, a stateof a MM-net is represented using a so-called snapshot , thatconsists of multimedia storage instance I and marking m .The latter is formally defined as function m : P → Ω ⊕ D , s.t. m ( p ) ∈ ∆ ⊕ color ( p ) and m ( v ) = ans ( M db , query ( v )) , for all p ∈ P and v ∈ P v . Note that the second condition in themarking definition guarantees that the marking of a view placecorresponds to the answers obtained by issuing its associatedquery over the underlying multimedia storage instance. In thefollowing, by writing a MM-net N in snapshot s = ( I , m ) ,we mean a marked net, with marking m , over a multimediastorage instance I = ( M db , O db ) .The firing of transition t ∈ T in a snapshot is defined w.r.t. aso-called binding for t defined as σ : Vars ( t ) ∪ OutVars ( t ) → ∆ D that substitutes all variables in inscriptions on the arcsincident to t and, possibly, formal parameters of an actionsignature assigned to t with values from ∆ D . Definition 3:
A transition t ∈ T is enabled in a snapshot s = (cid:104)I , m (cid:105) , written as s [ t (cid:105) , if there exists a binding σ satisfyingthe following: (i) σ ( F in ( p, t )) ⊆ m ( p ) , for every p ∈ P ; (ii) σ ( guard ( t )) is true; (iii) σ ( x ) (cid:54)∈ Val ( s ) , for every x ∈ Υ D ∩ OutVars ( t ) . (cid:3) Essentially, a transition is enabled with a binding σ ifthe binding selects values carried by tokens from the inputplaces (which match inscriptions on the corresponding inputarcs), so that the data they carry make the guard attached tothe transition true and, moreover, assigns globally fresh andpairwise distinct (both in m and I ) values to variables from Υ D . Then, when a transition is enabled, it may fire. Definition 4:
Let N be a MM-net in snapshot s = ( I , m ) ( I = ( M db , O db ) ), with t ∈ T enabled in s with some binding σ . Then, t may fire producing new snapshot s (cid:48) = ( I (cid:48) , m (cid:48) ) ,s.t. m (cid:48) ( p ) = m ( p ) − σ ( F in ( p, t )) + σ ( F out ( t, p )) and I (cid:48) = apply ( act ( t ) σ, I ) . We denote this as s [ t (cid:105) s (cid:48) and assume thatthe definition is inductively extended to sequences τ ∈ T ∗ . (cid:3) Here,
Val ( s ) denotes the set of all constants occurring both in m and I . or net N in initial snapshot s , we use S ( N ) = { s |∃ τ ∈ T ∗ , s.t. | s [ t (cid:105) s } to denote the set of all snapshots of N reachable from its initial snapshot s .The execution semantics of a MM-net is defined in termsof a possibly infinite-state labeled transition system (LTS)accounting for all possible executions of the control layerstarting from an initial snapshot. States of this transitionsystems are MM-net snapshots, whereas transitions modelfirings of MM-net transitions under chosen bindings. Formally,given a MM-net N in snapshot s , the execution semantics of N is given by the LTS Λ N = ( S, s , → ) , where: • S is a possibly infinite set of snapshots; • →⊆ S × T × S is a T -labelled transition relation betweenpairs of snapshots; • S and → are defined by simultaneous induction as thesmallest sets satisfying the following conditions: (i) s ∈ S ; (ii) given s ∈ S , for every transition t ∈ T , binding σ andsnapshot s (cid:48) over N , if s [ t (cid:105) s (cid:48) , then s (cid:48) ∈ S and s t → s (cid:48) .With this we cover the last requirement from Tab. II. E. Connection to DB-nets
It is easy to see that MM-nets are Turing complete. How-ever, this formalism can still be potentially used for checkingformal properties of multimedia integration patterns and theircompositions. As this paper primarily focuses on developinga modeling formalism, we leave a more in-depth discussionof the formal analysis to the future work and show a keyconnection between MM-nets and their predecessor DB-netsthat, as it has been shown in [16] and [21], can be used formodel-based testing via simulation as well as verification offormal properties such as reachability of a nonempty place.Formalisms of DB-nets and MM-nets are conceptually quitesimilar. The main difference lies in the type of persistentdata these formalisms manipulate and the data types they use(as we have stated before, DB-net data types do not supportfunctions). DB-nets allow for checking the reachability of anonempty place under restrictions limiting the data types thatone can use (namely, only strings and reals) and the “size”of information that can be simultaneously present in the netmarking and database instance [16]. The same result could bereconstructed for MM-nets with strings and reals (proviso thatone uses their definitions from [16]) as well as boundednessrestrictions imposed over the places and multimedia storageboundedness, and by encoding them into DB-nets. The RDFstorage used in MM-nets can be suitably represented in arelational database (e. g., [3] for more details), whereas theSPARQL queries can be translated to SQL as it is suggestedin [13]. For validating MM-nets, we can leverage the samemodular approach used for validating DB-nets in [21], [22].More specifically, one can use CPN Tools (http://cpntools.org/)for representing the control layer of MM-nets together withqueries assigned to view places and actions appearing ascode segments attached to transitions, and use its Access/CPNframework for defining extensions that would allow to imple-ment the data manipulation logic running on common RDF /SPARQL frameworks like Apache Jena (jena.apache.org) and image processing capabilities like OpenCV (opencv.org), asused in [20].IV. M
ULTIMEDIA P ATTERN R EALIZATION
In this section we formalize multimedia integration patternsused in Fig. 1 as MM-nets, and thus demonstrate how tomodel multimedia Message Filter and Content Enricher. Therealization of the Splitter can be already found in Sect. III. InAppendix we also provide a realization of another importantmultimedia integration pattern, a Feature Detector, that is,however, not mandatory for our scenario. Some of the patternsare structurally similar to previous works on textual integrationpatterns (e. g., [10], [21]).
A. Multimedia Operations
We start by outlining various types of multimedia operationsused in this section. For simplicity, we consider only the jpg type and assume that all the functions that will beformally defined further extend Φ jpg . Function countIMGs : IMG × S → N takes an image together with a feature patternand counts all sub-images in the given image that correspondto this pattern. If no sub-images have been detected, thefunction returns . To detect segments in images, we introducea function detectIMG : IMG × S → ℘ ( D ) that, given animage and a feature, returns a set of segments (of type Set rect )corresponding to image parts with the detected feature. Incertain scenarios, it is important to highlight detected objectswith a text and/or geometrical shapes. Function markIMG : IMG × D × S × S → IMG is the function that, given an image,segment coordinates, geometrical shape and color, draws thecolored shape around the specified segment in the image.
B. Message Filter
A multimedia Message Filter is supposed to check onincoming messages (carrying both multimedia objects andtheir metadata), filtering out those that do not match a certaincriterion and routing the others to the output channel. ch in str × oid Images withTags L × L Accept [ tag :: str = ” human ” ∨ tag :: str = ” product ” ∧ id (cid:48) :: str = id ] Discard [ ¬ ( tag :: str = ” human ” ∨ tag :: str = ” product ”) ∧ id (cid:48) :: str = id ] ch out str × oid (cid:104) i d , a (cid:105) (cid:104) i d , a (cid:105) (cid:104) id, a (cid:105) (cid:104) i d (cid:48) , t a g (cid:105) (cid:104) i d (cid:48) , t a g (cid:105) Fig. 4. MM-net control layer of a Message Filter
In MM-nets, this pattern is realized by encoding the fil-tering condition directly into the net using view places andSPARQL queries attached to them. Notice that this realizationimplicitly requires that the metadata already contain semanticinformation needed for checking the criterion (e. g., triplesspecifying how many humans/products are present on thepictures). The net in Fig. 4 starts by consuming a messagefrom input channel place ch in that carries an image identifierand its address in the object storage. View place Images withTags is equipped with a query that extracts image identifierstogether with tags of objects that they contain. The queryan be defined as following: ( (cid:104) ? id, ? tag (cid:105) , P ) , where P = SELECT ? id, ? tag WHERE { ? id mmdb : containsObj ? tag } .The tags of interest are specified in transition guards thatrealize filtering conditions. If the image with identifier id satisfies the condition, i. e., the view place contains a pair inwhich the first element matches the value carried by id andthe second element is either ” human ” or ” product ” , then thetoken with the input message is routed to the output channel byfiring transition Accept . Otherwise, the message gets discardedwith transition
Discard . C. Content Enricher ch in str × str × oid T str × oid UPD I MAGE ( a,seg ) Enrich str × str Get Segment [ k (cid:48) :: str = k ∧ id (cid:48) :: str = id ] SegmentsByKey str × str × rect rect × str ch out str × oid (cid:104) k,id,a (cid:105) (cid:104) id,a (cid:105)(cid:104) k,id (cid:105) (cid:104) k,id (cid:105) (cid:104) seg :: rect ,id (cid:105)(cid:104) id,a (cid:105) (cid:104) seg,id (cid:105)(cid:104) id,a (cid:105)(cid:104) id (cid:48) ,k (cid:48) ,seg (cid:105) Fig. 5. MM-net control layer of a Content Enricher
A Content Enricher enriches the content of incoming mes-sages using external sources. In our case, it utilizes an imageidentifier to access its metadata and extract information aboutdetected features, using which it then updates the image inthe object storage by adding to it extra visual components, asshown in Fig. 5.The net starts with a message that contains object iden-tifier id , its address a and some feature key k that willbe used for acquiring data for the enrichment. Then thenet proceeds by splitting the message into two parts andusing its part with id and k to get information about asegment that is characterized by k in the metadata storage.To this end, we use view place SegmentsByKey that hasa query with the following graph pattern attached to it: P = SELECT ∗ WHERE { ? id mmdb : faceSegment ? s. ? id mmdb : prodSegment ? s. } . This pattern returns all triplesthat have mmdb : faceSegment and mmdb : prodSegment aspredicates. The net then allows to choose any segment seg that matches the image identifier and the feature key. Finally,the enrichment step happens when transition Enrich getsfired and calls an action assigned to it. This action, called
UPD I MAGE , updates a (physical) image in the object storageby adding a red oval around some area in it defined byselected segment seg . Formally,
UPD I MAGE · add = ( ∅ , { a (cid:46) markIMG ( src ( a ) , seg, ” oval ” , ” red ”) } ) . D. Discussion
We have demonstrated how some of the most important(multimedia) integration patterns can be formally representedin our formalism. Notice that, similarly to [10] and [21], [22],every pattern should be equipped with input and output chan-nels. This is needed to ensure their flawless, message-basedcomposition: in case of two connected pattern formalizations,the output channel of the first one should have the same typeas the input channel of the second. In the scenario described in Ex. 1 and depicted in Fig. 1, the whole process essentiallyrepresents a sequential composition of integration patterns andthus can be seamlessly implemented by following the order ofpatterns in Fig. 1 and by “fusing” output and input channelsof two neighboring MM-net pattern representations. Indeed,by mapping every task into its corresponding Petri net-basedrepresentation, one can easily build a model formalizing theentire SAP Social Intelligence scenario. Notice, however, that,strictly speaking, textual patterns in Fig. 1 would need tobe formalized by either using CPNs [10] or DB-nets [21].In the first case, the control layer of MM-nets captures thewhole class of CPNs, resulting in their seamless adoptionwhen implementing multimedia EAI scenarios. In case ofDB-nets, however, one would need to study in more detailhow to implement the multi-model integration scenarios [20]that use both relational and multimedia databases. In ourcase, the textual Splitter and Content Enricher patterns do notrequire any database access and thus the whole scenario canbe implemented using MM-nets.Using the scenario in Ex. 1, we also identified a suitable wayto formally represent multimedia manipulating functions andsemantic operations using data types (cf. Sect. IV-A), SPARQLqueries (cf. Sect. III-A) and actions/queries in the data logiclayer (cf. Sect. III-B). Derived functions/queries/actions areproviding the full coverage of physical and logical operationsin Tab. I for studied patterns, and, moreover, can be seen aspattern-agnostic since they may be re-used in other image-based scenarios (formalized using MM-nets) as constructionprimitives, akin to pattern implementations.V. R
ELATED W ORK
Ritter et al. [19] showed that, for structured data, the onlyexisting formalization of integration patterns was studied byFahland et al. [10] using CPNs, and that was further extendedto cover a wider range of integration patterns with morerefined requirements in [21] using (timed) DB-nets. However,when considering multimedia integration patterns (cf. [20]),as briefly introduced in Sect. II, these works cannot be useddirectly (cf. requirements in Tab. II), due to their lack ofmultimedia data operations (cf. REQ1(a)), semantic operations(cf. REQ-1(b)), and partially their storage (cf. REQ-1(c)). Tothe best of our knowledge, there have been no attempts toformalize multimedia integration patterns (cf. [19]).
Formalisms for integration patterns . Although the scenarioin Ex. 1 is captured in BPMN [23], this modeling languageis not suitable for our requirements (especially REQs-1(a–c), 2). Yet, BPMN diagrams can be formally representedusing Petri nets [9]. Petri nets offer a good trade-off betweenuser-friendly graphical modeling and a toolbox for formalanalysis of produced models. There are many data-awareextensions of Petri nets (e. g., [2], [8], [16], [18]) allowingto account for more complex, structured data. However, theycannot be readily used for representing multimedia EAI forreasons similar to those discussed above. Alternatively, it ispossible to study other modeling requirements in which themultimedia message is treated as the first-class citizen, whilehe object storage is simply left out. Under this assumption,one could use the formalism of Petri nets with structureddata [2] (StDNs for short) for modeling multimedia EAI astokens in it carry XML documents that, in turn, could be usedfor representing multimedia metadata – the core concept of themultimedia message. The authors also delineate restrictionsrequired for the decidability of such properties as termination,coverability and boundedness. However, the formalism stilldoes not account for persistent data (violation of REQ-1(d) thatis crucial for some patterns) and would need to be extendedwith the support of functions.Mederly et al. [15] studied an approach for formalizingintegration patterns, in which messages are first-order formulasand patterns are operations that add and delete messages, andthat uses AI planning for finding an integration process witha minimal number of components. While this approach sharesthe formalization objective, MM-nets apply to a broader setof objectives (e. g., formal analysis, simulation) and covermultimedia data, semantics and storage (cf. REQs-1(a–c)).
Multimedia data . The approach for storing and queryingmultimedia data employed in this work is similar to the one inthe OCAPI system [6], which was developed for the semanticintegration of image data using knowledge bases. Retrieval ofmultimedia information from (distributed) databases is coveredin [5]. The multimedia semantics are represented by semanticattributes based on extended generalized icons with a logicaland physical representation on a database. While our approachseparates these different representations as well, [5] targetsextended normal forms and functional dependencies betweendifferent attributes and does not define user interaction withthe multimedia semantics on a business application-relevantfeature level that could be used for message processing. Morerecently, [14] developed an image similarity query mechanismin the area of multimedia queries in multimedia databases.While no query syntax is provided, the approach could beused to formulate decisions based on image similarity.VI. C
ONCLUSIONS
The previous work on EAI with multimedia data [19], [20]pinpointed two main issues in the domain. On the one hand, itargued that the integration patterns for multimedia scenariosare still not fully investigated, which can cause their wrongadoption in the EAI development stack. On the other hand,there is no formalization of these patterns that would allowto minimize design-time mistakes, facilitate the model-drivendevelopment and provide possibility for checking correctnessof the implemented multimedia EAI scenarios. In this work wefocused on the second issue and distilled a list of requirements(cf. Tab. II) for the formal representation of multimedia EAI.To address these requirements, we studied a formalism ofMM-nets that marries CPNs and multimedia databases, andthat allows to specify operations that manipulate both themultimedia objects and their metadata. The paper also presentshow MM-nets can be used for formalizing some of the mostfrequently used multimedia integration patterns. We believethat the formalism studied in this paper can also provide more insights on engineering multimedia EAI. Currently weare working on developing a CPN Tools-based prototype formodeling and simulating MM-nets and studying more in-depthformal analysis of MM-net models. In the future, it would bealso interesting to study a domain-independent language forrepresenting multimedia manipulation functions and creatingtheir repository in order to facilitate their adoption in differentmultimedia EAI scenarios.R
EFERENCES[1] S. Abiteboul, M. Arenas, P. Barcel´o, M. Bienvenu, D. Calvanese,et al. Research directions for principles of data management (dagstuhlperspectives workshop 16151).
Dagstuhl Manifestos , 7(1):1–29, 2018.[2] E. Badouel, L. H´elou¨et, and C. Morvan. Petri nets with structured data.
Fundam. Inform. , 146(1):35–82, 2016.[3] M. A. Bornea, J. Dolby, A. Kementsietsidis, K. Srinivas, P. Dantressan-gle, O. Udrea, and B. Bhattacharjee. Building an efficient RDF storeover a relational database. In
SIGMOD , pages 121–132. ACM, 2013.[4] M. Broy. Seamless model driven systems engineering based on formalmodels. In
Formal Methods and Software Engineering , pages 1–19.Springer Berlin Heidelberg, 2009.[5] S. Chang, V. Deufemia, G. Polese, and M. Vacca. A normalizationframework for multimedia databases.
IEEE Trans. Knowl. Data Eng.
AAAI , pages 1091–1099, 2017.[9] R. M. Dijkman, M. Dumas, and C. Ouyang. Formal semantics andanalysis of bpmn process models using petri nets.
QUT Tech. Rep. ,2007.[10] D. Fahland and C. Gierds. Analyzing and completing middlewaredesigns for enterprise integration using coloured petri nets. In
CAiSE
Enterprise integration patterns: Designing,building, and deploying messaging solutions . Addison-Wesley, 2004.[13] R. Kontchakov, M. Rezk, M. Rodriguez-Muro, G. Xiao, and M. Za-kharyaschev. Answering SPARQL queries over databases under OWL2 QL entailment regime. In
ISWC , pages 552–567, 2014.[14] S. Lin, M. T. ¨Ozsu, V. Oria, and R. T. Ng. An extendible hash formulti-precision similarity querying of image databases. In
VLDB , pages221–230, 2001.[15] P. Mederly, M. Lekav`y, M. Z´avodsk´y, and P. Navra. Construction ofmessaging-based enterprise integration solutions using AI planning. In
CEE-SET , pages 16–29, 2009.[16] M. Montali and A. Rivkin. Db-nets: On the marriage of colored petrinets and relational databases.
Trans. Petri Nets Other Model. Concurr. ,12:91–118, 2017.[17] J. P´erez, M. Arenas, and C. Guti´errez. Semantics and complexity ofSPARQL.
ACM Trans. Database Syst. , 34(3):16:1–16:45, 2009.[18] A. Polyvyanyy, J. M. E. M. van der Werf, S. Overbeek, and R. Brouwers.Information systems modeling: Language, verification, and tool support.In
CAiSE , volume 11483, pages 194–212. Springer, 2019.[19] D. Ritter, N. May, and S. Rinderle-Ma. Patterns for emerging applicationintegration scenarios: A survey.
Information Systems , 67:36 – 57, 2017.[20] D. Ritter and S. Rinderle-Ma. Toward application integration withmultimedia data. In
EDOC , pages 103–112, 2017.[21] D. Ritter, S. Rinderle-Ma, M. Montali, and A. Rivkin. Formal founda-tions for responsible application integration.
Information Systems , page101439, 2019.[22] D. Ritter, S. Rinderle-Ma, M. Montali, A. Rivkin, and A. Sinha.Formalizing application integration patterns. In
EDOC , pages 11–20,2018.23] D. Ritter and J. Sosulski. Exception handling in message-based integra-tion systems and modeling using BPMN.
Int. J. Coop. Inf. Syst , 25(2),2016.[24] F. Rosa-Velardo and D. de Frutos-Escrig. Decidability and complexityof petri nets with unordered data.
Theoretical Computer Science ,412(34):4439–4451, 2011.[25] C. Tzelepis, Z. Ma, V. Mezaris, B. Ionescu, I. Kompatsiaris, G. Boato,N. Sebe, and S. Yan. Event-based media processing and analysis: Asurvey of the literature.
Image Vis. Comput. , 53:3–19, 2016.[26] O. Zimmermann, C. Pautasso, G. Hohpe, and B. Woolf. A decade ofenterprise integration patterns: A conversation with the authors.
IEEESoftware , 33(1):13–19, 2016. A PPENDIX
A. Feature Detector ch in str × oid ADD I MG C NT ( id,a ) Detect str × oid Get FaceSegments str × oid × Set rect
UPD M ETADATA ( id, mmdb : faceSegment , getL ( s )) Update Face Metadata [ ¬ empty ( s )] Get ProductSegments [ empty ( s )] str × oid × Set rect
UPD M ETADATA ( id, mmdb : prodSegment , getL ( s )) Update Product Metadata [ ¬ empty ( s )] Finish [ empty ( s )] ch out str × oid (cid:104) id,a (cid:105) (cid:104) id,a (cid:105) (cid:104) i d , a (cid:105) (cid:104) id,a, detectIMG ( src ( a ) , ” faces ”) (cid:105) (cid:104) id,a,s (cid:105)(cid:104) id,a, rem ( s, getL ( s )) (cid:105) (cid:104) i d , a , s (cid:105) (cid:104) id,a, detectIMG ( src ( a ) , ” products ”) (cid:105) (cid:104) id,a,s (cid:105)(cid:104) id,a, rem ( s, getL ( s )) (cid:105) (cid:104) i d , a , s (cid:105) (cid:104) id,a (cid:105) Fig. 6. The control layer of a MM-net representing a Feature Detector
A Feature Detector is a pattern that updates metadata of anobject. More specifically, it uses concrete feature classifiersbased on which it retrieves data that are later on added tothe metadata storage. In our case, this pattern uses a pictureidentifier to access its metadata and to add information on howmany humans and products are in the picture, and, if any havebeen detected, provides data on the coordinates of segmentswhere human faces as well as products can be found.The net starts by executing transition
Detect that,in turn, calls action
ADD I MG C NT that has twoformal parameters id and a , and that upon firingalso adds two RDF triples to the metadata storage: ( id, mmdb : faceCount , countIMGs ( src ( a ) , ” human face ”:: L )) and ( id, mmdb : prodCount , countIMGs ( src ( a ) , ” product ”:: L )) .The net then proceeds with updating the metadata storagewith the information about coordinates of sub-images thateither contain human faces or products. By firing transition Get Face Segments , the net generates a set of segmentswith faces. Until this set is not empty, each of its elementsgets removed (using function rem ) and added to the meta-data storage with transition
Update Face Metadata . Thistransition calls action
UPD M ETADATA that has three formalparameters
UPD M ETADATA · params = (cid:104) id, l, seg (cid:105) , where id is an image identifier, l is an IRI and seg is a segment. Thisaction does not remove anything and adds to the metadatastorage only one triple ( id :: L , l :: I , seg :: L ) . In case of UpdateFace Metadata , UPD M ETADATA adds to the metadata (of animage with identifier id ) information about one face segmenttaken from set s that is specified with IRI mmdb : faceSegment .When the set of segments is empty, the net performs the sim-ilar procedure with product segments. That is, it first gets a setof all the product segments by firing Get Product Segments ,and then updates the metadata storage by consecutively firing
Update Product Metadata for each segment from the set.After all the updates are done, the net finishes its computationby firing
Finish and placing a token with the image identifierand address into place ch outout