[PDF] CheckSoft : A Scalable Event-Driven Software Architecture for Keeping Track of People and Things in People-Centric Spaces

Abstract

We present CheckSoft, a scalable event-driven software architecture for keeping track of people-object interactions in people-centric applications such as airport checkpoint security areas, automated retail stores, smart libraries, and so on. The architecture works off the video data generated in real time by a network of surveillance cameras. Although there are many different aspects to automating these applications, the most difficult part of the overall problem is keeping track of the interactions between the people and the objects. CheckSoft uses finite-state-machine (FSM) based logic for keeping track of such interactions which allows the system to quickly reject any false detections of the interactions by the video cameras. CheckSoft is easily scalable since the architecture is based on multi-processing in which a separate process is assigned to each human and to each "storage container" for the objects. A storage container may be a shelf on which the objects are displayed or a bin in which the objects are stored, depending on the specific application in which CheckSoft is deployed.

Full PDF

CC HECK S OFT : A S

CALABLE E VENT -D RIVEN S OFTWARE A RCHITECTURE FOR K EEPING T RACK OF P EOPLE AND T HINGSIN P EOPLE -C ENTRIC S PACES

Rohan Sarkar

School of Electrical and Computer EngineeringPurdue UniversityWest Lafayette, IN 47907 [email protected]

Avinash C. Kak

School of Electrical and Computer EngineeringPurdue UniversityWest Lafayette, IN 47907 [email protected]

February 23, 2021 A BSTRACT

We present CheckSoft, a scalable event-driven software architecture for keeping track of people-objectinteractions in people-centric applications such as airport checkpoint security areas, automated retailstores, smart libraries, and so on. The architecture works off the video data generated in real time bya network of surveillance cameras. Although there are many different aspects to automating theseapplications, the most difﬁcult part of the overall problem is keeping track of the interactions betweenthe people and the objects. CheckSoft uses ﬁnite-state-machine (FSM) based logic for keeping trackof such interactions which allows the system to quickly reject any false detections of the interactionsby the video cameras. CheckSoft is easily scalable since the architecture is based on multi-processingin which a separate process is assigned to each human and to each “storage container” for the objects.A storage container may be a shelf on which the objects are displayed or a bin in which the objectsare stored, depending on the speciﬁc application in which CheckSoft is deployed. K eywords People-centric Systems, Event Driven Architecture, Concurrent Software Architecture, Intelligent Systems,Finite State Machine Automata, Video Surveillance Systems, Airport Checkpoint Security, Automated Retail Stores,Video Analytics and Monitoring

This paper presents a scalable software architecture for automating video monitoring of people-centric spaces. We mustmention at the very outset that the purpose of this paper is not to address the computer vision issues related to detectingpeople, objects, and their interactions. On the other hand, our aim in this paper is solely to present an error-tolerant and scalable software design that usesﬁnite-state machine (FSM) logic capable of rejecting false detections reported by the underlying video cameras. As weshow in this paper, with FSM the system can quickly catch moment-to-moment discrepancies and inconsistencies in thedetections reported by the sensors.Whereas FSM allows for quick checks on the validity of the detected human-object interactions, the multi-processingdesign we have used for the architecture allows it to be scalable. Scalability means that the architecture wouldautomatically allow for an arbitrary number of people and objects to be present in the space being monitored withlatencies limited only by the computational power of the underlying hardware platform.With its high-level attributes as described above, a software architecture of the type presented in this paper can beexpected to ﬁnd applications in the new-age retail stores with no sales clerks and cash registers, smart libraries that Even though the computer vision aspects of the overall problem are not the focus of this paper, for the sake of validating thesoftware architecture presented here, we will show results using actual and simulated video streams. a r X i v : . [ c s . S E ] F e b HECK S OFT - F

EBRUARY

23, 2021allow a customer to simply walk out with the book desired, airport checkpoint security areas where it is important tokeep track of the passengers and their belongings, and so on. Additionally, since CheckSoft is capable of recording on acontinuous basis the history of people-object interactions in the monitored space, another important application of sucha software system is that it lends itself to an easy post-facto analysis of the log ﬁles for gaining insights into people’sreaction to the displayed objects. The results of such analysis may be used for a more productive arrangement of theobjects vis-a-vis the people.The application scenarios presented above might cause a reader to think that our work is similar to or overlapssigniﬁcantly with a rather popular research area: smart (or intelligent) spaces in which wireless sensors attached tothe objects and cameras are used to keep track of just the people or just the objects. The goal of the work presentedin this paper is different:

Keeping track of people-object interactions [1], in the sense that we want our system totrack ownership and possession relationships as people interact with the objects, exchange them, leave them behind inmonitored spaces when they shouldn’t, and so on. A generic software architecture for keeping track of people-object interactions should be able to accommodate thefact that the objects may present themselves differently in different applications. For example, in a retail store, theobjects are likely to be placed on the shelves where either each object is directly accessible to a customer or, when theobjects are stacked, the topmost object is directly accessible. On the other hand, in an airport checkpoint security area,the objects will be the personal belongings that the passengers divest in the bins that are then placed on the conveyorbelt. In this case, the objects will be in a heap in the bins or placed directly on the conveyor belt. In the softwaresystem presented in this report, we use the notion of storage to address at a generic level these differences betweenthe different applications. For retail store application, an instance of storage could be a shelf containing items. On theother hand, an instance of the same for airport checkpoint security would be a bin in which the passengers divest theirbelongings.In addition to the differences in how the objects may present themselves to the system in different applications, a systemlike ours must also be mindful of the fact that the acceptable rules for people-object interactions may be different indifferent applications. For example, in an airport checkpoint security application, passengers should be allowed tocollect their own items only at the end of the screening process. In a museum, visitors should not be allowed to touchany expensive exhibits, and, in a library, readers should not be allowed to misplace books in wrong shelves at the timeof returning the books.The software architecture we present in this paper meets the challenges described above and is also scalable at thesame time. We achieve scalability by using multiprocessing and concurrency. A unique process is associated with eachentity, that is with each individual and with each storage unit containing objects in the space being monitored. Thisdecentralizes the bookkeeping for keeping track of the different states of each individual and each storage container— the process assigned to each entity takes care of all such details. The decentralization also eliminates the risk ofdeadlock that may arise due to contention for the entity state information. The communication between the processes isimplemented using Message Passing Interface (MPI).Here is a summary of the main contributions of this paper:• CheckSoft can keep track of concurrent person-object interactions involving an arbitrary number of people and alsoan arbitrary number of objects in a monitored space in real-time using an architecture based on multi-processingand inter-module communications based on message passing. Each signiﬁcant entity in the system is assigned aseparate process. The overall design achieved in this manner eliminates the risk of any deadlocks that may arise dueto contention for the entity state information.• CheckSoft allows for an arbitrary number of video trackers to be plugged into the system on a plug-n-play basis. It isevent-driven and can operate asynchronously in real-time vis-a-vis the events detected from the video-feeds of anyarbitrary number of video cameras.• The architecture adheres to the time-honored principles of good object-oriented design and uses ﬁnite-state-machinebased logic for fault tolerance vis-a-vis any temporal discrepancies in the events detected by the sensors.• The architecture can be applied to a variety of applications that require tracking the interactions between people andobjects by only making minor modiﬁcations to the application-speciﬁc FSM logic and without any major architecturalchanges.CheckSoft was validated with data from actual video cameras. The scalability of CheckSoft was validated with asimulator. By ownership of an object, we mean the ﬁrst individual identiﬁed as possessing the object. HECK S OFT - F

EBRUARY

23, 2021In the rest of this report, we start in Section 2 with a quick review of the literature that has two parts to it: the ﬁrstpart deals with the software engineering literature that has guided the design of CheckSoft, and second part with theliterature related to keeping track of people and objects in video monitored spaces. Subsequently, in Section 3, wethen introduce the principal data structures of the system for representing the relational information in the system. Thedeﬁnitions presented for these data structures should give the reader a sense of the generality of the software architecture.Finally, the overall system architecture is summarized in Section 4 and presented in details in Section 5. In Section 6,we verify scalability and deadlock-free operation for the proposed architecture. In Section 7, we validate the operationof CheckSoft with actual video trackers and test the scalability and robustness of CheckSoft using a simulator.

We will ﬁrst present the literature that guided our software design in Section 2.1 and then compare related softwaresystems in Section 2.2.

Ning et al. [2] states that the two primary aspects of a complex software system are the components , which would bethe basic building blocks of the system, and the architecture , which describes how the individual components interactso that the overall system possesses the desired behavior. There is a close relationship between

Component-BasedSoftware Engineering (CBSE) [2] and

Modular programming [3] that emphasizes the importance of dividing the overallfunctionality of a complex software system into individual functional components that can be developed separately suchthat each module contains all that is needed to execute a particular aspect of the desired overall functionality. CheckSoftis comprised of different modules/components, each responsible for a distinct functionality as explained in Section 4.2.CheckSoft components can produce or consume events and, through the events, work collaboratively to achieve thedesired overall behavior, as we explain in Section 4. The resulting architecture is what may be referred to as an event-driven architecture (EDA). In a wide ranging survey of event based systems by Hinze et al [4], the authors discussthe role and the modalities for event processing in reactive applications. Etzion et al. [5] discuss the necessity ofincorporating event-driven functionality in software systems that must exhibit on-demand and just-in-time behavior.Since in EDA, the event producers are unaware of the nature and the number of event consumers, there is a low couplingbetween them which contributes to extensibility as new consumers can be added as and when required as well as the scalability of such systems [6].In such systems, an event is deﬁned as a signiﬁcant change in state [7] that, in general, must result in the execution of acertain functionality by what is commonly referred to as an event handler . Anvur et al. [8] and Wagner et al. [9] discussvarious ﬁnite-state-machine (FSM) based approaches for designing event handlers for real-time software systems. Theevent handler modules in CheckSoft are also based on FSM based logic that is capable of fast rejection of “illegal”statechanges reported by the video monitoring system.Of the various applications where the EDA approach to software architecture design has been found to be effective, thetwo that are closest to CheckSoft are IoT and smart environments [10]. The networks used in IoT generally involveheterogeneous devices capable of generating a large number of events and it is necessary to integrate, process, andreact to the events on the ﬂy. Almeida et al. [11] have proposed a distributed hierarchical architectural model based onSituational Awareness that can support scalability, ﬂexibility, autonomy and heterogeneity demands in distributed IoTenvironments. With regard to EDA for smart environments, Roda et al. [6] have proposed an architecture for a scalableand collaborative ambient intelligence environment designed for applications such as smart homes, hospitals, healthmonitoring and for daily life assistance.That brings us to the previous work in which researchers have addressed the concurrency issues that arise when thereis a need to process in real-time a large number of events simultaneously [12], as is the case with CheckSoft. Thereare generally two important aspects to such software systems: the ability to detect and respond to events occurring inany random order, and ensuring that the software responds to these events within some required time interval. Oneof the most notable features of CheckSoft is its ability to process concurrent events in a scalable fashion by usingmultiprocessing with a distributed memory model as discussed in Section 4.1.

In this subsection, we brieﬂy review other software systems that are related to our work in the sense that these systemsinvolve software architectures for making real-time inferences from video streams of surveillance data. As an illustration,Vezzani et al. [13] have proposed a Service Oriented Architecture (SOA) that uses event-driven communications to3

HECK S OFT - F

EBRUARY

23, 2021analyze video feeds from multiple cameras for detecting and classifying faces, postures, behaviors, etc. In that sense,this system is mostly for monitoring people, as opposed to monitoring people-object interactions as we do in CheckSoft.There also exists a commercial framework for video surveillance, the IBM Smart Surveillance Engine [14] that iscapable of generating real-time alerts for events triggered by changes in the locations of objects. This system again isnot about tracking human-object interactions as we do in CheckSoft.More closely related to CheckSoft — more closely from the standpoint of the end purpose of the software architectures— are the works described in [15] and [16] for video surveillance meant for airport checkpoint security, and in [17], [18],and [19] for retail store automation. The architecture for airport checkpoint security reported in [15] attempts to detectthe associations between the bags and the passengers using a rudimentary FSM that is speciﬁc to passengers divestingobjects and reclaiming them at airport checkpoint security. The manner in which this logic is implemented and, also,lack of multi-processing signiﬁcantly impacts the scalability of that work. The goal of the work reported in [16] is evenmore limited — it only seeks to detect abandoned bags. To compare, CheckSoft addresses system scalability withregard to both the number of people and the number of objects with multi-processing and message passing — this wasone of the most important considerations in the design of the CheckSoft architecture. Equally important in CheckSoft isthe extendibility of the system with regard to the types of interactions between people and objects.With regard to the previous work on retail store automation, the work reported in [17] and [18] is about just categorizingactions, as opposed to tracking human-object interactions in a fault-tolerant and scalable framework as accomplished byCheckSoft. Another contribution worth mentioning in the context of retail store automation is the work reported in [19]that deals with using cameras and weight sensors for cashierless grocery shopping. The main focus of such a systemis to identify object transfers from the shelves to the customer baskets and vice versa. On the other hand, the designwe have used for CheckSoft allows the same architecture to be used for different applications (to name just two forhighlighting the variety: automatic libraries and airport checkpoint security) with just a tweak of the FSM states andstate transitions.Another area of research that is tangentially related to CheckSoft is video-based security monitoring of what are knownas cyber-physical spaces. By cyber-physical space is meant a monitored space with access control. In addition tophysical assets, such spaces also may contain cyber assets that need to be protected from unauthorized intrusions bypeople. Greaves et al. [20] have shown how a virtual perimeter based on different types of sensors, including videocameras, can be used for access control. Another relevant publication in this area is by Tsigkanos et al. [21] where theauthors have shown how the physical layout and the topology of the space involved can be analyzed for the level ofprotection offered by the sensors used against potential threats. Their proposed analysis techniques also includes thedynamics of people movement in the spaces. The approaches used in such systems for detecting unauthorized behaviorsby people are generally based on violations of access control and on spatial modeling of people movement. Theseconcepts are not applicable to the goals for which CheckSoft is designed.

CheckSoft is about keeping track of the different entities that are present in a space that is being monitored, and, evenmore importantly, about keeping track of the interactions between those entities. We refer to the interactions — such aswhen a human picks up an object — as events. Therefore, we need data structures for the entities and for the events. Inkeeping with the best practices in modern programming, we represent these data structures by class hierarchies. Thisorganization of the classes as shown in Figs. 1 and 3 allows the functionality that is common to all the classes to beplaced in the root class, which makes it more efﬁcient to maintain and extend the code for different applications.In the two subsections that follows, Subsection 3.1 presents the

Entity class hierarchy and Subsection 3.2 the

Event class hierarchy.

As shown by the inheritance hierarchy in Fig. 1, we use the Java class

Entity as the root class for all the entities thatCheckSoft is required to keep track of, these being the individuals, the objects, the storage units, etc.Here is a brief description of the contract of each class shown in Fig. 1:

Entity ( Entity ): As mentioned previously, this is the parent class of all the entity related classes in CheckSoft.

Human Entity ( HumEnt ): The subclass

HumEnt serves as a base class for different types of human entities that may bepresent in the space being monitored with the video cameras. As to what these different types of human entities wouldbe depends on the application. In a retail application, the different subclasses of

HumEnt would represent the customersand the different categories of employees in the store. For an automated library, the same would be the users and the4

HECK S OFT - F

EBRUARY

23, 2021Figure 1:

The class diagram depicting the inheritance hierarchy for the different entities in CheckSoft. librarians, and so on. For airport checkpoint security, the different subclasses of

HumEnt would be the passengers andthe TSA agents.

Storage Entity ( StoEnt ): The subclass

StoEnt can be used as the parent class of other storage related subclasses thattell us how the objects in the space being monitored present themselves to the humans. The objects could be on shelves,in bins, on the ﬂoor, etc. An instance of

StoEnt may be real or virtual. An example of a virtual instance of

StoEnt would a heap of objects created by a passenger dumping his/her belongings directly on a conveyor belt.

Object Blobs ( OBlob ): This subclass in the

Entity hierarchy can be used as the parent class for representing differenttypes of objects in the space being monitored. Ordinarily, one would expect an

OBlob instance to represent a singleobject in the space being monitored. However, it is not always possible to discern objects individually. For example,for the airport checkpoint security case, when a passenger divests all his or her belongings on, say, the conveyor belt,all that the cameras would be able to see would be a blob of pixels occupied by the heap. So an instance of

OBlob represents what the system “thinks” is an object. It is possible for such an instance to implicitly represent a collection ofobjects that are not visible separately.The reader will notice the attribute ID of the Entity root class in Fig. 1. This attribute, inherited by all the subclasses,is a unique integer that is assigned to each instance of type

Entity . Our explanation of CheckSoft in this paper uses thenotation H i to refer to that HumEnt whose ID attribute has value i . Along the same lines, the notation O j will stand foran OBlob instance that was assigned the ID value j . And the k th instance of StoEnt in an explanation will be referredto as S k . In our diagrams, we will use the iconic representations of the different entities as shown in Fig. 2.Figure 2: Iconic representation of the different entities in CheckSoft.

Table 2 in the Appendix elaborates on the attributes for the classes in the inheritance hierarchy. Obviously, for eachchild class, what is shown for it speciﬁcally in the table is in addition to what it inherits from the base class

Entity . As mentioned at the beginning of Section 3, CheckSoft is about keeping track of the entities and the events that aregenerated when there is any object related interaction between the entities. This subsection will describe the classhierarchy for the events.However, before presenting the hierarchy of classes for the events, it is important to mention that we assume thata video-camera client of CheckSoft can track the individuals and identify the object blobs that the individuals are5

HECK S OFT - F

EBRUARY

23, 2021Figure 3:

The class diagram depicting the inheritance hierarchy for the different events in CheckSoft. interacting with and do so on a continuous basis. We assume that the events pertaining to humans interacting with theobjects are all hand-based. We also assume that an interaction between a human entity and an object entity that is in astorage entity starts with a

HandIn event and ends with a

HandOut event as explained below.More speciﬁcally, we assume that the video-camera clients that are used to monitor the space have continuously runningprocesses that can detect the following events:1. A human entering the monitored area triggers the

HumanEnter event and the human exiting the area triggersthe

HumanExit event.2. When a human instantiates a new storage entity (such as a cart) or “returns” a storage entity, either the

StorageInstantiate or the

StorageReturn event is triggered as the case may be. A storage entity isconsidered returned when it is empty and its user has exited the monitored area.3. When an object is placed in a storage entity or taken out of it, that triggers a

StorageUpdate event whichupdates the content list of the storage entity. If the space being monitored involves shelves for storing theobjects, each shelf would require its own camera for detecting such events.4. A human hand reaching inside a storage entity triggers a

HandIn event and the hand being pulled back triggersa

HandOut event. In Fig. 4, the direction of the arrow on the hand extension of the HumEnt icon indicates thedirection of the hand movement with respect to the storage container. The direction of the arrow should helpthe reader ﬁgure out whether the corresponding event is

HandIn or HandOut .The

Event class hierarchy is shown in Fig. 3. As shown there, we use the Java Class

Event as the root class for allevents in

CheckSoft that are detected by video-camera clients. Different applications of

CheckSoft will differ withregard to the types of events and the entities involved in the events. Table 3 in the Appendix elaborates on the differentclasses in the inheritance hierarchy shown in Fig. 3. This organization of classes allows a user to introduce new types ofevents in

CheckSoft with ease as and when required.

Fig. 4 illustrates an example of the different entities involved and the events detected in a people-centric space. Thespace being monitored can be divided into three main regions: the entry area, the exit area, and the interaction areawhere people interact with objects. The three different types of entities –

HumEnt , StoEnt and

OBlobs — are shown,each with its own unique integer. The icons used for the entities are as indicated in Fig. 2.The example shown in Fig. 4 will generate several different events simultaneously. In the entry and exit areas, the entry ofHumEnt H and the exit of HumEnt H will generate the HumanEnter and

HumanExit events, respectively. Similarly,HumEnt H instantiating StoEnt S and HumEnt H returning StoEnt S will result in the StorageInstantiate and

StorageReturn events being detected. In the interaction area of Fig. 4, many individuals could be interacting atthe same time and hence several

HandIn , HandOut and

StorageUpdate events would be generated simultaneously.It could be that HumEnt H has previously placed the object represented by OBlob O in the StoEnt S and is nowretrieving his/her hand, which would trigger the HandOut event. Along the same lines, HumEnt H appears in theprocess of placing an object in StoEnt S , which would trigger the HandIn event; and so on. The

StorageUpdate event is generated whenever the content information of any StoEnt needs to be updated.6

HECK S OFT - F

EBRUARY

23, 2021Figure 4:

An illustration of the different entities and the events detected in the monitored space.

Focusing on a speciﬁc subset of the interactions in Fig. 4, Fig. 5 illustrates how the state information for the entities isstored in the different data structures. We will consider the interactions involved in HumEnt H transferring OBlob O from StoEnt S to StoEnt S in Fig. 4.There are two interactions relevant here: HumEnt H removing OBlob O from StoEnt S between time t p to t q ,followed by another interaction in which the same HumEnt places the same OBlob in StoEnt S between time t r to t s .Each interaction starts with a HandIn event and ends with a

HandOut event as shown in Fig. 5(a). Fig. 5(b) shows allthe entity instances relevant to the two interactions.As shown in Fig. 5(b), the StoEnt instances maintain the timestamped information related to the OBlobs instances theycontain in the

Content data attribute. In Fig. 5(b), before the interaction starts with StoEnt S at t p , S . Content = { , , } as StoEnt S , contains the OBlobs O , O and O and similarly after the interaction ends at t q , we have S . Content = { , } . On a similar note, before the interaction starts with StoEnt S at t r , we have S . Content = { , } as can be seen in Fig. 5(b). Similarly, after the interaction ends at t s , we have S . Content = { , , } .One important point to note here is that during the time of the interaction, when the hands are inside the storage area,the hands may occlude the objects from the camera. Therefore, if the video-trackers generate a StorageUpdate eventduring the interaction, the content information reported would be inaccurate. It must be ensured that the state of StoEnt S k is not corrupted when the data reported by the video-trackers is unreliable.The S k . Update attribute shown in Fig.5(b) allows updates to the S k . Content attribute only when it is set to True. To prevent the state of S k from being(a) (b)Figure 5: A pictorial description of HumEnt H transferring OBlob O from StoEnt S to StoEnt S is shown in (a).An illustration of how the state information of the different entities due to the interactions is stored is shown in (b) HECK S OFT - F

EBRUARY

23, 2021updated during the interaction when a

HandIn event involving S k occurs, S k . Update transitions from True to Falseand, subsequently, when a

HandOut event occurs, S k . Update transitions back to True.Each HumEnt instance stores locally the ‘before’ and ‘after’ state of the storage containers that it is interactingwith. These are stored in the data attributes

BufferBefore and

BufferAfter of the HumEnt instance. In ourexample, the HumEnt instance H will store the ‘before’ and ‘after’ contents of the StoEnts S and S in the variables H . BufferBefore and H . BufferAfter . A unique key is associated with each such stored state that depends on theIDs of both the HumEnt and the StoEnt. In our example, the key associated with what is stored in H . BufferBefore and H . BufferAfter would be < , > for the ﬁrst interaction and < , > for the second interaction. In these two keys, is the HumEnt ID and and are the StoEnt IDs involved in the interactions.An attribute common to all the entities is Ownership that consists of two lists: the

OwnedBy list in which the informationregarding who owns the corresponding entity is stored, and

Owns in which the information regarding what entitiesthe corresponding entity owns is stored. For example, as shown in Fig. 5(b), HumEnt H owns StoEnt S and henceHumEnt H appears in the OwnedBy list of S . Ownership and StoEnt S appears in the Owns list of H . Ownership .Similarly, the reader can see that the StoEnt S is owned by some other HumEnt H . HumEnt H owns Oblobs O , O and O and therefore this appears in the Owns list of H . Ownership .The data attributes H . ActionQ and H . InferenceQ shown in the left panel in Fig. 5(b) are hash tables of queuesstoring the interaction history for each OBlob that HumEnt H interacted with. The interaction history, in the exacttemporal sequence of occurrence, is stored for each OBlob in a separate queue. Each interaction is recorded in thefollowing form: [ t - action/inference - k ]where t is the time of the interaction and S k the StoEnt involved in the interaction.Regarding the two queues mentioned above, in H . ActionQ , we store the elementary actions such as adding orremoving OBlobs. On the other hand, in H . InferenceQ , we store the inferences and anomalies detected by the logicof CheckSoft. To illustrate this, for both H . ActionQ and H . InferenceQ , we will consider the entries for key inFig. 5(b), which is the OBlob involved in the interaction. Since OBlob O was transferred from StoEnt S to S , theelementary actions in this case are Remove and

Add and these are stored in H . ActionQ as the following string:[ t q - Remove -2], [ t s - Add -3]The inferences are stored in H . InferenceQ as:[ t q - MoveFrom: Own Bin - 2], [ t s - MoveTo: Other Bin - 3]The ownership relationships

Own Bin and

Other Bin are derived from the fact that StoEnt S is owned by HumEnt H but StoEnt S is not, as shown in H . Ownership . This interaction in which the OBlob O was transferred to StoEnt S which belongs to some other HumEnt is an example of what will be detected as an anomalous interaction by CheckSoft. The goal of this section is to give the reader a top-level view of the architecture of CheckSoft and the rationale underlyingthe architectural design. Subsequently, a more detailed presentation of the architecture that describes the roles of thedifferent components will be presented in Section 5.With regard to the top-level view presented in the next three subsections, we ﬁrst describe in Section 4.1 how concurrencyis exploited for handling the interactions that may occur simultaneously, and, subsequently, in Section 4.2 we present themajor modules of the event-driven software architecture. Finally, we provide a high-level overview of how Checksoft’sﬁnite state machine based logic provides signiﬁcant immunity against noisy data reported by video-trackers in Section4.3.

As illustrated in Fig. 6(a), the data produced by the video camera clients (displayed by the 6 icons in the top rowin the ﬁgure) is inherently parallel. Most applications of CheckSoft will involve several video cameras monitoringdifferent sections of the space as shown in Fig. 4. Additionally, any number of individuals may be interacting with anarbitrary number of objects at any given time. Therefore, one can expect an arbitrary number of events to be generatedconcurrently. In order to not introduce undesirable latencies in reacting to these events, they would need to be processed8

HECK S OFT - F

EBRUARY

23, 2021(a)

Each

HumEnt and

StoEnt entity is allocated a separate worker process. The master process dispatches eventsdetected by the video-trackers to the different worker processes. The worker processes then run the event handlingmodules shown below and communicate using MPI as and when required. (b)

The different CheckSoft modules that handle concurrent events are shown in grey boxes. (The subscripted notation H i refers to the i th HumEnt instance, O j refers to the j th OBlob instance and S k refers to the k th StoEnt instanceinvolved in the corresponding events.)

Figure 6:

The Overall Architectural Diagram. in parallel. Obviously, any attempt at exploiting this parallelism must not result in memory corruption and deadlockconditions should the computations being carried out in parallel access the same portions of the memory.The two basic approaches to achieving concurrency in software are multi-threading and multi-processing. Since threadscan be thought of as lightweight processes, one main advantage of multithreading is that it is faster to create and destroythe threads and faster to access the objects in the memory that are meant to be shared by the threads. However, throughsynchronization and semaphores, multithreading requires greater care in ensuring that the memory meant to be sharedby the different objects is not corrupted by the threads accessing the same object simultaneously.In CheckSoft, we have chosen to use multiprocessing instead. As each

HumEnt or StoEnt instance is created, it ishanded over to a separate process by the master process that runs CheckSoft. This decentralizes the bookkeeping forkeeping track of the different states of each individual and each storage container, as the process assigned to each entitytakes care of all such details. This is one of the important reasons for why our software can easily be scaled up. As is tobe expected, a collection of processes for similar entities, such as all the HumEnt entities, are clones of the same basicprocess. We refer to such a collection as a process group . For the purpose of explanation, and as shown in Fig. 6(a),9

HECK S OFT - F

EBRUARY

23, 2021we use the notation

HumEnt_Group to refer to the HumEnt processes and the notation

StoEnt_Group to refer to theStoEnt processes.CheckSoft uses multiprocessing with a distributed memory model in which every process in the system has its ownprivate memory. If an event involves any of the HumEnt and/or StoEnt instances, the corresponding worker processexecutes the appropriate event handling module described in Section 4.2 and updates the corresponding entity stateinformation in its own private memory as shown in Fig. 6(a). Since all of the computations carried out by a processonly involve the private memory of the process, it follows that the processes that are in charge of analyzing thehuman-object interactions must somehow become aware of both the humans and the objects involved.

As opposed tousing a shared-memory model, we take care of such inter-process interactions through the communication facilitiesprovided by MPI (Message Passing Interface).

CheckSoft uses the standard MPI for intra-communication within each process group and for inter-communicationbetween different process groups. MPI provides us with what are known as communicators for those purposes.The messaging achieved with intra-communicators within each process group can work in both point-to-point mode,in which each process sends a message to one other speciﬁc process, and in broadcast mode, in which a processsends a message to all other sibling processes. The messaging achieved with inter-communicators can work only inpoint-to-point mode in which a process of a particular group sends a message to a chosen process in another processgroup.For an example of within-group communications with intra-communicators, let’s say that some aspect of the state of all

StoEnt entities needs to be updated at the same time, this would require using an intra-communicator in the broadcastmode. And for an example that requires an intra-communicator in a point-to-point mode, consider an object that hasbeen transferred from one

StoEnt instance to another

StoEnt instance. In this case, while the receiving StoEnt wouldknow directly from the video cameras that it was in possession of a new object, it would nonetheless need to heardirectly from the sending

StoEnt instance for conﬁrmation.The master

CheckSoft process uses the MPI’s default communicator,

Default_comm , to dispatch events to allworker process groups, meaning the processes in

HumEnt_Group and in

StoEnt_Group as shown in Fig. 6(a). Wedenote the intra-communicators within the process groups

HumEnt_Group and

StoEnt_Group by HumEnt_comm and

StoEnt_comm , respectively. As mentioned earlier, in addition to facilitating within-group communications, theintra-communicators are also needed for the functioning of the inter-communicators.A particularly useful application of the inter-communicator is in fetching content data from a process in the

StoEnt_Group when a

HandIn or a

HandOut event is recorded (because, say, a human divested an object in abin). The data and the entities involved can subsequently trigger the downstream ﬁnite-state based logic for checkingthe legality of the event and the legality of the consequences of the event.As to how the communicators are actually used for passing messages between the processes and fetching data andresults from the processes, that is accomplished with an MPI based function gptwc() , whose name stands for: “GeneralPurpose Two-way Communication”. This function is general purpose in the sense that it can be used to establish atwo-way communication link between any two processes. The process that wants to establish a communication link iscalled the initiator process and the other endpoint of the communication link the target process . An initiator processsends data to a target process and indicates what operation needs to carried out on the sent data vis-a-vis the data thatresides in the private memory of the target process. The target process acts accordingly and sends the result back to theinitiator process. The implementational details and how this function is invoked is presented in Appendix B .

As mentioned in Section 2.1, CheckSoft is comprised of different modules, each responsible for a distinct functionality,such as extracting person-object ownership information from the data; establishing and updating associational infor-mation related to the possibly changing relationships between the objects and the storage units; monitoring variousinteractions in the environment; detecting anomalies; and so on.The highly modular architecture is event-driven as shown in Figure 6(b). The events listed on the left side of the ﬁguretrigger speciﬁc event-handler modules shown on the right side. The most critical components of CheckSoft are theInteraction and Inference modules which are responsible for analyzing the events and drawing inferences regarding theoutcome of each person-object interaction. Each submodule within the Interaction and Inference Module is a ﬁnite statemachine (FSM) that works independently of the FSMs in all other submodules — independently in the sense that theFSM in a submodule cares only about the changes in the state information of the entities involved in the event thattriggered the particular FSM. 10

HECK S OFT - F

EBRUARY

23, 2021The use of FSM logic allows us to endow the modules with event-driven behaviors that possess efﬁcient (withpolynomial-time guarantees) implementations. As a result of employing an event-driven state machine model, theevent handlers remain idle while waiting for an event, and then, when an event occurs, the relevant event handler reactsimmediately. This approach can be thought of as a set of computational agents collaborating asynchronously usingevent-based triggers. Additionally, an FSM based implementation also allows for easy scalability with regard to all thevariables that require an arbitrary number of instantiations and additionally provides ﬂexibility to easily extend softwarefunctionalities as and when required.CheckSoft consists of the following four modules:

Video Tracker Data Recorder (VTDR) Module:

It is this module that the video camera clients talk to directly. Asmentioned earlier, CheckSoft is meant to sit on top of a one or more video camera clients installed in the space beingmonitored. As we will present in Section 5.1, the VTDR module provides a Java RMI (Remote Method Invocation)[22] based E-API (Extension-API) that a video camera client can use for directly sending its event detection results toCheckSoft. VTDR associates a time stamp with every detection reported by a video camera client, and, as the readerwill see later, that plays an important role in the temporal sequencing of the events that may be received simultaneouslyfrom multiple video cameras. The VTDR module records the event and storage content information from multiplevideo-trackers in an encoded format into a concurrent unbounded persisted queue named

EventSeq with micro-secondlatencies.

Event Dispatcher Module:

This module dispatches the encoded event information in the

EventSeq queue sequentiallyin the temporal order of occurrence for further processing. This triggers the different event handlers (and thus themodules shown on the right side of Fig. 6(b)).

Interaction Module:

This module is triggered by the

HandIn and

HandOut events involving

HumEnt and

StoEnt instances and detects the elementary interactions such as the addition and the removal of

OBlobs to and from

StoEnt instances. This module has two submodules:1.

Association Submodule:

This module processes the

HandIn and

HandOut events and, by monitoring changesin the

StoEnt content due to an interaction, establishes associations between the

OBlob , StoEnt and the

HumEnt instances.2.

Interaction State Machine Submodule:

This module implements the ﬁnite-state machine based logic tokeep track of the elementary interactions between the

HumEnt instances and the

OBlob instances that arepresent in the different

StoEnt instances. The ﬁnite-state machine logic associates different states with theentities and checks on the legality of the state transitions — and, thus, adds robustness to the operation ofCheckSoft.

Inference Module:

This module specializes in making inferences from the elementary interactions of the

HumEnt instances with each of the

OBlob instances that the

HumEnt interacted with. After each interaction, this module istriggered by a

HandOut event (after the elementary action involving the

HandOut event is recorded by the InteractionModule). This module is also triggered by the

HumanExit events. When the inference module is triggered, that causesthe

Data Extractor submodule to extract the interaction history recorded by the Interaction module in a time-sequentialmanner for each

OBlob instance the

HumEnt interacted with and sends this data to two concurrently running submodules:1.

Inference State Machine Submodule:

This submodule implements the ﬁnite-state machine based logic toinfer higher level interactions from the elementary interactions involving the

HumEnt , OBlob and the

StoEnt instances.2.

Anomaly Detector:

This submodule detects anomalous interactions (such as when a non-owner

HumEnt interacts with an

OBlob and/or

StoEnt instance) and raises appropriate alarms and warnings.Since the higher level interactions as well as the type of anomalies vary with the speciﬁc application that our architectureis applied to, we may need to tweak the ﬁnite-state machine logic in this module. However, it is important to note thatthese changes in the logic are expected to be minor and should not require any changes in the overall architecturalframework of CheckSoft.

The events listed on the left side of Fig. 6(b) are generated by the video-trackers. Such events generally are noisy dueto occlusion, missed and false detections, inaccurate localization of objects [20], and ineffective tracking under noisy It is implicitly assumed that the video camera clients are running the NTP protocol as a background process for staying globallysynchronized HECK S OFT - F

EBRUARY

23, 2021conditions [23]. Hence, CheckSoft needs to be designed such that it can provide some immunity against noisy eventsdetected by the video trackers. This section just qualitatively describes how the software handles different sources ofnoise and a quantitative analysis is shown in Section 7. The former part of this section brieﬂy describes how CheckSofthandles false and missed detections and the latter part describes how CheckSoft ensures that the entity state is notcorrupted by erroneous events reported by video trackers.CheckSoft is designed to be robust with respect to missed detections by the video cameras and also with respect toany false detections. To give the reader a sense of what we mean by missed and false detections, note that all theobjects in a

StoEnt instance may not be detected accurately in every image frame, as can be seen in Figure 7(a). Thesevariations in the detected objects might lead to inconsistencies. An additional source of difﬁculty arises from the factthat when a hand that is trying to reach an object in a

StoEnt moves too suddenly, it may not be detected in one ormore frames until the image of the hand stabilizes. CheckSoft provides some protection against such effects by ﬁlteringout momentary ﬂuctuations in the detections reported by the video trackers.Yet another source of difﬁculty is that when a human hand reaches into a

StoEnt instance it will block the objects inthe container from the camera. This is illustrated in Figure 7(b). Therefore, to eliminate the inconsistencies introduceddue to occlusion, the

Interaction Module in Section 5.3 only considers the object content before a hand has enteredthe storage area and after the hand has left.The state of the entities are updated by the frequently occurring

HandIn , HandOut and

StorageUpdate events. It iscritical to ensure that the state of the entities don’t get changed by erroneous events reported by the video trackers sothat the higher-level reasoning logic of CheckSoft functions as desired. This is done by enforcing consistency in theﬁnite-state logic between the different events related to the same overall person-object interaction as shown by the statediagrams in Fig. 8. In Fig. 8, a state is represented by the grey boxes, the event or condition that needs to be satisﬁedfor a state transition is shown in red and the corresponding output as a result of the transition is shown in blue alongsidethe arrows.The state diagram in Fig. 8(a) illustrates how CheckSoft rejects noisy content information reported by video-trackers.The

HandleUpdate module in Fig. 6(b), appends the

Content data attribute with the latest content of the StoEnt S k with the corresponding time-stamp t when a StorageUpdate( S k , t ) event occurs only if the data attribute S k .Update is set to True . This ensures that the content information for any StoEnt is updated only when reliable data is reportedby video-trackers. As a case in point, the video-trackers might report erroneous content information during the time ofan interaction when objects inside the storage area are occluded by hands. To provide protection against this, when any

HandIn event involving StoEnt S k occurs, S k . update is set to False thereby blocking any updates to state of S k duringthe interaction. When a HandOut event occurs involving the same StoEnt, S k . Update is set to True thereby allowingupdates to the state of S k after the interaction is over.The state diagram in Fig. 8(b) illustrates how noisy HandIn , HandOut events as well as false interactions reported byvideo-trackers are ﬁltered by the

Interaction Module . As mentioned before, every interaction starts with a

HandIn event and ends with a

HandOut event. Typically a video-tracker monitoring any storage area should report a

HandIn event when motion is detected and subsequently report a

HandOut event when the motion has completely subdued instorage area.

CheckSoft considers an interaction to be valid only if both of these events involving the same HumEntand StoEnt have been reported in the correct temporal sequence by the video trackers. The

Association Submodule (a) (b)Figure 7:

Inconsistencies in object detection due to sudden changes in ambient conditions and inconsistencies in handdetection due to motion artifacts and sudden motion are illustrated in Fig. (a). Inconsistencies arising when objects arepartially or completely occluded by hand(s) are illustrated in Fig. (b) HECK S OFT - F

EBRUARY

23, 2021(a) (b)Figure 8:

State Diagrams showing how CheckSoft is designed to be robust to noisy data reported by video-trackers. Thestate diagram for rejecting noisy storage content information is shown in (a). The state diagram for rejecting noisyHandIn, HandOut events and false interactions is shown in (b) is designed to ﬁlter out the noisy hand-based events as shown by the black arrows in Fig. 8(b). The Association Modulethereby only processes the hand-based events for a valid interaction and consequently fetches the content before andafter the interaction from StoEnt S k and ﬁnally stores it in H i . ContentBefore and H i . ContentAfter respectively.Also, it is very common for the video-trackers to report false interactions, where an interaction didn’t actually happenbut the video trackers report false

HandIn and

HandOut events indicating an interaction has happened. For such cases,there is no change in the content since an actual interaction has not happened. The

Interaction State Machine (shown by the magenta arrows in 8(b)) analyses the content before and after the interaction and records the interaction in H i . ActionQ for further analysis only if there is some change in the content. Otherwise, the interaction is not recordedand thereby false interactions are ﬁltered out. Only the interactions recorded are further considered for analyzinginferences and detecting anomalies by the

Inference Module and recorded in H i . InferenceQ (shown by the greenarrow).

The previous section, Section 4, provided a high-level summary of the modular architecture of CheckSoft and alsotalked about how data parallelism is exploited through concurrent processing as made possible by the MPI standard.The goal of this section is to present a more detailed look at each of the CheckSoft modules.

We start by reminding the reader that video tracking per se is outside the scope of the system architecture we present inthis paper. That is, we assume that a video tracker unit is implemented externally and provides our system with theinformation related to the occurrence of various kinds of events mentioned in Section 3.2. As mentioned before, anyarbitrary number of such video-trackers might be employed to monitor a space and thereby it is critical to design aninterface that would allow CheckSoft to operate vis-a-vis any arbitrary number of video cameras.The main purpose of the VTDR module shown in Figure 9(a) is to provide a Java’s RMI (Remote Method Invocation)[22] based plug-n-play interface for the software clients running video cameras. In the language of RMI, a client refersto the machine that wants to invoke a certain functionality in the namespace of another machine, which is referred to asthe server . The distributed object programming made possible by RMI is founded on the tenet that to a client a remoteobject should appear as if it were available locally. 13

HECK S OFT - F

EBRUARY

23, 2021(a) (b)Figure 9:

The plug and play architecture for incorporation with video trackers is shown in (a). A high-level overviewillustrating the mechanism of recording and dispatching event information using the EventSeq Queue is shown in (b).

In other words, after a client has located the remote object on the server, the client’s invocation of the methods of theserver object should have the same syntax as the methods of an object local to the client. That is, as long as the clientimplements the functions declared in the E-API (Extension Application Programming Interface) of the VTDR module,the VTDR module and the client would be able to exchange information seamlessly. By calling on its RMI stub classes,the client would be able to write information to VTDR server’s memory.The E-API made available by the VTDR Server is shown in Figure 9(a). The Blackboard interface is a pure interfacewith the function declaration of the function that is implemented in the Java class

VTDRServer . The function signatureis as follows : void recordEventSequence(Event E) where

Event is a superclass of all events that are detected by the video tracker client. Therefore, at runtime, an argumentof type

Event will actually be a derived instance of the speciﬁc event depending on the application.As each event of the type listed in Section 3.2 occurs, the video tracker client software invokes the recordEventSequence() function on the stub of the

VTDRServer class. VTDR’s E-API sits on top of Java RMI,which allows for asynchronous interactions between the VTDR module and multiple video tracker clients. This designallows CheckSoft to operate vis-a-vis any arbitrary number of video cameras monitoring the space.The information provided by multiple video-tracker clients through the E-API is encoded and appended concurrently toa queue named

EventSeq . This provides a buffer between the video-tracker clients (producers) and the downstreamCheckSoft event handler modules (consumers) which facilitates CheckSoft to process events independent of theincidence rate of events.In our implementation, we use the Chronicle-Queue (CQ) which is a distributed unbounded persisted queue usedfor high performance and latency-critical applications. It uses ‘appenders’ that write data to the end of the queue and‘tailers’ that read the next available data in the queue without deleting any data. In our design shown in Fig. 9(b), eachvideo-tracker client records the event information to the EventSeq queue concurrently using an appender. Then themaster process of CheckSoft reads the event data using a tailer and dispatches the information to different workerprocesses.The use of CQ in the design of the VTDR module offers the following main advantages:1. It allows fast communication between the process that runs VTDRServer (writes event data) and the masterprocess of CheckSoft (reads and dispatches event data to other worker processes). When a software system deﬁnes an E-API, that makes it much easier to write code for plug-n-play modules for that system.Technically speaking, an E-API for a software system is just like a regular API, except that the E-API’s functions are meant to beimplemented by an external entity. https://github.com/OpenHFT/Chronicle-Queue HECK S OFT - F

EBRUARY

23, 20212. It provides data persistence through memory-mapped ﬁles with no data loss and additionally storing the datato disk periodically for post-facto analysis.3. It is highly suitable for real-time applications that demand high throughput involving a large number of eventswith low latency because it uses off-heap memory which is not affected by garbage collection overheads.4. It supports concurrent read and write operations and guarantees total ordering of messages. For us this ensuresthat each video-tracker client appends the event data to the queue in the exact temporal sequence of occurrence.It is important to note that a client downloads only a stub version of the

VTDRServer class and the client only knowsabout the signature of the function of this class that is declared in the

Blackboard interface. The client only has toprogram to the interfaces of the root classes in our software system and has no access to any of the implementationcode in the server class and that creates maximum separation between our software architecture and the code in thevideo-tracker module. Additionally, the VTDR server and the video tracker clients could be running on differentmachines in a computer network.

The concurrent event information from the video trackers is recorded in the following encoded format : [Event type, Event time, Entity Information]

The master process for

CheckSoft reads the encoded event information from the

EventSeq queue sequentially in thetemporal order of occurrence and broadcasts it to the

HumEnt and

StoEnt worker processes using the

Default_comm communicator as shown in Figure 6(a).The worker processes decode the event data received from the master process. From the

Entity Information , the entitiesinvolved in each event are determined and the worker processes assigned to the corresponding entities involved thencall the appropriate event handlers to perform a variety of functions/computations based on the

Event Type , as shown inthe Table 4 in the Appendix.

This module is responsible for detecting the elementary interactions such as the addition and the removal of

OBlob instances to/from the

StoEnt instances and associating these entities with the

HumEnt entity involved in the interaction.This module is triggered for every interaction between

HumEnt and

OBlob instances contained within

StoEnt instances.Since these interactions can happen simultaneously in the real-world, this module is implemented such that it can handleconcurrent events.Additionally, the Interaction Module is designed to be robust to various sources of noisy event data reported by thevideo trackers such that the higher-level inferences made by the modules downstream are less prone to errors. We hadprovided a brief overview of the approach in Section 4.3. In this section we provide a detailed discussion of the same.More speciﬁcally, the Interaction Module provides immunity against the following :1. noisy hand-based events and false interactions by enforcing consistency in the ﬁnite-state logic between thedifferent events related to the same overall person-object interaction.2. inconsistent storage content information as shown in Fig. 7, due to the following:(a) false and missed object detections in the storage area by ﬁltering out momentary ﬂuctuations in thedetections reported by the video trackers.(b) occlusion of objects by hands during an interaction by determining the object content before a hand hasentered the

StoEnt area and after the hand has left.The Interaction module has two submodules:

This submodule processes the

HandIn and

HandOut events and by monitoring changes in the

StoEnt content due toan interaction, establishes associations between the

OBlob , StoEnt and the

HumEnt instances. To elaborate further onthis, let us refer to Fig. 10. There is a continuous association between

OBlob and

StoEnt instances since we know thecontent information at any given timestamp reported by the video-trackers. An association between a

HumEnt instanceand a

StoEnt instance is formed at the time of interaction. However, there is no direct association between

HumEnt and

OBlob instances. This submodule monitors the change in the content of the

StoEnt before and after an interaction and15

HECK S OFT - F

EBRUARY

23, 2021ﬁgures out the change in the

OBlob instances as a result of the interaction and passes this information of the

OBlob , StoEnt and the

HumEnt instances involved in any interaction to the Interaction State Machine Submodule.Figure 10:

Establishing associations between the HumEnt, StoEnt and OBlobs at the time of interaction.

Let us now refer to the Figure 11 to understand how the Association Submodule is designed to handle concurrent events.Let us consider an interaction between

HumEnt H i and StoEnt S k , that in general starts with a HandIn event at time t in and ends with a HandOut event at time t out . Let us denote the process assigned for the H i instance as P H i and theprocess assigned for the S k instance as P S k .When a HandIn event occurs,

P S k sets S k . Update to False which prevents any erroneous updates to S k . Content during the interaction when objects might be occluded from the camera by hands. The S k . Update value is set to Truewhen the

HandOut event involving the same HumEnt H i and StoEnt S k occurs. On the other hand, S k . InUse is set toTrue when the interaction begins and subsequently set to False at the end of the interaction. During both the

HandIn and

HandOut events, there would be a gptwc() function call using the inter-communicator

Inter_comm between

P H i and P S k , where the process P H i would be the initiator process and the P S k would be the target process.The HumEnt instances have a specialized buffer data structure that is designed to store information pertaining toconcurrently occurring interactions temporarily as well as ﬁlter out noisy invalid events effectively as will be explainedlater. Each entry in this buffer stores the content information ( S k . Content ( t )), at interaction time t returned by agptwc() function call between P H i and P S k with an associated unique key pair (< i, k >), which denotes the IDsof the interacting HumEnt H i and StoEnt S k entities. Each HumEnt instance has two such buffer data attributes

BufferBefore and

BufferAfter , for storing the results returned by a gptwc() function call for

HandIn and

HandOut events respectively.During a

HandIn event,

P H i initiates a request to P S k and P S k would be responsible for fetching the content beforetime t in and ﬁltering the noise across multiple time-stamps before that and send the noise-ﬁltered content information( S k . Content ( t in )) back to P H i and this would be pushed into H i . BufferBefore as: S k . Content ( t in ) push −−−−→ H i . BufferBefore

Similarly during a

HandOut event calling the gptwc() function would provide the noise-ﬁltered content information( S k . Content ( t out )) after time t out . The result would be pushed into H i . BufferAfter as : S k . Content ( t out ) push −−−−→ H i . BufferAfter

Now, it will be checked if there is a matching entry with the same key pair (< i, k >) in H i . BufferBefore and H i . BufferAfter , which would denote that this corresponds to a valid interaction between the unique pair of

HumEnt H i and StoEnt S k . If a matching entry is found, these entries are popped from H i . BufferBefore and H i . BufferAfter as : H i . BufferBefore pop −−−−→ H i . ContentBefore = S k . Content ( t in ) H i . BufferAfter pop −−−−→ H i . ContentAfter = S k . Content ( t out ) The Process

P H i then computes the objects added ( O ADD ) and removed ( O REM ) by computing the set difference asfollows: 16

HECK S OFT - F

EBRUARY

23, 2021Figure 11:

MPI Communication and computations in the Association Module O ADD = H i . ContentAfter − H i . ContentBefore O REM = H i . ContentBefore − H i . ContentAfter

The extracted information is then passed to the Interaction State Machine Submodule.Now, if there is a

HandOut event without a matching

HandIn event between

HumEnt H i and StoEnt S k , then therewould be no entry in H i . BufferBefore with the key pair (< i, k >) and the system would detect a

Noisy

HandOut eventand exit with a operation to ﬂush out the erroneous entry in H i . BufferAfter . The Interaction State Machine will notbe triggered in this case.Similarly, if there is a previous

HandIn event without a matching

HandOut event, between

HumEnt H i and StoEnt S k ,then there would be a previous entry in H i . BufferBefore with the key pair (< i, k >) however this error will not bepropagated any further because the Interaction State Machine is only triggered if there is a matching

HandOut event.This validates the fact that only a matching

HandIn and

HandOut event triggers the Interaction State Machine and allother noisy invalid

HandIn and

HandOut events would be ﬁltered out. This design provides immunity against varioustypes of noisy event data as well as handles concurrent interactions reported by video-trackers. This module implements the ﬁnite-state machine based logic to keep track of the elementary interactions betweenthe

HumEnt instances and the

OBlob instances that are present in the different

StoEnt instances. By monitoring thechanges in the contents in the particular

StoEnt instance the Interaction State Machine can determine if any

OBlobs were added and removed as a result of the interaction.The states shown in Figure 12(a) represent the elementary interactions between the three instances,

HumEnt H i , StoEnt S k and OBlob O , that would typically be involved in any interaction. Let us consider the following three cases: In some cases, an end user might want to keep track of these interactions for a group comprising of multiple individuals together.For example, in an airport checkpoint security it would be beneﬁcial to keep track of passengers traveling together as a group so thatthere are no false alarms when members of the same group divest/collect common items. Another possible scenario is keeping trackof families shopping together in a retail store such that they can be billed to a single account. In such cases we can associate allthe members of the group to a single

HumEnt instance with a single ID. The design of the Association Module can also effectivelyhandle complicated situations where multiple members in a group associated with the same

HumEnt instance interacts with multiple

StoEnt instances at the same time, without any additional changes. This is because each such interaction will have different keypairs (< i, k >), where i will remain same but k will be different for different StoEnt instances. HECK S OFT - F

EBRUARY

23, 2021(a) (b)Figure 12:

The Interaction State Machine diagram for each Interaction between

HumEnt i ( H i ) and StoEnt k ( S k ) attime t is shown in Fig. (a). An example of the elementary action history stored in H i .ActionQ for each OBlob (storedin a different row) that the particular HumEnt H i interacted with is shown in Fig. (b)

1. If an object O ADD was added to S k ( O ADD (cid:54) = φ ) at time t , then the string [ t - A - k ] is appended to the queuecorresponding to OBlob O ADD in the data attribute H i . ActionQ .2. Similarly, if an object O REM was removed ( O REM (cid:54) = φ ) from S k at time t , then the string [ t - R - k ] isappended to the queue corresponding to OBlob O REM in the data attribute H i . ActionQ .3. However, if no objects were added to/removed from S k ( O ADD = φ and O REM = φ ), it could either meanthat this is a false interaction reported by the video trackers or the HumEnt H i interacted with StoEnt S k , butdid not displace any objects and thus no changes are made to H i . ActionQ .An example of the elementary action history stored in H i . ActionQ for each OBlob that the particular HumEnt H i interacted with can be seen in Fig. 12(b). This module specializes in making inferences from the elementary interactions between the

HumEnt instances andthe

OBlob instances. After each interaction, this module is triggered by a

HandOut event (after the elementary actioninvolving the

HandOut event is recorded by the Interaction Module). This module is also triggered by the

HumanExit event. The input to this module is the data attribute H i . ActionQ , where H i is the HumEnt who is either involved in theinteraction or is exiting the monitored area.Since the higher level interactions as well as the anomalous interactions vary with the speciﬁc application that ourarchitecture is used in, the ﬁnite-state machine logic in this module needs to be tweaked. Hence, it is particularlyimportant to design this module such that this does not involve any changes in the overall architectural framework andso that these changes in the logic are minor and can be easily updated, speciﬁc to the requirements of the application.

This submodule extracts the interaction history from the

ActionQ of the

HumEnt for each

OBlob that the

HumEnt has interacted with. If the Inference Module is triggered after an interaction (by the

HandOut event), then only theinformation for the

OBlob that was involved in the interaction is extracted. However, if the Inference Module istriggered by the

HumanExit event then the Data Extractor extracts the information for every

OBlob that the exiting

HumEnt interacted with.For example according to Figure 12(b), the latest interaction of the HumEnt was at time 136 with OBlob O . So theHandOut event at this time triggers the Inference module and the Data Extractor would fetch the following string: [83 − R − , [101 − A − , [118 − R − , [128 − A − , [136 − R − The sequence of elementary actions for the particular OBlob is sent to the Inference State Machine submodule and theinformation pertaining to the

StoEnt and

OBlob instances involved in the interactions are sent to the Anomaly Detector,as shown in Fig. 13. For OBlob O in Fig 12(b), the sequence of elementary actions ({R, A, R, A, R}) is sent to theInference State Machine and the information for OBlob O and StoEnts S and S is sent to the Anomaly Detector. An elementary action can either be

Add or Remove , which is represented as ‘A’ or ‘R’ HECK S OFT - F

EBRUARY

23, 2021Figure 13:

The Inference Module architectural diagram with concurrently running Inference State Machine andAnomaly Detector submodules analyzing each entry for

HumEnt H i and OBlob O j This submodule implements the ﬁnite-state machine based logic to infer the higher level interactions and the ownershiprelationships between the

HumEnt , OBlob and the

StoEnt instances. More speciﬁcally, this submodule processesthe causal relationship between the elementary actions to understand higher level interactions based on the rulesof interaction speciﬁc to the application. The application-speciﬁc inference logic can be customized based on therequirements, as discussed in Appendix C.If the higher-level interaction inferred alters the ownership relationship then a control signal to

Set Ownership is sentto the Anomaly Detector. Otherwise, the control signal to

Test Ownership is sent to the Anomaly Detector. Typicallyfor the applications we are considering, the ﬁrst interaction is what determines the Ownership relationship between

HumEnt instances and

OBlob instances.

This submodule detects anomalous interactions based on ownership relationships and raises appropriate alarms whenanomalies are detected. This submodule runs in parallel vis-a-vis the Inference State Machine, that dictates its mode ofoperation.When the Inference State Machine infers any change in ownership relationships, it indicates that the Anomaly Detectorshould set the ownership information and remember it for detecting anomalies in successive interactions. For examplein an airport checkpoint security application, this is done by appending the

OBlob O j and StoEnt S k information inthe Owns list of the data attribute H i . Ownership .On the contrary, when any other type of higher-level interaction is inferred that does not change the ownershiprelationship, the Inference State Machine indicates that the Anomaly Detector should test if the entities involved in theinteraction belongs to

HumEnt H i . For testing the ownership of the entities OBlob and

StoEnt we would check if itexists in H i . Ownership . If it does not exist in H i . Ownership , then H i is not the owner and then an appropriate alarmor warning message is issued. Each of the latter two submodules have their own handlers to handle application-speciﬁctasks based on the outcome of the inferences and anomaly detections.In this section, we are only showing the architectural highlights of the Inference Module to demonstrate that CheckSoftcan be used in different applications. Obviously, the rules of person-object interactions are application-speciﬁc andhence the FSM based inference logic must be tweaked for each application. The inference logic for the AirportCheckpoint Security and Automated Retail Store applications is described in Tables 5 and 6 respectively in Appendix The actual owner can be determined by a gptwc() function call using

Inter_comm to fetch the ownership information of thecorresponding entity. HECK S OFT - F

EBRUARY

23, 2021C. To show examples of higher-level interactions inferred and anomalies detected by the Inference Module from theelementary action history in these applications, we refer the reader to Fig. 17.

In this section we present the main features of our software design that guarantee scalability and deadlock-free operation.The supplemental material includes a Petri Net based modeling of the software for veriﬁcation of deadlock-freeoperation and liveness properties.CheckSoft uses multiprocessing with a distributed memory model in which every process in the system has its ownprivate memory and all of the computations carried out by a process only involve the private memory. Obviously, theprocesses that are in charge of analyzing human-object interactions must somehow become aware of both the humansinvolved and the objects in the storage containers. As opposed to using a shared-memory model, we take care of suchinter-process interactions through the communication facilities provided by MPI, as shown in Fig 14. The isolationbetween the processes achieved in this manner eliminates any possibility of the processes stepping on one-another foraccessing shared resources, which is a major reason for deadlock in shared memory systems.Figure 14:

CheckSoft architecture using multiprocessing with a distributed memory model.

The master process reads the encoded event information recorded in the

EventSeq queue mentioned earlier inSection 5.1 and loads the next available entry into its local memory. This information is then broadcast throughMPI communications to the different

HumEnt and

StoEnt worker processes, as shown in Fig 14. When there is a

HumanEnter or a

StorageInstantiate event, a previously created worker process is launched and the correspondingentity information is stored in the local versions of the principal data structure derived from the base class

Entity .For example, the worker processes in the HumEnt group (

P H i to P H in ) each has a HumEnt instance ( H i to H in respectively) in its own local memory to store the corresponding human entity information. Similarly, the workerprocesses in the StoEnt group ( P S k to P S km ) each has a StoEnt instance ( S k to S km respectively) in its own localmemory to store the corresponding storage entity information. When there is a HumanExit or StorageReturn event,these worker processes are freed up and made available for new entities.We will now use an example to illustrate the fact that each worker process only needs to work with its own local memory.Assume that a

StorageUpdate event has just been recorded for a

StoEnt instance S k . This would cause the attribute S k . Content of the S k instance to be updated by the worker process P S k if S k . Update is True. To avoid erroneousupdates during an interaction involving S k , the worker process P S k sets S k . Update to False at the beginning of theinteraction and subsequently sets it back to True at the end of the interaction.It is important to note that only the process

P S k has access to this content information of S k . In general, when a processneeds some information from another entity, it fetches the information using MPI’s communication primitives. In ourexample, when there is a HandIn or HandOut event due to an interaction between a

HumEnt H i and a StoEnt S k , the Interaction Module is triggered in order to ﬁgure out what object has either been placed in the

StoEnt instance ortaken out of it. For that, the

HumEnt worker process

P H i needs the storage content of the StoEnt entity S k before the HandIn or after the

HandOut event. Towards that end, the process

P H i initiates a gptwc() function call that resultsin P S k fetching the S k . Content before/after the event. Subsequently,

P H i stores this information temporarily inthe H i . BufferBefore buffer or the H i . BufferAfter buffer, depending on whether the primary triggering event was

HandIn or HandOut . The difference between the content before and after a particular interaction is then analyzed to20

HECK S OFT - F

EBRUARY

23, 2021ﬁgure out what object was involved in the interaction and the elementary actions associated with the interaction is thenrecorded by the process

P H i in H i . ActionQ . In this manner, only the worker process

P H i is in charge of recordingall the interactions related to the HumEnt H i . All the elementary interactions involving any HumEnt H i can be found inthe data structure H i . ActionQ . As a consequence, at the end of every interaction involving a

HumEnt H i or when a HumEnt H i exits the area being monitored by CheckSoft, this aspect of our software design allows for all high-levelinferences related to the HumEnt H i to be made immediately and without any resource contention.This design makes CheckSoft scalable to any number of HumEnt , StoEnt and

OBlob entities. The level of concurrency,which is the number of active processes or entities at any given point of time, is, of course, limited by the computationalresources available on the hardware platform running CheckSoft. The event handlers of

CheckSoft are non-preemptiveand hence for handling certain events, a task may need to wait if the required process is busy handling a previous taskor when communication between processes is required. In general, a task would wait until all the required processesare available and this might incur unwanted latency in handling events, which should be within reasonable limits oftolerance. The scalability related aspects have been analyzed in Section 7.1.2.

For validation, we have adopted a dual approach in which we use a simulator to study the scalability and robustnessof CheckSoft and use actual video trackers to analyze other aspects of the system. This is because it would be highlynon-trivial to also analyze the scalability and robustness issues with real video data in a laboratory setting.In this section, we ﬁrst report in Section 7.1 on the scalability results that we have obtained with the help of the simulateddata involving a large number of

HumEnt , StoEnt and

OBlob entities being monitored by a large number of videotrackers. The scalability study involves investigating two performance parameters: level of concurrency and latency with simulated data for two different types of applications of CheckSoft: airport checkpoint security and automatedretail store . The simulated data, after noise is added to it, is also used to validate the robustness of CheckSoft withrespect to errors made by event detectors.Subsequently, in Section 7.2, we demonstrate with real-time data from several cameras that CheckSoft can indeedprocess feeds simultaneously from multiple video trackers.

Figure 15:

Validation Framework for

CheckSoft . We have tested the logic of CheckSoft with a simulation-based validation framework , called CheckSoftValidate, whose“architecture” is shown in Fig 15. The validation framework can be used for testing and evaluating the rules used forassociating human entities with the objects they interact with and for analyzing the outcomes of these interactions. Thefront-end to the validation framework is a UI on the client machine that can be used to set the different parameters of asimulated environment as shown in Fig. 16. The UI on the server machine displays the complete state of the monitoredarea and any detected anomalies, as inferred by the logic of CheckSoft.As shown in Fig 15, an important component of CheckSoftValidate is a video-tracker simulator module running on theclient machine that generates the event and storage content information which is input to CheckSoft. The video-trackersimulator models the monitored area symbolically and emulates application-speciﬁc human behavior as closely to the The veriﬁcation framework is available at the following url:https://github.com/sarkar-rohan/CheckSoftwhere a user can run the CheckSoft simulator and investigate its performance with respect to all the design parameters. HECK S OFT - F

EBRUARY

23, 2021

Parameter Description (a) Parameters that control the number of entities and video-trackers monitoring thesimulated environment nVideoTrackers

Number of video-trackers recording event information in server memory simultaneouslyusing RMI based E-API. This represents the number of threads running the Video TrackerSimulator. nHumans

Number of

HumEnt entities in total. nStorages

Number of

StoEnt entities in total. nObjects

Number of

OBlob entities in total. The

OBlob entities are distributed in the different

StoEnt entities based on the event sequences generated depending on the application. maxLevelConcurrency

Maximum number of allowed active

HumEnt and

StoEnt entities at any given timeinstant. (b) Parameters that control the frequency of events and different types of interactions eventList

List of all possible events [HumanEnter, HumanExit, HandIn, HandOut,StorageInstantiate, StorageReturn, StorageUpdate]eventPDF

Probability of the corresponding event in eventList to be drawn at any time step(basically a PDF). The probabilities determine the frequency of the events and controlthe rate at which humans enter/exit the simulated environment, instantiate/return storageunits and interact with objects etc. actionList

List of all possible higher level interactions depending on the speciﬁc type of application.This list includes the anomalous interactions as well. actionPDF

Probability of the corresponding action in actionList to be drawn at any time step(basically a PDF). The probabilities determine the randomized sequence of events for thedifferent interactions that emulate application-speciﬁc behavior of any

HumEnt entity. maxInteraction

Maximum number of interactions allowed for each

HumEnt entity. (c) Parameters that control the noise affecting the simulated data noisyHandEventPDF

Probability of the hand-based events (

HandIn and

HandOut ) to be corrupted by noise.The probabilities determines the effect of noise in the sequence of hand-based eventsgenerated. noisyObjDetectProb

The probability that a particular type of object will be detected wrongly (due to false ormissed detections) and the storage content would be corrupted by noise. maxNoisyContentPerc

The maximum extent that the content of any

StoEnt will be affected by noise. Thisdetermines the maximum percentage of image frames that are corrupted by noise beforeand after interactions.

Table 1:

Input parameters for simulator to generate stochastic event sequences. real world as possible. The simulator is multi-threaded where each thread symbolically represents a video-trackerdetecting concurrent events. The threads upload the event information parallely to the server memory.The number of video trackers monitoring the simulated area as well as the number of human entities, storage entitiesused, objects in the simulated environment and the number of human and storage entities that are active at any momentof time are controlled by values for the relevant UI parameters mentioned in Table 1(a). The rate at which the humansenter/exit the simulated environment and storage entities are instantiated/returned as well as the frequency of interactionswith objects is controlled by the values for the relevant UI parameters mentioned in Table 1(b). We can also simulatenoise in the environment which generates false hand-based events as well as false and missed detections of objects inthe storage entities, which are controlled by the values for the relevant UI parameters mentioned in Table 1(c). Thevalues of the parameters set in the UI speciﬁcally in Table 1(b) and (c), that control the interactions and events as wellas the effect of noise in the simulated data are merely the mean values for what are otherwise random numbers.CheckSoftValidate uses the data generated by the video tracker simulator for testing the scalability of the system asthe number of people, the storage units, and the objects are increased. Besides, the validation framework veriﬁesthe operation of the overall decentralized system by testing the E-API that allows CheckSoft and the Video TrackerSimulator to work in two different machines in a network.As shown in Fig 15, CheckSoftValidate compares the results inferred by the logic of CheckSoft with the ground truthgenerated by the simulator and computes the accuracy of CheckSoft under different test scenarios, generated by varyingthe different parameters in Table 1(a), (b) and (c). The results of this comparison are shown in the GUI displayed in Fig.17 for

Airport Checkpoint Security and

Automated Retail Store applications respectively.

The inference logic presented in Tables 5 and 6 is tested using the CheckSoftValidate framework. In this subsection weare going to brieﬂy discuss the veriﬁcation results and refer the reader to Appendix C for a more detailed discussionregarding the same. 22

HECK S OFT - F

EBRUARY

23, 2021Figure 16:

The main GUI of

CheckSoft . (a) (b)Figure 17: GUI that shows the interaction history, detected anomalies for an exiting human and veriﬁes the differentsoftware modules of

CheckSoft for Airport Checkpoint Security and Automated Retail Store applications respectively.

Airport Checkpoint Security Application

Fig. 17(a) shows how

CheckSoft draws inferences about each item that a particular passenger interacted with andraises appropriate warnings or alerts. 23

HECK S OFT - F

EBRUARY

23, 2021It can be easily seen that CheckSoft can correctly verify if passengers collected their own items in which case noanomalies are reported. It can also raise warnings when passengers leave behind their items and raise security alertswhen any passenger either moves an item from or to a tray that is not their own or collects an item that does not belongto them.

Automated Retail Store Application

Fig. 17(b) shows how

CheckSoft draws inferences about each item that the customer interacted with and raisesappropriate warnings if some item is misplaced.It can be easily seen that CheckSoft can accurately draw inferences regarding how many items of each type werepurchased, inspected and returned as well as infer which shelves any particular item has been misplaced to.

Scalability refers to CheckSoft’s ability to track person-object interactions on a continuing basis when it has to dealwith a large number of person entities, storage containers, and objects. The number of these entities in the simulatedenvironment is controlled by the nHumans , nStorages and nObjects parameters in Table 1(a). We test for scalabilitythrough latency at a given concurrency level . The concurrency level is deﬁned as the total number of worker processesthat can simultaneously be active. As the reader would recall, a worker process is assigned to each new HumEnt instanceand to each new

StoEnt instance. This is controlled by the maxLevelConcurrency parameter in Table 1(a). Themaximum level of concurrency depends obviously on the computational resources available to run the CheckSoftsoftware. As a case in point, in our simulations with CheckSoftValidate, we have no problems running CheckSoft at aconcurrency level of 200 on a VM with 24 vCPU and 8 GB RAM as shown in Fig. 16. We are able to do so for boththe application domains mentioned in the introduction to Section 7 – the airport checkpoint security domain and the automated retail domain .The number of video-trackers monitoring the simulated environment is controlled by setting the nVideoTrackers parameter in Table 1(a). The simulated data in the previous paragraph was generated using 50 threads uploadingevent information parallely on the server memory. This validates that multiple video-trackers can asynchronously andsimultaneously record data using the Blackboard interface of CheckSoft. Therefore, CheckSoft can operate vis-a-visany number of video-trackers that connect with it on a plug and play basis through the E-API. We have also validatedthis with actual video-trackers in Section 7.2.By latency at any given level of concurrency we refer to the time taken by CheckSoft event handlers to process any eventfrom the time it was ﬁrst detected. This metric helps us understand the responsiveness of CheckSoft to different typesof events for a speciﬁc level of concurrency. Fig 18 shows the result of an experiment in which the level of concurrencywas varied from 50 to 200 and the average latency for each event type was then averaged over 10 experiments.Figure 18:

Average Latency for the different event types

As can be seen in Fig 18, the average latency of CheckSoft at a concurrency level of 200 is well within the reasonablelimits considering the fact that the time constants associated with typical human-object interactions are of the orderof a second if not longer. Fig 18 also shows that the average latency increases at a slow rate as the number of activeprocesses increases and therefore CheckSoft would scale well to even higher levels of concurrency.24

HECK S OFT - F

EBRUARY

23, 2021The

HandIn and

HandOut events involve MPI communication between one of the

HumEnt and one of the

StoEnt worker processes and trigger additional FSM based event handling subroutines that ﬁlters out noisy events and drawsinferences at the end of every interaction and hence has the highest response time. The

HumanEnter , HumanExit , StorageInstantiate and

StorageReturn events require collective operations within the

HumEnt_group or StoEnt_group and have roughly similar response time because of which the curves overlap. The

StorageUpdate event involves onlyone of the

StoEnt worker processes and only updates the latest storage content information for the corresponding

StoEnt . Since the event handlers for these events do not involve any computation and MPI communication, theseevents have the lowest response time.

The video-tracker simulator part of CheckSoftValidate was designed speciﬁcally to generate randomized data thatwould correspond to noisy hand-based events as well as erroneous storage content for testing the tolerance of CheckSoftto measurement and event uncertainties.Figure 19:

Tolerance of CheckSoft to different types of missed or false hand-based events

The probability supplied through the simulation parameter noisyHandEventPDF in Table 4(c) is responsible forgenerating the noisy hand-based event data. Fig 19 describes the effect of missed or false hand-based events. While

CheckSoft can ﬁlter out false event detections to some extent, a missed hand-based event would not even triggerCheckSoft and hence it is important that the hand-based event detectors have a low false negative rate as explained inFig 19.Figure 20:

Tolerance of

CheckSoft to Storage Content Noise due to false and missed object detections.

The simulation parameter noisyObjDetectProb in Table 4(c) controls the probability of missed and false detectionsof any

OBlob in a

StoEnt . At the same time, the parameter maxNoisyContentPerc shown in the same table controlsthe extent to which the content in any

StoEnt is corrupted by noise. CheckSoft uses a polling based noise ﬁlteringalgorithm to ﬁlter out the noise before and after interactions across the content information for multiple time-stamps.Fig 20 shows how the accuracy of CheckSoft decreases as the values supplied for the parameters listed here increase25

HECK S OFT - F

EBRUARY

23, 2021Figure 21:

Block Diagram of the Video Trackers implemented to test

CheckSoft because there is greater sensory noise in recognizing the objects that naturally leads to reduced overall accuracy. Itgives us a rough estimate of the accuracy needed from the video trackers detecting objects in the storage units. Theprobability of missed and false detections and the extent to which the content information is corrupted by noise shouldbe such that the accuracy of CheckSoft remains within the region that is marked by dark green color. The red color inFig 20 indicates low tolerance to noisy content information.

While we have tested the scalability of CheckSoft and its robustness to noise using simulators, we have used real videotrackers to establish that CheckSoft can indeed process feeds simultaneously from multiple video trackers operating inreal-time. Our experimental setup is a simple retail-store application with shelves storing different types of objects.(a) (b)Figure 22:

The video-trackers track people and detect human entry and exit events from the video-feed of overheadcameras.

Figure 23:

The video-trackers track people and detect interactions with shelf instances from the video-feed of anoverhead camera. It would be highly non-trivial to also analyze the scalability and robustness issues with real video data in a laboratory setting.Hence our dual approach in which we use a simulator to study the scalability and robustness of CheckSoft and actual video trackersto analyze other aspects of the system. HECK S OFT - F

EBRUARY

23, 2021For experiments with real video trackers, we established a zone in our laboratory with an array of open shelves. Thezone was monitored with eleven cameras and each shelf has two racks each having its own camera for recording thecontent of the shelf and any changes in the content. The eleven area based cameras were used to monitor humans asthey approached the shelves or retracted away from them.The video feeds from all the cameras are processed by two PC class client machines. The client machines allot separateprocesses for the video-trackers hooked to each camera. The VTDR module on the server side, shown in Fig 21provides a Java RMI (Remote Method Invocation) based plug-n-play interface for the software clients running thevideo-trackers. The video trackers directly call the E-API using the RMI stub classes for uploading the event informationasynchronously and concurrently to the server memory, as mentioned in details in Section 5.1.Figure 24:

The video-trackers track hand movement inside the shelf (detection for four previous time instants are shownin blue and tracking shown by red lines in the images on the left side) and recognize objects in different shelves beforeand after interactions from the video-feed of the shelf cameras (as shown by the corresponding labels in the images onthe right side). The images to the right also show a count of the number of instances of apples, oranges and/or bananasdetected before and after the latest interaction. The news-feed shows the inferences made by CheckSoft in response tothe hand-based events detected.

As can be seen in Fig. 21, each process on the client machines runs a

EventDetector.py program that detects thevarious events from the video-feed of the corresponding camera and calls the function recordEventSequence(Event) madeavailable through the Blackboard interface. The event recording subroutines are implemented in the

VTDRServer.java program that encodes the event information and records it in the

EventSeq queue. The encoded event information isthen dispatched to the different event handler modules of CheckSoft for further processing.The video-trackers shown in Fig. 21 are responsible for tracking person instances and their hand movements as wellas shelves and the objects present within each shelf to detect events which trigger the different software modules in

CheckSoft . Fig. 22 shows the video-trackers detecting human entry and exit events from the video-feed of theoverhead cameras installed at the entrance and exit. Fig 23 illustrates the results of person and shelf tracking fromthe video feed of an overhead camera and the news-feed shows which person is interacting with which shelf. Fig 2427

HECK S OFT - F

EBRUARY

23, 2021illustrates hand-based events being detected and video trackers detecting and recognizing objects in storage units beforeand after an interaction from the video feed of the shelf cameras.The shelves shown in Fig 24 contains 3 types of objects – apples, bananas and oranges. In this simple application,the video-trackers keep track of the number of products of each type present in each of the shelves at any time. Thenews-feed shows the hand-based events and the application-speciﬁc interaction information between person and objectsin the shelves as inferred by

CheckSoft . Based on the content information shown before and after the interaction, thereader can verify the operation of

CheckSoft for the latest interaction in each of the shelves.

CheckSoft is based on several time-honored principles of object-oriented software design [25]. For example, one ofthe most venerable such principles is that the clients of a software system should only have to program to the publicinterfaces in the software system. CheckSoft subscribes to this principle by requiring the video tracker clients to onlyhave to be aware of the declaration of the method headers in the Blackboard interface.Our main goal in this paper was to present a scalable software architecture that can run asynchronously vis-a-vis thevideo trackers, and that incorporates a ﬁnite-state machine based reasoning framework for keeping track of concurrentpeople-object interactions in people-centric spaces.

CheckSoft is designed to handle concurrent events simultaneouslyusing a multi-processed event-driven architecture. It is also designed to provide a signiﬁcant measure of immunity toerrors in the event data generated by the video trackers. This is done by enforcing consistency in the ﬁnite-state logicbetween the different events related to the same overall person-object interaction.CheckSoft has so far been tested with both the simulated video trackers and some simple scenarios involving actual videotrackers. That was intentional since all we wanted to accomplish at this stage was to formulate the basic architecturaldesign of the software. Our future work would involve testing

CheckSoft in large-scale applications speciﬁcally withregards to scalability and tolerance to noise in real-world applications.

References [1] R. Sarkar and A. Kak,

Scalable event-driven software architecture for the automation of people-centric systems ,US Patent App. 16/436,164, Dec. 2020.[2] J. Q. Ning, “Component-based software engineering (cbse),” in

Proceedings Fifth International Symposium onAssessment of Software Tools and Technologies , Jun. 1997, pp. 34–43.

DOI : .[3] D. L. Parnas, “On the criteria to be used in decomposing systems into modules,” Commun. ACM , vol. 15,no. 12, pp. 1053–1058, Dec. 1972,

ISSN : 0001-0782.

DOI : . [Online]. Available: http://doi.acm.org/10.1145/361598.361623 .[4] A. Hinze, K. Sachs, and A. Buchmann, “Event-based applications and enabling technologies,” in Proceedingsof the Third ACM International Conference on Distributed Event-Based Systems , ser. DEBS ’09, Nashville,Tennessee: ACM, 2009, 1:1–1:15,

ISBN : 978-1-60558-665-6.

DOI : . [Online].Available: http://doi.acm.org/10.1145/1619258.1619260 .[5] O. Etzion, “Towards an event-driven architecture: An infrastructure for event processing position paper,” in Rules and Rule Markup Languages for the Semantic Web , A. Adi, S. Stoutenburg, and S. Tabet, Eds., Berlin,Heidelberg: Springer Berlin Heidelberg, 2005, pp. 1–7,

ISBN : 978-3-540-32270-2.[6] C. Roda, A. Rodríguez, E. Navarro, V. López-Jaquero, and P. González, “Towards an architecture for a scalableand collaborative ami environment,” in

Trends in Practical Applications of Scalable Multi-Agent Systems, thePAAMS Collection , F. de la Prieta, M. J. Escalona, R. Corchuelo, P. Mathieu, Z. Vale, A. T. Campbell, S. Rossi, E.Adam, M. D. Jiménez-López, E. M. Navarro, and M. N. Moreno, Eds., Cham: Springer International Publishing,2016, pp. 311–323,

ISBN : 978-3-319-40159-1.[7] M. K. Chandy,

Event-driven applications: Costs, beneﬁts and design approaches , 2006.[8] A. Avnur, “Finite state machines for real-time software engineering,”

Computing Control Engineering Journal ,vol. 1, no. 6, pp. 275–278, Nov. 1990,

ISSN : 0956-3385.

DOI : .[9] F. Wagner, R. Schmuki, T. Wagner, and P. Wolstenholme, Modeling Software with Finite State Machines . Boston,MA, USA: Auerbach Publications, 2006,

ISBN : 0849380863.[10] O.-A. Schipor, R.-D. Vatavu, and J. Vanderdonckt, “Euphoria: A scalable, event-driven architecture for designinginteractions across heterogeneous devices in smart environments,”

Information and Software Technology , vol. 109,pp. 43–59, 2019,

ISSN : 0950-5849.

DOI : https://doi.org/10.1016/j.infsof.2019.01.006 . [Online].Available: .28 HECK S OFT - F

EBRUARY

23, 2021[11] R. B. Almeida, V. R. C. Junes, R. da Silva Machado, D. Y. L. da Rosa, L. M. Donato, A. C. Yamin, and A. M.Pernas, “A distributed event-driven architectural model based on situational awareness applied on internet ofthings,”

Information and Software Technology , vol. 111, pp. 144–158, 2019,

ISSN : 0950-5849.

DOI : https://doi.org/10.1016/j.infsof.2019.04.001 . [Online]. Available: .[12] M. Ben-Ari, Principles of Concurrent and Distributed Programming . Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1990,

ISBN : 0-13-711821-X.[13] R. Vezzani and R. Cucchiara, “Event driven software architecture for multi-camera and distributed surveillanceresearch systems,” in , Jun. 2010, pp. 1–8.

DOI : .[14] Y. Li, L. Brown, A. Hampapur, M. Lu, A. Senior, and C.-F. Shu, “Ibm smart surveillance system (s3): Eventbased video surveillance system with an open and extensible framework,” Mach. Vis. Appl. , vol. 19, pp. 315–327,Oct. 2008.

DOI : .[15] Z. Wu and R. J. Radke, “Real-time airport security checkpoint surveillance using a camera network,” in , Jun. 2011,pp. 25–32. DOI : .[16] M. Bhargava, C.-C. Chen, M. S. Ryoo, and J. K. Aggarwal, “Detection of abandoned objects in crowdedenvironments,” in , Sep. 2007, pp. 271–276. DOI : .[17] E. Frontoni, P. Raspa, A. Mancini, P. Zingaretti, and V. Placidi, “Customers’ activity recognition in intelligentretail environments,” in New Trends in Image Analysis and Processing – ICIAP 2013 , A. Petrosino, L. Maddalena,and P. Pala, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 509–516,

ISBN : 978-3-642-41190-8.[18] B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, “A multi-stream bi-directional recurrent neural networkfor ﬁne-grained action detection,” in , Jun. 2016, pp. 1961–1970.

DOI : .[19] B. Gyori, I. Medrano, A. Frenkel, and P. Java, Shelf with integrated electronics , US Patent 10064502, Sep. 2018.[20] B. Greaves, M. Coetzee, and W. S. Leung, “Access control requirements for physical spaces protected by virtualperimeters,” in

Trust, Privacy and Security in Digital Business , S. Furnell, H. Mouratidis, and G. Pernul, Eds.,Cham: Springer International Publishing, 2018, pp. 182–197,

ISBN : 978-3-319-98385-1.[21] C. Tsigkanos, L. Pasquale, C. Ghezzi, and B. Nuseibeh, “On the interplay between cyber and physical spaces foradaptive security,”

IEEE Transactions on Dependable and Secure Computing , vol. 15, no. 3, pp. 466–480, May2018,

ISSN : 2160-9209.

DOI : .[22] “Java remote method invocation - distributed computing for java, ,”[23] M. Fiaz, A. Mahmood, and S. K. Jung, “Tracking noisy targets: A review of recent object tracking approaches,” CoRR , vol. abs/1802.03098, 2018. arXiv: . [Online]. Available: http://arxiv.org/abs/1802.03098 . 29

HECK S OFT - F

EBRUARY

23, 2021

A | Tables

Class Attributes DescriptionID

This stores an unique identiﬁer for each instance of the different entities.

Entity PhysicalState

This stores data related to the positional co-ordinates and other physical attributes of the real-world entity.Storing these values is optional in the current implementation , but might be useful in actual systems.

Ownership

This stores the ownership relations between the different entities, based on the application. It has two lists :

Owns : where it stores the information for the instances that this instance is the owner of and

OwnedBy : where it stores the information for the instances that owns this instance.

ActionQ

This data attribute records all the interaction information for each of the different objects that the humaninteracted with. It is a hashtable of queues which uses the object instances that the

HumEnt interacted withas keys and stores the history of interaction between the

HumEnt and each of the objects in separatequeues. The interaction information is stored in the temporal order of occurrence.

BufferBeforeBufferAfter

This data attribute is primarily used for temporarily storing the storage content before and after multipleconcurrent interactions that the

HumEnt instance was involved in, at any particular moment. Multiple suchentries are indexed using unique key pairs.

HumEnt ContentBeforeContentAfter

This data attribute is primarily used for computation and stores the storage content before and after aparticular interaction that the HumEnt instance was involved in.

InferenceQ

This data attribute records the inferences derived from the history of interaction stored in the ActionQ,between the

HumEnt and each of the objects. The inferences are stored for each of the objects in separatequeues, in the temporal order of occurrence.

Actions

This data attribute stores a list of different types of actions that the

HumEnt instance can perform, speciﬁcto the application.

Content

Each storage entity maintains its own individual content information with the corresponding time-stampin this data attribute.

StoEnt Update

This is a Boolean variable that indicates whether the Content data attribute should be updated or not.This is set to True only if the content information reported by the video trackers is reliable.

InUse

This is a Boolean variable that indicates whether the storage entity is currently involved in an interaction.

OBlob Characteristics

This stores the information related to the characteristics of the object speciﬁc to the application. Forexample, in an airport checkpoint security this could be the classiﬁcation deﬁning the security threatposed and in a retail store this could store the price of the object for automated billing.

Table 2:

Attributes of the different classes associated with each Entity in CheckSoft

Class Attributes DescriptionEvent String TypeString Time

This is the root class from which child classes for different events are derived. The time-stampindicating when the event occurred is stored in the attribute

Time and the speciﬁc event typeis stored in the attribute

Type .This class has a member function

ListencodeEventInfo() that can encode theinformation related to the type, time-stamp and the information regarding the entities involvedwith the speciﬁc event in the following format : [Type, Time, Information regarding Entities involved in the event]

HumanEnter HumEnt H

This class is associated with the event that a new Human Entity enters the monitored space.

HumanExit HumEnt H

This class is associated with the event that an existing Human Entity exits the monitored space.

HandIn HumEnt HStoEnt S

This class is associated with the event that any of the hands of an existing Human Entityreaches inside an existing Storage Entity.

HandOut HumEnt HStoEnt S

This class is associated with the event that all the hands of an existing Human Entity comesoutside an existing Storage Entity.

StorageInstantiate StoEnt SHumEnt H

This class is associated with the event that a new Storage Entity is instantiated by an existingHuman Entity.

StorageReturn StoEnt SHumEnt H

This class is associated with the event that an existing Storage Entity is returned by an existingHuman Entity.

StorageUpdate StoEnt SListO

This class is associated with the event that new content information is available for an existingStorage Entity. This class has a member function

ListparseContentInfo() that can parse the information related to the different Object Blobs present inside the StorageEntity at any particular moment of time.

Table 3:

Attributes and description of the different classes associated with each Event in CheckSoft HECK S OFT - F

EBRUARY

23, 2021

Event Event Handler Functionality performed by CheckSoft

Human

HandleEntry

Assign an available worker process from the

HumEnt_Group . Instantiate an instance derived

Enter from the

HumEnt class, depending on the application and initialize the instance with an unique ID , update the time of entry t entry and initialize the other data attributes. Storage

HandleInstantiate

Assign an available worker process from the

StoEnt_Group . Instantiate an instance derived

Instantiate from the

StoEnt class, depending on the application and initialize the instance with an unique ID , update the time of instantiation t instantiate and initialize the other data attributes. If the StoEnt S k was instantiated by a HumEnt instance H i then update the ownership informationin the OwnedBy list in S k . Ownership indicating the owner of S k is H i and the Owns list in H i . Ownership indicating H i owns S k . Storage

HandleReturn

The time of return t return is updated and the entity information is permanently recorded. The Return StoEnt instance S k is deleted and the allotted worker process is then freed and madeavailable to the StoEnt_Group . Storage

HandleUpdate

This event notiﬁes the worker process for

StoEnt S k that new content information is available Update from the video trackers and consequently fetches the latest

OBlob content from thecorresponding

Storage Y .csv ﬁle and updates it in S k . Content only if S k . Update = True . HandIn

Interaction Module

This event triggers the Interaction module (Section 5.3) and the process for

HumEnt H i fetches – Association Submodule the noise-ﬁltered content just before the event, from the process for StoEnt S k using a gptwc()function call over the Inter_comm and stores it in H i . BufferBefore . The process forStoEnt S k sets S k . Update = False to prevent updating the state of S k during the interaction. HandOut

Interaction Module

This event triggers the Interaction module (Section 5.3) and the process for

HumEnt H i fetches – Association Submodule the noise-ﬁltered content just after the event, from the process for StoEnt S k using a gptwc() – Interaction State Machine function call over the Inter_comm and stores it in H i . BufferAfter . It then checks for a

Inference Module matching

HandIn event and if a matching entry is found in H i . BufferBefore , the elementary – Data Extractor interaction information is extracted and then recorded in H i . ActionQ . Subsequently, the – Inference State Machine

Inference module (Section 5.4) is triggered, that analyses the latest interaction history of the – Anomaly Detector

OBlob involved in the interaction and records the inferences made in the H i . InferenceQ .It raises alerts if

HumEnt H i was involved in any anomalous interactions. The process forStoEnt S k sets S k . Update = True to allow updates to the state of S k after the interaction. Human

Inference Module

This event also triggers the Inference module (Section 5.4) that analyses the interaction history

Exit – Data Extractor with each

OBlob in H i . ActionQ and records the inferences made in the H i . InferenceQ . – Inference State Machine It raises alerts if

HumEnt H i was involved in any anomalous interactions. Once all inferences – Anomaly Detector are made, the time of exit t exit is updated and the entity information is permanently recorded. HandleExit

The instance is then deleted and the allotted worker process is freed and made available to the

HumEnt_Group .. Table 4:

Functionality of the different Event Handlers

B | General Purpose Two Way Communication

Figure 25:

General Purpose Two Way Communication function

The signature of the gptwc() function as follows:

TWC_Out gptwc(Comm comm_name, String oper_type, int p_initiator, int p_target, TWC_Inp inp_obj)

In order to understand the signature of gptwc() , the reader would ﬁnd it helpful to also look at the depiction in Figure25 that shows an initiator process and a target process. The former would also like to send a data object inp_obj , oftype

TWC_Inp , to the latter and receive from the latter a result in the form of a data object denoted out_obj of type

TWC_Out .Revisiting the function signature shown above, the ﬁrst parameter in the function signature is comm_name which mustbe set to the communicator needed for the two-way communication link. The parameter oper_type indicates thespeciﬁc type of functionality that needs to be executed by the target process. The parameter p_initiator is theinitiator process rank that requested the two-way communication link. By the same token, the parameter p_target is31

HECK S OFT - F

EBRUARY

23, 2021the target process rank that is the other endpoint of the communication link. The parameter inp_obj is an instance ofthe class

TWC_Inp that has two attributes: (1) d_input that is set to the data block being sent by the initiator process tothe target process; and (2) mode that is used to notify the target process the type of operation it needs to perform asfollows: mode =  Fetch Data using d_input

Compute operation on d_input

Fetch using d_input followedby a compute operationThe gptwc() function returns an instance out_obj of the class

TWC_Out which also has two attributes: (1) d_output attribute which is set to the result that the initiator process is expecting from the target process; and (2) status thatindicates if any errors were encountered when processing the data object sent over by the initiator process. When status has a value of 1, that signiﬁes success; other values denote speciﬁc types of errors.

C | Inference Logic for different Applications

C.1 Airport Checkpoint Security

The airport checkpoint security deals with distinct and unique objects and each of them are assigned an unique ID.For interactions with each of these objects, there is a temporal relationship between consecutive interactions involvingthe same person and the same object. To elaborate further, the higher level interactions such as –

Divest object, Moveobject from bin, Move object to bin, Leave behind object and Collect object depends on the sequence of two consecutiveelementary actions as shown in Table 5. The

HumEnt instances own

OBlob instances and

StoEnt instances and hence these are the two ownership relationshipsthat are tested by the Anomaly Detector. The ﬁrst interaction determines whether the

HumEnt instance divested anitem or not and if so, he is

Set as the owner of the

OBlob . The ownership information of the

StoEnt instances are setbefore at the time of their instantiation. It can be seen in Table 5, how the Inference module can detect different typesof anomalies and can either issue a warning or raise alarms, whenever a non-owner

HumEnt instance interacts with a

OBlob and

StoEnt instance that does not belong to him/her/them.

C.2 Automated Retail Store

On the other hand, the automated retail store application deals with multiple objects of the same product type and hencehave the same ID. Interactions with a speciﬁc type of product can happen in any arbitrary temporal sequence. Forexample, a

CustEnt might pick-up 5 items and then return back 2 items and another

CustEnt can pick up an itemone by one and return it one by one. In this application, the higher level interactions such as –

Pick-up item, Returnitem to correct shelf and Misplace item in wrong shelf depend only on the latest interaction with an item. This is whythe inference module logic for an automated retail store analyzes only the latest interaction with an item instead of asequence of elementary actions, as shown in Table 6. The

HumEnt instances only become owners when they pay at the time of exit. So as long as a

HumEnt instance is withinthe store,

StoEnt instances own the

OBlob instances and this is the only ownership relationship that is tested by theAnomaly Detector. The ownership information is assumed to be set at the time of installation as it is expected the storewill have this information before.It can be also be seen in Table 6, how the Inference Module can keep track of how many objects were purchased andautomatically bill the customer for each product. Further, it can identify when a product is misplaced and notify supportstaff. Besides, it keeps track of how many times a particular type of product was inspected and returned and how manytimes a product was actually purchased, and this data can be used by businesses to optimize their operations and build amore efﬁcient inventory management system. The symbol φ in the table represents an empty value (meaning that the next elementary action in the sequence is the ﬁrstinteraction with the object), A and R represent the elementary actions Add and Remove respectively and DNC represents Do NotCare and so it could be either φ , A or R. The symbols A and R represent the elementary actions Add and Remove respectively and DNC represents Do Not Care and so itcould be either A or R. HECK S OFT - F

EBRUARY

23, 2021

Inference State Machine Anomaly Detector Tasks

HumEnt H i ownsTriggeredby Event Latest Sequence ofElementary Actions Higher Level Interaction Set/TestOwnership OBlob StoEnt HandOut φ , A Divest own object O j in own Bin S k O j -Yes S k -Yes Append O j inSet (OBlob) H i .OwnershipDivest own object O j in other’s Bin S k Test (StoEnt) O j -Yes S k -No Append O j in H i .OwnershipRaise AlarmHandOut φ , R Taking other’s object O j from other’s Bin S k Test O j -No S k -No Raise AlarmHandOut R, A Move own object O j from own Bin S f to own Bin S t Test O j -Yes S f -Yes, S t -YesMove own object O j from own Bin S f to other’s Bin S t O j -Yes S f -Yes, S t -No Raise AlarmMove own object O j from other’s Bin S f to own Bin S t O j -Yes S f -No, S t -Yes Raise AlarmMove own object O j from other’s Bin S f to other’s Bin S t O j -Yes S f -No, S t -No Raise AlarmMove other’s object O j from own Bin S f to own Bin S t O j -No S f -Yes, S t -Yes Raise AlarmMove other’s object O j from own Bin S f to other’s Bin S t O j -No S f -Yes, S t -No Raise AlarmMove other’s object O j from other’s Bin S f to own Bin S t O j -No S f -No, S t -Yes Raise AlarmMove other’s object O j from other’s Bin S f to other’s Bin S t O j -No S f -No, S t -No Raise AlarmHumanExit DNC, A Left own object O j in own Bin S k Test O j -Yes S k -Yes Warn H i Left own object O j in other’s Bin S k O j -Yes S k -No Raise AlarmHumanExit DNC, R Collect own object O j from own Bin S k Test O j -Yes S k -YesCollect other’s object O j from other’s Bin S k O j -No S k -No Raise AlarmCollect other’s object O j from own Bin S k O j -No S k -Yes Raise Alarm Table 5: Inference module logic for Airport Checkpoint Security Application.

Inference State Machine Anomaly Detector Tasks

StoEnt S k owns OBlob O j Triggeredby Event Latest Sequence ofElementary Actions Higher Level Interaction Set/TestOwnership

HandOut R Picked up item O j from Shelf S k Test Yes nPurchase++; nInspect++;HandOut A Returned item O j to correct shelf S k Test Yes nReturn++; nPurchase–;Misplaced item O j in wrong shelf S k No nMisplace++; nPurchase–;Notify CustEnt and StaffEntHumanExit DNC - - - Add amount for all OBlobs O j in H i .ActionQ to total bill as : H i .Amount+=[nPurchase* O j .Price].Price]