A Hybrid Neuro-Symbolic Approach for Complex Event Processing
Marc Roig Vilamala, Harrison Taylor, Tianwei Xing, Luis Garcia, Mani Srivastava, Lance Kaplan, Alun Preece, Angelika Kimmig, Federico Cerutti
aa r X i v : . [ c s . A I] O c t To appear in EPTCS.
A Hybrid Neuro-Symbolic Approach for Complex EventProcessing *Marc Roig Vilamala, Harrison Taylor
Cardiff University
Tianwei Xing, Luis Garcia, Mani Srivastava
University of California, Los Angeles
Lance Kaplan
CCDC Army Research Laboratory
Alun Preece, Angelika Kimmig
Cardiff University
Federico Cerutti
University of Brescia
Training a model to detect patterns of interrelated events that form situations of interest can be acomplex problem: such situations tend to be uncommon, and only sparse data is available. Wepropose a hybrid neuro-symbolic architecture based on Event Calculus that can perform ComplexEvent Processing (CEP). It leverages both a neural network to interpret inputs and logical rules thatexpress the pattern of the complex event. Our approach is capable of training with much fewerlabelled data than a pure neural network approach, and to learn to classify individual events evenwhen training in an end-to-end manner. We demonstrate this comparing our approach against a pureneural network approach on a dataset based on Urban Sounds 8K.
Imagine a scenario where we are trying to detect a shooting using microphones deployed in a city: shooting is a situation of interest that we want to identify from a high-throughput (audio) data stream.Complex Event Processing (CEP) is a type of approach aimed at detecting such situations of interest,called complex events , from a data stream using a set of rules. These rules are defined on atomic piecesof information from the data stream, which we call events—or simple events , for clarity. Complex eventscan be formed from multiple simple events. For instance, shooting might start when multiple instances ofthe simple event gunshot occur. For simplicity, we can assume that when we start to detect siren events,authorities have arrived and the situation is being dealt with, which would conclude the complex event.Using the raw data stream implies that usually we cannot directly write declarative rules on thatdata, as it would imply that we need to process that raw data using symbolic rules; though theoreticallypossible, this is hardly recommended.Using a machine learning algorithm such a neural network trained with back-propagation is alsoinfeasible, as it will need to simultaneously learn to understand the simple events within the data stream,and the interrelationship between such events to compose a complex event. While possible, the sparsityof data makes this a hard problem to solve.The architecture we propose is a hybrid neuro-symbolic approach that allows us to combine theadvantages of both approaches. Our approach is capable of performing CEP on raw data after training inan end-to-end manner. Among other advantages, our approach is better at training with sparse data thanpure neural network approaches, as we will demonstrate. * This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under AgreementNumber W911NF-16-3-0001. The views and conclusions contained in this document are those of the authors and should notbe interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, theU.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized toreproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Code is available at https://github.com/MarcRoigVilamala/DeepProbCEP
AHybrid Neuro-Symbolic Approach for ComplexEvent Processing
ProbEC [5] is an approach for complex event processing using probabilistic logic programming. ProbECtakes an input stream of simple events—each of which has a probability of happening attached. Fromthere, it is capable of outputting how likely it is for a complex event to be happening at a given point intime based on some manually-defined rules that describe the pattern for the complex event.In a previous paper, we built on top of ProbEC proposing a system that made use of pre-trainedneural networks to detect complex events from CCTV feeds [3]. However this approach required accessto pre-trained neural networks to process the simple events, which are not always available. To solve thisissue, we moved towards end-to-end training, which is able to train these neural networks from just theinput data and labels on when the complex events are happening.In order to implement an end-to-end training with a hybrid neuro-symbolic approach, we made useof
DeepProbLog [2], which incorporates deep learning into a probabilistic programming language. Thisallowed us to train the neural networks as part of the system in an end-to-end manner.DeepProbLog allows users to make use of the outputs of a neural network as part of the knowledgedatabase in a ProbLog program. DeepProbLog also allows users to train those neural networks in anend-to-end manner by calculating the gradient required to perform the gradient descent based on thetrue value of the performed query and the outputs provided by the neural network. This allows us toimplement a hybrid neuro-symbolic architecture that is able to learn in an end-to-end manner.
In this paper, we are proposing a hybrid neuro-symbolic architecture that performs CEP. As a proof ofconcept, we have implemented our architecture to perform CEP on audio data. In our implementation,each audio second is processed by a PyTorch implementation of VGGish , a feature embedding frontendfor audio classification models [1] which outputs a feature vector for each second of the original audiofile. We use these features as input of a multi-layer perceptron (MLP, AudioNN in the figure) thatclassifies the audio into a pre-defined set of classes.The output of this neural network is then used in the logic layer, which contains the rules required toperform CEP. Here, the user can define which patterns of simple events constitute the starts and ends ofwhich complex events.DeepProbLog is used to integrate the different parts, which allows us to train the neural networkwithin the system in an end-to-end manner, which heavily reduces the cost of labelling; it is practicallyinfeasible to label each second of large collections of audio tracks, while it is much easier to identify thebeginning and the end of complex events as situations of interest. As such the system is provided withraw audio data and, for training, labels on when the complex events start and end.We experimentally compare our approach against a pure statistical learning approach using a neuralnetwork (Pure NN). Pure NN exchanges the logical layer for an MLP, which will learn the rules thatdefine complex events. We engineered a synthetic dataset based on Urban Sounds 8K [4]. We considertwo repetitions of the same start event—or end event— within a certain time window as the signal of thebeginning—or termination— of a fluent. Then, to test the efficacy of the approach, we varied the size ofthe time window for repetitions, from 2 to 5 seconds. The information on when a complex event beginsor ends is used as training data. The goal of our sysntetic dataset is to be able to detect when a complexevent is happening from raw data. Available at https://github.com/harritaylor/torchvggish arc RoigVilamala etal. 3Table 1: Accuracy results (average over 10-fold cross validation) for a hybrid neuro-symbolic architec-ture (our approach) and a pure neural network approach for both individual sounds (Sound Acc) and apattern of two instances of the same sound class (Pattern Acc) within the window size. Best in bold.Approach Window Size2 3 4 5Sound Accuracy Hybrid (Our approach)
Pure NN 0.0725 0.1155 0.0845 0.0833Pattern Accuracy Hybrid (Our approach)
Pure NN 0.1843 0.2034 0.2289 0.1927For all the reported results, the corresponding system has been trained on a sequence generated fromrandomly-ordering the files from 9 of the 10 pre-sorted folds from Urban Sounds 8K, with the remainingfold being used for testing. As recommended by the creators of the dataset, 10-fold cross validation hasbeen used for evaluation. Before each evaluation, both systems have been trained for 10 epochs with 750training points on each epoch. This allows for both approaches to converge according to our experiments.Table 1 shows the results of our approach and Pure NN with different window sizes. It shows both theperformance for detecting the starts and ends of complex events (Pattern Accuracy) and for classifyingthe simple events in the sequence (Sound Accuracy). As we can see in the table, our approach is clearlysuperior, as Pure NN has a performance only marginally better than a random classifier, which wouldarchive a performance of about 10%. Therefore our approach is very efficient at learning from sparsedata, and, as a byproduct, can also train the neural network to classify simple events.
In this paper we demonstrated the superiority of our approach against a feedforward neural architecture.Further investigations that consider recurrent networks (RNNs) particularly long-short term memory(LSTM) networks are ongoing; we thank an anonymous reviewer for this suggestion. Further researchcould also include rule learning, which could be used to remove the necessity of knowing which patternsof simple events form which complex events.
References [1] S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. ASaurous, B. Seybold et al. (2017):
CNN architectures for large-scale audio classification . In: 2017 IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing, IEEE, pp. 131–135.[2] Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester & Luc De Raedt (2018):
DeepProbLog: Neural Probabilistic Logic Programming . In: NIPS2018, pp. 3749–3759.[3] M. Roig Vilamala, L. Hiley, Y. Hicks, A. Preece & F. Cerutti (2019):
A Pilot Study on Detecting Violence inVideos Fusing Proxy Models . In: 201922thInternationalConferenceonInformationFusion, pp. 1–8.[4] J. Salamon, C. Jacoby & J. P. Bello (2014):
A Dataset and Taxonomy for Urban Sound Research . In: 22ndACMInternationalConferenceonMultimedia(ACM-MM’14), Orlando, FL, USA, pp. 1041–1044.[5] A. Skarlatidis, A. Artikis, J. Filippou & G. Paliouras (2015):