Experimenting with Innate Immunity
llibtissue - implementing innate immunity
Jamie Twycross, Uwe Aickelin
Abstract — In a previous paper the authors argued the casefor incorporating ideas from innate immunity into artificialimmune systems (AISs) and presented an outline for a concep-tual framework for such systems. A number of key generalproperties observed in the biological innate and adaptiveimmune systems were highlighted, and how such propertiesmight be instantiated in artificial systems was discussed indetail. The next logical step is to take these ideas and build asoftware system with which AISs with these properties can beimplemented and experimentally evaluated. This paper reportson the results of that step - the libtissue system.
I. I
NTRODUCTION libtissue is a software system for implementing andevaluating AIS algorithms on real-world monitoring andcontrol problems. AIS algorithms are implemented as multi-agent systems of cells, antigen and signals interacting withintissue compartments. Input data is provided by sensors whichmonitor a system under surveillance, and cells are activelyable to affect the monitored system through response mech-anisms. libtissue provides a general implementationalframework within which many different AIS algorithms canbe instantiated, rather thanc [1]. libtissue is being usedat the University of Nottingham to explore the application ofa range of novel immune-inspired algorithms to problems inintrusion detection.A brief review of the biological and conceptual views thatunderpin the design of libtissue is given in Section II,more detailed background information can be found in aprevious paper [2]. This is then followed by a detaileddescription of the libtissue implementation in Section III. libtissue has grown into a fairly complex software systemand its use is better understood in the context of examples.Thus, Section IV shows how libtissue can be applied toa real-world problem in computer security, and Section Vdescribes the implementation of a simple example algorithmusing libtissue . An analysis and evaluation of this algo-rithm are then presented in Section VI. The paper concludeswith a brief summary and discussion of future work inSection VII. II. A
PPLYING INNATE IMMUNITY
In a previous paper [2] the authors describe several biolog-ical processes in detail and then discuss these biological pro-cesses at a conceptual level. This biological and conceptualview of the immune system forms the foundation upon whichthe libtissue implementation is built, and a brief summaryis given here. The reader is referred to [2] and [3] for furtherdiscussions and explanations of the biological terminology.
Jamie Twycross, [email protected] (corresponding author), and UweAickelin, [email protected], are at the University of Nottingham, U.K.
The biological immune system is a complex system ofcells of different types interacting with each other and thetissue in which they reside. The key elements of the systemare cells, signals and antigen, combined with the environ-ment, tissue. Cells have access to their environment throughantigen and signals. Essentially, signals provide cells withinformation on the behaviour of entities in their environment,while antigen provides cells with information on the structure of these entities. In the biological system structure reflected atan antigenic level and behaviour at a signal level are tightlycoupled. If the behaviour of a cell changes then so doesits antigen profile and vice versa. Part of the motivation forthe research presented here comes from a desire to betterunderstand how information from these two levels determinesthe dynamics of the immune system.As well as providing information on behaviour, signalsalso provide a control mechanism for immune system cells.The behaviour of a single cell is determined by complexsignalling networks which are actively maintained betweencells. A cell’s behaviour can be seen in terms of the functionsa cell performs. Of particular interest are the functionsof antigen processing, signal processing, cellular binding,antigen matching and antigen response. Simple antigen pro-cessing consists of two steps: antigen ingestion and antigenpresentation. During ingestion, antigen is transfered fromthe extracellular space to the interior of the cell. Duringpresentation, internalised antigen is displayed on the surfaceof the cell. Additional manipulation of the antigen whilstinside the cell is also possible. A specialised class of cellscalled APCs performs antigen processing in the body. Signalprocessing refers to the ability of a cell to have its behaviourinfluenced through the level of a signal, such as a cytokineor hormone in the extracellular space. Control of DCs byPAMPs and Danger Signals, or of T helper cells by DCsprovide good examples of this.While signals allow cells to influence each other withoutcoming into contact, many immune system processes involveinteractions between cells which require contact. Cells bindwith each other through the action of adhesion moleculesand receptors on their surfaces. Antigen matching, the abilityof certain classes of receptors, for example TcRs, to onlybe activated by specific patterns of antigen is one exampleof this. Antigen matching within a particular context leadsto cells mounting a response, such as the initiation of thecomplement cascade. This response has an impact on theenvironment, causing other cells to change their behaviour,and so their structure, and closes the loop between cell andenvironment.II. S
YSTEM I MPLEMENTATION
The aim of the research presented here is to build asoftware system which allows researchers to implement andanalyse novel AIS algorithms and apply them to real-worldproblems. This clearly translates into three separate areas offunctionality: algorithm implementation, algorithm analysisand algorithm application. This section begins by describinghow the overall architecture of libtissue delivers thesefunctionalities. It then goes on to present in as much technicaldetail as space permits how these functionalities have actuallybeen implemented. antigenresponsesignalnetworkingprocessessystemoperating antigen storesignal store compartment cells libtissue clients libtissue server
AIS algorithmdata representationdata source monitored hosts
Fig. 1. The architecture of libtissue . libtissue clients monitor a hostand provide input data to a libtissue server and AIS algorithm. Clientsalso allow algorithms to change the state of the monitored host. libtissue has a client/server architecture pictured inFigure 1. An AIS algorithm is implemented as part of a libtissue server, and libtissue clients provide inputdata to the algorithm and response mechanisms which changethe state of the monitored system. This client/server architec-ture separates data collection by the libtissue clients fromdata processing by the libtissue servers and allows forrelatively easy extensibility and testing of algorithms on newdata sources. libtissue is coded in C as a Linux sharedlibrary with client and server APIs, allowing new antigenand signal sources to be easily added to libtissue serversfrom a programmatic perspective. Because libtissue isimplemented as a library, algorithms can be compiled andrun on other machines with no modification. Client/servercommunication is socket-based, allowing clients and serversto potentially run on separate machines, for example a signalor antigen client may in fact be a remote network monitor.AIS algorithms are implemented within a libtissue server as multiagent systems of cells. Cells exist withinan environment, called a tissue compartment, along withother cells, antigen and signals. The problem to which thealgorithm is being applied is represented by libtissue as antigen and signals. Cells express various repertories ofreceptors and producers which allow them to interact withantigen and control other cells through signalling networks. libtissue allows data on implemented algorithms to becollected and logged, allowing for experimental analysis ofthe system. A. libtissue clients libtissue clients are of three types: antigen, signal andresponse. Antigen clients collect and transform data into anti-gen which are forwarded to a libtissue server. Currently,a systrace antigen client has been implemented whichcollects process system calls (syscalls) using systrace [4].Syscalls are a low-level mechanism by which applicationsrequest system services such as peripheral I/O or memoryallocation from an operating system. Signal clients monitorsystem behaviour and provide an AIS running on the tissueserver with input signals. A process monitor signal client,which monitors a process and its children and recordsstatistics such as CPU and memory usage, and a networksignal client, which monitors network interface statistics suchas bytes per second, have been implemented. Two responseclients have been implemented, one which simply logs analert, and another which allows an active response throughthe modification of a systrace syscall policy. All of theseclients are designed to be used in realtime experiments andfor data collection for offline experiments with tcreplay .The implementation is designed to allow varied AIS algo-rithms to be evaluated on real-world, realtime systems andproblems. When testing IDSs it is common to use preexistingdatasets such as the Lincoln Labs dataset [5]. However, theproject libtissue has been built for is focused on combin-ing measurements from a number of different concurrent datasources. Preexisting datasets which contain all the necessarysources are not available. Therefore, to facilitate experimen-tation, a libtissue replay client, called tcreplay , wasalso implemented. This client reads in log files gathered fromprevious realtime runs of antigen and signal clients, and alsohas the facility to read logfiles generated by strace [6].It then sends these logs to a libtissue server. Variablereplay rates are available, allowing data collected from arealtime session to be used to perform many experimentsquickly. Having such a replay facility is important in terms ofreproducibility of experiments. In this paper, all experimentalruns are scripts which take data and parameter files as inputand run a tissue server and tcreplay client.
B. libtissue servers A libtissue server is in fact several threaded pro-cesses running asynchronously. An initialisation routine isfirst called which creates a tissue compartment based onuser-supplied parameters. During initialisation a thread isalso started to handle connections between the server and libtissue clients, and this thread itself starts a separatethread for each connected client. After initialisation, cells,the characteristics of which are specified by the user, arecreated and initialised, and the tissue compartment populatedwith these cells. Cells in the tissue compartment then cycleand input data is provided by connected libtissue clients.
1) Tissue compartments:
The libtissue server providesa multiagent simulation engine in which AIS algorithmscan be implemented. At the centre of this simulation is theconcept of a tissue compartment. A tissue compartment ishe environment in which cells, signals and antigen interact.As well as housing cells, the maximum number of which isdetermined by the max cells parameter, each tissue compart-ment has a fixed-size antigen store, set by the max antigen parameter, where antigen provided by libtissue clients isplaced. The tissue compartment also stores a fixed-numberof signals, set by the max cytokines parameter, the levels ofwhich are set either by signal tissue clients or cells.Input data can undergo some preprocessing before enteringa tissue compartment. As well as representing the target-domain problem as antigen and signals, one of the roles of libtissue is to frame it in a “ more biological ” way in thefollowing sense. The biological systems which biologically-inspired algorithms are based upon are specific for a par-ticular environment with particular characteristics. For theimmune system these characteristics include rate of antigen,uniqueness of antigen and antigen turnover. libtissue implements these functions by allowing the preprocessingof data from libtissue clients before it enters the tissue,controlled by a number of user-defined parameters. The antigen multiplier parameter determines the number of copiesof an incoming antigen placed into the tissue antigen storage.It was found necessary to have such a parameter since, aswill be seen, datum in real-world problems often occur ata low frequency. Biologically, it is the case that a certainlevel of antigen is necessary to simulate the system, this is,a single unique antigen will not perturb the immune systemmuch. Seen from the level of the pathogen, which is made upof repeated protein structures and reproduces itself multipletimes, this is also clear. The multiplicity of antigen seems tobe an important property of the biological immune system. Inessence, the antigen multiplier parameter allows libtissue to emulate this property for problems which have differingdegrees of multiplicity in their input data, and its value istherefore problem dependent.Another important concept related to antigen multiplicityis that of antigen persistence. In the biological system indi-vidual antigen do not persist indefinitely, but instead there isa turnover of antigen. This is provided for by libtissue on one level through the limitation of the size of a tissuecompartment’s antigen store by the max antigen parameter,and can be seen by tracing the transit of antigen that arereceived by a libtissue server. After preprocessing by the libtissue server as detailed above, new antigen, multipliedby the antigen multiplier parameter, will simply overwriteexisting antigen. Antigen is then transferred from the tissueto the internal antigen store of cells with antigen receptors.From the internal store antigen is transferred to antigenproducers on these cells, where it persists for a user-definedtime period before being removed. Users can also removeantigen from a cell’s store in the cell cycle callback. Thesefactors combine to create a turnover of antigen in the tissue,with antigen entering and eventually being removed.Even when composed of relatively small numbers ofsimple actors, the behaviour of multiagent systems is oftendifficult to understand. While formal analysis is possible, an experimental approach is more often adopted. libtissue implements probes which periodically sample and log datafrom a tissue compartment. Sampling is necessary, sinceeven with simple algorithms such as the one described inSection V below it is infeasible in terms of storage space andperformance to log all of the data produced. Additionally,since any experiment will require only certain data, thedetails of what is logged are left to the user, who providesa probe callback function. The rate at which this callback isrun, and so the rate at which data is sampled, is defined bythe probe rate parameter. Probes allow data to be efficientlygathered and ease the experimental evaluation of algorithms.
2) libtissue cells: libtissue cells, like tissue compart-ments, have antigen and signal stores, the sizes of whichare set by the num antigen and num cytokines parameters.They also have a number of different receptors and producerswhich allow them to interact with others cells, antigen andsignals in the tissue compartment. Currently, four types ofreceptors have been implemented: antigen, cytokine, celland VR receptors. Antigen receptors allow cells to transferantigen from the tissue compartment to their own internalantigen store. Cytokine receptors allow cells to read signallevels in the compartment. Cell receptors allow cells to bindto other cells. Binding is necessary for VR receptors to beactivated, which match antigen presented on another cell.Antigen from a cell’s internal store are presented on antigenproducers, one of the three types of producers currentlyimplemented. The other two types, response and cytokineproducers, allow cells to communicate with response clients,and to change signal levels in the tissue compartment andhence control the behaviour of cells with cytokine receptorsrespectively.While libtissue provides the basic building blocksfor modelling biological cells in terms of receptors andproducers, the details of their actual configuration on cellsand how cells behave in response to them is specified by theuser. libtissue implements a simple scheduler which isperiodically called at a rate defined by the cell update rate parameter. When called, the scheduler, taking the cells in arandom order, first sets the values of the receptors for allof the cells. A user-defined cell cycle callback function isthen executed for each cell. This function is essentially thecontroller for the cell, and determines how the actions of itsreceptors and producers are related. Once all of the callbackshave been run, the scheduler updates the tissue compartmentaccording to cells’ antigen producers. This design, since thecell cycle callback is in fact an arbitrary C function, meansthat cells can have complex behaviours. The specific actionand parameters of the various producers and receptors is nowdescribed in more detail.Antigen receptors allow the transfer of antigen from thetissue compartment’s antigen store to the internal store ofa cell. Transfered antigen is removed from the tissue com-partment. For each antigen receptor a cell has, a randomlocation in the tissue antigen store is chosen. If the locationcontains no antigen then none is transfered for that receptor. random location is picked in the cell’s antigen store intowhich to transfer the tissue antigen. If this location containsa previously transfered antigen then it is overwritten by theincoming antigen. Clearly, both the parameter settings for thesize of the tissue and cell antigen compartments, max antigen and num antigen respectively, as well as the rate of incomingantigen to the tissue compartment, will affect the overall rateat which antigen is transferred from the tissue compartmentto the internal antigen store of cells.Cytokine receptors allow cells to read the values of thesignals stored in the tissue compartment, which are setby libtissue signal clients or cells themselves throughcytokine producers. As well as providing a control mech-anism for cells, cytokine receptors are designed to givecells sensitivity to external signals. Cytokine receptors areinitialised as receptive to a specific tissue cytokine and ateach iteration the value of this cytokine is copied to thereceptor. This value is available for use during the user-specified cell cycle callback, and can, for example, affectthe value of an internal cytokine or be used to determine therange of receptors a cell expresses.Cell receptors model the concept of cellular bindingin libtissue and enable cells to restrict some recep-tor/producer interactions. A cell receptor can be specific fora cell of a particular type. At each time step a random indexin the tissue compartment’s cell store is chosen for each cellreceptor. If a cell of the same type as the cell receptor existsat that index then that cell’s index is copied to the receptor.Only when a cell is bound can certain other receptors, suchas VR receptors, be activated.VR receptors allow antigen presented on antigen producersto be matched. A VR receptor is the lock part of a lock-and-key type receptor mechanism. The lock is opened, that isthe receptor activated, by certain antigen, the keys, whichare presented on antigen producers of other cells. The exactstructure of the locks and keys and the matching criteriachosen to establish which keys fit which locks is problemdependent. libtissue provides for this by allowing the userto specify the lock and key structure and matching in user-defined callback functions. VR receptors enable libtissue cells to perform antigen matching.Antigen producers take antigen and make it available forinspection by other cells through VR receptors. Antigenproducers work much like antigen receptors except that theytransfer a randomly chosen antigen if available from a cell’sinternal store to the antigen producer itself. The antigen isremoved from the cell’s store and replaces any antigen whichmay already be on the antigen producer. This transfer andoverwriting, when combined with antigen receptors, allowsantigen to be passed through the system from tissue to cell toan antigen producer on a cell and so to eventual destruction.The parameter settings for the number of antigen receptorsand producers a cell has, along with the size of the cell’santigen store, affect how quickly this process takes place.One further parameter is available which has proved usefulin controlling this process. Antigen producers have an action time which determines the number of cell cycles an antigenis displayed for on the producer. While an antigen is beingdisplayed, it cannot be overwritten by other antigen. Togetherwith antigen receptors, antigen producers give a cell theability to process antigen.Cytokine producers allow signals stored in the tissuecompartment to be set. At each time step the value on thecytokine producer affects the value to the correspondingcytokine in the tissue compartment. Since the values ofcytokines can also be read by other cells, cells equipped withboth cytokine receptors and producers are capable of signalprocessing and can form complex signalling networks. Re-sponse producers allow cells to send messages to libtissue response clients and so actively affect the systems they aremonitoring. The semantics of the message and its actualeffects are determined by user-supplied callbacks. In thispaper, only a simple response producer which logs a mes-sage is considered. Active responses in combination with libtissue response clients are also possible. If the actionof response producers is linked to cytokine and VR receptorsin the cell cycle callback then cells can be made to respondto antigen in a selective way.IV. A
N EXAMPLE PROBLEM
The architecture described in the previous section allowsAIS algorithms to be implemented and experimentally eval-uated fairly easily, and an example algorithm will shortly begiven. First, this section addresses how libtissue can beused to test algorithms on realistic data derived from real-world problems. A brief review of a real-world intrusiondetection problem is now presented, followed by a shortdescription of the datasets gathered for this problem.Fundamentally, anomaly detection in intrusion detectionrests on the idea of a normal profile of behaviour, deviationsfrom which are considered as attacks [7]. It is attractive inthat it allows novel attacks to be detected so long as one candetermine to a sufficient degree of accuracy what is normal.Errors occur when instances of normal behaviour are seenas attacks, the false positive rate, or when attacks are seenas normal behaviour, the false negative rate. Reducing thefalse positive rate of anomaly detection systems is currentlya key area of research in intrusion detection. Process anomalydetection is a specific example of one such anomaly detectionproblem. Several process anomaly detection systems havebeen built on the idea of using syscalls to monitor the be-haviour of processes. Research such as [7] and [8] has shownthat this avenue is promising, especially when combined withother sources of data such as context signals. Systems suchas systrace [4] have also been implemented which allowprocess behaviour to be controlled through syscall policies.In order to gather data for the process anomaly detectionproblem a small experimental network with three hostswas set up. Pairs of strace and process monitor logswere collected on the instrumented target machine while rpc.statd was utilised in a number of different scenarios.These logs were then parsed to form a single tcreplay logfile for each of the scenarios. An antigen entry in the creplay log was created for every syscall recorded in the strace log. A signal entry was created for each recordingof CPU usage in the process monitor log. While the strace log actually contains much more information, theuse of just the syscall number is more than sufficient fortesting the example algorithm described in the next section.It would be expected that a more complex algorithm wouldrequire additional complexity in both the antigen and range ofsignals it is provided with, such as the addition informationabout syscall arguments, sequences of syscalls, or instruc-tion pointer address. A larger number of datasets wouldalso be necessary to statistically validate an algorithm. Themonitored scenarios are divided into three groups based onwhether the type of interaction with the rpc.statd serveris a successful attack, a failed attack, or normal usage.V. A
N EXAMPLE ALGORITHM
This section describes an example AIS algorithm called twocell implemented using libtissue . The algorithmis primarily intended to evaluate the initial libtissue implementation, and also as an explanatory aid to help thereader understand how the fairly complex system describedin Section III is used to actually implement an algorithm.For this reason the functions and interactions of the cellsin the example are kept fairly simple. This simplicity willof course limit the algorithm’s overall performance on theproblem when compared to existing solutions. On the otherhand it allows for the behaviour of the algorithm to be tracedat an in-depth level, the results of which are presented inSection VI below. This also makes the algorithm a useful toolfor testing and evaluation of the libtissue implementationitself. Other papers such as [9] focus on more complexalgorithms developed with libtissue . A. the twocell algorithm celltype 1 receptorcytokineantigenproducerantigenreceptor cell responseproducer celltype 2 receptorvrreceptor
Fig. 2. The two different cell types implemented in twocell . The cells in twocell , shown in Figure 2, are of two types,labelled Type 1 and Type 2, and each type has differentreceptor and producer repertories, as well as different cellcycle callbacks. Type 1 cells are designed to emulate twokey characteristics of biological APCs: antigen and signalprocessing. In order to process antigen, each Type 1 cell isequipped with a number of antigen receptors and producers.A cytokine receptor allows Type 1 cells to respond to thevalue of a signal in the tissue compartment. Type 2 cellsemulate three of the characteristics of biological T cells: cellular binding, antigen matching, and antigen response.Each Type 2 cell has a number of cell receptors specific forType 1 cells, VR receptors to match antigen, and a responseproducer which is triggered when antigen is matched. Type 2cells also maintain one internal cytokine, an integer which isincremented every time a match between an antigen producerand VR receptor occurs. If the value of this cytokine is stillzero, that is no match has occurred, after a certain numberof cycles, set by the cell lifespan parameter, then the valuesof all of the VR receptor locks on the cell are randomised.Settings for the various parameters are given in Table I.
TABLE IT HE libtissue PARAMETER SETTINGS USED FOR twocell . max antigen max cytokines max cells cell update rate ( µsecs ) 100000 antigen multiplier num cells num antigen num antigen receptors num antigen producers antigen producer action time num cells cell lifespan num cell receptors num vr receptors num response producers probe rate ( µsecs ) 1000000 A tissue compartment is populated with a number of Type1 and 2 cells. Antigen and signals in the compartment areset by libtissue clients based on the syscalls a process ismaking and its CPU usage. Type 1 and 2 cells have differentcell cycle callbacks. Type 1 cells ingest antigen through theirantigen receptors and present it on antigen producers. Theperiod for which the antigen is presented is determined by asignal read by a cytokine receptor on these cells, and so canbe made dependant upon CPU usage. Type 2 cells attemptto bind with Type 1 cells via their cell receptors. If bound,VR receptors on these cells interact with antigen producerson the bound Type 1 cell. If an exact match between a VRreceptor lock and antigen producer key occurs, the responseproducer on Type 2 cells produces a response, in this case alog entry containing the value of the matched receptor.VI. R
ESULTS
One of the goals of libtissue is to allow algorithms tobe experimentally evaluated and tested. The aim of this sec-tion is to highlight through a handful of simple experimentsthe methodology employed when attempting to understandthe dynamics of algorithms implemented with libtissue and when testing them on a real-world problem. twocell isused for this purpose and its behaviour is examined whenapplied to six datasets. The first experiment looks at aumber of twocell runs, while the second takes one runand examines it more closely. The third evaluates the per-formance of a syscall policy generated by twocell . Duringthese experiments, in order to more clearly understand thedynamics of twocell , the cytokine receptor on Type 1 cellsis disabled, thus making twocell unresponsive to the CPUusage external signal. The final experiment returns to thequestion of signals and compares the effect the addition ofthe signal has on the dynamics of twocell . The parametersgiven in Table I were used for all experiments, which werecarried out on a 2GHz AMD64 Turion laptop running Linux.Runs used on average around 1%, and never more than 3%,of the available CPU resources.
TABLE IIT
HE NAIVE SYSCALL POLICY AND THE AVERAGE twocell
POLICYGENERATED FROM THE normal1
AND normal2
DATASETS . syscall freq mean sd cv chdir(12) 2 0.07 0.26 371execve(11) 2 0.07 0.26 371personality(136) 2 0.07 0.34 485setsid(66) 2 0.07 0.34 485fork(2) 2 0.10 0.37 370write(4) 2 0.10 0.37 370send(309) 2 0.15 0.56 373time(13) 2 0.15 0.40 266fstat64(197) 2 0.17 0.52 305lseek(19) 2 0.17 0.42 247fsync(118) 2 0.25 0.80 365getrlimit(191) 2 0.28 0.67 320listen(304) 2 0.28 0.63 239select(142) 3 0.57 1.48 225gettimeofday(78) 4 0.50 0.85 276getsockname(306) 4 0.53 1.47 170exit(1) 4 0.55 1.38 277uname(122) 4 0.75 1.91 250stat(106) 4 0.80 2.58 259connect(303) 5 1.60 2.48 254getdents(141) 8 0.20 0.73 322mprotect(125) 8 0.47 1.30 185poll(168) 8 0.90 1.67 224sendto(311) 9 0.95 2.13 225recvfrom(312) 9 2.45 3.68 233rt sigaction(174) 10 0.97 2.19 155getpid(20) 10 1.60 2.28 142fcntl(55) 12 1.18 2.76 268bind(302) 12 1.68 4.51 200munmap(91) 15 1.88 3.77 225brk(45) 16 2.25 3.78 168fstat(108) 23 2.33 4.45 229ioctl(54) 24 2.73 4.67 190socket(301) 25 3.10 4.97 150old mmap(90) 27 1.90 4.29 171read(3) 27 2.25 5.17 160open(5) 30 5.95 7.75 130close(6) 557 19.43 27.03 139 In experiments it is important to have a baseline withwhich to compare algorithmic performance. In terms ofsyscall policies such a baseline can be generated and is heretermed a naive policy . A naive syscall policy is generatedfor a process, such as rpc.statd , by recording the syscalls
TABLE IIIT
HE SYSCALL POLICY GENERATED BY twocell
FOR THE normal2
DATASET AND THE FREQUENCY OF RESPONSE FOR EACH SYSCALL . syscall frequency gettimeofday(78) 1listen(304) 1send(309) 1select(142) 2poll(168) 3recvfrom(312) 8fcntl(55) 9fstat(108) 9open(5) 22close(6) 34 it makes under normal usage, as in the normal1 and nor-mal2 datasets. A permit policy statement is then createdfor all syscalls seen. This baseline is not too unrealisticwhen compared to how current systems such as systrace automatically generate a policy. The first column of Table IIshows the permitted syscalls (syscall number given in brack-ets) in such a naive policy generated from the normal1 and normal2 datasets. The frequency with which each syscall wasobserved at combined over the two datasets is given in thesecond column, as this will be useful for further analysis.Similarly to the naive policy, one way in which twocell can be used is to generate a syscall policy by running itwith normal usage data during a training phase. During therun, responses made by Type 2 cells are recorded. At theend of each run, a syscall policy is created by allowing onlythose syscalls responded to, and denying all others. Sinceinteractions in libtissue are stochastic, looking at theaverage results over a number of runs helps to understand thebehaviour of implemented algorithms. A script was written tostart the twocell server and then after 10 seconds start the tcreplay client and replay a dataset in realtime. twocell was allowed to continue running for a further minute afterreplay had finished. This process was repeated 20 times forboth the normal1 and normal2 datasets, yielding 40 individualsyscall policies. A single average twocell policy was thengenerated by allowing all syscalls which were permitted inany of the 40 individual policies. It was found that all ofthe 38 syscalls that were permitted in the naive policy werealso permitted in the average policy. The mean frequencywith which the syscall appeared in a policy is given in thethird column of Table II. As expected, there appears to bea correlation between the frequency that a syscall occursand the likelihood of it being in a policy generated by twocell . Standard deviations, given in the fourth columnof Table II, appear to at first show an increasing amount ofnoise for high-frequency syscalls. However, examination ofthe coefficient of variation for each syscall, given in the lastcolumn of Table II, shows that there is in fact more variationin the frequencies of response to the lower frequency syscalls.The last experiment showed that the twocell algorithmas the property of responding in a selective way to input databased on the frequency at which an input data item occurs.In order to examine more closely how twocell responds, asingle run of the twocell algorithm was observed. Follow-ing the same general procedure as the previous experiment, twocell was run once with the normal2 dataset. The result-ing policy is shown in Table III, along with the frequencieswith which the permitted syscalls were responded to. Duringthe run, the time at which a Type 2 cell produced a responseto a particular syscall was also recorded, and the rate at whichthese responses occur is plotted in Figure 3. The rate ofincoming syscalls is also plotted for comparison. This figureclearly shows a correlation between the rate of incomingsyscalls and the rate of responses produced by Type 2 cells.Cells initially do not produce any response until syscallsoccur, and then produce a burst of responses for a relativelyshort period before settling down to an unresponsive stateonce again. This is to be expected, as antigen enter and arepassed through twocell until their eventual destruction afterbeing presented on Type 1 cell antigen producers. seconds r a t e ( l og ) Fig. 3. The rate of incoming antigen and corresponding cell response ratesfor the normal2 dataset.
For the same run, the individual receptors expressed byType 2 cells can also be examined. Figure 4 shows therepertoire of VR receptors expressed by all 50 Type 2 cellsduring the run. A libtissue probe periodically recordedthe syscall values expressed by the VR receptors on all of theType 2 cells. A point is plotted in Figure 4 if the syscall wasbeing expressed during that period. Points for the 10 syscallswhich twocell responded to (see Table III) are highlighted.As expected, due to the limited lifespan of unmatched Type2 cells, set by the cell lifespan parameter, and after which thecell’s VR receptor is randomised, many bursts of around 10seconds of expression of VR receptors specific for a givensyscall are seen. Once a VR receptor matches, and a responseand permit policy is therefore produced for that syscall, thecell stops randomising its receptors. This can be observedfrom the continuous horizontal lines in Figure 4 for the 10highlighted syscalls.An example is now given of how the classification accu- seconds sysc a ll nu m be r Fig. 4. The VR receptor repertoire expressed by Type 2 cells for the normal2 dataset. Highlighted syscalls are the ones responded to. racy and error of a libtissue algorithm can be evaluated.In terms of syscall policies, a particular policy can be consid-ered successful in relation to the number of normal syscallsit permits versus the number of attack syscalls it denies.The naive policy and average twocell policy generatedfrom datasets normal1 and normal2 in the first experimentabove were evaluated in such a way. The number of syscallsboth policies permitted and denied when applied to the fourdatasets in the attack and failed groups was recorded. Foreach dataset, Table IV shows the percentages of attack andnormal syscalls in the dataset, together with the percentageof syscalls permitted by the naive and twocell policies.The results show that the tendency of the naive policy wasto permit the vast majority of syscalls, whether attack relatedor not. The twocell generated policy behaved much moreselectively, denying a slightly larger proportion of syscallsin the success1 and success2 datasets than it permitted. Forthe failure1 and failure2 datasets the converse was true.The previous experiments have all used the twocell al-
ABLE IVC
OMPARISON OF THE PERFORMANCE OF A NAIVE POLICY AND A twocell
POLICY GENERATED FROM THE normal2
DATASET . dataset success1 success2 failure1 failure2 normal syscalls
23% 23% 81% 87% attack syscalls
76% 76% 18% 12% naive permit
90% 90% 99% 99% naive deny
9% 9% 0% 0% twocell permit
47% 47% 69% 68% twocell deny
52% 52% 30% 31% gorithm with the cytokine receptors of Type 1 cells disabled.This was necessary in order to gain an initial understandingof the dynamics of twocell . This final experiment nowexamines how the addition of a context signal changes thedynamics of the algorithm. When enabled, the cytokine re-ceptor on a Type 1 cell controls the action time parameter ofantigen producers on these cells as follows. The action time parameter is initialised to a value of 100. If there is no changein the signal, CPU usage in this case, then the action timestays the same. If CPU usage decreases, the action timeis reduced by 50%, and if it increases, the action time isreset 100. twocell with its cytokine receptor enabled wasrun 20 times on the success2 dataset and the responses itproduced recorded. For a fair comparison, the mean actiontime observed on antigen producers over all of the runs,28.57 in this case, was calculated and the twocell algorithm without signals was run 20 times on the same dataset withthe action time of its antigen producers set to 29. Figure 5shows bspline curves fitted to the mean response rates of twocell with and without a signal over the 20 runs. Theresults show that the response time of twocell with a signalis much more tightly controlled, with responses starting anddropping off more rapidly and lasting for a shorter durationin total. This is to be expected in light of the incoming data,and from the action of the cytokine receptor, which causesa sudden rise and quick decreases in the action time of theantigen producers on Type 1 cells based on the rate of changeof the external signal.VII. C
ONCLUSIONS
The aim of this paper has been to describe the architectureof the libtissue implementation and how it is used toimplement and evaluate algorithms on real-world problems.After briefly laying down the biological and conceptualbackground, the libtissue implementation was describedin detail. In order to help understand how libtissue is ac-tually used, its application to a real-world intrusion detectionproblem was presented. An example algorithm implementedwith libtissue was then introduced, and aspects of itsdynamics evaluated and discussed. The paper now concludeswith a brief summary and discussion of future work.While simplified, the examples presented above validatethe libtissue implementation in several ways. They showthat it meets the goals it set out to achieve in terms of im- seconds
10 20 30 40 50 60 r e s pon s e r a t e Fig. 5. The mean response rates of the twocell algorithm with and withouta signal for 20 runs on the success2 dataset. plementation, evaluation and application of AIS algorithms.More generally, they show the feasibility of using AISsimplemented as multiagent systems to address real-worldproblems. Additionally, it is the authors’ experience thatsimple algorithms such as twocell are a necessary stepin developing more complex algorithms. Such algorithmsare being developed by the authors and other researchersusing libtissue and future papers will report the resultsof this research. The sourcecode of libtissue is distributedunder a GPL licence and available, along with the datasets,clients and example algorithm used in this paper, from thefirst author’s website.A
CKNOWLEDGMENTS
This research is supported by the EPSRC (GR/S47809/01).R
EFERENCES[1] P. Bentley, J. Greensmith, and S. Ujjin, “Two ways to grow tissue forartificial immune systems,” in . Banff, Canada: LNCS 3627, 2005, pp. 139–152.[2] J. Twycross and U. Aickelin, “Towards a conceptual framework forinnate immunity,” in .Banff, Canada: LNCS 3627, 2005, pp. 112–125.[3] R. N. Germain, “An innately interesting decade of research in immunol-ogy,”
Nature Medicine , vol. 10, no. 12, pp. 1307–1320, 2004.[4] N. Provos, “Improving host security with system call policies,” in
Proc.of the 12th USENIX Security Symposium strace ∼ wichert/strace/.[7] C. Kruegel, D. Mutz, F. Valeur, and G. Vigna, “On the detectionof anomalous system call arguments,” in Proc. of the 8th EuropeanSymposium on Research in Computer Security (ESORICS ’03) , Gjovik,Norway, October 2003, pp. 326–343.[8] D. Gao, M. K. Reiter, and D. Song, “On gray-box program tracking foranomaly detection,” in