From Network Traces to System Responses: Opaquely Emulating Software Services
Miao Du, Steve Versteeg, Jean-Guy Schneider, John Grundy, Jun Han
FFrom Network Traces to System Responses:Opaquely Emulating Software Services
Miao Du ∗ , Steve Versteeg † , Jean-Guy Schneider ∗ , John Grundy ∗ , Jun Han ∗∗ School of Software and Electrical EngineeringSwinburne University of Technology, Hawthorn, Victoria 3122, AustraliaEmail: { miaodu,jschneider,jgrundy,jhan } @swin.edu.au † CA Technologies, Melbourne, Victoria 3004, AustraliaEmail: [email protected]
Abstract —Enterprise software systems make complex interac-tions with other services in their environment. Developing andtesting for production-like conditions is therefore a challengingtask. Prior approaches include emulations of the dependencyservices using either explicit modelling or record-and-replayapproaches. Models require deep knowledge of the target serviceswhile record-and-replay is limited in accuracy. We present anew technique that improves the accuracy of record-and-replayapproaches, without requiring prior knowledge of the services.The approach uses multiple sequence alignment to derive messageprototypes from recorded system interactions and a scheme tomatch incoming request messages against message prototypes togenerate response messages. We introduce a modified Needleman-Wunsch algorithm for distance calculation during messagematching, wildcards in message prototypes for high variabilitysections, and entropy-based weightings in distance calculationsfor increased accuracy. Combined, our new approach has showngreater than 99% accuracy for four evaluated enterprise systemmessaging protocols.
I. I
NTRODUCTION
Software systems are becoming ever increasing inter-connected. Service emulation supports developing and testinga software system, independent of the other systems on whichit depends [1]. In a typical deployment scenario, an enterprisesystem might interact with many other systems, such as amainframe, directory servers, databases and other types ofsoftware services. In order to check whether the enterprisesystem will function correctly in terms of its interactionswith these services, it is necessary to test it in as realistican environment as possible, prior to the actual deployment.Getting access to the actual production environment for testingis not possible due to the risk of disruption. Large organ-isations often have a test environment, which is a closereplication of their production environment, but this is very ex-pensive. Furthermore the test environment is in high demand,so software developers will have only limited access to it.Enabling developers to have continuous access to production-like conditions to test their application is an important part ofDevelopment-Operations (known as ‘DevOps’) [2][3].One popular approach that developers use to test theirapplication’s dependence on other systems is to install theother systems on virtual machines (such as VMware) [4].However virtual machines are time consuming to configureand maintain. Furthermore the configuration of the systems running on the virtual machine is likely to be different to theproduction environment. An alternative that is gaining tractionis service emulation, where models of services are emulated -sometimes into the many thousands of service instances - toprovide more realistic scale and less complicated configuration[5]. However, existing approaches to service emulation rely onsystem experts explicitly modelling the target services and thisrequires very detailed knowledge of message protocols andstructure. This is often infeasible if the required knowledge isunavailable for the wide range of services, especially legacyservices, in a real deployment environment [6], and is besidesvery time consuming and error-prone.Our aim is to develop an automated approach to serviceemulation which uses no explicit knowledge of the services,their message protocols and structures and yet can simulate –to a high degree of accuracy and scale – a realistic enterprisesystem deployment environment. To achieve this we need anautomated, accurate, efficient and robust method for serviceemulation that derives service responses from collected actualnetwork traces between an enterprise system and a service –the focus of this paper.Our previous work [7] relied on searching the collectedmessage traces to automatically generate service responses.Unfortunately this was limited to text based protocols andbecame inefficient for large message traces. The method weemployed was sensitive to mismatching by payload versusoperation type (i.e. was not robust.) Our subsequent work [8]used clustering to improve efficiency but decreased accuracy.The aim of the research described in this paper is to achieveboth higher accuracy and efficiency through the use of de-scriptive and robust message prototypes which can be appliedto both binary and text protocols. Our approach uses: • a multiple sequence alignment algorithm to derive mes-sage prototypes, • wildcards in message prototypes for the sections withhigh variability, • a modified Needleman-Wunsch algorithm for messagedistance calculations, and • entropy-based weightings in distance calculations forincreased accuracy.We present a set of experiments with four enterprise system a r X i v : . [ c s . S E ] A p r essaging protocols, including binary and textual LDAP, abinary Mainframe protocol, a textual JSON (Twitter) andSOAP services. These experiments show overall a greater than accuracy in the generated response messages for the fourprotocols tested. Additionally they show efficient performancein generating the emulated service responses, enabling scalingwithin an emulated deployment environment.II. M OTIVATION AND R ELATED W ORK
Consider an example enterprise software system which is tobe integrated with other systems in its production environmentas part of its operation. The other systems (called services)include a legacy mainframe program, a directory server anda web service. The behaviour of the example enterprise soft-ware system depends on the responses it receives from thesedependency services. The directory server uses a proprietaryprotocol which is poorly documented. It has operation typesadd, search, modify and delete, each with various payloadsand response messages. The operation type is encoded in asingle character. A small sample set of request and responsemessages (a transaction library ) for the directory service isshown in Table I. Such message traces are captured by eithernetwork monitoring tools, such as Wireshark [9], or proxies.In the following sections we use this sample transaction libraryto illustrate how our method works. This transaction libraryis from a fictional protocol that has some similarities to thewidely used LDAP protocol [10], but is simplified to makeour running example easier to follow.The enterprise system under test needs to interact with thisservice, to search for user identities, register new identities, de-register identities and so on. Other systems in the environmentshare the same service. For complex deployment environmentswe want to extensively test the enterprise system as to itsprotocol conformance, robustness and scaling.The most common approach to providing a testing environ-ment for enterprise systems using such a messaging protocol isto use virtual machines [11]. Implementations of the servicesare deployed on virtual machines and communicated with bythe system under test. Major challenges with this approachinclude configuration complexity [12] and the need to maintaininstances of each and every service type in multiple configu-rations [13]. Recently, cloud-based testing environments haveemerged to mitigate some of these issues [14].Emulated testing environments for enterprise systems, re-lying on service models, are another approach. When sentmessages by the enterprise system under test, these respondwith approximations of “real” service response messages [15].Kaluta [16] is proposed to provision emulated testing environ-ments. Challenges with these approaches include developingthe models, lack of precision in the models, especially forcomplex protocols, and ensuring robustness of the modelsunder diverse loading conditions [17]. To assist developingreusable service models, approaches either reverse engineermessage structures [18][19][20], or discover [21][22] pro-cesses for building behavioral models [23]. While these allowengineers to develop more precise models, none of them can automate interaction between enterprise system under test andthe emulated dependency services.Recording and replaying message traces is an alternativeapproach. This involves recording request messages sent bythe enterprise system under test to real services and responsemessages from these services, and then using these mes-sage traces to ‘mimic’ the real service response messages inthe emulation environment [24]. Some approaches combinerecord-and-replay with reverse-engineered service models. CAService Virtualization [25] is a commercial software tool,which can emulate the behaviour of services. The tool assumesknowledge of protocol message structures to model servicesand mimic interactions automatically.III. A
PPROACH network {id:?,op:S,sn:????}{id:?,op:A,sn:???,gn:??}{id:?,op:D,sn:????}{id:?,op:U,sn:???,gn:??}
Operation TypeClusters ConsensusPrototypes
Offline Processing
MatchingFunction
Runtime Emulator ne t w o r k System Under Test r e c o r d i ng Request insim
Response {id:2,op:S... {id:2,op:SearchRs{id:1,op:A... {id:1,op:AddRsp...{id:3,op:U... {id:3,op:UpdateRs{id:5,op:D... {id:5,op:DeleteRs..
MessageTraces {id:6,op:S... {id:6,op:SearchRs{id:4,op:U... {id:4,op:UpdateRs {id:1,op:A... {id:1,op:AddRsp...{id:1,op:A... {id:1,op:AddRsp...{id:1,op:A... {id:1,op:AddRsp...{id:6,op:S... {id:6,op:SearchRs{id:6,op:S... {id:6,op:SearchRs{id:6,op:S... {id:6,op:SearchRs{id:3,op:U... {id:3,op:UpdateRs{id:3,op:U... {id:3,op:UpdateRs{id:3,op:U... {id:3,op:UpdateRs{id:5,op:D... {id:5,op:DeleteRs..{id:5,op:D... {id:5,op:DeleteRs..{id:5,op:D... {id:5,op:DeleteRs..
Transformation
Fig. 1: System overviewOur goal is to produce an emulation environment forenterprise system testing that uses message trace recordingscollected a priori to produce a response on behalf of a servicewhen invoked by a system-under-test at runtime.The general approach is to cluster the trace recordingsinto groups of similar messages and then formulate a sin-gle representation for the request messages of each cluster.This accelerates runtime performance by enabling incomingrequests from the system under test to be compared only tothe cluster representations, rather than the entire transactionlibrary. Previous work [8] selected the cluster centroid requestas the representative. However accuracy decreased as the in-formation from the other requests in the cluster was discarded.Our current aim is to improve accuracy, but preserve efficiency,by generating prototypes which capture the common featuresof the range of requests in each cluster. The centroid is the transaction with the minimised total distance from theother transactions in the cluster.ndex Request Response1 { id:1,op:S,sn:Du } { id:1,op:SearchRsp,result:Ok,gn:Miao,sn:Du,mobile:5362634 } { id:13,op:S,sn:Versteeg } { id:13,op:SearchRsp,result:Ok,gn:Steve,sn:Versteeg,mobile:9374723 } { id:24,op:A,sn:Schneider,mobile:123456 } { id:24,op:AddRsp,result:Ok } { id:275,op:S,sn:Han } { id:275,op:SearchRsp,result:Ok,gn:Jun,sn:Han,mobile:33333333 } { id:490,op:S,sn:Grundy } { id:490,op:SearchRsp,result:Ok,gn:John,sn:Grundy,mobile:44444444 } { id:2273,op:S,sn:Schneider } { id:2273,op:SearchRsp,result:Ok,sn:Schneider,mobile:123456 } { id:2487,op:A,sn:Will } { id:2487,op:AddRsp,result:Ok } { id:3106,op:A,sn:Hine,gn:Cam,Postcode:33589 } { id:3106,op:AddRsp,result:Ok } TABLE I: Directory Service Interaction Library ExampleThe approach for generating and using our prototypes(depicted in figure 1) has the following steps:1) We collect message traces of communications betweena client and the real target service and store them in atransaction library with a transaction being a request-response pair.2) We then cluster the transaction library, with the goalof grouping transactions by operation type. We do notconsider the state of service under which the requestsare issued. (This may give lower accuracy but is stilluseful in many cases as discussed in Section V.)3) We then derive a request consensus prototype (a messagepattern) for each operation type cluster by:a) Aligning all request messages of each cluster toreveal their common features for each operationtype by inserting gap characters;b) Extracting a consensus character sequence from thealigned request messages (i.e., the request messageprototype) by selecting or deleting a character ateach character position;c) Calculating positional weightings to prioritise dif-ferent sections of each request consensus proto-type.4) At runtime, for an incoming request message, we use a matching distance calculation technique to select thenearest matching request consensus prototype.5) We then perform dynamic substitutions on a specificallychosen response message from the identified operationtype cluster to generate a final response message to besent back to the enterprise system under test.Steps 1 to 3 are performed offline to prepare requestconsensus prototypes. Steps 4 and 5 are performed at runtimeto generate live responses to the system under test messages.
A. Needleman-Wunsch
The Needleman-Wunsch algorithm [26] is used at differentsteps during the offline and runtime processing. It is a dynamicprogramming algorithm which finds the globally optimalalignment for two sequences of symbols in O ( M N ) time,where M and N are the lengths of the sequences. Needleman-Wunsch uses a progressive scoring function S , which gives anincremental score for each pair of symbols in the alignment.A different score will be given depending on whether thesymbols are identical, different or a gap has been inserted. B. Collecting and Clustering Interaction Messages
The first step is to record real message exchanges betweena client (such as a previous version of the system under test)and the service we aim to emulate. A transaction is a request-response pair of the communications, where the service mayrespond with zero or more response messages to a requestmessage from the client. We record requests and responses atthe network level (using a tool such as Wireshark), recordingthe bytes in TCP packet payloads. This makes no assumptionsabout the message format of the service application.Having recorded our transaction library, the next step is togroup the transactions by operation type, but again withoutassuming any knowledge of the message format. To achievethis we use a distance function-based clustering technique. Inour previous work [8] we considered multiple cluster distancefunctions and found that the response similarity (as measuredby edit distance, calculated using the Needleman-Wunschalgorithm [26]) was the most effective method for groupingtransactions of the same operation type. We therefore use thesame technique in this work, grouping transactions by thesimilarity of their response messages. The clustering algorithmused was VAT [27], one among many alternatives, as we foundit to be effective in our previous work [8].Applying our clustering technique to the example transac-tion library yields two clusters, as shown in tables II andIII, corresponding to search operations and add operations,respectively.
C. Aligning Request Messages
To formulate a representative cluster prototype, we firstalign the request messages in a cluster to determine the com-mon features of these request messages. Then we extract thecommon features while accommodating variations. In aligningthe request messages of a cluster, we adopt the multiplesequence alignment (MSA) technique originated from Biology.MSA was first used to align three or more biological sequencesto reveal their structural commonalities [28] . Using MSA forrevealing commonalities of interaction messages offers a num-ber of advantages over other techniques, such as n -grams [31].Firstly, it is able to effectively handle messaging protocols withsingle-byte operation fields (e.g. in the binary LDAP protocol)as well as multi-byte operation fields. Secondly, it does not For this reason this technique has also been widely used to reverse-engineer protocol message structures [29], [30].ndex Request Response1 { id:1,op:S,sn:Du } { id:1,op:SearchRsp,result:Ok,gn:Miao,sn:Du,mobile:5362634 } { id:13,op:S,sn:Versteeg } { id:13,op:SearchRsp,result:Ok,gn:Steve,sn:Versteeg,mobile:9374723 } { id:275,op:S,sn:Han } { id:275,op:SearchRsp,result:Ok,gn:Jun,sn:Han,mobile:33333333 } { id:490,op:S,sn:Grundy } { id:490,op:SearchRsp,result:Ok,gn:John,sn:Grundy,mobile:44444444 } { id:2273,op:S,sn:Schneider } { id:2273,op:SearchRsp,result:Ok,sn:Schneider,mobile:123456 } TABLE II: Cluster 1 (search operations)
Index Request Response24 { id:24,op:A,sn:Schneider,mobile:123456 } { id:24,op:AddRsp,result:Ok } { id:2487,op:A,sn:Will } { id:2487,op:AddRsp,result:Ok } { id:1106,op:A,sn:Hine,gn:Cam,postalCode:33589 } { Id:1106,Msg:AddRsp,result:Ok } TABLE III: Cluster 2 (add operations)require a predetermined matching sequence width (e.g. , n in n -gram).In particular, we adopt ClustalW [32], a widely used heuris-tic technique for MSA. It is memory efficient and is shown toproduce high accuracy alignments in polynomial computationtime for empirical datasets (in contrast to the original NP-complete MSA technique [33]). ClustalW is a progressiveMSA algorithm, where pairwise sequence alignment resultsare iteratively integrated into the multiple sequence alignmentresult. An overview of the ClustalW algorithm is as follows:1) All N ( N − / pairs of sequences are aligned tocalculate their similarity ratio by using the Needleman-Wunsch algorithm, where N is the number of sequences.2) A N × N distance matrix is built for capturing distancecalculation.3) A guide tree is constructed from the distance matrix byapplying a neighbour-joining clustering algorithm. (Aguide tree is a tree data structure which organises thesimilarities between sequences.)4) The guide tree is used to guide a progressive alignmentof sequences from the leaves to the root of the tree.Figure 2 shows the multiple sequence alignment results ofapplying the ClustalW algorithm to the example clustersfrom tables II and III. The MSA results are known as profiles .Gaps which were inserted during the alignment process aredenoted with the ‘ (cid:63) ’ symbol. Note that the common sequencesfor the requests in each cluster have now been aligned.
D. Formulating the Request Consensus Prototype
Having derived the MSA profile for the request messagesof each cluster, the next step is to extract the common featuresfrom the MSA profile into a single character sequence, whichwe call the request consensus prototype , to facilitate efficientruntime comparison with an incoming request message fromthe system under test.From all the aligned request messages in a cluster, we derivea byte (or character) occurrence count table. Figure 3 graphi-cally depicts byte frequencies at each position for the examplealignment in Figure 2. Each column represents a position inthe alignment result. The frequencies of the different bytes { id: (cid:63)(cid:63) (cid:63) ,op:S,sn: (cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63) Du }{ id: (cid:63)(cid:63) (cid:63) Versteeg }{ id:2273,op:S,sn:Schneider }{ id:275 (cid:63) ,op:S,sn: (cid:63) Han (cid:63)(cid:63)(cid:63)(cid:63)(cid:63) }{ id:490 (cid:63) ,op:S,sn:Grundy (cid:63)(cid:63)(cid:63) } (a) Cluster 1 alignment { id:24 (cid:63)(cid:63) ,op:A,sn:Schne (cid:63)(cid:63)(cid:63)(cid:63)(cid:63) ider (cid:63)(cid:63) ,mo (cid:63) bil (cid:63)(cid:63)(cid:63) e:123456 }{ id:2487,op:A,sn:W (cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63) il (cid:63)(cid:63)(cid:63) l (cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63)(cid:63) }{ id:3106,op:A,sn:Hi (cid:63) ne,gn:Cameron,postalCode:3 (cid:63) } (b) Cluster 2 alignment Fig. 2: MSA results of the requests in tables II and IIIwhich occur at each position are displayed as a stacked bargraph.Based on the byte occurrence table, we formulate the request consensus prototype by extending the concept of a consensus sequence [34] commonly used in summarising aMSA profile. A consensus sequence can be viewed as asequence of consensus symbols , where the consensus symbol c i is the most commonly occurring byte at the position i . In ourextension, a request consensus prototype, p , is calculated byiterating each byte position of the MSA profile, to calculate a prototype symbol , p i , at each position, according to equation 1. p i = c i if q ( c i ) ≥ f ∧ c i (cid:54) = (cid:63) ⊥ if q ( c i ) ≥ ∧ c i = (cid:63) ? otherwise (1)Where q ( c i ) denotes the relative frequency at position i ofthe consensus symbol c i , f is the relative frequency threshold,‘ (cid:63) ’ denotes a gap, ‘ ? ’ is the ‘wildcard’ symbol and ‘ ⊥ ’represents a truncation. After calculating the prototype symbolfor each position, any truncation symbols are then deleted fromthe request consensus prototype.Introducing wildcards and truncations into the messageprototypes allows us to distinguish between gaps and wherethere is no consensus. If the relative frequency q ( c i ) is at orabove the threshold f , then we insert the consensus symbol F r e qu e n c y Position { d : ⋆ ⋆ i G o : s n S i s
4 2 7 9 7 , : S , r r r e e e H a c V u u h n g y t d d e D } ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆ P Fig. 3: Character Frequencies in the Alignment Result of Search Requests.into our prototype (unless the consensus symbol is a gap.)If the relative frequency is below threshold, then we insert awildcard. If the consensus symbol is a gap and it is in themajority, then we leave that position as empty (i.e. deleted).Wildcards allow us to encode where there are high variabilitysections of the message. In our experimentation, withouttruncations consensus sequences became artificially long asthere tended to be many gaps. By truncating the gaps, thelengths of the prototypes become similar to the typical lengthsof messages in the cluster. The consensus prototype for acluster of request messages can differentiate stable positionsfrom variant positions. Moreover, it can identify consensussymbols that can be utilised for matching.Applying our request consensus prototype method, using afrequency threshold f = 0 . , to the example clusters in tablesII and III yields the following results: Consensus prototype for the search request cluster: { id: ??? ,op:S,sn: ??????? } Consensus prototype for the add request cluster: { id: ???? ,op:A,sn: ?????????????? l ??????? } (Note that the add prototype contains an ‘l’ from coinciden-tally aligning ‘l’s from ‘mobile’, ‘Will’ and ‘postalCode’.) E. Deriving Entropy-Based Positional Weightings
The final step of the offline analysis is to calculate weight-ings for each byte position in the consensus prototype tosupport the runtime distance matching process for requestmessages. In our service emulator, generating a responsemessage of the correct operation type is more critical thanthe contents of the message payload. Thus we give a higherweighting to the sections of the message that likely relate to theoperation type. To do this we make use of the observation thatstructure information (such as operation type) is more stablethan payload information. We use entropy as a measurementof variability, and use it as the basis to calculate a weightingfor each byte position of the consensus prototype.Using the MSA profile for a cluster (from section III-C),we calculate the entropy for each column using the ShannonIndex [35], as given by equation 2. E i = − R (cid:88) j =1 q ij log q ij (2)Where E i is the Shannon Index for the i th column, q ij isthe relative frequency of the j th character in the character set at column i and R is the total number of characters in thecharacter set.Since we wish to give a high weighting to stable parts of themessage, for each column we invert the entropy by applyinga scaling function of the form given in equation 3 w i = 1(1 + bE i ) c (3)Where w i is the weighting for the i th column, E i is theentropy of the i th column, and b and c are positive constants.The higher the values of b and c , the more higher entropycolumns are deweighted. In our experiments we found thebest results were obtained with b = 1 and c = 10 . This allowsstructural information to strongly dominate in the matchingprocess. Payload similarity is only used as a ‘tie-breaker’.Columns that correspond to gaps removed from the con-sensus prototype are also dropped from the weightings array.Table IV gives an example weightings array for the searchrequest consensus prototype. F. Entropy Weighted Runtime Matching for Request Messages
At runtime, the formulated request consensus prototypesare used to match incoming requests from the system undertest. We extend the Needleman-Wunsch algorithm to calcu-late the matching distance between an incoming request andthe request consensus prototype for each operation type (orcluster). We modify the Needleman-Wunsch scoring function, S (see equation 4), by giving a special score for alignmentswith wildcard characters and multiplying all scores by theircorresponding positional entropy weights (from equation 3). S ( p i , r j ) = w i M if p i = r j ∧ p i (cid:54) = ? w i D if p i (cid:54) = r j ∧ p i (cid:54) = ? w i X if p i = ? (4)Where p i is the i th character in the consensus prototype, r j is the j th character in the incoming request, w i is theweighting for the i th column, M and D are constants denotingthe Needleman-Wunsch identical score and difference penalty,respectively, and X is the wildcard matching constant. In ourexperiments we used values M = 1 , D = − and X = 0 .Using the modified scoring equation 4, we applyNeedleman-Wunsch to align the consensus prototype with anincoming request, calculating an absolute alignment score, s .The relative distance, denoted as d rel , is calculated fromthe absolute alignment score to normalise for consensus proto-types of different lengths, different entropy weights and differ- { i d : ? ? ? , o p : S , s n : ? ? ? ? ? ? ? } E w TABLE IV: Example weightings for search consensus prototype calculated from the MSA profile in Figure 3 (using equation 3constants b = 1 , c = 10 .)ent numbers of wildcards. The relative distance is in the range0 to 1, inclusive, where 0 signifies the best possible match withthe consensus, and 1 represents the furthest possible distance.It is calculated according to equation 5: d rel ( p , r ) = 1 − s ( p , r ) − s min ( p ) s max ( p ) (5)Where s max is the maximum possible alignment score for thegiven consensus prototype and s min is the minimum possiblealignment score for the given consensus prototype. These arecalculated in equations 6 and 7, respectively. s max ( p ) = | p | (cid:88) i =1 S ( p i , p i ) (6) s min ( p ) = | p | (cid:88) i =1 S ( p i , ∅ ) (7)Where ∅ is a special symbol, different to all of the charactersin the consensus prototype.The consensus prototype that gives the least distance tothe incoming message is selected as the matching prototype,therefore identifying the matching transaction cluster.As an example, suppose we receive an incoming add requestwith the byte sequence { id:37,op:A,sn:Durand } . Aligningthe request against the search consensus prototype yields: request: { id:3*7,op:A,sn:*Durand } prototype: { id:???,op:S,sn:??????? } Using equation 5, the weighted relative distance is calculatedto be 0.0715. In comparison, the add consensus prototypeproduces the alignment below with a relative distance of 0.068. request: { id:37**,op:A,sn:********D*u*r*an*d*** } prototype: { id:????,op:A,sn:?????????????l??????? } Consequently, the add prototype is the closest matching,causing the add cluster to be selected.
G. Response Generation through Dynamic Substitution
The final step in our approach is to send a customised re-sponse for the incoming request by performing some dynamicsubstitutions on a response message from the matched cluster.Here we use the symmetric field technique described in [7],where character sequences which occur in both the request andresponse messages of the chosen transaction are substitutedwith the corresponding characters from the live request in thegenerated response. We take the response from the centroidtransaction [8] of the selected cluster and apply the symmetricfield substitution.In our example, the centroid transaction from the add clusteris given below. There is one symmetric field (boxed). request: { id:24,op:A ,sn:Schneider,mobile:123456 } response: { id:24,op:A ddRsp,result:Ok } After performing the symmetric field substitution the finalgenerated response is: { id:37,op:AddRsp,result:Ok } H. Implementation
We implemented our proposed system using Java, followingthe steps in the previous sections, using the architecture ofFigure 1. Due to commercial agreements we are unable torelease the source code of our implementation.IV. E
VALUATION
We wanted to assess three key characteristics of our ap-proach: accuracy , efficiency and robustness , aiming to answerthe following research questions:1) RQ1 ( Accuracy ): Having request consensus proto-types formulated at the pre-processing stage, is ourapproach able to generate accurate, protocol-conformantresponses? At runtime, can positional weighting basedmatching further improve the accuracy of our approachfor generating such responses?2)
RQ2 ( Efficiency ): Is our technique efficient enough togenerate timely responses, even for large transactionlibraries?3)
RQ3 ( Robustness ): Given non-homogeneous clusters,which contain some messages of operation types differ-ent to the majority of messages, is our technique robustenough to generate accurate responses?In order to answer these questions, we applied our techniqueon six message trace datasets of four case study protocols. Weran the technique on these datasets and assessed its accuracy,efficiency and robustness for each. We then compared theseresults to our previous techniques.
A. Case Study Protocols and Traces
Protocol Binary/Text Fields
TABLE V: Experiment message trace datasets (availableat [36])We applied our techniques to four real-world protocols,
IMS [37] (a binary mainframe protocol),
LDAP [10] (a binarydirectory service protocol),
SOAP [38] (a textual protocol, withan Enterprise Resource Planning (ERP) system messagingystem services), and
RESTful Twitter [39] (a JSON protocolfor the Twitter social media service).We have used one message trace dataset for each of theseprotocols. In addition,
LDAP has two extra datasets: a datasetwith textual representation converted from the binary dataset(denoted by
LDAP text (1) ), and another textual dataset thatwas used in our prior work [8] (
LDAP text (2) ).We chose these protocols because: (i) they are widely usedin enterprise environments; (ii) some of them (LDAP text(2) and SOAP) were used in the evaluation of our priorwork [7], [8]; (iii) they represent a good mix of text-basedprotocols (SOAP and RESTful Twitter) and binary protocols(IMS and LDAP); (iv) they use either fixed length, lengthencoding or delimiters to structure protocol messages; and(v) each of them includes a diverse number of operation types,as indicated by the Ops column. The number of request-response interactions for each test case is shown as column in Table V.Our message trace datasets are available for download fromour website [36].
B. 10-fold Cross-Validation Approach and Evaluation CriteriaCross-validation [40] is a popular model validation methodfor assessing how accurately a predictive model will performin practice. For the purpose of our evaluation, we applied thecommonly used 10-fold cross-validation approach [41] to allthe six case study datasets.We randomly partitioned each of the original interactiondatasets into 10 groups. Of these 10 groups, one group isconsidered to be the evaluation group for testing our approach,and the remaining 9 groups constitute the training set . Thisprocess is then repeated 10 times (the same as the numberof groups), so that each of the 10 groups will be used as theevaluation group once.When running each experiment with each trace dataset,we applied our approach to each request message in the evaluation group , referred to as the incoming request , togenerate an emulated response . We recorded the time thatour approach took to generate the response for each incomingrequest. This was used to evaluate the runtime efficiency of ourconsensus+weighting approach compared to other approaches.Having generated a response for each incoming request,we utilised five criteria to determine its accuracy, therebyevaluating the ability of our approach to generate protocol-conformant responses. These five criteria are explained asfollows using an example shown in Table VI. Consider the in-coming request { id:37,op:A,sn:Durand } with the associatedresponse { id:37,op:AddRsp,result:Ok } in the transactionlibrary. The emulated response is considered to be:1) identical if its character sequence is identical to therecorded (or expected) response (cf. Example (i) inTable VI); Given a protocol message, length fields or delimiters are used to convertits structure into a sequence of bytes that can be transmitted over the wire.Specifically, a length field is a number of bytes that show the length of anotherfield, while a delimiter is a byte (or byte sequence) with a known value thatindicates the end of a field. (i) Expected { id:37,op:AddRsp,result:Ok } Generated { id:37,op:AddRsp,result:Ok } (ii) Expected { id: ,op:AddRsp,result: Ok } Generated { id: ,op:AddRsp,result: AlreadyExists } (iii) Expected { id:37,op:AddRsp,result:Ok } Generated { id: ,op:AddRsp,result: AlreadyExists } (iv) Expected { id:37,op: AddRsp ,result:Ok } Generated { id:15,op: SearchRsp ,result:Ok,gn:Miao,sn:Du } (v) Expected { id:37,op: AddRsp ,result:Ok } Generated { id:15,op: AearchRsp ,result:Ok,gn:Miao,sn:Du } TABLE VI: Examples for accuracy criteria: (i) Identical, (ii)Consistent, (iii) Protocol conformant, (iv) Well-formed, (v)Malformed.2) consistent if it is of the expected operation type and hasthe critical fields in the payload replicated (cf. Example(ii) in Table VI where id is identical, but some of theother payload differs);3) protocol conformant if its operation type correspondsto the expected response, but it differs in some payloadinformation (cf. Example (iii) in Table VI where boththe id and result tags differ);4) well-formed if it is structured correctly (that is, itcorresponds to one of the valid response messages),but has the wrong operation type (cf. Example (iv) inTable VI where the generated response is of a validstructure, but its operation type op:SearchRsp does notmatch the expected operation type op:AddRsp ); and5) malformed if it does not meet any of the above criteria(cf. Example (v) in Table VI where operation type op:AearchRsp in the generated response is invalid).We further consider a generated response to be valid if itmeets one of the first three criteria, that is, identical , consistent or protocol conformant . Otherwise, a generated response isconsidered to be invalid . C. Compared Techniques
To our knowledge there is no other work that generatesapplication layer responses directly from trace messages. Ourcomparison is therefore with our prior work, including the
Whole Library [7] and the
Cluster Centroid [8] approaches.Given an incoming request, the
Whole Library approachsearches the entire transaction library for its closest matchingrequest to synthesize its response(s). This approach is effectivein producing accurate responses. Experimental results revealedthat more than 90% of generated responses were valid, asdefined by the criteria in Section IV-B. However, it is generallytoo slow for real-time use.The
Cluster Centroid approach reduces the number ofsearches to the number of transaction library clusters. There-fore, it can generate responses for real time use, but with lessaccuracy. We use these two approaches as baselines to evaluateour new technique.
D. Evaluation Results1)
Accuracy (RQ1) : The accuracy evaluation is conductedto assess the capability of our approach for generating accurate rotocol Method Accuracy Ratio No. Valid InvalidIdentical Consistent Conformant Well-formed MalformedIMS (binary) Whole Library 75.25% 800 400 202 0 198 0Cluster Centroid 97.88% 400 383 0 17 0ConsensusOnly f=0.5
400 400 0 0 0f=0.8
400 400 0 0 0f=1
400 400 0 0 0Consensus+Weighting f=0.5
400 400 0 0 0f=0.8
400 400 0 0 0f=1
400 400 0 0 0LDAP (binary) Whole Library 94.12% 2177 248 17 1784 36 92Cluster Centroid 91.59% 263 17 1714 183 0ConsensusOnly f=0.5
268 14 1628 267 0f=0.8
264 14 1565 334 0f=1
259 14 1468 436 0Consensus+Weighting f=0.5
278 18 1853 28 0f=0.8
278 18 1880 1 0f=1
267 16 1609 285 0LDAP text (1)(text) Whole Library 100% 2177 1648 415 114 0 0Cluster Centroid 100% 811 1325 41 0 0ConsensusOnly f=0.5
808 192 0 0 0f=0.8
808 192 0 0 0f=1
808 192 0 0 0Consensus+Weighting f=0.5
808 192 0 0 0f=0.8
808 192 0 0 0f=1
808 192 0 0 0SOAP (text) Whole Library 100% 1000 77 923 0 0 0Cluster Centroid 100% 98 902 0 0 0ConsensusOnly f=0.5
96 904 0 0 0f=0.8
96 904 0 0 0f=1
96 904 0 0 0Consensus+Weighting f=0.5
96 904 0 0 0f=0.8
96 904 0 0 0f=1
96 904 0 0 0Twitter (REST)(text) Whole Library
TABLE VII: Evaluation Results of Applying Consensus Sequence Prototyperesponses. As discussed in Section III, our approach has twoimportant features aimed at enhancing response accuracy, the request consensus prototype combined with entropy-weighteddistance calculation at runtime. To measure the impact of thesetechniques we ran two separate sets of experiments, whichare referred to as
Consensus Only and
Consensus+Weighting ,respectively. In addition, for both sets of experiments we testedfor the best pre-defined frequency threshold f , trying threedifferent values. Table VII summarises the evaluation results of
ConsensusOnly , Consensus+Weighting , and our prior work (i.e.
WholeLibrary and
Cluster Centroid ) experiments for the six testdatasets. The
Accuracy Ratio column is calculated by di-viding the number of valid generated responses by the totalnumber of interactions tested. The last five columns give a more detailed breakdown of the different categories of validand invalid responses generated.Table VII shows that the combined
Consensus+Weighting approach achieves the highest accuracy overall for the datasetstested. The combined approach achieves 100% accuracy forfour of the datasets, and 99.95% and 99.34% for the remainingtwo (LDAP binary and Twitter, respectively). Twitter is theonly case where the
Whole Library approach is marginallybetter.With respect to the impact for the frequency threshold f , theresults show that allowing some tolerance (i.e. f < ) in the As illustrated in Equation 1, a pre-defined frequency threshold is requiredto calculate the consensus sequence prototype. In bioinformatics, a number ofinvestigations have been done for identifying the best threshold [34]. In ourexperiments, we selected the most popular 3 values, that is, . , . and .o. WholeLibrary ClusterCentroid Consensus+Weighting RealSystemIMS 800 470.99 4.94 3.24 518LDAP 2177 835.91 2.77 2.88 28LDAPtext(1) 2177 1434.70 5.69 7.30 28LDAPtext(2) 1000 266.30 2.44 1.63 28SOAP 1000 380.24 2.97 3.35 65Twitter 1825 464.09 32.86 36.62 417(a) Average Total Response Generation Time (ms) No. Whole Library Cluster Centroid Consensus+WM S M S M SIMS 800 460.78 10.2 3.67 1.27 2.68 0.56LDAP 2177 828.95 6.96 2.38 0.39 2.60 0.28LDAPtext(1) 2177 1425.23 9.47 4.38 1.31 5.67 1.63LDAPtext(2) 1000 257.92 8.38 1.35 1.09 1.14 0.49SOAP 1000 372.58 7.66 1.92 1.05 2.45 0.9Twitter 1825 412.98 51.11 1.47 31.39 1.67 34.95 (b) Average Matching Time and Average Substitution Time. M represents thematching time, and S represents the substitution time. (ms)
TABLE VIII: Approach Efficiency Evaluation Resultsmultiple sequence alignment can yield better results. For theLDAP (binary) dataset the thresholds of f = 0 . and f = 0 . produced significantly higher accuracy than f = 1 . For theother datasets the threshold had no impact on the accuracy.The general conclusion appears that the results are not verysensitive to the value of the threshold for most scenarios.An interesting result is that the Consensus+Weighting ap-proach has a higher accuracy than the
Whole Library approach,even though the latter uses all the available data points fromthe trace library (for three datasets
Consensus+Weighting issignificantly more accurate, for two it has the same accuracy,for one it is slightly lower). The reason for the higher accuracyis that the
Consensus+Weighting abstracts away the messagepayload information sections (using wildcards), so is lesssusceptible to matching a request to the wrong operation typebut with the right payload information, whereas the
WholeLibrary approach is susceptible to this type of error (notethe well-formed but invalid responses for the
Whole Library approach in Table VII).The impact of the entropy weightings can only be observedfor the LDAP binary dataset. For this test, the weightingssignificantly improve the accuracy results. For the otherdatasets, no impact from the weightings can be observed, asthe consensus sequence prototype by itself (
Consensus Only )already produces 99-100% accuracy. Efficiency (RQ2) : Table VIIIa compares the average re-sponse generation time of the
Consensus+Weighting approachwith the
Whole Library and
Cluster Centroid approaches. Testswere run on an Intel Xeon E5440 2.83GHz CPUs with 24GBof main memory available. The times represent the averageresponse generation time across all of the requests in thedatasets. Table VIIIa also lists the average response times ofthe real services from which the original traces were recorded. In order to get a better insight of the runtime performanceof our approach, we separately measured matching time andsubstitution time, results of which are presented in Table VIIIb.The results show that the
Consensus+Weighting approachis very efficient at generating responses, indeed much fasterthan the real services being emulated. The response generationtime is comparable to the
Cluster Centroid approach, beingfaster for some datasets, slower for others. Both of theseapproaches are about two orders of magnitude faster thanthe
Whole Library approach. However, whereas the
ClusterCentroid approach trades off accuracy for speed, the
Consen-sus+Weighting has both high accuracy and speed.Comparing the matching time versus the substitution time(shown in Table VIIIb) we can observe that the
Whole Library approach consumes most of its time during the matching pro-cess (because a Needleman-Wunsch alignment is made withevery request in the transaction library).
Consensus+Weighting and
Cluster Centroid have greatly reduced matching times.Twitter has unusually long substitution times, such that for thefast approaches, most time is spent performing the substitution.This is due to the Twitter responses being very long, causingthe symmetric field identification (common substring search)to become time consuming.The
Consensus+Weighting generates responses faster thanthe real services being emulated. This is crucial for supportingtesting of an enterprise system under test under realisticperformance conditions (delays can be added to slow downthe emulated response, but not the other way around). A majorlimitation of the
Whole Library approach is that it cannotgenerate responses in a time which matches the real servicesfor fast services (such as LDAP). Robustness (RQ3) : Our final test is to evaluate whetherour
Consensus+Prototype approach is robust in generatingaccurate responses when the clustering process (from Sec-tion III-B) is imperfect. For this test we deliberately injectnoise into our clusters, i.e. to create clusters where a fractionof the interaction messages are of different operation types.The noise ratios tested were 5%, 10% and 20%. We repeatedthe experiments with different frequency thresholds (i.e. 0.5,0.8, and 1).Table IX summarises the experimental results. The resultsshow that having a frequency threshold below 1 has a very bigimpact on preserving the accuracy when the clustering is noisy.A threshold of f = 0 . gives the best accuracy. When usingthis threshold, the accuracy stays above 97% for all datasets,when the noise ratio is 5%. As the noise ratio increases to20%, the accuracy decreases significantly for binary LDAP,but stays high for the other datasets.Overall this is a good result. For our approach to work bestthe clusters produced should be relatively clean, but there istolerance for a small amount of noise. A noise ratio of 20% isconsidered very high. Our actual clustering process producedperfect separation (i.e. 0% noise) of interaction messages byoperation type for the six datasets tested. % Noise 10% Noise 20% NoiseConsensus+Weighting Consensus+Weighting Consensus+Weightingf = 0.5 f = 0.8 f = 1 f = 0.5 f = 0.8 f = 1 f = 0.5 f = 0.8 f = 1IMS LDAP
LDAP text (1)
LDAP text (2)
SOAP
Twitter (REST)
TABLE IX: Response Accuracy for Clusters with Noisy Data
E. Industry Validation
The proposed service emulation technique has been inte-grated into CA Technologies’ commercial product: CA ServiceVirtualization. An earlier version of the technique [42] wasreleased as a new feature, named opaque data processing , inversion 8.0 of the product and has been sold to customers [43].Opaque data processing has been used at customer sitesto successfully emulate services of protocols not otherwisesupported by the product. The emulated protocols included aproprietary extension of IMS and Sun’s ONC/RPC protocol.The present technique is on the backlog for future versions.
F. Threats to Validity
We have identified some threats to validity which should betaken into consideration when generalising our experimentalresults: • Our evaluation was performed on six datasets from fourprotocols. Given the great diversity in message protocols,further testing should be performed on other messageprotocols. • The datasets were obtained by randomly generating clientrequests for services of different protocols. Some realsystem interactions are likely to be more complicatedthan those of our datasets. Further testing on real systeminteractions are warranted.V. D
ISCUSSION AND F UTURE W ORK
We have developed an approach for automatically generat-ing service responses from message traces which requires noprior knowledge of message structure or decoders or messageschemas. Our approach of using multiple sequence alignmentto automatically generate consensus prototypes for the purposeof matching request messages is shown to be accurate, efficientand robust. Wildcards in the prototypes allow the stable andunstable parts of the request messages for the various operationtypes to be separated. Rather than using the prototypes directlyfor strict matching (such as using it as a regular expression)we instead calculate matching distance through a modifiedNeedleman-Wunsch alignment algorithm. Since we look forthe closest matching prototype, the method is robust evenif the prototypes are imperfect. Moreover, this process canmatch requests which are slightly different to the prototypes,or are of different length to the prototypes. This allows thesystem to handle requests which are outside of the casesdirectly observed in the trace recordings. Weighting sections of the prototype with different importance based on the entropy,further improves the matching accuracy.Our experimental results using the 6 message trace datasetsdemonstrate that our approach is able to automatically generateaccurate responses in real time for most cases. Moreover, wecan also see that our approach can also generate accurateresponses from imperfect message clusters that contain a smallnumber of messages of different operation types.One limitation in our current approach is the lack ofdiversity in the responses generated. We are working on anapproach to identify common patterns of all responses in acluster. A possible solution is to apply multiple sequencealignment to response messages, to distinguish stable posi-tions from variable positions. The variable parts of responsescould then be stochastically generated. To further improve therobustness of our approach to noisy clustering, we will utiliseoutlier detection techniques [44] to remove outliers of clustersbefore applying the alignment method.Our approach does not consider the service state historyin formulating responses. In practice a stateless model issufficient in many cases. For example: (i) when the emulationtarget service is stateless, or (ii) when the testing scenario doesnot contain two equivalent requests, requiring different stateaffected responses, or (iii) where the testing scenario does notrequire highly accurate responses (e.g. performance testing.)To address this limitation, an avenue of future exploration isto process mine the operation sequences to discover statefulmodels. VI. C
ONCLUSION
We have developed a new technique for automaticallygenerating realistic response messages from network tracesfor enterprise system emulation environments that outper-forms current approaches. We use the bioinformatics-inspiredmultiple sequence alignment algorithm to derive messageprototypes, adding wildcards for high variability sections ofmessages. A modified Needleman-Wunsch algorithm is usedto calculate message distance and entropy weightings are usedin distance calculations for increased accuracy. Our techniqueis able to automatically separate the payload and structuralinformation in complex enterprise system messages, makingit highly robust. We have shown in a set of experimentswith four enterprise system messaging protocols a greaterthan accuracy for the four protocols tested. Additionallythey show efficient emulated service response performanceenabling scaling within an emulated deployment environment.
EFERENCES[1] M. J. Rutherford, A. Carzaniga, and A. L. Wolf, “Evaluating testsuites and adequacy criteria using simulation-based models of distributedsystems,”
IEEE Transactions on Software Engineering
DevOps: A Software Architect’s Perspec-tive . Addison-Wesley Professional, 2015.[4] J. Sugerman, G. Venkitachalam, and B.-H. Lim, “Virtualizing i/o deviceson vmware workstation’s hosted virtual machine monitor,” in
Proceed-ings of the General Track: 2001 USENIX Annual Technical Conference(USENIX 2001) , Boston, Massachusetts, USA, 2001, pp. 1–14.[5] C. Hine, J.-G. Schneider, J. Han, and S. Versteeg, “Scalable emulationof enterprise systems,” in
Proceedings of the 20th Australian SoftwareEngineering Conference (ASWEC 2009) , Gold Coast, Australia, 2009,pp. 142–151.[6] S. Ghosh and A. P. Mathur, “Issues in Testing Distributed Component-Based Systems,” in
Proceedings of the 1st International ICSE Workshopon Testing Distributed Component-Based Systems (ICSE 1999) , LosAngeles, California, USA, 1999.[7] M. Du, J.-G. Schneider, C. Hine, J. Grundy, and S. Versteeg, “Generatingservice models by trace subsequence substitution,” in
Proceedings ofthe 9th International ACM Sigsoft Conference on Quality of SoftwareArchitectures (QoSA 2013) , Vancouver, British Columbia, Canada, 2013,pp. 123–132.[8] M. Du, S. Versteeg, J.-G. Schneider, J. C. Han, and J. Grundy, “In-teraction traces mining for efficient system responses generation,” in
Proceedings of the 2nd International Workshop on Software Mining(SoftMine 2013) , Palo Alto, CA, USA, 2013, pp. 1–8.[9] G. Combs, “Wireshark,” http://wireshark.org, 1998–2015.[10] J. Sermersheim, “Lightweight Directory Access Protocol (LDAP): TheProtocol,” 6 2006, RFC 4511.[11] P. Li, “Selecting and Using Virtualization Solutions: our Experienceswith VMware and VirtualBox,”
Journal of Computing Sciences inColleges , vol. 25, no. 3, pp. 11–17, 2010.[12] P. Godefroid, “Micro execution,” in in the proceedings of the 36th Inter-national Conference on Software Engineering (ICSE 2014) , Hyderabad,India, 2014, pp. 539–549.[13] J. Grundy, Y. Cai, and A. Liu, “Softarch/mte: Generatingdistributed system test-beds from high-level software architecturedescriptions,”
Automated Software Engineering , vol. 12, no. 1, pp.5–39, Jan. 2005. [Online]. Available: http://dx.doi.org/10.1023/B:AUSE.0000049207.62380.74[14] T. Banzai, H. Koizumi, R. Kanbayashi, T. Imada, T. Hanawa, andM. Sato, “D-cloud: Design of a software testing environment for reliabledistributed systems using cloud computing technology,” in
Proceedingsof the 10th IEEE/ACM International Conference on Cluster, Cloud andGrid Computing (CCGrid 2010) , Melbourne, Australia, 2010, pp. 631–636.[15] A. Bertolino, G. De Angelis, L. Frantzen, and A. Polini, “Model-basedgeneration of testbeds for web services,” in
Testing of Software andCommunicating Systems , 2008, vol. 5047, pp. 266–282.[16] C. Hine, “Emulating enterprise software environments,” PhD thesis,Swinburne University of Technology, Faculty of Information and Com-munication Technologies, 2012.[17] J. Sun and T. Mannisto, “Usefulness evaluation of simulation in serversystem testing,” in
Proceedings of the 36th IEEE Computer Softwareand Applications Conference (COMPSAC 2012) , Izmir, Turkey, 2012,pp. 158–163.[18] G. Wondracek, P. M. Comparetti, C. Kruegel, E. Kirda, and S. S. S.Anna, “Automatic network protocol analysis,” in
Proceedings of theNetwork and Distributed System Security Symposium (NDSS 2008) ,California, USA, 2008, pp. 1–14.[19] W. Cui, J. Kannan, and H. J. Wang, “Discoverer: Automatic protocolreverse engineering from network traces,” in
Proceedings of the 16thUSENIX Security Symposium (Security 2007) , Boston, MA, USA, 2007,pp. 199–212.[20] Y. Wang, X. Yun, M. Z. Shafiq, L. Wang, A. X. Liu, Z. Zhang, D. Yao,Y. Zhang, and L. Guo, “A semantics aware approach to automatedreverse engineering unknown protocols,” in
Network Protocols (ICNP),2012 20th IEEE International Conference on . IEEE, 2012, pp. 1–10. [21] J. De Weerdt, J. Vanthienen, B. Baesens et al. , “Active trace clusteringfor improved process discovery,”
IEEE Transactions on Knowledge andData Engineering , vol. 25, no. 12, pp. 2708–2720, 2013.[22] R. Jagadeesh Chandra Bose and W. M. van der Aalst, “Process di-agnostics using trace alignment: opportunities, issues, and challenges,”
Information Systems , vol. 37, no. 2, pp. 117–141, 2012.[23] D. Lo, L. Mariani, and M. Pezz`e, “Automatic steering of behavioralmodel inference,” in
Proceedings of the 7th joint meeting of the Euro-pean Software Engineering Conference and the ACM SIGSOFT Interna-tional Symposium on Foundations of Software Engineering (ESEC/FSE2009) , Amsterdam, The Netherlands, 2009, pp. 345–354.[24] W. Cui, V. Paxson, N. C. Weaver, and R. H. Katz, “Protocol-independentadaptive replay of application dialog,” in
Proceedings of the 13th AnnualNetwork and Distributed System Security Symposium (NDSS 2006) ∼ /media/files/whitepapers/ca-svcapabilities-wp-oct2012.aspx[26] S. B. Needleman and C. D. Wunsch, “A general method applicable tothe search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology , vol. 48, no. 3, pp. 443–453, 1970.[27] J. C. Bezdek and R. J. Hathaway, “Vat: A tool for visual assessmentof (cluster) tendency,” in
Proceedings of the 2002 International JointConference on Neural Networks (IJCNN 2002) , Honolulu, Hawaii, USA,2002, pp. 2225–2230.[28] R. Durbin,
Biological sequence analysis: probabilistic models of proteinsand nucleic acids . Cambridge university press, 1998.[29] P. M. Comparetti, G. Wondracek, C. Kruegel, and E. Kirda, “Prospex:Protocol Specification Extraction,” in
Proceedings of the 30th IEEESymposium on Security and Privacy (S&P 2009) , Oakland, CA, USA,2009, pp. 110–125.[30] M. Beddoe, “The protocol informatics project,”
Toorcon , vol. 4, p. 4,2004.[31] Y. Wang, X. Yun, M. Z. Shafiq, L. Wang, A. X. Liu, Z. Zhang, D. Yao,Y. Zhang, and L. Guo, “A semantics aware approach to automatedreverse engineering unknown protocols,” in
Proceedings of the 20thIEEE International Conference on Network Protocols (ICNP 2012) ,Austin, TX, USA, 2012, pp. 1–10.[32] J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W:improving the sensitivity of progressive multiple sequence alignmentthrough sequence weighting, position-specific gap penalties and weightmatrix choice,”
Nucleic Acids Research , vol. 22, no. 22, pp. 4673–4680,1994.[33] L. Wang and T. Jiang, “On the complexity of multiple sequencealignment,”
Journal of computational biology , vol. 1, no. 4, pp. 337–348,1994.[34] W. H. Day and F. McMorris, “Critical comparison of consensus methodsfor molecular sequences,”
Nucleic Acids Research , vol. 20, no. 5, pp.1093–1099, 1992.[35] C. E. Shannon, “A mathematical theory of communication,”
The BellSystem Technical Journal , pp. 379–423,623–656, 1948.[36] M. Du and S. Versteeg, “Interactive Message Traces from ApplicationLayer Protocols,” 9 2014. [Online]. Available: http://quoll.ict.swin.edu.au/doc/message traces.html[37] R. Long, M. Harrington, R. Hain, and G. Nicholls,
IMS Primer
Pattern Recognition: A Statistical Approach .Prentice/Hall International, 1982.[41] G. J. McLachlan, K.-A. Do, and C. Ambroise,
Analyzing MicroarrayGene Expression Data . Wiley-Interscience, 2004.[42] S. Muller, “ODP makes moot the question, ‘Do yousupport that protocol?’,” 10 2014. [Online]. Available: https://communities.ca.com/community/ca-devtest-community/blog/2014/10/13/odp-makes-moot-the-question-do-you-support-that-protocol[43] D. Swan, “Swinburne, CA Technologies in virtualisationbreakthrough,”
The Australian