[PDF] Fingerprinting Cryptographic Protocols with Key Exchange using an Entropy Measure

Abstract

Encryption has increasingly been used in all applications for various purposes, but it also brings big challenges to network security. In this paper, we take first steps towards addressing some of these chal- lenges by introducing a novel system to identify key exchange protocols, which are usually required if encryption keys are not pre-shared. We ob- served that key exchange protocols yield certain patterns of high-entropy data blocks, e.g. as found in key material. We propose a multi-resolution approach of accurately detecting high-entropy data blocks and a method of generating scalable fingerprints for cryptographic protocols. We pro- vide experimental evidence that our approach has great potential for identifying cryptographic protocols by their unique key exchanges, and furthermore for detecting malware traffic that includes customized key exchange protocols.

Full PDF

FFingerprinting Cryptographic Protocols withKey Exchange using an Entropy Measure

Shoufu Luo and Sven Dietrich , The Graduate Center, City University of New York, [email protected] John Jay College of Criminal Justice, City University of New York, [email protected]

Abstract.

Encryption has increasingly been used in all applications forvarious purposes, but it also brings big challenges to network security.In this paper, we take ﬁrst steps towards addressing some of these chal-lenges by introducing a novel system to identify key exchange protocols,which are usually required if encryption keys are not pre-shared. We ob-served that key exchange protocols yield certain patterns of high-entropydata blocks, e.g. as found in key material. We propose a multi-resolutionapproach of accurately detecting high-entropy data blocks and a methodof generating scalable ﬁngerprints for cryptographic protocols. We pro-vide experimental evidence that our approach has great potential foridentifying cryptographic protocols by their unique key exchanges, andfurthermore for detecting malware traﬃc that includes customized keyexchange protocols.

In the network security ﬁeld, the use of encryption for malicious purposes bringsnew challenges to network security defense. For example, encryption has pre-vented botnet traﬃc from being inspected and detected by defense systems basedon deep-packet inspection (DPI), which used to be very eﬀective up to thatpoint. For symmetric encryption and decryption, a secret key k shared amongtwo communicating parties is required, either pre-shared or negotiated on the ﬂyusing cryptographic key-exchange protocols. Most common cryptographic pro-tocols [4, 12, 13] using symmetric encryption to secure the channel use a keyexchange protocol, such as the Diﬃe-Hellman key exchange [22].Depending on the protocol design, key material is distributed diﬀerentlyalong the traﬃc stream. As key material has high entropy compared to nor-mal traﬃc, the traﬃc for the key exchange exhibits detectable characteristics,namely the uniqueness of the distribution of key material allowing for properdiscriminating characteristics, as shown in Figure 1. Using an entropy metric,it may not be hard to test the hypothesis whether a byte string is “random,”if that byte string is suﬃciently long. The problem becomes harder if the givenstring is relatively short, i.e. undersampled, or if the goal is to identify whichpart of the string contains random bytes, in particular, deciding the boundaries a r X i v : . [ c s . CR ] D ec ig. 1: Visualization of Entropy Distribution: dark portions are high-entropyblocks.of those random bytes (also known as blocks of interest). It is therefore challeng-ing to characterize a stream by the distribution of embedded random bytes, orso-called high-entropy blocks.To avoid being treated as an anomaly, malware might try to use standardcryptographic protocols (e.g. SSL/TLS) for secure communication, eﬀectivelypreventing DPI. However, standard protocols such as SSL can potentially besubject to a man-in-the-middle attack. However, malware in general tends toavoid using standard protocols and instead employs a customized variant. Only10% of malware utilize TLS as a form of encryption, according to a recent study[1]. To ensure fresh key material, a new key exchange is desirable for every newcommand-and-control (C&C) session of the malware [6, 26].Our work oﬀers a systematic way to characterize network traﬃc throughkey exchange behaviors and generate scalable ﬁngerprints based on detectedhigh-entropy blocks. The system mainly consists of two parts: the high-entropyblock detection and the ﬁngerprint generation. First, we aim to identify high-entropy blocks from a traﬃc stream using sample entropy via a sliding window.Second, with all high-entropy blocks identiﬁed, entropy-based ﬁngerprints fornetwork ﬂows will be generated by the distribution of high-entropy blocks. Ourcontribution also includes: – A new method of identifying cryptographic protocols, raising the bar formalicious activities that abuse customizing cryptographic protocols to evadeinspection. – A voting mechanism that eﬃciently boosts the accuracy of entropy estima-tion when undersampled using a multi-resolution analysis. – A statistical approach to estimate the range of high-entropy data blocks andbuild scalable entropy-based ﬁngerprints for key exchange protocols in theform of regular expressions.To the best of our knowledge, our work is the ﬁrst attempt to ﬁngerprint keyexchange protocols by the distribution of key material and apply such a techniqueto malware detection. By design our approach can be implemented and deployedas a standalone system. However, it is not the intention to replace any existingdetection techniques, but rather to complement them. This system can be builtwith existing systems as a plug-in component, in particular those relying on acertain degree of payload analysis, e.g. [30]. Moreover, a component of our systemcan be a useful tool for the security community, e.g. for identifying high-entropyportions of a given data block, such as detection of packed malware binaries. elated Work

Olivain et al. [20] proposed to use cumulative entropy of net-work ﬂows for detection of speciﬁc attacking behaviors targeted at known cryp-tographic protocols, i.e. SSL. Instead of an aggregation, our work aims to ﬁn-gerprint the entropy distribution along the examined traﬃc. Our approach isstill applicable for their purpose in a more precise way. Meanwhile, we adoptthe technique they propose,

N-truncated entropy , for entropy estimation, whichis also used by Dorﬁnger et al. [7] for classifying encrypted and unencryptedtraﬃc. There is prior work [28] that shows how entropy tests can be used to de-tect encrypted or compressed packets from network streams. Again, we providea more reliable mechanism to detect high-entropy areas as one of our essentialcontributions.Our work shares an interest from the ﬁeld of protocol identiﬁcation. Most ofthe work in that ﬁeld is mainly learning-based, relying on network-observablefeatures [17, 29]. For example, Wright et al. [29] proposed to identify the crypto-graphic protocol of individual encrypted TCP connections using post-encryptionobservable features, such as timing, size, direction etc. To some extent, our ap-proach can also be also applied for this purpose. However, there are knownobfuscation techniques which could be used to evade this, such as obfsproxy [5]and FTE [9]. As discussed in [27], obfuscation can be detected with entropy-based tests over the packet payloads. Our approach does the same by extractingentropy-based ﬁngerprints.Zhang et al. [32] proposed to detect encrypted traﬃc by looking for N sequen-tial high-entropy packets of all ﬁrst M packets of one network ﬂow adopting thecumulative entropy technique. In 2015, Zhang et al. [31] improved their previouswork by detecting of high-entropy ﬂows as an additional measure to score a hostbeing a bot for BotHunter [14]. Applicable to the same problem, our approachis diﬀerent from theirs by ﬁngerprinting malware with customized cryptographicprotocols, such as Nugache, as will be shown. Unlike their work, our work doesnot rely on another system for detection.The rest of this paper is organized as follows. We begin with background onentropy and its estimators. In section 3, we discuss our methodology in detail,including how to identify high-entropy blocks, a voting mechanism as well asa ﬁltering method for false positives reduction, etc. Following that, section 4presents evaluation and analysis of our approach with three diﬀerent dataset.Finally, we conclude this study by discussing limitations and directions of futurework. Introduced by Shannon [25], entropy is used as a measurement of the amountof information that is missing before reception. In the context of cryptography,it is used as a measure of randomness (or uncertainty), equating higher entropywith higher randomness. Let X be a discrete random variable under an arbi-trary distribution P on a countable alphabet Σ = { x , ..., x m } . The deﬁnitionf Shannon entropy can be generally expressed by the equation (1), H ( X ) = − m (cid:88) i =1 p ( x i ) log p ( x i ) (1)The entropy H ( X ) yields a maximum value when all p ( x i ) are equal to m ,i.e. uniformly distributed. In cryptography, as a fundamental requirement ofsecurity, key material should have high entropy in order to be hard to predict. Entropy can be easily obtained by the equation (1) if given a random variablewhose probability distribution is known. However, in practice, P may remainunknown for most scenarios. Frequently, p ( x i ) could be still estimated by therelative frequencies of the outcome x i from a large number of trials. The proba-bility of x i is thereby ˆ p ( x i ) = n i N , where n i is the number of times x i occurs and N is the total number of trials or samples. Hereby, the sample entropy , a.k.a.maximum likelihood estimator (MLE) [2], can be estimated as in the equationbelow. ˆ H MLEN ( X ) ≡ − m (cid:88) i =1 ˆ p ( x i ) log ˆ p ( x i ) (2)Even though MLE is an unbiased estimator of H ( X ) when N tends to in-ﬁnity where ˆ p ( x i ) approximates p ( x i ) and ˆ H MLEN ( X ) approximates real H ( X ).When N is not suﬃciently large, namely undersampled , ˆ H MLEN ( X ) highly bias,in particular, N < m or N ∼ m . There is no universal rate at which the errorof MLE compared to H ( X ) would be close to zero [2]. There are attempts thataim to subtract the bias directly, such as the Miller-Madow corrector [18], theJackknife corrector [11] and the Paninski corrector [21]. However, the bias is stillsigniﬁcantly high when N < m or N ∼ m . Moreover, it has been proven diﬃcultto ﬁnd an unbiased estimator [21, 24]. Unfortunately, the Paninski corrector isunbiased but if and only if P has a uniform distribution, which can not be guar-anteed. Furthermore, according to this study [20], ˆ H MLEN ( X ) ∼ H ( X ) is valid ifand only if N (cid:29) m , which typically means N is of the order of roughly at least times as large as m . In another word, if Σ = { , ..., } (i.e. m= | Σ | =256),it would require around 2,000 samples to possibly obtain a reasonable estimatedentropy. That makes it impractical for the purpose of proﬁling network traﬃc askey material usually is at most hundreds of bytes (256 bytes = 2048 bits). Forexample, in a typical TLS handshake, a client random number only contains 28bytes. H N ( X ) Similar to Olivain et al. [20], an accurate entropy value is not of our main focus,but rather the probability of a string being generated from a uniform distri-bution. The

N-truncated entropy H N ( X ) proposed by Olivain et al. meets oureeds, which is the average of the sample entropy ˆ H MLEN ( X ) over all strings oflength of N drawn at random from the distribution P , as deﬁned below. H N ( X ) = (cid:88) Σ i n i = N (cid:34)(cid:18) Nn , ..., n m − (cid:19) m − (cid:89) i =0 p n i i (cid:32) − m − (cid:88) i =0 n i N log n i N (cid:33)(cid:35) (3) By construction, ˆ H MLEN ( X ) is an unbiased estimator of H N ( X ) for an arbi-trary distribution P . More importantly, ˆ H MLEN ( X ) gives a statistical indicationthat how close the distribution P is to being uniform by comparing to ˆ H MLEN ( W )given that W be a random variable under a uniform distribution U . In section3.2, we describe how to obtain both values. Alternatively, if a string s of length N with each sample drawn from P , we use ˆ H MLEN ( s ) instead of ˆ H MLEN ( X ). Todiﬀerentiate this, w is used instead if uniform distribution, U . H N ( X ) has anupper bound of log min { m, N } as it reaches its maximum value if all ˆ p x i areequal, either ˆ p x i = N if N < m or ˆ p x i = m otherwise. In either case, uncertaintyreaches its maximum. In this section, we discuss in detail the techniques we used and developed, ac-companied by experimental evidence.

To obtain entropy information of diﬀerent portions within the traﬃc stream,a sliding window moves over the traﬃc with a step of one byte while sampleentropy will be measured for each chunk of bytes in that window. Bytes in eachwindow form a block.The window size determines sample size, which directly impacts the accuracyof sample entropy. If the sample size is too small, the sample entropy mightnot be accurate enough to be meaningful. Equation (4) roughly estimates theprobability of a N-byte string appearing to be “random”, i.e. each char in thealphabet only occurs once in the string. Fix Σ to be Σ and then let m =256.Let N =16 be a 16-byte sliding window. Pr[X=e]=0.6197. That is, there is a40% probability that an arbitrary string appears random, i.e. a forty percentchance of a false positive. However, if N=32, Pr[X=e]=0.082. This conﬁrms thediscussion in Paninski et al. [21] that one should never use less than 16 bytes forentropy estimation when Σ is used. P r [ X = e ] = 1 · m − m · ... · m − N + 1 m = N − (cid:89) i =0 m − im (4)If the sliding window grows to be too large, it is likely to mix high-entropy areaswith low-entropy areas, confusing the diﬀerence between them. As shown inFigure 2, when the window size is small, e.g. 16-byte, the curve is fuzzy and hasoo many valleys (low-entropy) and peaks (high-entropy), while as the windowsize goes larger, e.g. 1024 or 2048-byte, the curve becomes ﬂatter and valleys orpeaks are not distinctive anymore.Fig. 2: Entropy plot of a TLS sample traﬃc using diﬀerent sliding window sizes,from bottom to top (-byte): 16, 32, 64, 128, 256, 512, 1024 to 2048.A smaller window is more likely to mistakenly identify a non-random dataarea to be “random” (false positive), while a larger window possibly fails toidentify real high-entropy area (false negative). The choice of window size willheavily depend on the minimum length of key materials of interest. In case ofTLS, we choose a 32-byte sliding window as it is good for the minimum length ofinterests, i.e. 28-byte client random number. In summary, as the window slidesover the data with a one-byte step, each block is labeled as either high-entropyor low-entropy. A list of consecutive either high-entropy blocks or low-entropyblocks then forms a unit , more precisely a high-entropy unit or a low-entropyunit respectively. H N ( U ) To identify a high-entropy block, we follow the idea used by [20], i.e. the Monte-Carlo method, as it provides a level of conﬁdence of a string being random. Weﬁrst repeatedly generate strings of length of N with each byte sampled from arandom source, e.g. /dev/urandom on MacOS X. Then, we calculate the mean µ and standard deviation σ of sample entropy using all samples. Here, µ and σ summarize the distribution of the sample entropy of random strings of length N . By a speciﬁc number t of standard deviations, we can obtain the proportionof sample strings falling within the range of µ ± t × σ . This proportion providesus with a conﬁdence of a string being random if it falls within the given range.As exceeding the upper bound does not aﬀect the randomness of the string, wegnore the upper bound and use the lower bound as a cutoﬀ for a string beingrandom, denoted by θ , with a conﬁdence by the proportion ρ : θ = µ (cid:0) ˆ H MLEN ( w ) (cid:1) − t × σ (cid:0) ˆ H MLEN ( w ) (cid:1) ρ = number of samples above θnumber of samples (5)Consequently, any strings falling below the threshold are considered not random,i.e. low-entropy blocks. Similarly, any strings falling above the threshold will beconsidered random, i.e. high-entropy. Table 1 shows thresholds ( θ ) for w usingdiﬀerent window sizes (N) above a minimum level of conﬁdence 99.0%. N µ σ t θ ρ

16 3.94199 0.08290 2.8 3.7098 99.2%32 4.88171 0.08134 2.7 4.6620 99.3%64 5.76562 0.07664 2.6 5.5663 99.2%128 6.55003 0.06733 2.5 6.3817 99.2%256 7.17518 0.05240 2.5 7.0441 99.2%512 7.59073 0.03364 2.4 7.5099 99.0%1024 7.80894 0.01726 2.5 7.7658 99.1%2048 7.90804 0.00814 2.5 7.8877 99.2%

Table 1: ˆ H MLEN ( w ) under Various ConﬁgurationsThe conﬁdence measures the conﬁdence of a string not being random whenfalling out of the range, rather than a conﬁdence of a string being random whenfalling within the range. For example, let N be 64 and Σ = Σ , then µ =5.7656, σ =0.0766. With 99.4% of samples above θ = µ -3 σ =5.53569 (i.e. t = 3), we wouldhave at least s with ˆ H MLEN ( s )=5.5120 is not closeto random, i.e. not a high-entropy block. Here, t is our control variable. We canchoose a smaller t to tighten the range with a higher conﬁdence or a larger t toloosen the range, but with a lower conﬁdence. In our study, we choose t tightly toobtain a relatively high conﬁdence, at least 99.0%. With the threshold, we couldthen transform sample entropy score to either one or zero. The plot turns to beto a square wave where one indicates high-entropy and zero for low-entropy asshown in ﬁgure 3. The shadow in the upper plot shows the cutoﬀ. Σ Due to statistical limitations, some data blocks may mistakenly be labeled ashigh-entropy blocks, i.e. a false positive, which will mislead the ﬁngerprint andtherefore must be avoided or minimized. In order to achieve this, we devised avoting mechanism using multi-resolution analysis, utilizing the choice of alphabet Σ . As will be shown, this mechanism dramatically reduces the rate of falsepositives.Thus far we based our discussion on the choice of Σ to be Σ ( m =256)with each char being a byte. In cryptography, however, the randomness of keyig. 3: Normalization: high-entropy blocksmaterial is deﬁned at a more restrictive level, i.e. at a bit level, and thereby Σ = {

0, 1 } ( m =2). Let’s consider one experiment of tossing one coin that has twooutcomes, and another experiment of tossing eight independent coins with twooutcomes for each. According to basic probability theory, if each coin is uniformlydrawn from Σ = {

0, 1 } , the outcome of eight coins ( Σ ) will still follow a uniformdistribution. In our estimation of ˆ H MLEN ( w ), we do generate each random byteby randomly sampling eight times over {

0, 1 } for all our sample strings. Thatbeing said, given that each bit is independently sampled uniformly from {

0, 1 } ,we could choose a random variable of diﬀerent number of τ bits (i.e. coins) andsuch a random variable will be guaranteed to have a uniform distribution.As an extension to our previous computation of ˆ H MLEN ( w ), we outline thethresholds and their conﬁdence levels for diﬀerent τ while ﬁxing N to 32. We usethe term τ -bit measure, e.g. 2-bit measure. Previously, N could be interpretedas either the window size and the sample size. In the case of τ -bit measure, thesample size changes, i.e. τ N ( τ ≤ τ -bit measuredoes not change the fundamentals of N-truncated entropy as it simply uses alarger sample size and a diﬀerent alphabet. τ m µ σ t θ ρ Table 2: τ -bit measure ˆ H MLE ( w )Statistical methods such as sample entropy generally ignore potential struc-tures or patterns occurring in the data. Therefore, a string with a high samplentropy score is not guaranteed to be random. For example, given a hexadeci-mal string s be “55 55 bb bb”, i.e. 0101 0101 0101 0101 1010 1010 1010 1010in binary, we have ˆ p = ˆ p = if 1-bit measure ( τ =1) used, i.e. Σ = {

0, 1 } , andthen ˆ H MLEN ( s ) = 1. Consequently, s will be labeled as high-entropy bytes inspite they are not at all. Taking another example from real world, a hexadecimalstring from a TLS session: 16 03 01 0 c

13 0 b

00 0 c f

00 0 d e

10 04 7 a

30 82,which is a block of control information from the TLS handshake traﬃc. Thetwo bytes

03 01 indicate the TLS version, i.e. TLS 1.0,

0c 13 for the length, for the protocol type, and another 3 bytes of length

00 0c 0f . This block maynot also appear “random“ if an 8-bit measure is used. Such cases are prone tofalse positives and mislead the process.Fig. 4: A traﬃc sample from a TLS 1.2 session with a 1024-bit RSA public key.However, the idea is that if a string is random, no matter which τ -bit measureis being used, its sample entropy ˆ H MLEN ( s ) should be always close to ˆ H N ( U ).Thus, we propose to use a voting mechanism instead of using a sole τ -measure.The voting rule is if any of chosen τ -bit measure rejects the randomness of thatblock, the block will be labeled as non-random. It is a simple AND operationamong the outcome of all measures. Figure 4 shows the eﬀectiveness of combiningthree τ -measures, where the resulting signature by voting precisely outlines allhigh-entropy blocks in the TLS session. The last plot line, X-signature, is basedon the voting over the three 1-bit, 4-bit and 8-bit measures. Our voting mechanism eﬀectively reduces false positives. However, in some sce-narios, this approach may still not be suﬃcient to eliminate all false positives. Control information is commonly known to have low entropy. here is still a chance that all τ -bit measures falsely identify an ordinary blockto be high-entropy because of accidentally some small actual randomness withinthe data. If there supposedly are no high-entropy data blocks, the length of adata block with randomness should be less than the minimum length of interestand the size of detected high-entropy units would appear to be relatively smallcompared to that, if there actually exists a high-entropy data block of interest.A ﬁltering threshold denoted as ξ is possibly chosen to eliminate those smallhigh-entropy units. Our empirical study suggests ξ = 9 to be a good choicewhen a 32-byte sliding window size chosen for detecting a minimum 20-bytehigh-entropy key material blocks. That means if there are only 9 consecutivehigh-entropy blocks detected between two low-entropy units, then a false pos-itive is identiﬁed and ﬁltered out in that case. Here, the “ﬁlter out” meanslabeling these blocks to be low-entropy instead of high-entropy. Beyond identifying high-entropy blocks, it is also essential to describe the lengthof each unit in order to ﬁngerprint the shape of the square wave as shown in 4.Due to its statistical inheritance and the way of measuring, the length of eachunit (i.e. the number of detected consecutive high-entropy or low-entropy blocks)may vary because when the sliding window is partially over the target randombytes, it may still continue to yield high sample entropy blocks until the win-dow moves suﬃciently away from the target. For example, a TLS traﬃc streamcontains a client random number as a chunk of 28 bytes. It is not diﬃcult toanticipate that there will not be only exactly one high-entropy block detected inthis case. The total number of high-entropy blocks detected around that chunkof data will not be ﬁxed as well from case to case. However, our intention isnot to determine an absolute value for each unit among all cases, but rather acertain reasonable range. Hereby, we resort to

Monte-Carlo methods to empir-ically estimate the range. For example, to estimate the length of high-entropyunit around client random bytes, we sampled 100,000 client hello messages fromTLS sessions.The result shown in ﬁgure 5 indicates most of the length for the 28-byte clientrandom string followed by the list of cipher suites fall within a range between sixhigh-entropy blocks and twenty-four blocks. If a 32-byte TLS session ID (alsorandom bytes) is present along with the client random bytes, adding up to 60bytes, we obtain a range of [38 ,

52] as shown in ﬁgure 5. A more conservativerange would be [20 , Fingerprinting is a process to proﬁle a key exchange protocol by its distribu-tion of high-entropy blocks along traﬃc streams generated by such a protocol. Aentropy-based ﬁngerprint is a series of interleaving high-entropy units and low-entropy units with the length of each unit speciﬁed as a range. The reason thathigh-entropy blocks have to interleave with low-entropy ones is that otherwiseig. 5: Distribution of length of detected high-entropy blocks (1) Left: over theTLS 28-byte client random string (2) Right: over the TLS 28-byte client randomstring and 32-byte session ID.two adjacent high-entropy or low-entropy blocks would be merged into one. Let( s, l, r ) represent one unit where s ∈ { , } , l, r ∈ Z + , where s be the sign indi-cating a high-entropy unit or low-entropy, l be the minimum length and r be themaximum length. An entropy-based ﬁngerprint then is the concatenation of anordered list of ( s, l, r ) with s alternating among one and zero. Alternatively, itcan be concisely expressed as below, where s i ∈ { , } , l i , r i ∈ Z + . The beneﬁt ofsuch a representation is that this form aligns with standard regular expressionand the matching process can be done very eﬃciently. The regular expressionform will provide a ﬂexible way of expressing the ﬁngerprint, for instance, op-tional units, as will be shown in the experiment section. n (cid:110) i =1 s i { l i , r i } , s i (cid:54) = s i +1 The ﬁngerprinting is straightforward in three steps: (1) identify high-entropyand low-entropy areas (units) of the anticipated traﬃc from a cryptographicprotocol; (2) follow the technique described in section 3.5 and estimate the rangefor each area; (3) formalize the units in a regular expression. Taking TLS usinga cipher-suite of DHE-RSA-* as an example, the ﬁngerprint is as below:1 { , } { , } { , } { , } { , } .... During the detection phase, we have these steps: (1) scan the traﬃc streamby sliding a window over it and estimating sample entropy for each windowusing diﬀerent τ -bit measures; (2) normalize each block by its entropy score toeither one or zero using the pre-calculated threshold θ ; (3) perform the voting(i.e. AND) of outcomes from each measures; (4) ﬁlter out the noises using ﬁlterthreshold; (5) use regular expression to match the predeﬁned ﬁngerprint againstthe output (i.e. a string consisting of zeros and ones).In our demonstration, we emphasize DHE-RSA-* cipher-suite for TLS proto-col as our approach aims to proﬁle that a particular key exchange protocol andTLS is capable of using diﬀerent key exchange protocols. SSL has evolved overime into the standard TLS protocol, which supports a long list of cipher suiteswith diﬀerent key exchange protocols. To demonstrate, we choose to proﬁle oneset of key exchange protocol cipher suites, i.e. DHE-RSA-*, (see 3). By contrast,as an application of our system, most botnet C&C protocols are much simpler asmost of them are designed for the sole purpose of performing a limited numberof tasks. SSL/TLS is a well-known cryptographic protocol with fair complexity. The suc-cessful characterization of the TLS protocol provides the full ability to charac-terize other and simpler botnet C&C protocols. For evaluation, we ﬁrst use TLSas our primary target and later extend it to the Nugache botnet. All streamsare bidirectional and packets of a stream are correctly ordered with all TCP/IPheaders removed. The tshark [3] was used as a primary tool to process networktraces in pcap [15].

We obtained a data set of TLS network traﬃc from the ZMap project [8]. Ini-tially, we extracted 16,240 TCP streams on standard port 443 from 800MBof raw traﬃc data and further reduced to 5,794 completed and validated TLSstreams . Then, we extracted from those 5,794 streams the 1,378 streams thatused one of the DHE-RSA- ∗ ciphersuites in Table 3. We split 1,378 instances intotwo sets: the d00200 set of 218 instances for parameter selection and signaturereﬁnement and the test set d00300 of 1,160 instances for the testing of the ﬁnalsignature, denoted as the d00015 set. We also extracted 1,204 TLS instanceswith other ciphersuites. We extracted 337 Nugache traﬃc streams from a setof raw Nugache traﬃc and divided instances into two groups: 162 instances oftraining set and 175 instances of testing set. Similar to TLS, we use the trainingset to tune the ﬁngerprint and the testing set for validation.In addition, we used 3,412 non-TLS TCP streams from a data set generated byUNSW-NB15 [19]. This data set contains a variety of traﬃc types, but withoutany TLS traﬃc so we can use it as another dimension of negative cases for testingthe ﬁngerprints. Table above shows the traﬃc type of the majority by serviceports, only including standard ports under 1024. The table does not show thewhole spectrum of traﬃc types in this dataset, but rather provides a quick look.More details on this data set are available in the original paper. We test the signature generated as previously described over the training set dhe00200 with thresholds of the conﬁdence ρ above 99.2% for diﬀerent measures. A large portion of hosts scanned by the ZMap client did not respond or rejectconnections for various reasons during TLS negotiation ipher ID Name

TLS_DHE_RSA_WITH_DES_CBC_SHA

TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA

TLS_DHE_RSA_WITH_AES_128_CBC_SHA

TLS_DHE_RSA_WITH_AES_256_CBC_SHA

TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA

TLS_DHE_RSA_WITH_AES_128_CBC_SHA256

TLS_DHE_RSA_WITH_AES_256_CBC_SHA256

TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA

TLS_DHE_RSA_WITH_SEED_CBC_SHA

TLS_DHE_RSA_WITH_AES_128_GCM_SHA256

TLS_DHE_RSA_WITH_AES_256_GCM_SHA384

Table 3: TLS Ciphersuites of Choice: DHE-RSA-*

Port

80 25 22 143 21 111 179 139 110

The results shown in table 4 do not seem promising at all, of all the best resultsfrom 1-4-8 and 1-2-4-8 only reach a recall rate, 62.84% and 64.22% with theconﬁdence of 99.85% respectively, but it does conﬁrm that the strategy of usingmultiple τ -measures signiﬁcantly improves the recall rate. Also, it is interestingto notice that the rate of multiple τ -measures drops signiﬁcantly below 10% witha conﬁdence of 99.99%, which is reasonable because the threshold is too relaxed(with a higher proportion of high entropy blocks) to be accurate. (cid:64)(cid:64)(cid:64) τ ρ Table 4: Recall rate of the original signature for TLSBy manually checking those failures, we found three major issues of ouroriginal signature. One is the range of the server random bytes. It was a littlebit tighter than it appeared, which is previously set to be (+, 8, 54) as we usedthe range estimated from client random bytes. It turns out to be inadequate asthe bytes after the server random bytes appear more random than those afterclient random bytes, and therefore more likely produce a longer high-entropyblock. Following the same method as we did for client random bytes, we increasehe maximum length to 64. The second major issue is that we failed to consideroptional random bytes such as key identiﬁer ﬁelds for both issuer and subject ofthe certiﬁcate. The third one relies on the fact that two high-entropy areas mightbe adjacent to each other without a suﬃcient gap and get merged to a largerhigh-entropy area, e.g. the signature of certiﬁcate and the server key exchangeparameters. For the later two cases, we introduce optional blocks to the signaturemaking the signature scalable. In the regular expression, we can include optionalstrings. For instance, our TLS signature has been extended to include optionalstrings as below. This adjustment boosts the recall for most cases, as shown inTable 5. For both cases of 1-4-8 and 1-2-4-8, the recall increases by around 20%. ‘ ... { , } (0 { , }|{ , } (1 { , }|{ , } { , } { , } )0 { , } ) ... (cid:48) . (cid:64)(cid:64)(cid:64) τ ρ Table 5: Recall using reﬁned ﬁngerprintThe noise threshold is used to remove false positives and make the ﬁngerprintmore reliable. As the threshold increases, the detection accuracy of high-entropyblocks will increase as we are eliminating those accidental “high-entropy” blocks.At a certain point, this elimination may hurt the eﬀectiveness as true high-entropy blocks may be eliminated by such an excessively large threshold. Weexperimented with diﬀerent ﬁlter thresholds ξ using a 4-bit measure, as shownin ﬁgure 6. Given its initial purpose, this parameter should be kept as smallas possible for eﬀective ﬁltering. Thus, ξ = 9 is chosen based on the empiricalresults. As suggested by our test results, it appears to be a proper choice forother measures, e.g. 1-4-8 measure. Test Results

After two improvement procedures, i.e. signature reﬁnement andparameter selection, the ultimate test over the testing sets, d00300 , is shown inthe table below. The multiple τ -measure 1-4-8 now produces a good recall rate.Finally, we ﬁxed our noise threshold ξ = 9 and used the 1-4-8 measure. Wesummarize our results over three datasets as follows. Overall, the TLS signaturehas a precision of nearly 94.6% and its accuracy is around 94%, only includingnegative cases from d00300 so as to have a equivalent size of positive cases.On the other hand, negative cases from non-TLS, i.e. d00015 , turn out to beig. 6: Noise Threshold Selection over TLS traﬃc using a 4-bit measure ξ =9 TP FN Recall4-bit measure ( ρ =99.20%) 1056 104 91.03%1-4-8 measure ( ρ =99.85%) 1079 81 93.02% relatively trivial even though some of instances do contain high-entropy traﬃc,for example, SSH on port 22. The Nugache botnet, was one of the ﬁrst peer-to-peer botnets to use strongcryptography to protect its C&C channel, as the inter-peer communication wasencrypted using individually negotiated session keys derived using a hybridRSA/Rijndael scheme [6, 23, 26]. Speciﬁcally, Nugache uses a two-way RSA-likekey exchange protocol for every session with a minimum length of 512 bits forthe modulus. That is, one peer sends the length of the key to announce a peerkey exchange, followed by an actual key [6]; the other peer in turn replies with amessage of the same length encrypted with that public key. Compared to TLS,signature extraction for Nugache is much easier because of the simplicity of itskey exchange. Since there is little control information in key exchange messages,if consider the payload only, the signature can be simply deﬁned as 1*, meaninghigh-entropy blocks everywhere, which is also a strong detectable characteristicdistinct from other cryptographic protocols. Following the same consideration,we choose ξ = 9, which yields a fair recall rate.The initial ﬁngerprint we generated for Nugache includes two high-entropyareas, corresponding to the two-way key exchange. First, we test all τ -bit mea-sures with a ﬁxed noise threshold value ξ =9. It shows the 2-bit measure producesgood results (92.21% with ρ = 99 . τ -bit measure given the same level of conﬁdence.We conservatively choose the 1-4-8 measure as our metric in a general.In Table 8, we summarize our testing results of the Nugache signature overthree datasets as follows. It is encouraging that the Nugache signature generates ataset Total Positive Negative d00200 : TLS w/ selected Cipher 1,160 1,079 81 d00300 : TLS w/ other Cipher 1,204 61 1,143 d00015 : non-TLS 3,412 0 3,412 Table 6: TLS signature over diﬀerent datasets (cid:64)(cid:64)(cid:64) τ ρ

Table 7: Recall on Nugache (N=32)no false positive and so has a precision of 100%. For obfuscation techniques, therestill a portion of the traﬃc, although small, will appear to have low entropy.

One may argue that high entropy does not necessarily imply encryption, com-pressed data, or multimedia data. The critical point is the distribution of high-entropy data blocks not solely the presence of high-entropy data. A study [32]provides evidence against such “common sense,” where it was shown that mul-timedia ﬁles could yield low entropy instead, although the authors also pointedout that in some cases compressed ﬁles do have high entropy. Such cases requirea much closer look, which we left for future work. Furthermore, encodings, e.g.base64 [16], can signiﬁcantly reduce the entropy of a string. For this case, weassume that a base64 detector as well as a decoder could be deployed to canon-icalize the traﬃc data. It is also possible that one could easily inject arbitrarybytes to disturb the original distribution of high entropy and low entropy. In thiscase, we consider it to be a new protocol for which the traﬃc could be possiblyﬁngerprinted, e.g. using optional units as we did for TLS. If the signature genera-tion process is automated, then this approach would still be eﬃcient. However, ifmore advanced obfuscation techniques [5,10] are applied, then our approach willfail at identifying the obfuscated protocol. Nevertheless, our proposed techniquesmay be still used to detect the obfuscation techniques themselves.To avoid being ﬁngerprinted, malware could adopt plain TLS instead of cus-tomizing the protocol, running the risk of SSL inspection. It may explain whythere only 10% of malware samples indeed utilize TLS. Nevertheless, the work [1]also found that malware or botnets utilize TLS in a very customized way, i.e.advertising signiﬁcantly much fewer cipher suites than enterprise TLS clients. Ashorter list of cipher suites will reduce the control information (i.e. low-entropyblocks) and therefore may end with diﬀerent ﬁngerprints than enterprise-grade ataset Desc Total Positive Negative-

Nugache

175 162 13 d00200, d00300

TLS 2,364 0 2,364 d00015 non-TLS 3,412 0 3,412

Table 8: Nugache ﬁngerprint over diﬀerent datasetsTLS clients. Investigating how eﬀective our approach would be in such a sce-nario is left for future work. Under certain circumstances, it is possible that ourapproach may not be suﬃcient to rule out all possible false positives and wewould recommend to coordinate with other tools for reducing false positives.Last but not least, we are interested in looking at more diverse data, such ascompressed data, SSH, and other malware traﬃc.

In this paper, we proposed a novel voting-based method for accurately detectinghigh-entropy blocks, e.g. key material, in a network traﬃc stream, and a methodbased on regular expressions for generating a scalable ﬁngerprint based on iden-tiﬁed high-entropy blocks. Our approach can eﬀectively put malware authors onthe defense, as a longer key used for a more securely encrypted connection wouldmake it more easily characterized and therefore more detectable. However, if ashorter key is used for making the connection less vulnerable to detection, thenthey would only achieve a less secure connection.

References

1. B. Anderson, S. Paul, and D. McGrew. Deciphering malware’s use of TLS (withoutdecryption). arXiv preprint arXiv:1607.01639 , 2016.2. A. Antos and I. Kontoyiannis. Convergence properties of functional estimates fordiscrete distributions.

Random Struct. Algorithms https://blog.torproject.org/blog/ obfsproxy-next-step-censorship-arms-race , 2012.6. D. Dittrich and S. Dietrich. P2P as botnet command and control: a deeper insight.In

Proceedings of the 3rd International Conference on Malicious and UnwantedSoftware (Malware) , 2008.7. P. Dorﬁnger, G. Panholzer, and W. John. Entropy estimation for real-time en-crypted traﬃc identiﬁcation. In

Proceedings of the Third International Conferenceon Traﬃc Monitoring and Analysis , TMA’11, pages 164–171, Berlin, Heidelberg,2011. Springer-Verlag.8. Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-wide scan-ning and its security applications. In

Proceedings of the 22nd USENIX SecuritySymposium , August 2013.. K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton. Protocol misidenti-ﬁcation made easy with format-transforming encryption. In

Proceedings of the20th ACM SIGSAC conference on Computer and Communications Security , pages61–72. ACM, 2013.10. K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton. Protocol misidenti-ﬁcation made easy with format-transforming encryption. In

Proceedings of the20th ACM conference on Computer and Communications Security (CCS 2013) ,November 2013.11. B. Efron and C. Stein. The jackknife estimate of variance.

The Annals of Statistics9 (1981) , 9:586–596, April 2007.12. A. Freier, P. Karlton, and P. Kocher. RFC 4251: The Secure Shell (SSH) ProtocolArchitecture. http://http://tools.ietf.org/html/rfc4251, Jan 2006.13. A. Freier, P. Karlton, and P. Kocher. RFC 6101 (historic): The Secure SocketsLayer (SSL) Protocol Version 3.0. http://tools.ietf.org/html/rfc6101, August 2011.14. G. Gu, P. Porras, V. Yegneswaran, M. Fong, and W. Lee. BotHunter: DetectingMalware Infection Through IDS-driven Dialog Correlation. In

Proceedings of the16th USENIX Security Symposium

Proceedings of the 2005 Conference on Applications, Tech-nologies, Architectures, and Protocols for Computer Communications , SIGCOMM’05, pages 229–240, New York, NY, USA, 2005. ACM.18. G. A. Miller. Note on the bias of information estimates.

Information Theory inPsychology: Problems and Methods , pages pp. 95–100, 1955.19. N. Moustafa and J. Slay. UNSW-NB15: a comprehensive data set for networkintrusion detection systems (UNSW-NB15 network data set). In

Military Commu-nications and Information Systems Conference (MilCIS), 2015 , pages 1–6. IEEE,2015.20. J. Olivain and J. Goubault-Larrecq. Detecting subverted cryptographic protocolsby entropy checking. Research Report LSV-06-13, Laboratoire Sp´eciﬁcation etV´eriﬁcation, ENS Cachan, France, June 2006. 19 pages.21. L. Paninski. Estimation of entropy and mutual information.

Neural Comput. ,15(6):1191–1253, June 2003.22. E. Rescorla. RFC2631: Diﬃe-Hellman Key Agreement Method.https://tools.ietf.org/html/rfc2631, 1999.23. C. Rossow, D. Andriesse, T. Werner, B. Stone-Gross, D. Plohmann, C. Dietrich,and H. Bos. SoK: P2PWNED - modeling and evaluating the resilience of peer-to-peer botnets. In , pages 97–111,May 2013.24. T. Sch¨urmann. Bias analysis in entropy estimation.

J. Phys. A Math. Gen , pages295–301, 2004.25. C. E. Shannon. A mathematical theory of communication.

The Bell System Tech-nical Journal , 27:379–423, 623–656, July, October 1948.26. S. Stover, D. Dittrich, J. Hernandez, and S. Dietrich. Analysis of the Storm andNugache Trojans: P2P is here. In

USENIX ;login: vol. 32, no. 6 , December 2007.7. L. Wang, K. P. Dyer, A. Akella, T. Ristenpart, and T. Shrimpton. Seeing throughnetwork-protocol obfuscation. In

Proceedings of the 22nd ACM SIGSAC Confer-ence on Computer and Communications Security , pages 57–69. ACM, 2015.28. A. M. White, S. Krishnan, M. Bailey, F. Monrose, and P. A. Porras. Clear andpresent data: Opaque traﬃc and its security implications for the future. In

NDSS ,2013.29. C. V. Wright, F. Monrose, and G. M. Masson. On inferring application protocolbehaviors in encrypted network traﬃc.

J. Mach. Learn. Res. , 7:2745–2769, Dec.2006.30. T.-F. Yen and M. K. Reiter. Traﬃc aggregation for malware detection. In

Detectionof Intrusions and Malware, and Vulnerability Assessment (DIMVA) , pages 207–227. Springer, 2008.31. H. Zhang and C. Papadopoulos. Early detection of high entropy traﬃc. In

IEEEConference on Communications and Network Security (CNS) , pages 104–112, Sept2015.32. H. Zhang, C. Papadopoulos, and D. Massey. Detecting encrypted botnet traﬃc. In