Fingerprinting Cryptographic Protocols with Key Exchange using an Entropy Measure
FFingerprinting Cryptographic Protocols withKey Exchange using an Entropy Measure
Shoufu Luo and Sven Dietrich , The Graduate Center, City University of New York, [email protected] John Jay College of Criminal Justice, City University of New York, [email protected]
Abstract.
Encryption has increasingly been used in all applications forvarious purposes, but it also brings big challenges to network security.In this paper, we take first steps towards addressing some of these chal-lenges by introducing a novel system to identify key exchange protocols,which are usually required if encryption keys are not pre-shared. We ob-served that key exchange protocols yield certain patterns of high-entropydata blocks, e.g. as found in key material. We propose a multi-resolutionapproach of accurately detecting high-entropy data blocks and a methodof generating scalable fingerprints for cryptographic protocols. We pro-vide experimental evidence that our approach has great potential foridentifying cryptographic protocols by their unique key exchanges, andfurthermore for detecting malware traffic that includes customized keyexchange protocols.
In the network security field, the use of encryption for malicious purposes bringsnew challenges to network security defense. For example, encryption has pre-vented botnet traffic from being inspected and detected by defense systems basedon deep-packet inspection (DPI), which used to be very effective up to thatpoint. For symmetric encryption and decryption, a secret key k shared amongtwo communicating parties is required, either pre-shared or negotiated on the flyusing cryptographic key-exchange protocols. Most common cryptographic pro-tocols [4, 12, 13] using symmetric encryption to secure the channel use a keyexchange protocol, such as the Diffie-Hellman key exchange [22].Depending on the protocol design, key material is distributed differentlyalong the traffic stream. As key material has high entropy compared to nor-mal traffic, the traffic for the key exchange exhibits detectable characteristics,namely the uniqueness of the distribution of key material allowing for properdiscriminating characteristics, as shown in Figure 1. Using an entropy metric,it may not be hard to test the hypothesis whether a byte string is “random,”if that byte string is sufficiently long. The problem becomes harder if the givenstring is relatively short, i.e. undersampled, or if the goal is to identify whichpart of the string contains random bytes, in particular, deciding the boundaries a r X i v : . [ c s . CR ] D ec ig. 1: Visualization of Entropy Distribution: dark portions are high-entropyblocks.of those random bytes (also known as blocks of interest). It is therefore challeng-ing to characterize a stream by the distribution of embedded random bytes, orso-called high-entropy blocks.To avoid being treated as an anomaly, malware might try to use standardcryptographic protocols (e.g. SSL/TLS) for secure communication, effectivelypreventing DPI. However, standard protocols such as SSL can potentially besubject to a man-in-the-middle attack. However, malware in general tends toavoid using standard protocols and instead employs a customized variant. Only10% of malware utilize TLS as a form of encryption, according to a recent study[1]. To ensure fresh key material, a new key exchange is desirable for every newcommand-and-control (C&C) session of the malware [6, 26].Our work offers a systematic way to characterize network traffic throughkey exchange behaviors and generate scalable fingerprints based on detectedhigh-entropy blocks. The system mainly consists of two parts: the high-entropyblock detection and the fingerprint generation. First, we aim to identify high-entropy blocks from a traffic stream using sample entropy via a sliding window.Second, with all high-entropy blocks identified, entropy-based fingerprints fornetwork flows will be generated by the distribution of high-entropy blocks. Ourcontribution also includes: – A new method of identifying cryptographic protocols, raising the bar formalicious activities that abuse customizing cryptographic protocols to evadeinspection. – A voting mechanism that efficiently boosts the accuracy of entropy estima-tion when undersampled using a multi-resolution analysis. – A statistical approach to estimate the range of high-entropy data blocks andbuild scalable entropy-based fingerprints for key exchange protocols in theform of regular expressions.To the best of our knowledge, our work is the first attempt to fingerprint keyexchange protocols by the distribution of key material and apply such a techniqueto malware detection. By design our approach can be implemented and deployedas a standalone system. However, it is not the intention to replace any existingdetection techniques, but rather to complement them. This system can be builtwith existing systems as a plug-in component, in particular those relying on acertain degree of payload analysis, e.g. [30]. Moreover, a component of our systemcan be a useful tool for the security community, e.g. for identifying high-entropyportions of a given data block, such as detection of packed malware binaries. elated Work
Olivain et al. [20] proposed to use cumulative entropy of net-work flows for detection of specific attacking behaviors targeted at known cryp-tographic protocols, i.e. SSL. Instead of an aggregation, our work aims to fin-gerprint the entropy distribution along the examined traffic. Our approach isstill applicable for their purpose in a more precise way. Meanwhile, we adoptthe technique they propose,
N-truncated entropy , for entropy estimation, whichis also used by Dorfinger et al. [7] for classifying encrypted and unencryptedtraffic. There is prior work [28] that shows how entropy tests can be used to de-tect encrypted or compressed packets from network streams. Again, we providea more reliable mechanism to detect high-entropy areas as one of our essentialcontributions.Our work shares an interest from the field of protocol identification. Most ofthe work in that field is mainly learning-based, relying on network-observablefeatures [17, 29]. For example, Wright et al. [29] proposed to identify the crypto-graphic protocol of individual encrypted TCP connections using post-encryptionobservable features, such as timing, size, direction etc. To some extent, our ap-proach can also be also applied for this purpose. However, there are knownobfuscation techniques which could be used to evade this, such as obfsproxy [5]and FTE [9]. As discussed in [27], obfuscation can be detected with entropy-based tests over the packet payloads. Our approach does the same by extractingentropy-based fingerprints.Zhang et al. [32] proposed to detect encrypted traffic by looking for N sequen-tial high-entropy packets of all first M packets of one network flow adopting thecumulative entropy technique. In 2015, Zhang et al. [31] improved their previouswork by detecting of high-entropy flows as an additional measure to score a hostbeing a bot for BotHunter [14]. Applicable to the same problem, our approachis different from theirs by fingerprinting malware with customized cryptographicprotocols, such as Nugache, as will be shown. Unlike their work, our work doesnot rely on another system for detection.The rest of this paper is organized as follows. We begin with background onentropy and its estimators. In section 3, we discuss our methodology in detail,including how to identify high-entropy blocks, a voting mechanism as well asa filtering method for false positives reduction, etc. Following that, section 4presents evaluation and analysis of our approach with three different dataset.Finally, we conclude this study by discussing limitations and directions of futurework. Introduced by Shannon [25], entropy is used as a measurement of the amountof information that is missing before reception. In the context of cryptography,it is used as a measure of randomness (or uncertainty), equating higher entropywith higher randomness. Let X be a discrete random variable under an arbi-trary distribution P on a countable alphabet Σ = { x , ..., x m } . The definitionf Shannon entropy can be generally expressed by the equation (1), H ( X ) = − m (cid:88) i =1 p ( x i ) log p ( x i ) (1)The entropy H ( X ) yields a maximum value when all p ( x i ) are equal to m ,i.e. uniformly distributed. In cryptography, as a fundamental requirement ofsecurity, key material should have high entropy in order to be hard to predict. Entropy can be easily obtained by the equation (1) if given a random variablewhose probability distribution is known. However, in practice, P may remainunknown for most scenarios. Frequently, p ( x i ) could be still estimated by therelative frequencies of the outcome x i from a large number of trials. The proba-bility of x i is thereby ˆ p ( x i ) = n i N , where n i is the number of times x i occurs and N is the total number of trials or samples. Hereby, the sample entropy , a.k.a.maximum likelihood estimator (MLE) [2], can be estimated as in the equationbelow. ˆ H MLEN ( X ) ≡ − m (cid:88) i =1 ˆ p ( x i ) log ˆ p ( x i ) (2)Even though MLE is an unbiased estimator of H ( X ) when N tends to in-finity where ˆ p ( x i ) approximates p ( x i ) and ˆ H MLEN ( X ) approximates real H ( X ).When N is not sufficiently large, namely undersampled , ˆ H MLEN ( X ) highly bias,in particular, N < m or N ∼ m . There is no universal rate at which the errorof MLE compared to H ( X ) would be close to zero [2]. There are attempts thataim to subtract the bias directly, such as the Miller-Madow corrector [18], theJackknife corrector [11] and the Paninski corrector [21]. However, the bias is stillsignificantly high when N < m or N ∼ m . Moreover, it has been proven difficultto find an unbiased estimator [21, 24]. Unfortunately, the Paninski corrector isunbiased but if and only if P has a uniform distribution, which can not be guar-anteed. Furthermore, according to this study [20], ˆ H MLEN ( X ) ∼ H ( X ) is valid ifand only if N (cid:29) m , which typically means N is of the order of roughly at least times as large as m . In another word, if Σ = { , ..., } (i.e. m= | Σ | =256),it would require around 2,000 samples to possibly obtain a reasonable estimatedentropy. That makes it impractical for the purpose of profiling network traffic askey material usually is at most hundreds of bytes (256 bytes = 2048 bits). Forexample, in a typical TLS handshake, a client random number only contains 28bytes. H N ( X ) Similar to Olivain et al. [20], an accurate entropy value is not of our main focus,but rather the probability of a string being generated from a uniform distri-bution. The
N-truncated entropy H N ( X ) proposed by Olivain et al. meets oureeds, which is the average of the sample entropy ˆ H MLEN ( X ) over all strings oflength of N drawn at random from the distribution P , as defined below. H N ( X ) = (cid:88) Σ i n i = N (cid:34)(cid:18) Nn , ..., n m − (cid:19) m − (cid:89) i =0 p n i i (cid:32) − m − (cid:88) i =0 n i N log n i N (cid:33)(cid:35) (3) By construction, ˆ H MLEN ( X ) is an unbiased estimator of H N ( X ) for an arbi-trary distribution P . More importantly, ˆ H MLEN ( X ) gives a statistical indicationthat how close the distribution P is to being uniform by comparing to ˆ H MLEN ( W )given that W be a random variable under a uniform distribution U . In section3.2, we describe how to obtain both values. Alternatively, if a string s of length N with each sample drawn from P , we use ˆ H MLEN ( s ) instead of ˆ H MLEN ( X ). Todifferentiate this, w is used instead if uniform distribution, U . H N ( X ) has anupper bound of log min { m, N } as it reaches its maximum value if all ˆ p x i areequal, either ˆ p x i = N if N < m or ˆ p x i = m otherwise. In either case, uncertaintyreaches its maximum. In this section, we discuss in detail the techniques we used and developed, ac-companied by experimental evidence.
To obtain entropy information of different portions within the traffic stream,a sliding window moves over the traffic with a step of one byte while sampleentropy will be measured for each chunk of bytes in that window. Bytes in eachwindow form a block.The window size determines sample size, which directly impacts the accuracyof sample entropy. If the sample size is too small, the sample entropy mightnot be accurate enough to be meaningful. Equation (4) roughly estimates theprobability of a N-byte string appearing to be “random”, i.e. each char in thealphabet only occurs once in the string. Fix Σ to be Σ and then let m =256.Let N =16 be a 16-byte sliding window. Pr[X=e]=0.6197. That is, there is a40% probability that an arbitrary string appears random, i.e. a forty percentchance of a false positive. However, if N=32, Pr[X=e]=0.082. This confirms thediscussion in Paninski et al. [21] that one should never use less than 16 bytes forentropy estimation when Σ is used. P r [ X = e ] = 1 · m − m · ... · m − N + 1 m = N − (cid:89) i =0 m − im (4)If the sliding window grows to be too large, it is likely to mix high-entropy areaswith low-entropy areas, confusing the difference between them. As shown inFigure 2, when the window size is small, e.g. 16-byte, the curve is fuzzy and hasoo many valleys (low-entropy) and peaks (high-entropy), while as the windowsize goes larger, e.g. 1024 or 2048-byte, the curve becomes flatter and valleys orpeaks are not distinctive anymore.Fig. 2: Entropy plot of a TLS sample traffic using different sliding window sizes,from bottom to top (-byte): 16, 32, 64, 128, 256, 512, 1024 to 2048.A smaller window is more likely to mistakenly identify a non-random dataarea to be “random” (false positive), while a larger window possibly fails toidentify real high-entropy area (false negative). The choice of window size willheavily depend on the minimum length of key materials of interest. In case ofTLS, we choose a 32-byte sliding window as it is good for the minimum length ofinterests, i.e. 28-byte client random number. In summary, as the window slidesover the data with a one-byte step, each block is labeled as either high-entropyor low-entropy. A list of consecutive either high-entropy blocks or low-entropyblocks then forms a unit , more precisely a high-entropy unit or a low-entropyunit respectively. H N ( U ) To identify a high-entropy block, we follow the idea used by [20], i.e. the Monte-Carlo method, as it provides a level of confidence of a string being random. Wefirst repeatedly generate strings of length of N with each byte sampled from arandom source, e.g. /dev/urandom on MacOS X. Then, we calculate the mean µ and standard deviation σ of sample entropy using all samples. Here, µ and σ summarize the distribution of the sample entropy of random strings of length N . By a specific number t of standard deviations, we can obtain the proportionof sample strings falling within the range of µ ± t × σ . This proportion providesus with a confidence of a string being random if it falls within the given range.As exceeding the upper bound does not affect the randomness of the string, wegnore the upper bound and use the lower bound as a cutoff for a string beingrandom, denoted by θ , with a confidence by the proportion ρ : θ = µ (cid:0) ˆ H MLEN ( w ) (cid:1) − t × σ (cid:0) ˆ H MLEN ( w ) (cid:1) ρ = number of samples above θnumber of samples (5)Consequently, any strings falling below the threshold are considered not random,i.e. low-entropy blocks. Similarly, any strings falling above the threshold will beconsidered random, i.e. high-entropy. Table 1 shows thresholds ( θ ) for w usingdifferent window sizes (N) above a minimum level of confidence 99.0%. N µ σ t θ ρ
16 3.94199 0.08290 2.8 3.7098 99.2%32 4.88171 0.08134 2.7 4.6620 99.3%64 5.76562 0.07664 2.6 5.5663 99.2%128 6.55003 0.06733 2.5 6.3817 99.2%256 7.17518 0.05240 2.5 7.0441 99.2%512 7.59073 0.03364 2.4 7.5099 99.0%1024 7.80894 0.01726 2.5 7.7658 99.1%2048 7.90804 0.00814 2.5 7.8877 99.2%
Table 1: ˆ H MLEN ( w ) under Various ConfigurationsThe confidence measures the confidence of a string not being random whenfalling out of the range, rather than a confidence of a string being random whenfalling within the range. For example, let N be 64 and Σ = Σ , then µ =5.7656, σ =0.0766. With 99.4% of samples above θ = µ -3 σ =5.53569 (i.e. t = 3), we wouldhave at least s with ˆ H MLEN ( s )=5.5120 is not closeto random, i.e. not a high-entropy block. Here, t is our control variable. We canchoose a smaller t to tighten the range with a higher confidence or a larger t toloosen the range, but with a lower confidence. In our study, we choose t tightly toobtain a relatively high confidence, at least 99.0%. With the threshold, we couldthen transform sample entropy score to either one or zero. The plot turns to beto a square wave where one indicates high-entropy and zero for low-entropy asshown in figure 3. The shadow in the upper plot shows the cutoff. Σ Due to statistical limitations, some data blocks may mistakenly be labeled ashigh-entropy blocks, i.e. a false positive, which will mislead the fingerprint andtherefore must be avoided or minimized. In order to achieve this, we devised avoting mechanism using multi-resolution analysis, utilizing the choice of alphabet Σ . As will be shown, this mechanism dramatically reduces the rate of falsepositives.Thus far we based our discussion on the choice of Σ to be Σ ( m =256)with each char being a byte. In cryptography, however, the randomness of keyig. 3: Normalization: high-entropy blocksmaterial is defined at a more restrictive level, i.e. at a bit level, and thereby Σ = {
0, 1 } ( m =2). Let’s consider one experiment of tossing one coin that has twooutcomes, and another experiment of tossing eight independent coins with twooutcomes for each. According to basic probability theory, if each coin is uniformlydrawn from Σ = {
0, 1 } , the outcome of eight coins ( Σ ) will still follow a uniformdistribution. In our estimation of ˆ H MLEN ( w ), we do generate each random byteby randomly sampling eight times over {
0, 1 } for all our sample strings. Thatbeing said, given that each bit is independently sampled uniformly from {
0, 1 } ,we could choose a random variable of different number of τ bits (i.e. coins) andsuch a random variable will be guaranteed to have a uniform distribution.As an extension to our previous computation of ˆ H MLEN ( w ), we outline thethresholds and their confidence levels for different τ while fixing N to 32. We usethe term τ -bit measure, e.g. 2-bit measure. Previously, N could be interpretedas either the window size and the sample size. In the case of τ -bit measure, thesample size changes, i.e. τ N ( τ ≤ τ -bit measuredoes not change the fundamentals of N-truncated entropy as it simply uses alarger sample size and a different alphabet. τ m µ σ t θ ρ Table 2: τ -bit measure ˆ H MLE ( w )Statistical methods such as sample entropy generally ignore potential struc-tures or patterns occurring in the data. Therefore, a string with a high samplentropy score is not guaranteed to be random. For example, given a hexadeci-mal string s be “55 55 bb bb”, i.e. 0101 0101 0101 0101 1010 1010 1010 1010in binary, we have ˆ p = ˆ p = if 1-bit measure ( τ =1) used, i.e. Σ = {
0, 1 } , andthen ˆ H MLEN ( s ) = 1. Consequently, s will be labeled as high-entropy bytes inspite they are not at all. Taking another example from real world, a hexadecimalstring from a TLS session: 16 03 01 0 c
13 0 b
00 0 c f
00 0 d e
10 04 7 a
30 82,which is a block of control information from the TLS handshake traffic. Thetwo bytes
03 01 indicate the TLS version, i.e. TLS 1.0,
0c 13 for the length, for the protocol type, and another 3 bytes of length
00 0c 0f . This block maynot also appear “random“ if an 8-bit measure is used. Such cases are prone tofalse positives and mislead the process.Fig. 4: A traffic sample from a TLS 1.2 session with a 1024-bit RSA public key.However, the idea is that if a string is random, no matter which τ -bit measureis being used, its sample entropy ˆ H MLEN ( s ) should be always close to ˆ H N ( U ).Thus, we propose to use a voting mechanism instead of using a sole τ -measure.The voting rule is if any of chosen τ -bit measure rejects the randomness of thatblock, the block will be labeled as non-random. It is a simple AND operationamong the outcome of all measures. Figure 4 shows the effectiveness of combiningthree τ -measures, where the resulting signature by voting precisely outlines allhigh-entropy blocks in the TLS session. The last plot line, X-signature, is basedon the voting over the three 1-bit, 4-bit and 8-bit measures. Our voting mechanism effectively reduces false positives. However, in some sce-narios, this approach may still not be sufficient to eliminate all false positives. Control information is commonly known to have low entropy. here is still a chance that all τ -bit measures falsely identify an ordinary blockto be high-entropy because of accidentally some small actual randomness withinthe data. If there supposedly are no high-entropy data blocks, the length of adata block with randomness should be less than the minimum length of interestand the size of detected high-entropy units would appear to be relatively smallcompared to that, if there actually exists a high-entropy data block of interest.A filtering threshold denoted as ξ is possibly chosen to eliminate those smallhigh-entropy units. Our empirical study suggests ξ = 9 to be a good choicewhen a 32-byte sliding window size chosen for detecting a minimum 20-bytehigh-entropy key material blocks. That means if there are only 9 consecutivehigh-entropy blocks detected between two low-entropy units, then a false pos-itive is identified and filtered out in that case. Here, the “filter out” meanslabeling these blocks to be low-entropy instead of high-entropy. Beyond identifying high-entropy blocks, it is also essential to describe the lengthof each unit in order to fingerprint the shape of the square wave as shown in 4.Due to its statistical inheritance and the way of measuring, the length of eachunit (i.e. the number of detected consecutive high-entropy or low-entropy blocks)may vary because when the sliding window is partially over the target randombytes, it may still continue to yield high sample entropy blocks until the win-dow moves sufficiently away from the target. For example, a TLS traffic streamcontains a client random number as a chunk of 28 bytes. It is not difficult toanticipate that there will not be only exactly one high-entropy block detected inthis case. The total number of high-entropy blocks detected around that chunkof data will not be fixed as well from case to case. However, our intention isnot to determine an absolute value for each unit among all cases, but rather acertain reasonable range. Hereby, we resort to
Monte-Carlo methods to empir-ically estimate the range. For example, to estimate the length of high-entropyunit around client random bytes, we sampled 100,000 client hello messages fromTLS sessions.The result shown in figure 5 indicates most of the length for the 28-byte clientrandom string followed by the list of cipher suites fall within a range between sixhigh-entropy blocks and twenty-four blocks. If a 32-byte TLS session ID (alsorandom bytes) is present along with the client random bytes, adding up to 60bytes, we obtain a range of [38 ,
52] as shown in figure 5. A more conservativerange would be [20 , Fingerprinting is a process to profile a key exchange protocol by its distribu-tion of high-entropy blocks along traffic streams generated by such a protocol. Aentropy-based fingerprint is a series of interleaving high-entropy units and low-entropy units with the length of each unit specified as a range. The reason thathigh-entropy blocks have to interleave with low-entropy ones is that otherwiseig. 5: Distribution of length of detected high-entropy blocks (1) Left: over theTLS 28-byte client random string (2) Right: over the TLS 28-byte client randomstring and 32-byte session ID.two adjacent high-entropy or low-entropy blocks would be merged into one. Let( s, l, r ) represent one unit where s ∈ { , } , l, r ∈ Z + , where s be the sign indi-cating a high-entropy unit or low-entropy, l be the minimum length and r be themaximum length. An entropy-based fingerprint then is the concatenation of anordered list of ( s, l, r ) with s alternating among one and zero. Alternatively, itcan be concisely expressed as below, where s i ∈ { , } , l i , r i ∈ Z + . The benefit ofsuch a representation is that this form aligns with standard regular expressionand the matching process can be done very efficiently. The regular expressionform will provide a flexible way of expressing the fingerprint, for instance, op-tional units, as will be shown in the experiment section. n (cid:110) i =1 s i { l i , r i } , s i (cid:54) = s i +1 The fingerprinting is straightforward in three steps: (1) identify high-entropyand low-entropy areas (units) of the anticipated traffic from a cryptographicprotocol; (2) follow the technique described in section 3.5 and estimate the rangefor each area; (3) formalize the units in a regular expression. Taking TLS usinga cipher-suite of DHE-RSA-* as an example, the fingerprint is as below:1 { , } { , } { , } { , } { , } .... During the detection phase, we have these steps: (1) scan the traffic streamby sliding a window over it and estimating sample entropy for each windowusing different τ -bit measures; (2) normalize each block by its entropy score toeither one or zero using the pre-calculated threshold θ ; (3) perform the voting(i.e. AND) of outcomes from each measures; (4) filter out the noises using filterthreshold; (5) use regular expression to match the predefined fingerprint againstthe output (i.e. a string consisting of zeros and ones).In our demonstration, we emphasize DHE-RSA-* cipher-suite for TLS proto-col as our approach aims to profile that a particular key exchange protocol andTLS is capable of using different key exchange protocols. SSL has evolved overime into the standard TLS protocol, which supports a long list of cipher suiteswith different key exchange protocols. To demonstrate, we choose to profile oneset of key exchange protocol cipher suites, i.e. DHE-RSA-*, (see 3). By contrast,as an application of our system, most botnet C&C protocols are much simpler asmost of them are designed for the sole purpose of performing a limited numberof tasks. SSL/TLS is a well-known cryptographic protocol with fair complexity. The suc-cessful characterization of the TLS protocol provides the full ability to charac-terize other and simpler botnet C&C protocols. For evaluation, we first use TLSas our primary target and later extend it to the Nugache botnet. All streamsare bidirectional and packets of a stream are correctly ordered with all TCP/IPheaders removed. The tshark [3] was used as a primary tool to process networktraces in pcap [15].
We obtained a data set of TLS network traffic from the ZMap project [8]. Ini-tially, we extracted 16,240 TCP streams on standard port 443 from 800MBof raw traffic data and further reduced to 5,794 completed and validated TLSstreams . Then, we extracted from those 5,794 streams the 1,378 streams thatused one of the DHE-RSA- ∗ ciphersuites in Table 3. We split 1,378 instances intotwo sets: the d00200 set of 218 instances for parameter selection and signaturerefinement and the test set d00300 of 1,160 instances for the testing of the finalsignature, denoted as the d00015 set. We also extracted 1,204 TLS instanceswith other ciphersuites. We extracted 337 Nugache traffic streams from a setof raw Nugache traffic and divided instances into two groups: 162 instances oftraining set and 175 instances of testing set. Similar to TLS, we use the trainingset to tune the fingerprint and the testing set for validation.In addition, we used 3,412 non-TLS TCP streams from a data set generated byUNSW-NB15 [19]. This data set contains a variety of traffic types, but withoutany TLS traffic so we can use it as another dimension of negative cases for testingthe fingerprints. Table above shows the traffic type of the majority by serviceports, only including standard ports under 1024. The table does not show thewhole spectrum of traffic types in this dataset, but rather provides a quick look.More details on this data set are available in the original paper. We test the signature generated as previously described over the training set dhe00200 with thresholds of the confidence ρ above 99.2% for different measures. A large portion of hosts scanned by the ZMap client did not respond or rejectconnections for various reasons during TLS negotiation ipher ID Name
TLS_DHE_RSA_WITH_DES_CBC_SHA
TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA
TLS_DHE_RSA_WITH_SEED_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
Table 3: TLS Ciphersuites of Choice: DHE-RSA-*
Port
80 25 22 143 21 111 179 139 110
The results shown in table 4 do not seem promising at all, of all the best resultsfrom 1-4-8 and 1-2-4-8 only reach a recall rate, 62.84% and 64.22% with theconfidence of 99.85% respectively, but it does confirm that the strategy of usingmultiple τ -measures significantly improves the recall rate. Also, it is interestingto notice that the rate of multiple τ -measures drops significantly below 10% witha confidence of 99.99%, which is reasonable because the threshold is too relaxed(with a higher proportion of high entropy blocks) to be accurate. (cid:64)(cid:64)(cid:64) τ ρ Table 4: Recall rate of the original signature for TLSBy manually checking those failures, we found three major issues of ouroriginal signature. One is the range of the server random bytes. It was a littlebit tighter than it appeared, which is previously set to be (+, 8, 54) as we usedthe range estimated from client random bytes. It turns out to be inadequate asthe bytes after the server random bytes appear more random than those afterclient random bytes, and therefore more likely produce a longer high-entropyblock. Following the same method as we did for client random bytes, we increasehe maximum length to 64. The second major issue is that we failed to consideroptional random bytes such as key identifier fields for both issuer and subject ofthe certificate. The third one relies on the fact that two high-entropy areas mightbe adjacent to each other without a sufficient gap and get merged to a largerhigh-entropy area, e.g. the signature of certificate and the server key exchangeparameters. For the later two cases, we introduce optional blocks to the signaturemaking the signature scalable. In the regular expression, we can include optionalstrings. For instance, our TLS signature has been extended to include optionalstrings as below. This adjustment boosts the recall for most cases, as shown inTable 5. For both cases of 1-4-8 and 1-2-4-8, the recall increases by around 20%. ‘ ... { , } (0 { , }|{ , } (1 { , }|{ , } { , } { , } )0 { , } ) ... (cid:48) . (cid:64)(cid:64)(cid:64) τ ρ Table 5: Recall using refined fingerprintThe noise threshold is used to remove false positives and make the fingerprintmore reliable. As the threshold increases, the detection accuracy of high-entropyblocks will increase as we are eliminating those accidental “high-entropy” blocks.At a certain point, this elimination may hurt the effectiveness as true high-entropy blocks may be eliminated by such an excessively large threshold. Weexperimented with different filter thresholds ξ using a 4-bit measure, as shownin figure 6. Given its initial purpose, this parameter should be kept as smallas possible for effective filtering. Thus, ξ = 9 is chosen based on the empiricalresults. As suggested by our test results, it appears to be a proper choice forother measures, e.g. 1-4-8 measure. Test Results
After two improvement procedures, i.e. signature refinement andparameter selection, the ultimate test over the testing sets, d00300 , is shown inthe table below. The multiple τ -measure 1-4-8 now produces a good recall rate.Finally, we fixed our noise threshold ξ = 9 and used the 1-4-8 measure. Wesummarize our results over three datasets as follows. Overall, the TLS signaturehas a precision of nearly 94.6% and its accuracy is around 94%, only includingnegative cases from d00300 so as to have a equivalent size of positive cases.On the other hand, negative cases from non-TLS, i.e. d00015 , turn out to beig. 6: Noise Threshold Selection over TLS traffic using a 4-bit measure ξ =9 TP FN Recall4-bit measure ( ρ =99.20%) 1056 104 91.03%1-4-8 measure ( ρ =99.85%) 1079 81 93.02% relatively trivial even though some of instances do contain high-entropy traffic,for example, SSH on port 22. The Nugache botnet, was one of the first peer-to-peer botnets to use strongcryptography to protect its C&C channel, as the inter-peer communication wasencrypted using individually negotiated session keys derived using a hybridRSA/Rijndael scheme [6, 23, 26]. Specifically, Nugache uses a two-way RSA-likekey exchange protocol for every session with a minimum length of 512 bits forthe modulus. That is, one peer sends the length of the key to announce a peerkey exchange, followed by an actual key [6]; the other peer in turn replies with amessage of the same length encrypted with that public key. Compared to TLS,signature extraction for Nugache is much easier because of the simplicity of itskey exchange. Since there is little control information in key exchange messages,if consider the payload only, the signature can be simply defined as 1*, meaninghigh-entropy blocks everywhere, which is also a strong detectable characteristicdistinct from other cryptographic protocols. Following the same consideration,we choose ξ = 9, which yields a fair recall rate.The initial fingerprint we generated for Nugache includes two high-entropyareas, corresponding to the two-way key exchange. First, we test all τ -bit mea-sures with a fixed noise threshold value ξ =9. It shows the 2-bit measure producesgood results (92.21% with ρ = 99 . τ -bit measure given the same level of confidence.We conservatively choose the 1-4-8 measure as our metric in a general.In Table 8, we summarize our testing results of the Nugache signature overthree datasets as follows. It is encouraging that the Nugache signature generates ataset Total Positive Negative d00200 : TLS w/ selected Cipher 1,160 1,079 81 d00300 : TLS w/ other Cipher 1,204 61 1,143 d00015 : non-TLS 3,412 0 3,412 Table 6: TLS signature over different datasets (cid:64)(cid:64)(cid:64) τ ρ
Table 7: Recall on Nugache (N=32)no false positive and so has a precision of 100%. For obfuscation techniques, therestill a portion of the traffic, although small, will appear to have low entropy.
One may argue that high entropy does not necessarily imply encryption, com-pressed data, or multimedia data. The critical point is the distribution of high-entropy data blocks not solely the presence of high-entropy data. A study [32]provides evidence against such “common sense,” where it was shown that mul-timedia files could yield low entropy instead, although the authors also pointedout that in some cases compressed files do have high entropy. Such cases requirea much closer look, which we left for future work. Furthermore, encodings, e.g.base64 [16], can significantly reduce the entropy of a string. For this case, weassume that a base64 detector as well as a decoder could be deployed to canon-icalize the traffic data. It is also possible that one could easily inject arbitrarybytes to disturb the original distribution of high entropy and low entropy. In thiscase, we consider it to be a new protocol for which the traffic could be possiblyfingerprinted, e.g. using optional units as we did for TLS. If the signature genera-tion process is automated, then this approach would still be efficient. However, ifmore advanced obfuscation techniques [5,10] are applied, then our approach willfail at identifying the obfuscated protocol. Nevertheless, our proposed techniquesmay be still used to detect the obfuscation techniques themselves.To avoid being fingerprinted, malware could adopt plain TLS instead of cus-tomizing the protocol, running the risk of SSL inspection. It may explain whythere only 10% of malware samples indeed utilize TLS. Nevertheless, the work [1]also found that malware or botnets utilize TLS in a very customized way, i.e.advertising significantly much fewer cipher suites than enterprise TLS clients. Ashorter list of cipher suites will reduce the control information (i.e. low-entropyblocks) and therefore may end with different fingerprints than enterprise-grade ataset Desc Total Positive Negative-
Nugache
175 162 13 d00200, d00300
TLS 2,364 0 2,364 d00015 non-TLS 3,412 0 3,412
Table 8: Nugache fingerprint over different datasetsTLS clients. Investigating how effective our approach would be in such a sce-nario is left for future work. Under certain circumstances, it is possible that ourapproach may not be sufficient to rule out all possible false positives and wewould recommend to coordinate with other tools for reducing false positives.Last but not least, we are interested in looking at more diverse data, such ascompressed data, SSH, and other malware traffic.
In this paper, we proposed a novel voting-based method for accurately detectinghigh-entropy blocks, e.g. key material, in a network traffic stream, and a methodbased on regular expressions for generating a scalable fingerprint based on iden-tified high-entropy blocks. Our approach can effectively put malware authors onthe defense, as a longer key used for a more securely encrypted connection wouldmake it more easily characterized and therefore more detectable. However, if ashorter key is used for making the connection less vulnerable to detection, thenthey would only achieve a less secure connection.
References
1. B. Anderson, S. Paul, and D. McGrew. Deciphering malware’s use of TLS (withoutdecryption). arXiv preprint arXiv:1607.01639 , 2016.2. A. Antos and I. Kontoyiannis. Convergence properties of functional estimates fordiscrete distributions.
Random Struct. Algorithms https://blog.torproject.org/blog/ obfsproxy-next-step-censorship-arms-race , 2012.6. D. Dittrich and S. Dietrich. P2P as botnet command and control: a deeper insight.In
Proceedings of the 3rd International Conference on Malicious and UnwantedSoftware (Malware) , 2008.7. P. Dorfinger, G. Panholzer, and W. John. Entropy estimation for real-time en-crypted traffic identification. In
Proceedings of the Third International Conferenceon Traffic Monitoring and Analysis , TMA’11, pages 164–171, Berlin, Heidelberg,2011. Springer-Verlag.8. Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-wide scan-ning and its security applications. In
Proceedings of the 22nd USENIX SecuritySymposium , August 2013.. K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton. Protocol misidenti-fication made easy with format-transforming encryption. In
Proceedings of the20th ACM SIGSAC conference on Computer and Communications Security , pages61–72. ACM, 2013.10. K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton. Protocol misidenti-fication made easy with format-transforming encryption. In
Proceedings of the20th ACM conference on Computer and Communications Security (CCS 2013) ,November 2013.11. B. Efron and C. Stein. The jackknife estimate of variance.
The Annals of Statistics9 (1981) , 9:586–596, April 2007.12. A. Freier, P. Karlton, and P. Kocher. RFC 4251: The Secure Shell (SSH) ProtocolArchitecture. http://http://tools.ietf.org/html/rfc4251, Jan 2006.13. A. Freier, P. Karlton, and P. Kocher. RFC 6101 (historic): The Secure SocketsLayer (SSL) Protocol Version 3.0. http://tools.ietf.org/html/rfc6101, August 2011.14. G. Gu, P. Porras, V. Yegneswaran, M. Fong, and W. Lee. BotHunter: DetectingMalware Infection Through IDS-driven Dialog Correlation. In
Proceedings of the16th USENIX Security Symposium
Proceedings of the 2005 Conference on Applications, Tech-nologies, Architectures, and Protocols for Computer Communications , SIGCOMM’05, pages 229–240, New York, NY, USA, 2005. ACM.18. G. A. Miller. Note on the bias of information estimates.
Information Theory inPsychology: Problems and Methods , pages pp. 95–100, 1955.19. N. Moustafa and J. Slay. UNSW-NB15: a comprehensive data set for networkintrusion detection systems (UNSW-NB15 network data set). In
Military Commu-nications and Information Systems Conference (MilCIS), 2015 , pages 1–6. IEEE,2015.20. J. Olivain and J. Goubault-Larrecq. Detecting subverted cryptographic protocolsby entropy checking. Research Report LSV-06-13, Laboratoire Sp´ecification etV´erification, ENS Cachan, France, June 2006. 19 pages.21. L. Paninski. Estimation of entropy and mutual information.
Neural Comput. ,15(6):1191–1253, June 2003.22. E. Rescorla. RFC2631: Diffie-Hellman Key Agreement Method.https://tools.ietf.org/html/rfc2631, 1999.23. C. Rossow, D. Andriesse, T. Werner, B. Stone-Gross, D. Plohmann, C. Dietrich,and H. Bos. SoK: P2PWNED - modeling and evaluating the resilience of peer-to-peer botnets. In , pages 97–111,May 2013.24. T. Sch¨urmann. Bias analysis in entropy estimation.
J. Phys. A Math. Gen , pages295–301, 2004.25. C. E. Shannon. A mathematical theory of communication.
The Bell System Tech-nical Journal , 27:379–423, 623–656, July, October 1948.26. S. Stover, D. Dittrich, J. Hernandez, and S. Dietrich. Analysis of the Storm andNugache Trojans: P2P is here. In
USENIX ;login: vol. 32, no. 6 , December 2007.7. L. Wang, K. P. Dyer, A. Akella, T. Ristenpart, and T. Shrimpton. Seeing throughnetwork-protocol obfuscation. In
Proceedings of the 22nd ACM SIGSAC Confer-ence on Computer and Communications Security , pages 57–69. ACM, 2015.28. A. M. White, S. Krishnan, M. Bailey, F. Monrose, and P. A. Porras. Clear andpresent data: Opaque traffic and its security implications for the future. In
NDSS ,2013.29. C. V. Wright, F. Monrose, and G. M. Masson. On inferring application protocolbehaviors in encrypted network traffic.
J. Mach. Learn. Res. , 7:2745–2769, Dec.2006.30. T.-F. Yen and M. K. Reiter. Traffic aggregation for malware detection. In
Detectionof Intrusions and Malware, and Vulnerability Assessment (DIMVA) , pages 207–227. Springer, 2008.31. H. Zhang and C. Papadopoulos. Early detection of high entropy traffic. In
IEEEConference on Communications and Network Security (CNS) , pages 104–112, Sept2015.32. H. Zhang, C. Papadopoulos, and D. Massey. Detecting encrypted botnet traffic. In