[PDF] A space- and time-efficient Implementation of the Merkle Tree Traversal Algorithm

Abstract

We present an algorithm for the Merkle tree traversal problem which combines the efficient space-time trade-off from the fractal Merkle tree [3] and the space efficiency from the improved log space-time Merkle trees traversal [8]. We give an exhaustive analysis of the space and time efficiency of our algorithm in function of the parameters H (the height of the Merkle tree) and h (h = H L where L is the number of levels in the Merkle tree). We also analyze the space impact when a continuous deterministic pseudo-random number generator (PRNG) is used to generate the leaves. We further program a low storage-space and a low time-overhead version of the algorithm in Java and measure its performance with respect to the two different implementations cited above. Our implementation uses the least space when a continuous PRNG is used for the leaf calculation.

Full PDF

AA space– and time–eﬃcient Implementation of the Merkle TreeTraversal Algorithm

Markus Knecht, Willi Meier, and Carlo U. Nicola

University of Applied Sciences Northwestern Switzerland, School of Engineering,5210 Windisch , Switzerland. {markus.knecht,willi.meier,carlo.nicola}@fhnw.ch

Abstract.

We present an algorithm for the Merkle tree traversal problem which combines the eﬃcientspace-time trade-oﬀ from the fractal Merkle tree [3] and the space eﬃciency from the improved logspace-time Merkle trees traversal [8]. We give an exhaustive analysis of the space and time eﬃciencyof our algorithm in function of the parameters H (the height of the Merkle tree) and h ( h = HL where L is the number of levels in the Merkle tree). We also analyze the space impact when a continuousdeterministic pseudo–random number generator (PRNG) is used to generate the leaves. We furtherprogram a low storage–space and a low time–overhead version of the algorithm in Java and measure itsperformance with respect to the two diﬀerent implementations cited above. Our implementation usesthe least space when a continuous PRNG is used for the leaf calculation. Keywords:

Merkle tree traversal, Authentication path computation, Merkle signatures

Merkle’s binary hash trees are currently very popular, because their security is independent from anynumber theoretic conjectures [6]. Indeed their security is based solely on two well deﬁned propertiesof hash functions: (i) Pre-image resistance: that is, given a hash value v , it is diﬃcult to ﬁnd amessage m such that v = hash ( m ) . The generic pre-image attack requires n calls to the hashfunction, where n is the size of the output in bits. (ii) Collision resistance: that is, ﬁnding twomessages m (cid:54) = m such that hash ( m ) = hash ( m ) is diﬃcult. The generic complexity of such anattack is given by the birthday bound which is n/ calls to the hash function. It is interesting tonote that the best quantum algorithm to date for searching N random records in a data base (ananalogous problem to hash pre-image resistance) achieves only a speedup of O ( √ N ) to the classicalone O ( N ) [15]. More to the point in [14] the speedup of a quantum algorithm that ﬁnds collisionsin arbitrary r-to-one functions is O ( (cid:112) N/r ) to the classical one.A Merkle tree is a complete binary tree with a n -bit hash value associated with each node. Eachinternal node value is the result of a hash of the node values of its children. Merkle trees aredesigned so that a leaf value can be veriﬁed with respect to a publicly known root value given theauthentication path of the respective leaf. The authentication path for a leaf consists of one nodevalue at each level l , where l = 0 , · · · , H − , and H is the height of the Merkle tree ( H ≤ inmost practical cases). The chosen nodes are siblings of the nodes on the path connecting the leaf tothe root.The Merkle tree traversal problem answers the question of how to calculate eﬃciently the authen-tication path for all leaves one after another starting with the ﬁrst Leaf up to the last Leaf H − ,if there is only a limited amount of storage available (e.g. in Smartcards ). The authors of [7] proved that the bounds of space ( O ( tH/log ( t )) ) and time ( O ( H/log ( t )) )) for the output of theauthentication path of the current leaf are optimal (t is a freely choosable parameter). a r X i v : . [ c s . CR ] S e p The generation of the root of the Merkle tree (the public key in a Merkle signature system) requiresthe computation of all nodes in the tree. This means a grand total of H leaves’ evaluations andof H − hash computations. The root value (the actual public key) is then stored into a trusteddatabase accessible to the veriﬁer.The leaves of a Merkle tree are used either as a one–time token to access resources or as buildingblock for a digital signature scheme. Each leaf has an associated private key that is used to generateeither the token or a signature building block (see 4.2). The tokens can be as simple as a hashof the private key. In the signature case, more complex schemes are used in the literature (see forexample [2] for a review). Related work

Two solutions to the Merkle tree traversal problem exist. The ﬁrst is built on the classical treetraversal algorithm but with many small improvements [8] (called log algorithm from now on). Thesecond one is the fractal traversal algorithm [3] (called fractal algorithm from now on). The fractalalgorithm trades eﬃciently space against time by adapting the parameter h (the height of both Desired and

Exist subtrees, see Fig. 1), however the minimal space it uses for any given H (if h ischosen for space optimality) is more than what the log algorithm needs. The log algorithm cannotas eﬀectively trade space for performance. However, for small H it can still achieve a better timeand space trade-oﬀ than the fractal algorithm.A study [7] analyses theoretically the impact on space and time–bounds of some enhancements toboth the log and fractal algorithm, which are important to our implementation. Our contributions

We developed an algorithm for the Merkle tree traversal problem which combines the eﬃcientspace-time trade-oﬀ from [3] with the space eﬃciency from [8]. This was done by applying all theimprovements discussed in [8] to the fractal algorithm [3]. We have also analyzed the space impactof a continuous deterministic pseudo–random number generator (PRNG) on the algorithms. Allthese improvements lead to an algorithm with a worst case storage of [ L × h + 2 H − h ] hash values(Sec. 4.4). The worst case time bound for the leaves’ computation per authentication path, amountsto h − h × ( L −

1) + 1 (Sec. 4.4). This means a reduction in space of about a factor 2 compared withthe fractal algorithm [3] (see Fig. 5 and Fig. 4).Although on ﬁrst sight our enhancements are predated by [7] three main diﬀerences distinguishour contribution vis-à-vis [7]: (i) Our use of a diﬀerent

T reeHash and metrics; (ii) Our specialcomputation of the

Desired tree (see Section 4.2) and (iii) Our use of a continuous PRNG in leafcomputation.We further implemented the algorithm in Java with focus on a low space and time overhead [1] andwe measured its performance (Sec. 6).

The idea of the fractal algorithm [3] is to store only a limited set of subtrees within the wholeMerkle tree (see Fig. 1). They form a stacked series of L subtrees { Subtree i } L − i =0 . Each subtree A deterministic pseudo–random number generator which can not access any random number in its range withoutﬁrst computing all the preceding ones. consists of an

Exist tree { Exist i } and a Desired tree { Desired i } , except for Subtree L − , whichhas no Desired tree. The

Exist trees contain the authentication path for the current leaf. When theauthentication path for the next leaf is no longer contained in some

Exist trees, these are replacedby the

Desired tree of the same subtree. The

Desired trees are built incrementally after eachoutput of the authentication path algorithm, thus minimizing the operations needed to evaluate thesubtree.

Merkle [Mer89] addressed the problem of key man-agement, introducing the method of tree authentication .In [Mer89], originally published in 1979, the concept ofa hash tree is presented, which provides efﬁcient keymanagement for a large number of one-time signatures.The recent works of [JLMS03], [Szy04a], [BKN04]improve Merkle’s hash-tree method [Mer89] and give theability to handle large hash trees more efﬁciently. Thefocus of [JLMS03] was to obtain space-time trade off,the focus of [Szy04a] was very low space solution, and[BKN04] combined the two. As an example, Jakobssonet al.’s suggested to use their algorithm for one-timesignature. A proof of concept code for their idea wasimplemented in [Col03] with some basic performanceresults.Recently, a work of Seys and Preneel [SP05] provideda power consumption estimation of one-time signaturesschemes, for low power mobile platforms. Their workestimates the overall power consumption of the signa-ture schemes of Winternitz and Reyzin, with the keymanagement techniques of Jakobsson et al. [JLMS03]and Perrig [Per01].The work closest to ours was recently suggestedindependently by Coronado [Cor05]. The work used adifferent traversal technique by [Szy04a] and focused onforward security of the scheme rather than on achievingfast signatures; therefore, we cannot directly compareCoronado’s results with ours.

C. Contributions

Our ﬁrst contribution is a design of a signature schemewe call FMTseq - Fractal Merkle Tree sequential signa-tures. FMTseq combines Merkle’s one-time signatureswith Jakobsson et al.’s algorithm for hash tree traversal.We reﬁne [JLMS03] construction and complete the de-tails for a practical scheme to provide many one-timesignatures with the same hash tree. In contrast with[JLMS03] our work follows Merkle’s original suggestionfor hash tree construction which is more efﬁcient andconceptually natural.Next, we provide an efﬁcient implementation of ascheme that signiﬁcantly improves the preliminary im-plementation [Col03] (see also [Szy04b]). Our experi-mental results show that FMTseq is one- or two-ordersof magnitude faster than RSA, with low signature sizesand signer storage requirements.We consider applications that wish to obtain fastsignature rates while keeping the run-time space and

Exist Exist

L-1

Desired H h = H / L Exist Desired Fig. 1. Fractal merkle tree notations signature size reasonable. We show that when selectingthe parameter values for the scheme, rather than usingthe low space solution of [JLMS03], we can trade a fewadditional kilobytes of signer run-time storage to obtainfaster signature rates.Full details can be found in [NSW05].II. F

RACTAL M ERKLE T REE S EQUENTIAL O NE -T IME S IGNATURES

In this section we describe our scheme for sequentialone-time signatures using fractal Merkle tree algorithm[JLMS03]. The fractal Merkle tree algorithm is a schemefor sequential traversal of a Merkle hash tree, i.e.,providing the authentication path for each leaf when theleaves are used one after the other. The scheme requiresa computational effort of N/ log log N and a run-time space of . N/ log log N nodes. Notation:

A hash tree T of height H is divided into L levels , each of height h . The leaves of the hash treeare indexed { , , ..., H − } from left to right. The altitude of a node n is deﬁned as the height of themaximal subtree for which it is the root and ranges from (for the leaves), to H (for the root). An h -subtree is“at level i” when the altitude of its root is ih for some i ∈ { , , ..., L } . For each i there are H − ih such h -subtrees at level i . A series of h -subtrees { T ree i } Li =1 isa stacked series if for all i < L the root of T ree i is aleaf of T ree i +1 . The notations are illustrated in Figure1. In our scheme, the secrets of each one-time signaturesare generated by a pseudo-random number generator.The value of each leaf of the fractal Merkle tree is a hashover all the commitments of a single one-time signature.Therefore, each leaf serves as a public commitment Fig. 1: Fractal Merkle tree structure and notation (Figure courtesy of [3]). A hash tree of height H is divided into L levels, each of height h . The leaves of the hash tree are indexed { , , ..., H − } from left to right. The height of a node is deﬁned as the height of the maximal subtree for which itis the root and ranges from (for the leaves) to H (for the root). An h -subtree is "at level i " whenthe height of its root is ( i + 1) h for some i ∈ { , , · · · , L − } .The nodes in a Merkle tree are calculated with an algorithm called T reeHash . The algorithm takesas input a stack of nodes, a leaf calculation function and a hash function and it outputs an updatedstack, whose top node is the newly calculated node. Each node on the stack has a height i thatdeﬁnes on what level of the Merkle tree this node lies: i = 0 for the leaves and i = H for theroot. The T reeHash algorithm works in small steps. On each step the algorithm looks at its stackand if the top two elements have the same height it pops them and puts the hash value of theirconcatenation back onto the top of stack which now represents the parent node of the two poppedones. Its height is one level higher than the height of its children. If the top two nodes do not havethe same height, the algorithm calculates the next leaf and puts it onto the stack, this node has aheight of zero.We quickly summarize the three main areas where our improvements were critical for a betterspace–time performance of the original fractal algorithm:1. Left nodes have the nice property, that when they ﬁrst appear in an authentication path, theirchildren were already on an earlier authentication path (see Fig. 2). For right nodes this propertydoes not hold. We can use this fact to calculate left nodes just before they are needed for theauthentication path without the need to store them in the subtrees. So we can save half ofthe space needed for the subtrees, but compared to the fractal algorithm one additional leaf calculation has to be carried out every two rounds (one round corresponds to the calculation ofone authentiﬁcation path).2. In most practical applications, the calculation of a leaf is more expensive than the calculationof an inner node . This can be used to design a variant of the T reeHash algorithm, which hasa worst case time performance that is nearer to its average case for most practical applications.The improved

T reeHash (see Algorithm 1) given one leaf, calculates as many inner nodes aspossible per update (see Section 4.1) before needing a new leaf, instead of processing just oneleaf or one inner node as in the normal case.3. In the fractal Merkle tree one

T reeHash instance per subtree exists for calculating the nodesof the

Desired trees and each of them gets two updates per round. Therefore all of them havenodes on their stacks which need space of the order of O ( H h ) . We can distribute the updates inanother way, so that the associated stacks are mostly empty [5]. This reduces the space neededby the stacks of the T reeHash instances to O ( H − h ) .It is easy enough to adapt point one and two for the fractal algorithm, but point three needssome changes in the way the nodes in a subtree are calculated (see Sec. 4.2). Algorithm 1

Listing: Generic version of

T reeHash that accepts diﬀerent types of

Process i (SeeAppendix A for a thorough deﬁnition of Process i ). A node has a height and an index, where theindex indicates where a node is positioned in relation to all nodes with the same height in the Merkletree. INPUT : StackOfNodes , Leaf , Process i , SubtreeIndex

OUTPUT : updated StackOfNodesNode ← Leaf if Node . index (mod 2) == 1 then continue ← Process i (Node , SubtreeIndex) else continue ← end ifwhile continue (cid:54) = 0 ∧ (Node . height == StackOfNodes . top . height) do Node ← hash(StackOfNodes . pop || Node) continue ← Process i (Node , SubtreeIndex) end whileif continue (cid:54) = 0 then

StackOfNodes . push(Node) end if In this Section we will give an overview of the complete algorithm and explain how all its componentswork together. The algorithm is divided into two phases. The ﬁrst phase is the initialisation phasein which the public key is calculated (see Alg. 2). We run in this phase the improved

T reeHash (see A inner node is a node with height greater than zero.

I-2 I-1 I

Level L

Level L+1 Level L+2

Fig. 2: The colored lines mark the diﬀerent authentication paths. The index I at the start of eachline indicates how many times a node on level L of the authentication path has changed. Anauthentication path whose node has changed I times on level L has changed I × (2 L ) times on level (which changes each round). The dotted circles are left nodes or the root of a subtree which arenot stored in a subtree.Alg. 1) from [8], step by step until the root node is computed. The improved T reeHash algorithmneeds

Leaf i where i ∈ { , , · · · , H − } as an input.The value of Leaf i is dependent on the usage of the Merkle tree. It could be as simple as a token,where the leaf is a hash of the tokens private key, or a one time signature scheme like Winternitz [6]where the leaf is the public key of the one–time signature. The private keys needed to compute theleaves are provided by a PRNG, whose key corresponds to the private key of the complete Merkletree.In the initialisation phase each node is computed exactly once. This fact is used to store all rightnodes in the ﬁrst Exist tree of each subtree and all the nodes in the authentication path for the

Leaf . The second phase iteratively generates the authentication paths for all the remaining Leaf i (from left to right) where i ∈ { , , · · · , H − } (see Alg. 8 and Alg. 7). Each authentication pathcan be computed by changing the previous one [6]. The authentication path for Leaf i changes on alevel k if k | i . If the node changes to a right node, it can be found in one of the Exist trees. If itchanges to a left node, it can be computed from its two children. The right child can be found inthe

Exist trees and the left child is on the previous authentication path (see Fig. 2).When a node in the

Exist tree is no longer needed for the computation of any upcomingauthentication path, it is removed. To prevent the

Exist tree running out of nodes, all the nodes inthe

Desired tree have to be computed before the

Exist tree has no nodes left. This is done with thehelp of two

T reeHash instances per subtree. One, called the lower

T reeHash , calculates all nodeson the bottom level of a Desired tree (called bottom level nodes) from the leaves of the Merkletree. The other, called the higher

T reeHash , calculates all the remaining

Desired nodes (callednon-bottom level nodes) from the bottom level ones. All the lower T reeHash instances use the samescheduling algorithm as in [8] with L − updates per round. The higher T reeHash uses a custom The bottom level is the lowest level in a

Desired tree. Desired nodes are all the nodes stored in a

Desired tree. scheduling algorithm which executes an update every bottomLevel rounds. The higher T reeHash produces a node on a level k in the Desired tree every k rounds, which corresponds to the rateat which the authentication path changes on that level. When the last node from the Exist tree isremoved, all the nodes in the

Desired tree are computed and the

Exist tree can be replaced withthe

Desired tree. In section 4.3 we will prove that the lower

T reeHash es produce the nodes onthe bottom level before the higher

T reeHash es need them, if L − updates are done per round. Alower T reeHash , which has terminated, is initialized again as soon as the generated node is usedas input for the higher

T reeHash . Because only the right nodes are stored in the subtrees, the

T reeHash es do only have to compute right nodes and those left nodes which are used to calculatea right node contained in the

Desired tree. The only left nodes never used to compute a rightnode in a

Desired tree are the ﬁrst left node at each level in each

Desired tree. To ensure that nounneeded nodes are computed, the lower

T reeHash does not compute nodes for its ﬁrst bottomLevel updates per Desired tree and so does the higher

T reeHash for its ﬁrst update per

Desired tree.These skipped updates are nevertheless counted without the scheduling algorithm assigning themto another

T reeHash . Therefore from the point of view of the scheduling algorithm the

T reeHash behaves as if the nodes would have been computed.

TreeHash

Metrics

Below we give some deﬁnitions that will ﬁrstly permit a better understanding of our analysis andsecondly, unify all the similar concepts scattered in the literature. We deﬁne as classical

TreeHash the algorithm used in [3]. In that paper a step C is deﬁned as the calculation of either one leaf or thenode’s hash. We deﬁne as improved TreeHash the algorithm used in [8]. Therein a step I is deﬁned asthe calculation of the sum of one leaf and X hashes (where X is the number of nodes’ computationsbefore a new leaf is needed). We deﬁne update = 2 × step C in the case of the classical TreeHash and update = 1 × step I in the case of the improved TreeHash .Since in our work we assume that the hash computation time is very small compared to a leaf’scomputation (an assumption certainly valid for MSS (Merkle Signature Scheme)), we use as thebasic time unit (metrics) for this work the time we need to compute a leaf.So we can claim that in the worst case condition a classical

TreeHash update takes 2 leaves’computations whereas an improved

TreeHash update takes only one. For the calculation of all nodesin a tree of height H , the classical T reeHash needs H − − updates ( − because the last updateneeds only to do one step C ). On the contrary the improved T reeHash needs H updates to reachthe same goal. In this section we will explain how the nodes in a

Desired tree are computed and stored which hall-marks the main diﬀerence of our algorithm to the algorithm in [7]. Recall that in [7] the

T reeHash algorithm of a

Desired tree gets h × (2 bottomLevel ) updates for calculating its nodes. The h bottomlevel nodes of a Desired tree are calculated during the ﬁrst h × (2 bottomLevel − − ) updates .After calculating the bottom level nodes there are h − updates left ( from: (2 h × (2 bottomLevel )) − Remember that a classical

T reeHash needs bottomLevel − − updates to compute a bottom level node. Furthermorein all our derivation h | H and h ≤ H hold. (2 h × (2 bottomLevel − − )) = 2 h − ). These are used to calculate the non-bottom level nodes of the Desired tree [7]. There is no additional space needed to calculate the non-bottom level nodes fromthe bottom level nodes. This because after calculating a new node, the left child is dropped and thenew value can be stored instead [7]. This approach can not be used with the improved

T reeHash from [8] without increasing the amount of updates the

T reeHash of a

Desired tree gets beforethe

Desired tree has to be ﬁnished. This is due to the fact, that the improved

T reeHash needs h × (2 bottomLevel ) updates to compute all bottom level nodes of the Desired tree, which would leave0 updates for the calculation of the non-bottom level nodes.As described in Section 3 our algorithm uses a lower

T reeHash and a higher

T reeHash per subtree.All the lower

T reeHash instances use the same scheduling algorithm as in [8] with L − updatesper round. The higher T reeHash es use a custom scheduling algorithm which executes an updateevery bottomLevel rounds. The main diﬀerence vis-à-vis [7] is that we compute the nodes in the Desired tree continuously during the calculations of the

Desired tree, and not only at the end.This approach distributes the leaf computations during the computation of a

Desired tree moreequally than the one from [7].

Space analysis for Desired tree computation

We will show that our algorithm needs L × (2 h − hash values for the Exist and

Desired tree, when the authentication path is taken into account,instead of L × (2 h ) hash values needed by the algorithm in [7].The authentication path is a data structure which can store one node per level. Because the authen-tication path is contained in all the Exist trees (which store only right nodes), right nodes on theauthentication path are contained in both structures and thus have to be held only once in memory.The authentication path changes on a level k every k rounds and the higher T reeHash producesa node on a level k (cid:48) every k (cid:48) rounds. Whenever a left node enters the authentication path, its rightsibling leaves the authentication path and can be discarded (with one exception discussed below).From this we can conclude (ignoring the exception for now), that every k +1 rounds the Exist tree discards a right node on level k and the higher T reeHash produces a left node on the samelevel. This means the higher

T reeHash can store one left node on each level using the space of thediscarded nodes in the

Exist tree. The right nodes the higher

T reeHash produces can be storedusing the space of the left node from which they have been computed.We will now look at the exception mentioned above: a right node on level k which has a left nodeas parent, cannot be discarded when it leaves the authentication path, because it is needed forthe computation of its parent as explained in [8]. It will be discarded k rounds after it left theauthentication path. During these k rounds there can be a left node with height k on the higher T reeHash , for which fresh storage space must be provided. Fortunately this situation can onlyoccur if there is a right node on the authentication path (the sibling of the parent of the node whichcould not be discarded). This right node is stored in both the

Exist tree and the authenticationpath and must be held in memory only once.The special scheduling of the lower

T reeHash (see Sec. 4.3) may compute a node on the bottomlevel that is not immediately consumed by the higher

T reeHash and therefore should be storeduntil needed. We can store this node in the space reserved for the higher

T reeHash , because theleft node with the highest level on a higher

T reeHash is never stored, for the simple reason that itis not needed for the calculation of any right node in the

Desired tree (see Fig. 3).From this we conclude that the authentication path and all the subtrees together use no more than L × (2 h −

1) + H space, where h is the height of a subtree, ( h − ) is the amount of nodes a tree of height h needs, when it stores only right nodes (see Fig. 3) and H is the space needed to store thecurrent authentication path. Sharing the same Data structure in both Exist and Desired Trees

We now show that wecan store the nodes of the

Exist tree and the

Desired tree in one single tree data structure. Thisis the case, because we can store two related nodes in the same slot. We can do this because whena node in the Desired tree is stored, its related node in the

Exist tree was already discarded. Thisis trivial for left nodes, because they are never stored in the

Exist or Desired tree. In the previoussection we showed that with one exception, the

Exist tree discards a right node on a level in thesame round the higher

T reeHash computes a left node on that level. The sibling of the left nodea higher

T reeHash computes every l +1 rounds on a level l , is the node related to the right nodethe Exist tree discards during this round. The right node which is computed l rounds later on thelevel l is the node related to the discarded one and so it can be stored in the same slot of the datastructure. We now look at the special case: right nodes with a left node as parent (see Sec. 4.2).Such a right node on level k will be discarded k rounds later than the other right nodes. It willbe discarded in the same round as the higher T reeHash produces its related node. We ensure thatthe slot in the data structure is free by calculating left nodes in the authentication path before weupdate the higher

T reeHash (see Algorithm 8).In Fig. 3 we show how the diﬀerent nodes of the

Desired and

Exist trees are managed.

Space used for the key generation of the leaves

In this section we will analyse the space usedby the deterministic PRNG, which calculates the private keys used in the leaf calculations. Supposingthe PRNG algorithm can generate any random number within its range without ﬁrst calculatingall the preceding ones (indexed PRNG), then only one instance of the PRNG would be needed tocalculate the private keys for all the leaves. No PRNG’s currently recommended by NIST [9] havethis property. For both, the log and the fractal algorithms, solutions exist that use a PRNG whichcalculates the leaves’ private keys in sequential order (continuous PRNG). This requires storingmultiple internal states of the continuous PRNG during the generation of the authentication paths.The fractal algorithm stores as many continuous PRNG internal states as it has subtrees, whereas thelog algorithm stores two continuous PRNG internal states per

T reeHash [8] plus one for calculatingthe leaves that are left nodes. Our algorithm uses the same PRNG-approach as the fractal one. Whenour algorithm skips a leaf calculation (because it would not contribute to the calculation of a rightnode stored in a subtree, see Sec. 3), it still calculates the leaf’s private key and thus advancesthe state of the PRNG. Therefore, our algorithm and the fractal one, store L additional continuousPRNG states, whereas the log algorithm needs to store × ( H − K ) + 1 continuous PRNG states [8].For the space analysis we choose the state size of the PRNG equal to the output size of the hashfunction used. TreeHash

Algorithm

In this section we will explain the reason why we use the same

T reeHash scheduling as in [8]together with the improved

T reeHash from [8] and what impact this has on the performance of thealgorithm. A

T reeHash instance which calculates a node on height i and all its children, is called Two nodes of either a

Desired or an

Exist tree are said to be related if they have the same position relative totheir root. Fig. 3: The left half of each circle represents the

Exist tree and the right half the

Desired tree.The nodes with dotted lines are left nodes or the root and thus are not stored in the subtree, butthey may be stored in the authentication path or on the higher

T reeHash . The markings on thenodes have the following meanings: Label x : nodes already discarded (in case of Exist tree) or notyet computed (in case of

Desired tree). Label : nodes lying on the current authentication path.Label : nodes lying on the upcoming authentication path. Label : nodes computed next by thehigher T reeHash . Label : left node on the higher T reeHash . Label : left nodes which do nevercontribute to a right node calculation (not stored in higher T reeHash ). Label : node which couldnot yet be discarded, because it is needed for calculating a left node in the upcoming authenticationpath. T reeHash i . For each Desired tree in a subtree we need a lower

T reeHash bottomLevel instance.Each of these instances have up to bottomLevel + 1 nodes on their stack. If we compute themsimultaneously as it is done in [6], it can happen that each instance has its maximum amount ofnodes on their stack. The update scheduling algorithm from [8] uses less space by computing the

T reeHash instances in a way, that at any given round the associated stacks are mostly empty [5].The basic idea is to start a freshly initialized

T reeHash k only if there is no T reeHash with nodesof height smaller than k on their stack. This is achieved by assigning each update to the T reeHash instance with the smallest tail height (see Algorithm 3). The tail height is the smallest height forwhich there is a node on the stack of the

T reeHash . A terminated

T reeHash k is considered to havean inﬁnite tail height and an empty one is considered to have a tail height of k . Furthermore, theimproved T reeHash from [8] we use, changes the deﬁnition of a step as compared to the classicalone. A step C was originally considered in [6] as either calculating a leaf node or an inner node. Thisis ﬁne as long as a hash computation can be considered to be as expensive as a leaf calculation.More often though, a leaf computation is signiﬁcantly more expensive than the computation of aninner node. This leads to a larger diﬀerence between the average and worst case time needed for a step C . In [8], a step I consists of one leaf’s calculation and of as many inner node computations as possible before needing a new leaf, instead of processing just one leaf or one inner node as in theclassic case (see Algorithm 1). Nodes’ supply for the higher

TreeHash

We wish to prove, that when we spend L − updateson the lower T reeHash (see Alg. 1), it produces nodes before the higher

T reeHash needs them forcomputing nodes in the

Desired tree. To prove this we use the same approach as in [8]. We focuson a subtree ST k with a lower T reeHash h (bottom level of ST k is h ). We consider a time intervalstarting at the initialization of T reeHash h and ending at the time when the next node at height h is required by the higher T reeHash of ST k . We call this node N eed h . The higher T reeHash isupdated every h rounds and requires a bottom level node in each update. This means that in thetime considered we execute ( L − × h updates. A higher T reeHash of a subtree on a lower levelneeds new nodes more frequently, because their authentication nodes change more often. For anygiven

T reeHash i with i < h, h i nodes are needed during the time interval deﬁned above: i updatesare used up to complete a node on height i . Therefore T reeHash i requires h i × i = 2 h updates toproduce all needed nodes. If there are N T reeHash i with i < h , then all of them together need atmost N × h updates to compute all their nodes. They may need less, because they may alreadyhave nodes on their stack. There may be a partial contribution to any T reeHash j with j > h .But they can only receive updates as long as they have nodes at height < h (tail height < h ). A T reeHash j needs at most h updates to raise its tail height to h . There are L − N − T reeHash j with j > h (the top subtree has no T reeHash ). Together they need at most ( L − N − × h updates. All T reeHash k with k (cid:54) = h need at most ( L − N − × h + N × h = ( L − × h updates.This leaves h updates for T reeHash h , which are enough to compute N eed h . Space and time analysis for the lower TreeHashes

In [8], it was shown that when the improvedscheduling algorithm is used with n × updates per round, a T reeHash l terminates at most l +1 rounds after its initialization ( n corresponds to the actual number of T reeHash instances). This isclearly enough for the log algorithm, because the authentication path needs a new right node onlevel l every l +1 rounds. For our algorithm the higher T reeHash needs a new node every l roundswhich is twice as often. We thus need to distribute twice as many updates per lower T reeHash instances with the improved scheduling algorithm from [8]. That means L − updates per round intotal.In addition, when the improved scheduling algorithm is used to calculate nodes with a set of T reeHash i (where all i ’s are diﬀerent), these nodes can share a stack [8]. The amount of spaceneeded by this shared stack is the same as that of the T reeHash i with the highest i [8]. Since thehighest subtree (bottom level: H − h ) does not have a lower T reeHash instance, the highest levelon which any node has to be computed by a lower

T reeHash is the bottom level of the secondhighest subtree (with bottom level: H − h − h ). So, the shared stack of our algorithm stores at most H − h hash values. In this section we will give the total space and time bounds of our algorithm, and compare themwith the log and fractal ones under the condition that a continuous PRNG with an internal stateequal in size of the hash value is used. We obtain the total space of our algorithm by summing upthe contributions of its diﬀerent parts: L × (2 h −

1) + H from the subtrees and authentication path (see Sec. 4.2), H − h from the lower T reeHash es (see Sec. 4.3) and L from the PRNG internalstates (see Sec. 4.2). This sums up to L × h + 2 H − h times the hash value size.For the time analysis we look at the number of leaves’ calculations per round. The improved T reeHash makes one leaf calculation per update and we make at most ( L − lower T reeHash up-dates per round. The higher

T reeHash never calculates leaves. So in the worst case all

T reeHash estogether need ( L − leaves’ calculations per round. We need an additional leaf calculation everytwo rounds to compute the left nodes as shown in [8]. Thus we need L leaves’ calculations per roundin the worst case. In the average case however, we need less, as the ﬁrst node of the h bottom levelnodes of a Desired tree is not computed, since it is not needed to compute any right node in the

Desired tree.This reduces the average–case time by a factor h − h and leads to a total of h − h × ( L −

1) + leaves’ computations per round. The term enters the expression because the left node computationneeds a leaf every two rounds. The average case time bound holds true for only the ﬁrst H − H − h rounds. Thereafter less leaf computations would be needed on average, because some subtrees nolonger need a Desired tree. Table 1 summarizes the above results and Table 2 does the same forthe log space– and fractal–algorithm when a continuous PRNG with an internal state equal to thesize of a hash value used.

Bounds h = 1 , L = H h = 2 , L = H h = log ( H ) , L = Hlog ( H ) Worst case: space H − H − H log ( H ) + 2 H − log ( H ) Average case: time H H − H − log ( H ) + Worst case: time H H Hlog ( H ) − Table 1: Space–time trade–oﬀ of our Merkle tree traversal algorithm as a function of H (height ofthe tree) with h (height of a subtree) as parameter. Bounds

Log [8] K = 2 Fractal [3] h = log( H ) Worst case: space . H − H +2 H log ( H ) Average case: time H − Hlog ( H ) − Worst case: time H Hlog ( H ) − Table 2: Space–time trade–oﬀ of log algorithm [8] and fractal algorithm [3] optimized for storagespace. The values in the Table include the space needed by the continuous PRNG.When h = 2 our algorithm has better space and time bounds (or at least as good as in thecase of worst–case time) than the log algorithm [8]. When we choose the same space–time trade-oﬀparameter as in the fractal algorithm [3] (column h = log ( H ) in Table 1), our algorithm needs lessstorage space. There are several aspects which are by purpose unspeciﬁed by the Merkle tree traversal algorithms.These are the hash function, the deterministic pseudo–random number generator and the algorithmused for the leaf calculation. The latter is deﬁned by the usage of the tree. Although the hash functionand PRNG are independent of the trees’ usage, both have an impact on the cryptographic strengthand the performance. The hash function used for the traversal algorithm must be collision-resistantas shown in [13]. Thus the main selection criteria for the hash function are good performance andstrong security. A suitable candidate is BLAKE [4].As a PRNG we chose an algorithm based on a hash function. This choice has the advantage thatwe do not need another cryptographic primitive. In [9], NIST has recommended two continuoushash based PRNG’s named HASH_DBRG and HMAC_DBRG. Both of them have an internalstate composed of two values with the same length as the output length of the used hash function.HASH_DBRG has the advantage that one of its two internal values solely depends on the seed anddoes not change until a reseeding occurs. For Merkle trees, there is no reseeding necessary as longas less than leaves exist [9]. Hence, in our application one of its two internal values is the samefor all used HASH_DBRG instances within the same Merkle tree. We prefer HASH_DBRG overHMAC_DBRG because it uses less space and is more performant. We compared the performance of our algorithm with both, the log algorithm from [8] and the fractalalgorithm from [3]. We chose as performance parameters the number of leaf computations and thenumber of stored hash values. This choice is reasonable because the former is the most expensiveoperation if the Merkle tree is used for signing, and the latter is a good indicator of the storagespace needed. Operations like computing a non–leaf node or generating a pseudo–random valuehave nearly no impact on the performance in the range of H values of practical interest. A leafcomputation is exactly the same in each of the three algorithms and therefore only dependent onthe underlying hardware for its performance.To be able to present cogently the results, each data point represents an aggregation over rounds. Recall that one round corresponds to the calculation of one authentication path. In the caseof storage measurements one point represents the maximal amount of stored hash values at anytime during these rounds. In the case of the leaf computation one point represents the averagenumber of leaves’ computations done in one round during the rounds.We will present the results for two sets of measurements with leaves. For the ﬁrst set we choosethe parameter such that each algorithm uses its minimal space. For our and the fractal algorithmthe minimal space for H = 16 is achieved with h = 2 . In the case of the log algorithm we have set K (deﬁned in [8]) to in order to achieve minimal space usage. The second set uses h = log ( H ) asit was proposed in [3]. For the fractal and our algorithm this means h = 4 for H = 16 and K = 2 for the log algorithm. The NIST recommendation HASH_DBRG is used as PRNG for both setsof measurements. The results of these measurements are shown in Fig. 5 for a similar space–timetrade-oﬀ as the fractal tree and in Fig. 4 for minimal storage space.We see that in a setting where a good space–time trade-oﬀ is needed, our algorithm uses lessspace and slightly more leaf calculations than the fractal algorithm (at most more on average perround). If a minimal space solution is needed, our algorithm with h = 2 uses less space and less leafcalculations than both the log and the fractal algorithm. M a x i m a l s t o r e d h a s h v a l u e s Number of rounds

FractalOurLog 01 N u m b e r o f l e a v e s ' c a l c u l a t i o n s Number of rounds

FractalOurLog

Fig. 4:

Left : Set one: Maximal number of stored hash values as a function of rounds for minimalspace.

Right : Number of calculated leaves as a function of rounds for minimal space. Parameters: H = 16 , h = 2 and K = 2 . HASH_DBRG is used as pseudo–random number generator. One roundcorresponds to the calculation of one authentication path. M a x i m a l s t o r e d h a s h v a l u e s Number of rounds

FractalOurLog 012345678 N u m b e r o f l e a v e s ' c a l c u l a t i o n s Number of rounds

FractalOurLog

Fig. 5:

Left : Set two: Number of stored hash values as a function of rounds for similar space–timetrade-oﬀ.

Right : Maximal number of calculated leaves as a function of rounds for similar space–timetrade-oﬀ. Parameters: H = 16 , h = 4 and K = 2 . HASH_DBRG is used as pseudo–random numbergenerator. One round corresponds to the calculation of one authentication path.In addition, the plots show a weak point of our algorithm compared with the log algorithm: thenumber of leaves’ calculations is not constant. The fractal algorithm for similar parameter showseven greater ﬂuctuations, but they are not visible in Fig. 5, because they cancel each other out overthe rounds. If we measure the ﬁrst rounds with no aggregation we see that the deviations ofour algorithm decrease markedly (see Fig. 6) compared to the fractal one.The full package with source code and measurements results is available at [1]. We developed an algorithm for the Merkle tree traversal problem which combines the eﬃcientspace-time trade-oﬀ from the fractal algorithm and the space eﬃciency from the log algorithm. Anexhaustive analysis of the space and time eﬃciency of our algorithm in function of the parameters H and h has shown that if a continuous PRNG is used, our algorithm has a space advantage overthe log and fractal algorithms and a time advantage over the log algorithm.

82 85 88 91 94 97 100 103 106 109 112 115 118 121 124 127 N u m b e r o f l e a v e s ' c a l c u l a t i o n s Number of rounds

FractalOur

Fig. 6: Number of calculated leaves as function of rounds for similar space–time trade-oﬀ (ﬁrst 128rounds in detail). Parameters: H = 16 and h = 4 . HASH_DBRG is used as pseudo–random numbergenerator. One round corresponds to the calculation of one authentication path.We further programmed a low storage–space and a low time–overhead version of the algorithmin Java and measured its performance with respect to the two diﬀerent implementations. Ourimplementation needs about a factor 2 less space than the fractal algorithm, when minimum spaceis required.Ours as well as the log and fractal algorithms suﬀer from a long initialisation time for large valuesof H . This problem was solved by the CMSS [10] and GMSS [11] algorithms. These two algorithmsuse a stacked series of Merkle trees where the higher trees sign the roots of their child trees andthe lowest tree is used for the real cryptographic purpose. Both of them thus rely on a solution ofthe Merkle traversal problem for each layer, for which our algorithm could be used instead of thecurrent ones. It is possible to use diﬀerent parameters for diﬀerent layers in the CMSS or GMSS.In addition, the higher trees used in these schemes favor Winternitz as leaf calculation functionwhich is signiﬁcantly more expensive than an inner node computation, and thus can proﬁt from theimproved T reeHash used in our algorithm. The XMSS [12] is an extension to the Merkle signaturescheme (MSS) which allows to use a hash function which is only second-pre–image resistant insteadof collision resistant. It is based on the log algorithm and the usage of a forward secure continuousPRNG. Under these circumstances, our algorithm would be a good replacement for the log algorithm:it would use less space and provide greater ﬂexibility.

Acknowledgements

This work was partially funded by the Hasler Foundation Grant no. 12002– "An optimized CPUarchitecture for cryptological functions". We thank an anonymous reviewer for insightful remarks.

A Appendix

A.1 Algorithms

The algorithm descriptions use an oracle for the leaves’ computations. The oracle gets the leaf’sindex as input. We should modify the algorithms (as explained in Sec. 4.2) in the case that theleaf’s computation is based on a continuous PRNG and it needs a private key as input.Our algorithm uses the following data structures: Auth h , h = 0 , ..., H − . An array of nodes that stores the current authentication path.2. Subtree h , h = 0 , ..., L − . An array of subtree structures with the following properties:(a) bottomLevel: the minimal height for which the subtree stores nodes.(b) rootLevel: the height of the root of the subtree.(c) tree: the data structure for the Exist and

Desired tree with the following functions:i. get( j, k ): get k th node (from left to right) with height j in the subtreeii. add(node): store node in the subtreeiii. remove( j, k ): remove k th node (from left to right) with height j in the subtree(d) stackHigh: the stack for the higher T reeHash .(e) nextIndex: the index of the next leaf needed by the lower

T reeHash .(f) bottomLevelNode: the node of lower

T reeHash which is stored outside the shared stack [8].(g) stackLow: the stack for the lower

T reeHash (the part of the shared stack currently containingnodes for this

Subtree [8]).3.

Leaf Calc ( i ) , i = 0 , ..., H − . Oracle for calculating the leaf i .Our algorithm has the following phases:1. Init: T reeHash computes the root node. During this process it stores right nodes of the left–most

Exist trees and the nodes of the ﬁrst authentication path(Algorithm 2)2. Generation of the authentication paths: repeat H times:(a) Output current authentication path Auth h , h = 0 , ..., H − (b) Update lower T reeHash es (Algorithm 7)(c) Compute next authentication path (Algorithm 8) Algorithm 2

Key generation (PK) and Merkle tree setup.

INPUT : OUTPUT : PK {Initialize L-1 subtrees} for all

Subtree i with i ∈ { , · · · , L − } do Subtree i . tree ← empty if i < (L-1) then Subtree i . stackHigh ← empty Subtree i . stackLow ← empty end if Subtree i . bottomLevel ← i × h Subtree i . rootLevel ← Subtree i . bottomLevel + h Subtree i . nextIndex ← Subtree i .rootLevel − end for {Initialize Stack, set Leaf level k = 0 } k ← Stack ← emptyStack.push ( LeafCalc ( k )) k ← k + 1 while Stack . peek . height < H do TreeHash(Stack , LeafCalc(k) , Process , null) k ← k + 1 end while P K ← Stack . pop return P K

Algorithm 3

TailHeight : Calculation of the height of the lowest node on a stackLow

INPUT : subtree index i OUTPUT : height if Subtree i has a stackHigh ∧ ¬ (Subtree i . bottomLevelNode) thenif Subtree i . stackLow == empty then height ← Subtree i . bottomLevel else height ← Subtree i . stackLow . tosNode . height end ifelse height ← ∞ end ifreturn height Algorithm 4

Process INPUT : Node , index j OUTPUT : continue if Node . index ≤ SubTreeForLevel(Node . height) . rootLevel − Node . height ∧ Node . index (mod 2) == 1 then SubTreeForLevel(Node . height) . tree . add(Node) end ifif Node . index == 1 then Auth

Node . height ← Node end ifreturn continue ← Algorithm 5

Process : INPUT : Node; subtree index j OUTPUT : continue if Node . height == Subtree j .bottomLevel then Subtree j . bottomLevelNode ← Node continue ← else continue ← end ifreturn continue Algorithm 6

Process : INPUT : Node; subtree index i OUTPUT : continue if Node (cid:54) = dummy then continue ← if Node . index (mod 2) == 1 then Subtree i . tree . add(Node) if Node . index / Subtree i . rootLevel − Node . height − ) == 0 then continue ← end ifend ifif Node.height == Subtree i . rootLevel − then {Current Desired tree becomes new

Exist tree} if Subtree i . nextIndex + 1 > = 2 H then {It was the last Desired } Subtree i . stackHigh ← remove end ifend ifelse continue ← end ifreturn continue Algorithm 7

Distribution of updates to the active lower

T reeHash instances:

INPUT : leaf index i ∈ { , · · · , H − } updates ← number of desiredTree in SubTrees repeat {Find TreeHash instance with lowest tail height, on a tie use the one with lowest index} s ← min { l : ∀ TailHeight(l) == min j =0 , ··· ,L − { TailHeight(j) }} Subtree s . nextIndex ← Subtree s . nextIndex + 1 if Subtree s . nextIndex (mod 2 Subtree s . rootLevel ) ≥ Subtree s . bottomLevel then T reeHash (Subtree s . stackLow , LeafCalc (Subtree s . nextIndex) , P rocess , s ) elseif Subtree s . nextIndex + 1 (mod 2 Subtree s . rootLevel ) == 2 Subtree s . bottomLevel then Subtree s . bottomLevelNode ← dummy end ifend if updates ← updates − until updates == 0 Algorithm 8

Generation of the next authentication path. (

SubTreeForLevel(l) is the Subtreecontaining level l.)

INPUT : leaf index i ∈ { , · · · , H − } { k is 0 if leaf i is a righ node and k (cid:54) = 0 means the height of the ﬁrst parent of leaf i that is a right node} k ← max m =0 , ··· ,H { m : i mod 2 m == 0 } if k == 0 then Auth ← LeafCalc(i − else leftNode ← Auth k − rightNode ← SubTreeForLevel(k − . tree . get(leftNode . index ⊕ , k − k ← hash(leftNode || rightNode)SubTreeForLevel(k − . tree . remove(j , k − end if {Remove sibling of Auth k } if Auth k . index / then SubTreeForLevel(k) . remove(Auth k . index ⊕ , k) end if {Run through stackHigh in all Subtrees whose Auth bottomLevel changed} for all r ∈ { · · · L − } where Subtree r . bottomLevel ≤ k doif SubTree[r] has a desiredTree then

T reeHash (Subtree r . stackHigh , Subtree r . bottomLevelNode , P rocess , r )Subtree r . bottomLevelNode ← remove end ifend forfor all t ∈ { · · · k − } do Auth t ← SubTreeForLevel(t) . tree . get(n , (i / t ) ⊕ end forreturn Auth j ∀ j ∈ { , · · · , H − } References

1. The full package with algorithms in Java and results are available at:

2, 132. J. Buchmann, E. Dahmen, Michael Szydlo. Hash-based Digital Signatures Schemes

Post-Quantum Cryptography2009, Springer Verlag , 35–93. 23. Markus Jakobson, Frank T. Leighton, Silvio Micali, Michael Szydlo (2003). Fractal Merkle Tree representationand Traversal.

Topics in Cryptology - CT-RSA 2003 , 314 – 326. Springer 1, 2, 3, 6, 11, 124. Jean-Philippe Aumasson, Luca Henzen, Willi Meier, Raphael C.-W. Phan (2010). SHA-3 proposal

BLAKE . INDOCRYPT 2006, LNCS 4329 , 349 – 363. 1411. J. Buchmann, E. Dahmen, E. Klintsevich, K. Okeya, C. Vuillaume . Merkle Signatures with Virtually UnlimitedSignature Capacity. ACNS 2007, pp. 31–45 1412. J. Buchmann, E. Dahmen, A. Hülsing. XMSS – A Practical Forward Secure Signature Scheme based on MinimalSecurity Assumptions. PQCrypto 2011, pp. 117–129. 1413. Luis Carlos Coronado Garcia. On the security and the eﬃciency of the Merkle signature scheme. Tatra Mt. Math.Publ., 37, 2005, pp.1–21 1214. G. Brassard, P. Høyer, A. Tapp, Quantum Cryptanalysis of Hash and Claw-free Functions. SIGACT News 1997,28, pp. 14–19 115. Lov K. Grover (1996) A fast quantum mechanical algorithm for database search.