One Way Function Candidate based on the Collatz Problem
OOne Way Function Candidate based on theCollatz Problem
Раде Вучковац (Rade Vuckovac)
Abstract
The one way function based on the Collatz problem is proposed. It isbased on the problem’s conditional branching structure which is not con-sidered as important even when the 3 x + 1 question is quite famous. Theanalysis shows why the problem is mathematically so inaccessible and howthe algorithm conditional branching structure can be used to constructone way functions. It also shows exponential dependence between algo-rithm’s conditional branching and running cost of algorithm branch-lessreductions. Introduction
According to Levin [1] the existence of One Way Function (OWF) is arguablythe most important question in computing theory. OWF existence would imply P (cid:54) = N P and the existence of some very important constructs in cryptography.For example: pseudorandom number generators, pseudorandom functions andvarious cryptographic protocols. Informally OWF is easy to compute giveninput x . When function description and output y is given, however, it is difficultto guess input x .In the absence of theoretically proven OWF, OWF candidates are used inpractice. The bulk of asymmetrical encryption is based on OWF candidatessuch as factoring and discrete logarithm problems. It is not proven that theyare reversible, but so far no efficient and non quantum algorithms are foundyet. For example: if a relatively large composite integer is given, it is difficultto decompose it into a product of two integers (integer factorization).The proposed OWF candidate is based on the 3 x + 1 problem. This problemis also known as the Collatz problem and it is a famous problem in mathematics.The problem is easy to state: an input is a positive integer, if the input is even,divide it with 2 otherwise multiply it with 3 and add 1. The outcome is a newinput and the same is repeated ... The conjecture, states that for every positiveinteger, the procedure will reach 1 eventually. While the problem does not sufferlack of attention, no single mathematical structure is associated with it and anarbitrarily chosen iteration behaves as a fairly flipped coin [2]. Therefore, theuse of the 3 x + 1 problem in cryptography is not surprising. Apple Inc appliedfor a patent using Collatz conjecture as a system and method for a hash function[3]. 1 a r X i v : . [ c s . CC ] J u l he Wolfram’s rule 30 cellular automata [4] is another example with branch-ing structure. It has almost identical transformation procedure as the 3 x + 1problem. Its English formulation is [5]: ”First, look at each cell and its right-hand neighbor. If both of these were white on the previous step, then take thenew color of the cell to be whatever the previous color of its left-hand neighborwas. Otherwise , take the new color to be the opposite of that” (emphasis added).The rule 30 is used as a pseudorandom number generator in Wolfram’s Math-ematica software. Both concepts rely either on the problem difficulty (Apple’sCollatz based hash) or on the empirical evidence (Wolfram’s rule 30).Generally, an algorithm is perceived as a well defined procedure, but that isnot always the case. Conditional branching can undermine procedure definitionand make algorithm behaviour quite unpredictable. The OWF proposal usesunderlying conditional branching structure of the 3 x + 1 transformation (as is inthe rule 30) to argue function reversal difficulty. Acquired complexity dependson a number of conditional branching iterations r and not on input size n .For example: Let input x be a positive integer 512 bits long. Then applymodified Collatz transformation: if x is even, divide it with 2 otherwise multiplyit with 3 add 1 and divide it with 2. Repeat the procedure 256 times and recordthe latest output ( x ) (See Figure 1).Figure 1: The x + 1 composite function; f ( x ) = x/ and g ( x ) = (3 x + 1) / . Then split 512 bit input x to two 256 bit values xl (left part) and xr (rightpart). Do the same for the latest iteration point x resulting with xl and xr . The path value p is 256 bit path record where ”even” route step is 0 and”odd” 1 The output is calculated as bellow where ⊕ is exclusive or (basicallyoutput is xor of input, stop value and parity): y = xl ⊕ xr ⊕ xl ⊕ xr ⊕ p Reversal difficulty lies in the absence of a function description. Without in-put specified, the transformation is in an ambiguous state and could happen inany of 2 ways. Therefore only particular input defines corresponding trans-formation . If an arbitrarily 256 bit integer is presented as an output y , it ishard to prove if it is a valid output (let alone to find matching input x ) thanksto function description 2 - sized ambiguous state.The whole argumentation is based on the algorithm structure and classicalnotions of polynomial and exponential costs.2 nformal Theorem 1. To have a proper function description ( x + 1 for ex-ample),the exponential nature of conditional branching must be circumvented inthe algorithm implementing that function. Avoidance results in either exhaustivesearch with accompanying exponential running cost or the conditional branch-ing is replaced with combination of sequence and iteration structure incurringpolynomial cost only. The analysis of relations between the running costs and the conditionalbranching complexity of various 3 x + 1 algorithm variants is included in thispaper.The rest of the paper is organised as follows: In section 1 two 3 x + 1 algo-rithms are presented. One is using branching which looks exactly as Figure 1.The other does not use branching at all. In section 2 the 3 x +1 candidate for oneway function is shown and discussed. The section 3 contains discussion on theexecution paths coverage and its relation to the costs and function descriptionof the algorithms presented. In this paper, the argumentation assumes structured program theorem [6] to betrue. It states: a control flow graph needs only three structures to compute anycomputable function. They are: sequence, selection and iteration. The paperassumes branching structure as an elementary construction. Although it couldbe replaced with another two structures, the replacement can not happen inpolynomial running time. That is discussed in section 3.The cyclomatic complexity (CC) is also an important part of the followingdiscussion. CC is a software metric which measures the amount of executionpaths the program can take through execution. It is really counting predicates(algorithm branching) where every count doubles the amount of execution paths(if the decision is binary) [7]. The assumption is: if the algorithm has high CCit is in a state where the algorithm can be run but the functional descriptionand the behaviour of it is unknown. For example, if a program has a CC morethan 10 (meaning 2 execution paths) the program should be rewritten becausetesting each path of that program becomes very costly and it is questionablewhat that program had in mind in the first place. Please see NIST article [8]for the algorithm CC recommendations. On the other hand, some algorithmshave high CC and apparently it can not be avoided. For example the 3 x + 1problem with every iteration doubles the amount of possible execution paths.The same is for some cellular automata such as Wolfram’s rule 30 [4].To set a stage for the discussion, the 3 x + 1 problem is presented with twoalgorithms. One with high CC and other with constant CC with respect to thenumber of iterations. Finally the 3 x + 1 OWF is presented in section 2. Allthree algorithms are based on the modified 3 x + 1 function. The reason is to3void extra iteration step consisting of dividing even part 3 x + 1 with 2. f ( x ) = (cid:26) x/ x ≡ x + 1) / x ≡ mod Let input x be defined as a positive integer for the composite function and r asa number of function iterations i.e. a number of functions f ( x ) or g ( x ) involvedin composition: (cid:26) f if x is even; g if x is odd; (cid:27) ◦ (cid:26) f if x is even; g if x is odd; (cid:27) ◦ . . . r timesWhere: f ( x ) = x/ x is even and g ( x ) = (3 x + 1) / x is odd. Pseudo code looks like: Algorithm 1 x + 1 algorithm procedure Colatz ( x, r ) (cid:46) starting integer x and iterations r for i = 0; i < r ; i + + do if x is even then x ← x/ else x ← (3 x + 1) / end if end for return x (cid:46) finishing integer (output) end procedure For example, x = 3 and r = 2 gives following:input transformations output3 g ◦ g This algorithm and notations are equivalent with 1. The difference is the struc-ture of the algorithm. While 1 uses if/else for function composition, this algo-rithm uses exhaustive search to find a particular composition to match an input.First, depending on r the list of all combination for composition is created:4 = 2 r = 3 r = . . .f ◦ f f ◦ f ◦ f . . .f ◦ g f ◦ f ◦ gg ◦ g f ◦ g ◦ fg ◦ f f ◦ g ◦ gg ◦ g ◦ gg ◦ g ◦ fg ◦ f ◦ gg ◦ f ◦ f Then, the algorithm tries an input with every composition from the cor-responding list and stops if the composite function outputs a whole number.Pseudo code is:
Algorithm 2 x + 1 search algorithm procedure Colatz ( x, r ) (cid:46) starting integer x and iterations r initialise y as rational and i = 0 (cid:46) i is a counter initialise f ( x ) = x/ g ( x ) = (3 x + 1 / initialise list l (cid:46) r sized 2d array with all combination of f and g while y is rational do y ← i th l compositition (cid:46) do ith row from the list i + + end while return y (cid:46) finishing integer (output) end procedure For example, x = 3 and r = 2. The algorithm will process the first columnfrom previous table ( r = 2): f ◦ f (3) = 3 / f ◦ g (3) = 2 3 / g ◦ g (3) = 8 result integer, it will stop and output 8 It is not the first time when the Collatz conjecture is used in a cryptographicapplication. Apple Inc. applied for a patent using 3 x + 1 problem as a hashsystem and method [3]. However approaches from Apple and the proposed OWFare different. Apple uses traditional 3 x + 1 transformations (Input and Outputcolumn Table 1) and add more operations. Please see quote below:1. A method comprising: receiving an input value and an itera-tion value; based on the iteration value, iteratively performing stepscomprising: if a least significant bit of the input value is 0,(1) dividing the input value by a first value5f a least significant bit of the input value is 1,(1) multiplying the input value by a second value,(2) adding one to the input value, and(3) applying a modulo operation of a prime value to the input value,to yield a first iteration value, to yield an updated input value; re-turning the updated input value as a hash value.Note: It appears that the ”first value” and ”second value” could be numbersother than 2 and 3 respectively.In contrast, the proposed OWF takes exclusive or of the input value, the lastiterated value and execution path encoding as an output. Using the examplefrom Table 1, OWF output is calculated as: y = 9 ⊕ ⊕ x + 1 OWF
This algorithm is also based on 3 x + 1 problem. The difference from 3 x + 1problem algorithm is in a way how the algorithm stops and what is the actualoutput. Variables are defined below: • Input x is positive integer 512 bits long (binary encoding) l x = 512 • The number of iterations r = 256 • The algorithm output is a binary string y with the same bit length as r • s is a record of selection decisions through the algorithm executionThe program takes x as an input, runs as 3 x + 1 algorithm with r iterationsand output y . Pseudo code is: 6 lgorithm 3 x + 1 OWF algorithm procedure Colatz OWF ( x, s, r ) (cid:46)
512 bit x , 256 bit s and iterations r = 256 create xl and xr (cid:46) equally divided sides of x (256 bit) for i = 0; i < r ; i + + do if x is even then x = x/ s i ← (cid:46) ith bit of s becomes 0 else x = (3 x + 1) / s i ← (cid:46) ith bit of s becomes 1 end if end for create x (cid:48) l and x (cid:48) r (cid:46) divided sides of final x ( x (cid:48) r
256 bit wide) y = xl ⊕ xr ⊕ x (cid:48) l ⊕ x (cid:48) r ⊕ s (cid:46) ⊕ exclusive or return y (cid:46) finishing integer (output) end procedure Table 1: 3 x + 1 OWF for x = 9 and r = 6, y = 9 ⊕ ⊕ r Input Function Output Path Encoding1 starts with (3 x + 1) / x/
2) 7 03 7 (3 x + 1) / x + 1) / x + 1) / x/
2) ends with The execution cost of algorithm 1 has linear dependency on the number ofiterations r . The second algorithm 2 is exhaustive search and the cost hasexponential dependency on number of iterations r because of 2 r -sized tableused for search. The relation of execution paths for the above algorithms arethe opposite. For algorithm costs and number of paths relation see Table 2.When examined further, both algorithms have problems: • The problem with the branching algorithm is not so obvious. It runs finewith linear cost w.t.r. of the number of iterations. The problem startswhen input is not specifically defined. For example, if r = 256 and un-known input x > r in length, the 3 x + 1 algorithm can take any of 2 different composite functionscould be applied. That is quite opposite of what is expected from a math-ematical function with well defined description. For example sin functionis properly specified; if sin 30 = 0 . .
866 then sin of anglesbetween 30 and 60 are somewhere in range of 0 . . x + 1 algorithm because there isno underlying transformation defined. To have proper 3 x + 1 functiondescription (for example algorithm 3) someone has to go through the allinputs (2 ) and create an ordered table of all inputs and outputs whichis not practical for large r . • Non branching algorithms do have well defined execution paths but run-ning costs is exponential with respect to the number of iterations r .Table 2: Costs and path coverage with respect to iterations r x + 1 algorithm variants running cost number of pathsbranching algorithm 1 polynomial 2 r non branching algorithm 2 O (2 r/ ) constantmirage polynomial constantLets consider set V containing all the 3 x + 1 algorithm variants (or reduc-tions). CC is used to categorise all variants. Using CC is beneficial becauseall variants have a certain programming structure and consequently CC metricsassigned. The list of all variants can be divided in two groups according to thealgorithm associated CC: • The subset C where C ⊂ V is a branching algorithm subset where CCdepends exponentially on iteration ( r ) (1st row Table 2). • The remaining variants are in the second subset R where R ⊂ V and C + R = V (2nd row Table 2).The subset R can be divided again into two subsets using algorithm runningcost: • The subset E with exponential running cost w.r.t. iterations r (exhaustivesearch) the same as algorithm 2 Table 2 where E ⊂ R . • The subset G of remaining variants where running cost is polynomial w.r.t.iteration m (mirage in Table 2) where G ⊂ R and E + G = R . Subset G is interesting because its members have polynomial running cost andexecution paths are well defined.Mirage variant (3rd row Table 2) is a member of subset G . If mirage variantexists, such an algorithm will behave in a similar fashion as a non branching8lgorithm 2 but instead of checking all entries in the 2 r -sized table it will alwayschoose adequate function composition. Essentially it will take input x and thenumber of iterations r and will execute 3 x + 1 in polynomial time with respectto r without using branching programming structure. Sure enough, mirage re-assembles non deterministic polynomial (NP) algorithm from complexity theorywhere it always chooses the correct path when a branching decision is needed.From all above, Theorem 1 can be deduced. Note that structured programtheorem [6] already implies conditional branching as a basic algorithmic struc-ture. The reason for this addition is to clarify cases when conditional branchingis replaced with sequence and iteration structures (for example Algorithm 2). Definitions 1.
The list of definition is: • cb ; Conditional branching is an algorithm structure which causes differ-ent sequence execution depending on some comparison. One example ismodified Colatz if else statement. f ( x ) = (cid:26) x/ x even (3 x + 1) / • si ; Sequence and iteration reduction is a situation when the cb is replacedwith other two algorithmic structures. For example, algorithms 1 and2 have different structures ( cb and si ) but on the same inputs produceidentical outputs. • r ; Iterations represent a number of cb steps. One configuration example isshown in Figure 1. In that case, the number of execution paths is 2 r where2 is binary branching and r is number of steps. This case is the simplestcase for the path counting. Other bc configurations are determined byCC metric procedure [7] and resulting CC value c is equivalent to r (pathcount is 2 c ). • exp ; Set of all algorithms with exponential costs. Theorem 1.
There exists at least one conditional branching structure cb fromthe set of all possible conditional branching structures CB . It can not be replacedwith equivalent sequence/iteration structure si in algorithm a si and at the sametime have polynomial cost for that replacement. ∃ cb ∈ CB : cb = si ∧ ∀ a si ∈ exp Proof.
The opposite of the theorem statement is assumed (that none of cb ex-ists). Consequently, every possible conditional branching construct cb will haveequivalent sequence/iteration combination si with polynomial cost reduction.In other words, every program or algorithm can be constructed with sequencesand iterations only.The value of the above discussion is in categorisation applicability to everypossible branching scenario. If all conditional branching cases are analysed and9f it is found that every branching case has an accompanying mirage variantthen branching programming structure is redundant . Otherwise, designing al-gorithms with high cyclomatic complexity and asking difficult questions withoutspecifying input is quite easy. References [1] L. A. Levin. The tale of one-way functions.
Probl. Inf. Transm. , 39(1):92–103, January 2003.[2] Jeffrey C. Lagarias. The 3x + 1 problem: An overview.http://bookstore.ams.org/mbk-78/ FreeAttachments/mbk-78-prev.pdf.[3] M. Ciet, A.J. Farrugia, and T. Icart. System and method for a collatz basedhash function, May 2 2013. US Patent App. 13/308,452.[4] Stephen Wolfram. Computation theory of cellular automata.
Communica-tions in mathematical physics , 96(1):15–57, 1984.[5] Stephen Wolfram.
A New Kind of Science . Wolfram Media, 2002.[6] Corrado B¨ohm and Giuseppe Jacopini. Flow diagrams, turing machinesand languages with only two formation rules.
Communications of the ACM ,9(5):366–371, 1966.[7] Thomas J McCabe.
Structured testing: A software testing methodology usingthe cyclomatic complexity metric . US Department of Commerce, NationalBureau of Standards, 1982.[8] Arthur H Watson, Thomas J McCabe, and Dolores R Wallace. Structuredtesting: A testing methodology using the cyclomatic complexity metric.