[PDF] One Way Function Candidate based on the Collatz Problem

Abstract

The one way function based on the Collatz problem is proposed. It is based on the problem's conditional branching structure which is not considered as important even the 3x+1 question is quite famous. The analysis shows why the problem is mathematically so inaccessible and how the algorithm conditional branching structure can be used to construct one way functions. It also shows exponential dependence between algorithm's conditional branching and running cost of algorithm branch-less reductions.

Full PDF

OOne Way Function Candidate based on theCollatz Problem

Раде Вучковац (Rade Vuckovac)

Abstract

The one way function based on the Collatz problem is proposed. It isbased on the problem’s conditional branching structure which is not con-sidered as important even when the 3 x + 1 question is quite famous. Theanalysis shows why the problem is mathematically so inaccessible and howthe algorithm conditional branching structure can be used to constructone way functions. It also shows exponential dependence between algo-rithm’s conditional branching and running cost of algorithm branch-lessreductions. Introduction

According to Levin [1] the existence of One Way Function (OWF) is arguablythe most important question in computing theory. OWF existence would imply P (cid:54) = N P and the existence of some very important constructs in cryptography.For example: pseudorandom number generators, pseudorandom functions andvarious cryptographic protocols. Informally OWF is easy to compute giveninput x . When function description and output y is given, however, it is diﬃcultto guess input x .In the absence of theoretically proven OWF, OWF candidates are used inpractice. The bulk of asymmetrical encryption is based on OWF candidatessuch as factoring and discrete logarithm problems. It is not proven that theyare reversible, but so far no eﬃcient and non quantum algorithms are foundyet. For example: if a relatively large composite integer is given, it is diﬃcultto decompose it into a product of two integers (integer factorization).The proposed OWF candidate is based on the 3 x + 1 problem. This problemis also known as the Collatz problem and it is a famous problem in mathematics.The problem is easy to state: an input is a positive integer, if the input is even,divide it with 2 otherwise multiply it with 3 and add 1. The outcome is a newinput and the same is repeated ... The conjecture, states that for every positiveinteger, the procedure will reach 1 eventually. While the problem does not suﬀerlack of attention, no single mathematical structure is associated with it and anarbitrarily chosen iteration behaves as a fairly ﬂipped coin [2]. Therefore, theuse of the 3 x + 1 problem in cryptography is not surprising. Apple Inc appliedfor a patent using Collatz conjecture as a system and method for a hash function[3]. 1 a r X i v : . [ c s . CC ] J u l he Wolfram’s rule 30 cellular automata [4] is another example with branch-ing structure. It has almost identical transformation procedure as the 3 x + 1problem. Its English formulation is [5]: ”First, look at each cell and its right-hand neighbor. If both of these were white on the previous step, then take thenew color of the cell to be whatever the previous color of its left-hand neighborwas. Otherwise , take the new color to be the opposite of that” (emphasis added).The rule 30 is used as a pseudorandom number generator in Wolfram’s Math-ematica software. Both concepts rely either on the problem diﬃculty (Apple’sCollatz based hash) or on the empirical evidence (Wolfram’s rule 30).Generally, an algorithm is perceived as a well deﬁned procedure, but that isnot always the case. Conditional branching can undermine procedure deﬁnitionand make algorithm behaviour quite unpredictable. The OWF proposal usesunderlying conditional branching structure of the 3 x + 1 transformation (as is inthe rule 30) to argue function reversal diﬃculty. Acquired complexity dependson a number of conditional branching iterations r and not on input size n .For example: Let input x be a positive integer 512 bits long. Then applymodiﬁed Collatz transformation: if x is even, divide it with 2 otherwise multiplyit with 3 add 1 and divide it with 2. Repeat the procedure 256 times and recordthe latest output ( x ) (See Figure 1).Figure 1: The x + 1 composite function; f ( x ) = x/ and g ( x ) = (3 x + 1) / . Then split 512 bit input x to two 256 bit values xl (left part) and xr (rightpart). Do the same for the latest iteration point x resulting with xl and xr . The path value p is 256 bit path record where ”even” route step is 0 and”odd” 1 The output is calculated as bellow where ⊕ is exclusive or (basicallyoutput is xor of input, stop value and parity): y = xl ⊕ xr ⊕ xl ⊕ xr ⊕ p Reversal diﬃculty lies in the absence of a function description. Without in-put speciﬁed, the transformation is in an ambiguous state and could happen inany of 2 ways. Therefore only particular input deﬁnes corresponding trans-formation . If an arbitrarily 256 bit integer is presented as an output y , it ishard to prove if it is a valid output (let alone to ﬁnd matching input x ) thanksto function description 2 - sized ambiguous state.The whole argumentation is based on the algorithm structure and classicalnotions of polynomial and exponential costs.2 nformal Theorem 1. To have a proper function description ( x + 1 for ex-ample),the exponential nature of conditional branching must be circumvented inthe algorithm implementing that function. Avoidance results in either exhaustivesearch with accompanying exponential running cost or the conditional branch-ing is replaced with combination of sequence and iteration structure incurringpolynomial cost only. The analysis of relations between the running costs and the conditionalbranching complexity of various 3 x + 1 algorithm variants is included in thispaper.The rest of the paper is organised as follows: In section 1 two 3 x + 1 algo-rithms are presented. One is using branching which looks exactly as Figure 1.The other does not use branching at all. In section 2 the 3 x +1 candidate for oneway function is shown and discussed. The section 3 contains discussion on theexecution paths coverage and its relation to the costs and function descriptionof the algorithms presented. In this paper, the argumentation assumes structured program theorem [6] to betrue. It states: a control ﬂow graph needs only three structures to compute anycomputable function. They are: sequence, selection and iteration. The paperassumes branching structure as an elementary construction. Although it couldbe replaced with another two structures, the replacement can not happen inpolynomial running time. That is discussed in section 3.The cyclomatic complexity (CC) is also an important part of the followingdiscussion. CC is a software metric which measures the amount of executionpaths the program can take through execution. It is really counting predicates(algorithm branching) where every count doubles the amount of execution paths(if the decision is binary) [7]. The assumption is: if the algorithm has high CCit is in a state where the algorithm can be run but the functional descriptionand the behaviour of it is unknown. For example, if a program has a CC morethan 10 (meaning 2 execution paths) the program should be rewritten becausetesting each path of that program becomes very costly and it is questionablewhat that program had in mind in the ﬁrst place. Please see NIST article [8]for the algorithm CC recommendations. On the other hand, some algorithmshave high CC and apparently it can not be avoided. For example the 3 x + 1problem with every iteration doubles the amount of possible execution paths.The same is for some cellular automata such as Wolfram’s rule 30 [4].To set a stage for the discussion, the 3 x + 1 problem is presented with twoalgorithms. One with high CC and other with constant CC with respect to thenumber of iterations. Finally the 3 x + 1 OWF is presented in section 2. Allthree algorithms are based on the modiﬁed 3 x + 1 function. The reason is to3void extra iteration step consisting of dividing even part 3 x + 1 with 2. f ( x ) = (cid:26) x/ x ≡ x + 1) / x ≡ mod Let input x be deﬁned as a positive integer for the composite function and r asa number of function iterations i.e. a number of functions f ( x ) or g ( x ) involvedin composition: (cid:26) f if x is even; g if x is odd; (cid:27) ◦ (cid:26) f if x is even; g if x is odd; (cid:27) ◦ . . . r timesWhere: f ( x ) = x/ x is even and g ( x ) = (3 x + 1) / x is odd. Pseudo code looks like: Algorithm 1 x + 1 algorithm procedure Colatz ( x, r ) (cid:46) starting integer x and iterations r for i = 0; i < r ; i + + do if x is even then x ← x/ else x ← (3 x + 1) / end if end for return x (cid:46) ﬁnishing integer (output) end procedure For example, x = 3 and r = 2 gives following:input transformations output3 g ◦ g This algorithm and notations are equivalent with 1. The diﬀerence is the struc-ture of the algorithm. While 1 uses if/else for function composition, this algo-rithm uses exhaustive search to ﬁnd a particular composition to match an input.First, depending on r the list of all combination for composition is created:4 = 2 r = 3 r = . . .f ◦ f f ◦ f ◦ f . . .f ◦ g f ◦ f ◦ gg ◦ g f ◦ g ◦ fg ◦ f f ◦ g ◦ gg ◦ g ◦ gg ◦ g ◦ fg ◦ f ◦ gg ◦ f ◦ f Then, the algorithm tries an input with every composition from the cor-responding list and stops if the composite function outputs a whole number.Pseudo code is:

Algorithm 2 x + 1 search algorithm procedure Colatz ( x, r ) (cid:46) starting integer x and iterations r initialise y as rational and i = 0 (cid:46) i is a counter initialise f ( x ) = x/ g ( x ) = (3 x + 1 / initialise list l (cid:46) r sized 2d array with all combination of f and g while y is rational do y ← i th l compositition (cid:46) do ith row from the list i + + end while return y (cid:46) ﬁnishing integer (output) end procedure For example, x = 3 and r = 2. The algorithm will process the ﬁrst columnfrom previous table ( r = 2): f ◦ f (3) = 3 / f ◦ g (3) = 2 3 / g ◦ g (3) = 8 result integer, it will stop and output 8 It is not the ﬁrst time when the Collatz conjecture is used in a cryptographicapplication. Apple Inc. applied for a patent using 3 x + 1 problem as a hashsystem and method [3]. However approaches from Apple and the proposed OWFare diﬀerent. Apple uses traditional 3 x + 1 transformations (Input and Outputcolumn Table 1) and add more operations. Please see quote below:1. A method comprising: receiving an input value and an itera-tion value; based on the iteration value, iteratively performing stepscomprising: if a least signiﬁcant bit of the input value is 0,(1) dividing the input value by a ﬁrst value5f a least signiﬁcant bit of the input value is 1,(1) multiplying the input value by a second value,(2) adding one to the input value, and(3) applying a modulo operation of a prime value to the input value,to yield a ﬁrst iteration value, to yield an updated input value; re-turning the updated input value as a hash value.Note: It appears that the ”ﬁrst value” and ”second value” could be numbersother than 2 and 3 respectively.In contrast, the proposed OWF takes exclusive or of the input value, the lastiterated value and execution path encoding as an output. Using the examplefrom Table 1, OWF output is calculated as: y = 9 ⊕ ⊕ x + 1 OWF

This algorithm is also based on 3 x + 1 problem. The diﬀerence from 3 x + 1problem algorithm is in a way how the algorithm stops and what is the actualoutput. Variables are deﬁned below: • Input x is positive integer 512 bits long (binary encoding) l x = 512 • The number of iterations r = 256 • The algorithm output is a binary string y with the same bit length as r • s is a record of selection decisions through the algorithm executionThe program takes x as an input, runs as 3 x + 1 algorithm with r iterationsand output y . Pseudo code is: 6 lgorithm 3 x + 1 OWF algorithm procedure Colatz OWF ( x, s, r ) (cid:46)

512 bit x , 256 bit s and iterations r = 256 create xl and xr (cid:46) equally divided sides of x (256 bit) for i = 0; i < r ; i + + do if x is even then x = x/ s i ← (cid:46) ith bit of s becomes 0 else x = (3 x + 1) / s i ← (cid:46) ith bit of s becomes 1 end if end for create x (cid:48) l and x (cid:48) r (cid:46) divided sides of ﬁnal x ( x (cid:48) r

256 bit wide) y = xl ⊕ xr ⊕ x (cid:48) l ⊕ x (cid:48) r ⊕ s (cid:46) ⊕ exclusive or return y (cid:46) ﬁnishing integer (output) end procedure Table 1: 3 x + 1 OWF for x = 9 and r = 6, y = 9 ⊕ ⊕ r Input Function Output Path Encoding1 starts with (3 x + 1) / x/

2) 7 03 7 (3 x + 1) / x + 1) / x + 1) / x/

2) ends with The execution cost of algorithm 1 has linear dependency on the number ofiterations r . The second algorithm 2 is exhaustive search and the cost hasexponential dependency on number of iterations r because of 2 r -sized tableused for search. The relation of execution paths for the above algorithms arethe opposite. For algorithm costs and number of paths relation see Table 2.When examined further, both algorithms have problems: • The problem with the branching algorithm is not so obvious. It runs ﬁnewith linear cost w.t.r. of the number of iterations. The problem startswhen input is not speciﬁcally deﬁned. For example, if r = 256 and un-known input x > r in length, the 3 x + 1 algorithm can take any of 2 diﬀerent composite functionscould be applied. That is quite opposite of what is expected from a math-ematical function with well deﬁned description. For example sin functionis properly speciﬁed; if sin 30 = 0 . .

866 then sin of anglesbetween 30 and 60 are somewhere in range of 0 . . x + 1 algorithm because there isno underlying transformation deﬁned. To have proper 3 x + 1 functiondescription (for example algorithm 3) someone has to go through the allinputs (2 ) and create an ordered table of all inputs and outputs whichis not practical for large r . • Non branching algorithms do have well deﬁned execution paths but run-ning costs is exponential with respect to the number of iterations r .Table 2: Costs and path coverage with respect to iterations r x + 1 algorithm variants running cost number of pathsbranching algorithm 1 polynomial 2 r non branching algorithm 2 O (2 r/ ) constantmirage polynomial constantLets consider set V containing all the 3 x + 1 algorithm variants (or reduc-tions). CC is used to categorise all variants. Using CC is beneﬁcial becauseall variants have a certain programming structure and consequently CC metricsassigned. The list of all variants can be divided in two groups according to thealgorithm associated CC: • The subset C where C ⊂ V is a branching algorithm subset where CCdepends exponentially on iteration ( r ) (1st row Table 2). • The remaining variants are in the second subset R where R ⊂ V and C + R = V (2nd row Table 2).The subset R can be divided again into two subsets using algorithm runningcost: • The subset E with exponential running cost w.r.t. iterations r (exhaustivesearch) the same as algorithm 2 Table 2 where E ⊂ R . • The subset G of remaining variants where running cost is polynomial w.r.t.iteration m (mirage in Table 2) where G ⊂ R and E + G = R . Subset G is interesting because its members have polynomial running cost andexecution paths are well deﬁned.Mirage variant (3rd row Table 2) is a member of subset G . If mirage variantexists, such an algorithm will behave in a similar fashion as a non branching8lgorithm 2 but instead of checking all entries in the 2 r -sized table it will alwayschoose adequate function composition. Essentially it will take input x and thenumber of iterations r and will execute 3 x + 1 in polynomial time with respectto r without using branching programming structure. Sure enough, mirage re-assembles non deterministic polynomial (NP) algorithm from complexity theorywhere it always chooses the correct path when a branching decision is needed.From all above, Theorem 1 can be deduced. Note that structured programtheorem [6] already implies conditional branching as a basic algorithmic struc-ture. The reason for this addition is to clarify cases when conditional branchingis replaced with sequence and iteration structures (for example Algorithm 2). Deﬁnitions 1.

The list of deﬁnition is: • cb ; Conditional branching is an algorithm structure which causes diﬀer-ent sequence execution depending on some comparison. One example ismodiﬁed Colatz if else statement. f ( x ) = (cid:26) x/ x even (3 x + 1) / • si ; Sequence and iteration reduction is a situation when the cb is replacedwith other two algorithmic structures. For example, algorithms 1 and2 have diﬀerent structures ( cb and si ) but on the same inputs produceidentical outputs. • r ; Iterations represent a number of cb steps. One conﬁguration example isshown in Figure 1. In that case, the number of execution paths is 2 r where2 is binary branching and r is number of steps. This case is the simplestcase for the path counting. Other bc conﬁgurations are determined byCC metric procedure [7] and resulting CC value c is equivalent to r (pathcount is 2 c ). • exp ; Set of all algorithms with exponential costs. Theorem 1.

There exists at least one conditional branching structure cb fromthe set of all possible conditional branching structures CB . It can not be replacedwith equivalent sequence/iteration structure si in algorithm a si and at the sametime have polynomial cost for that replacement. ∃ cb ∈ CB : cb = si ∧ ∀ a si ∈ exp Proof.

The opposite of the theorem statement is assumed (that none of cb ex-ists). Consequently, every possible conditional branching construct cb will haveequivalent sequence/iteration combination si with polynomial cost reduction.In other words, every program or algorithm can be constructed with sequencesand iterations only.The value of the above discussion is in categorisation applicability to everypossible branching scenario. If all conditional branching cases are analysed and9f it is found that every branching case has an accompanying mirage variantthen branching programming structure is redundant . Otherwise, designing al-gorithms with high cyclomatic complexity and asking diﬃcult questions withoutspecifying input is quite easy. References [1] L. A. Levin. The tale of one-way functions.

Probl. Inf. Transm. , 39(1):92–103, January 2003.[2] Jeﬀrey C. Lagarias. The 3x + 1 problem: An overview.http://bookstore.ams.org/mbk-78/ FreeAttachments/mbk-78-prev.pdf.[3] M. Ciet, A.J. Farrugia, and T. Icart. System and method for a collatz basedhash function, May 2 2013. US Patent App. 13/308,452.[4] Stephen Wolfram. Computation theory of cellular automata.

Communica-tions in mathematical physics , 96(1):15–57, 1984.[5] Stephen Wolfram.

A New Kind of Science . Wolfram Media, 2002.[6] Corrado B¨ohm and Giuseppe Jacopini. Flow diagrams, turing machinesand languages with only two formation rules.

Communications of the ACM ,9(5):366–371, 1966.[7] Thomas J McCabe.

Structured testing: A software testing methodology usingthe cyclomatic complexity metric . US Department of Commerce, NationalBureau of Standards, 1982.[8] Arthur H Watson, Thomas J McCabe, and Dolores R Wallace. Structuredtesting: A testing methodology using the cyclomatic complexity metric.