[PDF] On the existence of recurrent structures & statistical bias in the Collatz path sequences

Abstract

This paper enumerate some numerical findings concerning the repetitive patterns arising in the so-called Collatz path sequences. This is followed by a closed form finite state machine (FSM) model of these recurrences using a set of linear congruence equations resulting in a different terminating condition for the Collatz problem. The completeness of the problem in these finite number of recurrent forms (here, six recurrent forms), such that the elements of the Collatz path sequence switches in-between till one of them reaches a number of the form 2m is shown. Further, by using heuristic analysis on the frequency distribution of these recurrence forms, the manifestation of statistical bias and constructive analytical formulation and convergence for some of these recurrent forms is exhibited. Unlike many other approaches described in literature, the present contribution illustrates the existence of a recurrence quantification of the Collatz conjecture. Also if the presented analysis can be made rigorous to solve the set of linear congruence equations for the Collatz FSM model, the exact true nature of Collatz problem can be inferred.

Full PDF

11 Department of Electrical Engineering, IIT Kanpur, UP, India Amazon Development Center, India, Private Limited

Sawon Pratiher , Subhasis Kundu Abstract

This paper enumerate some numerical findings concerning the repetitive patterns arising in the so-called Collatz path sequences, {𝑭 𝒌 (𝐧)} 𝒌=𝟎∞ , F (n) = n, 𝑭 𝒌 (𝐧) = 𝑭(𝑭 𝒌−𝟏 (𝐧)).

This is followed by a closed form finite state machine (FSM) model of these recurrences using a set of linear congruence equations resulting in a different terminating condition for the Collatz problem. The completeness of the problem in these finite number of recurrent forms (here, six recurrent forms), such that the elements of the Collatz path sequence switches in-between till one of them reaches a number of the form 2 m is shown. Further, by using heuristic analysis on the frequency distribution of these recurrence forms, the manifestation of statistical bias and constructive analytical formulation and convergence for some of these recurrent forms is exhibited. Unlike many other approaches described in literature, the present contribution illustrates the existence of a recurrence quantification of the Collatz conjecture. Also if the presented analysis can be made rigorous to solve the set of linear congruence equations for the Collatz FSM model, the exact true nature of Collatz problem can be inferred. Keywords — Collatz Problem; Number Theory; Recurrence; Sequence; Congruence; Bias; Complete; Finite State Machine; I. I NTRODUCTION

Problems which are simple to state but extremely hard to solve are abound in mathematics and one such is the 3n+1 problem. It is also known by different names such as the Collatz problem, Hasse’s Algorithm (after Helmut Hasse), Ulam’s problem (after Stanislaw Ulam), the Syracuse problem and the Kakutani’s problem (after Shizuo Kakutani) respectively. The 3n + 1 problem is described by the repetitive mapping, F: Z → Z while, Z being the set of positive integers: 𝑓(𝑛) = {3n + 1 if n ≡ 1 (mod 2) 𝑛2 if n ≡ 0 (mod 2)}

For n, k ∈ Z, the Collatz conjecture states that for all n, there exists some k for which, 𝑭 𝒌 (𝐧) = 𝟏 , and after which it falls in the cycle 4  

1. From the works of Martin Gardener [1] and Riho Terras [2] , we find that the problem has been verified for all n < 2,000,000,000, which is further extended by Leavens and Vermeulen [3] for all n < 5.6 x 10 . Analysis concerning the stopping time and the number of iterations required for the Collatz sequences to converge can be traced from the works of Everett [4] , Silva [5] and Dolan [6] whereas, Steiner’s theorem [7] and Simons’ work [8] regarding the non-existence of 2-cycles for the Collatz problem holds analytical perspective. For a comprehensive literature on the previous findings, readers are requested to refer to the works by Lagarias [9-12] , and Chamberland [13] for a broader perspective of the problem. Ongoing, section II explains the mathematical model taken for grouping of the natural numbers into modulo 4 classes and its subsequent exposition of the recurrent patterns in the Collatz path sequences. These recurrences are modelled using a FSM and a closed form recurrence quantification is described for the proposed FSM with the completeness of these patterns in these finite number of recurrent forms. Section III deals with the numerical enumerations and frequency distribution analysis leading to the observation of statistical bias in their distribution and closed form convergence for some of these recurrent forms. Finally, conclusions and futuristic scope of the proposed recurrence quantification model of the Collatz problem is discussed in section IV. II.

APPLIED MATHEMATICAL MODEL

Segregation of the Natural Numbers into classes

We begin by defining the 2 operations of the Collatz problem as follows:  E = Collatz Even Operation, i.e., n  n/2, if n is even, and,  O = Collatz Odd Operation, i.e., n 

3n + 1, if n is odd. Prior work done by Tzanis [14] on grouping numbers into classes motivate us to proceed by classifying all natural numbers into modulo 4 classes namely class A, B, C and D respectively as shown in Table – I.

On the existence of recurrent structures & statistical bias in the Collatz path sequences T ABLE I - M ODULO GROUPING OF NUMBERS

Class A Class B Class C Class D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

A = { x|x ≡ 1 ( mod

4) }, B = { x|x ≡ 2 ( mod

4) }, C = { x|x ≡ 3 ( mod

4) }, D = { x|x ≡ 0 ( mod

4) }, i.e., the numbers in each of the classes are of the form:

Class A: 4k + 1 Class B: 4k + 2 Class C: 4k + 3 Class D: 4k + 4

Where, k = 0, 1, 2, 3, 4 … Table II highlights the notations used throughout the rest of the paper for referencing of the recurrent forms. T

ABLE

II-R

ECURRENT FORMS

Recurrent Form Notation 9n+8 (a) 9n+4 (b) 9n+2 (c) 9n+1 (d) 9n+5 (e) 9n+7 (f)

Different terminating condition of the Collatz problem:

It will be shown that the Collatz path sequence for all natural numbers terminates in one of these 6 forms only, or else it keeps switching in-between these forms, till one of the forms becomes 2 m , m ∈ Z , and thereafter, by repeated Collatz Even Operation it reaches 1. Also, proving the Collatz problem entails that all natural numbers taken from each of the classes A,B,C and D, after iterating through a series of Collatz operations, i.e., after a finite sequence of E ‘s and O ‘s, results in either of the two scenarios mentioned below: A smaller number than the initial starting number, or, Yielding a number which is of the form 2 m . If condition 2 is satisfied for n where, n ∈ Z, then thereafter by repeated Collatz Even Operation it will reach 1. Else if condition 1 is satisfied for n where, n ∈ Z, then by induction methodology, we can say that it will again be reduced to either of the two conditions mentioned above and it follows. I. Class A: numbers of the form 4k + 1

4k + 1 O

12k + 4 E

6k + 2 E

3k + 1 3k + 1 < 4k + 1 (yielding a smaller number)

II.

Class B: numbers of the form 4k + 2

4k + 2 E

2k + 1 2k + 1 < 4k + 2 (yielding a smaller number)

III.

Class D: numbers of the form 4k + 4

4k + 4 E

2k + 2 2k + 2 < 4k + 4 (yielding a smaller number)

Observation:

For the 3 classes A, B and D , it reduces to a smaller number, and any number belonging to these classes will be again (if any) reduced to a smaller one, and it continues. IV.

Class C: numbers of the form 4k + 3 O E O E For numbers belonging to class C, it’s not yielding a smaller number than the initial starting number, so it must yield a number of the form 2 m , for the Collatz Conjecture to be true (i.e., hypothesis). On studying the conditions required for class C numbers to converge to the form 2 m , we find the existence of recurrent patterns in their path sequences and their completeness in these forms are studied extensively in the following sections. s From the above tree structure, we find that the path sequence elements converegs to the forms (c) and (a) respectively, which can again be traced back from the initial occurrence of the forms (c) and (a) and thus the existence of repetitive patterns. These repetitive structures manifest in closed form recurrence quantification of the Collatz path sequences which can be modelled using a Finite State Machine Model (FSM), shown in the next section. Also, it may be note that these repetitive sequences of Collatz path sequence continues till one of these forms reaches a number of the form 2 m . Since elements from classes A, B and D are reducible to a smaller number, so the Collatz problem now dwells on proving that all natural numbers belonging to class C will reach a number of the form 2 m by these recurrence path elements. The next section provides the FSM model and its recurrence quantification using a set of linear congruence equations. Finite State Machine Modelling

Fig. 1.

Finite State Machine of the Recurrent Structures of the Collatz path sequence for any number n

Fig.1 describes the recurrence quantification of the Collatz problem using FSM, where E ‘s and O ‘s represents the sequence of Collatz even and odd operations. The closed form completeness of the FSM is evident from Fig. 1 and its analytical explanation is given below: According to Collatz Problem for any positive integer, it must reach to some power of two, 2 m (say) following the function, 𝑓(𝑛) iterated ‘i’ times. Now this 2 m , will always lie in one of the 6 congruence classes ( a), (b), (c), (d), (e) and ( f). Proof:

Let us consider 2 i = 9k + 2 where i  

0 Thus, 2 i = 9k + 2  i = 2(9k + 2)  i+1 = 18k + 4 

9n + 4 [Taking 2k = n] Now, 2 i+1 = 9k + 4  i+1 = 2(9k + 4)  i+2 = 18k + 8 

9n + 8 [Taking 2k = n] Now, 2 i+2 = 9k + 8  i+2 = 2(9k + 8)  i+3 = 18k + 16 = 9n + 16 [Taking 2k = n]  i+3 = 9(n+1) + 7 

9m + 7 [Taking n+1 = m] Now, 2 i+3 = 9k + 7  i+3 = 2(9k + 7)  i+4 = 18k + 14 

9n + 5 Now, 2 i+5 = 9k + 5  i+5 = 2(9k + 5)  i+6 = 18k + 10 

9n + 1 Again, 2 i+7 = 2(9k+1) = 18k + 2 

9n + 2 Thus, after every 6 iterations, the power of 2’s will lie in the same congruence class. So, all power of 2 will fall in one of the congruence classes ( a), (b), (c), (d), (e) and ( f) . Also, it may be noted that, in all the 6 congruence class namely ( a), (b), (c), (d), (e) and ( f) , for any number x in those classes, any number of the form x.2 i (where i 

0) also lies in the same class.

Proof:

Let us consider x2 i  M (where, M can be any of the congruence class mentioned in Table II) Now, 𝑓 𝑖 (𝑥2 𝑖 ) = 𝑥 [Following the Collatz problem] Thus, the stopping number for x and x2 i will be the same. [Stopping Number is a number which is power of 2 such that for any number x, 𝑓 𝑖 (𝑥) = 2 𝑚 for m  𝑓 𝑖−1 (𝑥) ≠ 2 𝑘 for k  A closer look at the FSM described in Fig. 1 manifests a set of linear equations involving congruences for the FSM model, which is described below:           nnnnn aaaba ............. (1)        

24 232412 nnnnn bcbcb           nnnnn bcbcb ................. (2)        

48 434814 nnnnn cacdc           nnnnn cacdc ................ (3)           nnnnn dedcd ............... (4)           nnnnn efeae ................ (5)           nnnnn fafcf ................ (6) Where, a n , b n , c n , d n , e n and f n, denotes the the n th recurrent forms for each class. Solving the above set of congruence equations give the set of values of n for each of these converging recurrent forms. The set of equations involves solving a set of linear congruence equations, and we need to show that the union of the solution for each of these 6 recurrences set equals the set of Natural numbers for the Collatz problem. There is an approach based on Power set Construction [15, 16] with which it can be shown that starting with the 6 sets N , N , N , N , N , N . The union of them will be the set  (i.e. set of all integers).The 6 sets are formed based on the following rules: N a = {x | x   & F i (x) = y  y = 2 m & y  {9k+8 , 9k+4, 9k+2, 9k+1, 9k+5, 9k+7}} i.e. for set N a , x is an element of N a , i.e. x  N a such that x   and F i (x) = y and y = 2 m and y  {9k+8 , 9k+4, 9k+2, 9k+1, 9k+5, 9k+7}. It has been shown that any number x from Tzanis Form will satisfy F i (x) = y, where y = 2 m  m  Z and also y satisfy one of the recurrent form (a) , (b) , (c) , (d) , (e) , (f). Thus,  = N  N  N  N  N  N = Z III. OBSERVATIONS Enumerations & bias in the distribution of the recurrent forms

Inspired by the "Unexpected biases in the distribution of consecutive primes" paper by Oliver, Robert J. Lemke, and Kannan Soundararajan [17] and in order to study the asymptotic behavior of these repetitive patterns arising in the Collatz path sequences, we apply the enumerative approach on our mathematical model for the first 10 natural numbers. The enumeration approach is described as follows: 1. For each multiple of 10, we find the frequency distribution,

N (r) for each of these recurrent forms. 2.

The frequency distribution,

N (r) of each recurrent form is defined as:

N (r) = 𝑁 𝑟 𝑁 𝑇𝑜𝑡𝑎𝑙

Where, ‘ r’ is the recurrence form, here, it may be either of ( a), (b), (c), (d), (e) and ( f). 𝑁 𝑇𝑜𝑡𝑎𝑙 : Total number of elements in the set containing all natural numbers from 1 to a given number, ‘N’ as given in Table III. It may be noted that 𝑁 𝑇𝑜𝑡𝑎𝑙 = N. 𝑁 𝑟 : Total number of elements in the subset of 𝑁 𝑇𝑜𝑡𝑎𝑙 whose terminating Collatz path sequence having the recurrence form ‘r’ . Terminating Collatz path sequence is the any one of ( a), (b), (c), (d), (e) and ( f) in the Collatz path sequence becoming some number of the form 2 m . For example, in analyzing all the natural numbers from 1 to 100. We have the following parameters: 𝑁 = 𝑁

𝑇𝑜𝑡𝑎𝑙 = 100. 𝑁 𝑎 = 89 N (a) = 𝑁 𝑏 = 1 N (b) = 𝑁 𝑐 = 3 N (c) = 𝑁 𝑑 = 2 N (d) = 𝑁 𝑒 = 4 N (e) = 𝑁 𝑓 = 1 N (f) =

The details of the enumeration results are given in Table III.

Table III: Frequency distribution of the 6 terminating recurrence forms for the Collatz path sequence element becoming 2 m The following observations can be made from Table III. 1.

The Collatz path sequence shows biases in the distribution of its recurrent forms for which it reaches 2 m. The chance of the recurrent form for which it becomes 2 m and lying in any one of these ( a), (b), (c), (d), (e) or ( f) forms is maximum for recurrent form (a) and recurrent form (c) while asymptotically they are converging converges towards a bias. 3. Also, from a stochastic viewpoint of a pure random process [18-21] , if the Collatz problem is truly random then the distribution of these recurrence forms should have been equally distributed and a recurrence quantification as shown for these recurring forms should not been possible. The exact true nature of the Collatz problem lies in solving the 6 linear congruence equations used in describing the FSM model. Prime factorization of the elements of set 𝑁 𝑟 for some of the recurrence classes shows patterns and as such constructive formulation for some of the recurrent forms have been given below: Form (d): 9n + 1 S (d) = {1, 64, 4096, 262144, ...........}; Table IV: Prime factorization of the elements of set S (d)

Set Element Prime Factorization 1 2

64 2 . . . . So, the general term of the numbers for which it becomes 2 m , for congruence class (d) is, , where, k = 0,1,2,3,4,……. Form (b): 9n + 4 S (b) = {4, 256, 16384, ............}; Table V: Prime factorization of the elements of set S (b)

Set Element Prime Factorization 4 2

256 2 . . . . So, the general term of numbers for which it becomes 2 m , for congruence class (b) is, (6k + 2) , where, k = 0,1,2,3,4,……. Form (f): 9n + 7 S (f) = {16, 1024, 65536, .....}; Table VI: Prime factorization of the elements of set S (f)

Set Element Prime Factorization 16 2 . . . . So, general term of numbers for which it becomes 2 m , for congruence class (f) is, (6k + 4) , where, k = 0,1,2,3,4,……. 𝑁 N (a) N (b) N (c) N (d) N (e) N (f) 10 0.700000 0.1000 0.1000 0.1000 0.00000 0.0000 10 Form (e): 9n + 5 S (e) = {21, 32, 42, 84, 168, 336, 1344, 1365, 2048, 2688, 2730, 5376, 5460, 10752, 10920, 21504, 21840, 43008, 43680, 86016, 87360, 87381, 131072, 172032, 174720, 174762, 344064, 349440, 349524, 688128, 698880, 699048, ...........}; Table VII: Prime factorization of the elements of set S (e) Set Element Prime Factorization 21 3 x 7

32 2

42 2 x 3 x 7 84 2 x 3 x 7 168 2 x 3 x 7 336 2 x 3 x 7 1344 2 x 3 x 7 1365 3 x 5 x 7 x 13 x 3 x 7 2730 2 x 3 x 5 x 7 x 13 5376 2 x 3 x 7 5460 2 x 3 x 5 x 7 x 13 10752 2 x 3 x 7 10920 2 x 3 x 5 x 7 x 13 21504 2 x 3 x 7 21840 2 x 3 x 5 x 7 x 13 43008 2 x 3 x 7 43680 2 x 3 x 5 x 7 x 13 86016 2 x 3 x 7 87360 2 x 3 x 5 x 7 x 13 87381 3 x 7 x 19 x 73 x 3 x 7 174720 2 x 3 x 5 x 7 x 13 174762 2 x 3 x 7 x 19 x 73 344064 2 x 3 x 7 349440 2 x 3 x 5 x 7 x 13 349524 2 x 3 x 7 x 19 x 73 688128 2 x 3 x 7 698880 2 x 3 x 5 x 7 x 13 699048 2 x 3 x 7 x 19 x 73 . . . . On analyzing, it was found that although, a generalized term representation for recurrence class (e) is not found, but still the repetitive patterns in numbers of the form (6k+5) at a gap following a nd order arithmetic progression is found, where, k = 0, 1, 2, 3, 4, ...... Form (c): 9n + 2 S (c) = {2, 75, 85, 113, 128, 150, 170, 226, 267, 300, 301, 340, 401, 452, 453, 475, ………….}; Table VIII: Prime factorization of the elements of set S (c) Set Element Prime Factorization 2 2 75 3 x 5

85 5 x 17

Form (a): 9n + 8 S (a) = {3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39,...............}; Table IX: Prime factorization of the elements of set S (a)

No repetitive patterns and constructive formulation for recurrence forms (a) & form (c) are found.

IV. DISCUSSION & CONCLUSION We have presented a formal argument showing a different terminating condition for the Collatz problem and shown the existence of recurrent patterns in the Collatz path sequences followed by a model formulation of these repetitive patterns by means of functional equations involving a set linear congruence equations. The presence of bias in the distribution of the recurrent patterns arising in the so-called Collatz path sequences throws light on its limiting stochastic behavior from a pure random process viewpoint. It may be noted that classes A, B, and D, are reducible to a lower form, so, the problem now translates to a problem in solving for class C only, i.e., to show that there exist a path sequence for all elemenets from class C following this FSM terminating to any one of these recurrent forms which is of the form 2 m . Also, if the presented method be made rigorous to solve the set of congruence equations, it will lead to a constructive formulation for rest of the recurrence forms throwing light on the observed statistical bias in the distribution of the recurrence forms as exhibited in Table III. Since till now, no patterns are found in the Collatz Problem, the presented work holds analytical significance. We hope that our findings will significantly boost research in this direction. The fact that the Collatz path sequence elements recurses in a periodic fashion offers a fresh and optimistic outlook to the Collatz problem. 113 113 128 2

150 2 x 3 x 5

170 2 x 5 x 17 226 2 x 113 267 3 x 89 300 2 x 3 x 5

301 7 x 43 340 2 x 5 x 17 401 401 452 2 x 113 453 3 x 151 475 5 x 19 . . Set Element Prime Factorization 3 3

5 5

6 2 x 3

7 7

8 2

9 3

10 2 x 5

11 11

12 3 x 2

13 13

14 2 x 7

15 3 x 5

17 17

18 2 x 3

19 19

20 2 x 5 . . R EFERENCES Garner, Lynn E. "On the Collatz 3 𝑛 + 1 algorithm." Proceedings of the American Mathematical Society

Terras, Riho. "A stopping time problem on the positive integers."

Acta Arithmetica

Leavens, Gary T., and Mike Vermeulen. "3x+ 1 search programs."

Computers & Mathematics with Applications

Everett, C. J. "Iteration of the number-theoretic function f (2n)= n, f (2n+ 1)= 3n+ 2."

Advances in Mathematics

Silva, Tomás Oliveira E. "Maximum excursion and stopping time record-holders for the 3x+ 1 problem: computational results."

Mathematics of Computation

Dolan, J. M., A. F. Gilman, and S. Manickam. "A generalization of Everett's result on the Collatz 3x+ 1 problem."

Advances in Applied Mathematics

Steiner, Ray P. "A theorem on the Syracuse problem."

Proceedings of the 7th Manitoba Conference on Numerical Mathematics and Computation . 1977. 8.

Simons, John L. "A simple (inductive) proof for the non-existence of 2-cycles of the 3x+ 1 problem."

Journal of Number Theory

Lagarias, Jeffrey C. "The 3x+ 1 problem and its generalizations."

The American Mathematical Monthly

Lagarias, Jeffrey C. "The 3x+ 1 problem: An annotated bibliography, ii (2000-2009)." arXiv preprint math/0608208 (2006). 11.

Applegate, David, and Jeffrey C. Lagarias. "Density bounds for the 3 𝑥 + 1 problem. I. Tree-search method." mathematics of computation Lagarias, Jeffrey C., ed.

The ultimate challenge: The 3x+ 1 problem . American Mathematical Soc., 2010. 13.

Chamberland, Marc. "An update on the 3x+ 1 problem." (2003). 14.

Tzanis, Evangelos. "Collatz Problem: Properties and Algorithms."

Computation And Reasoning Laboratory (Corelab), School of Electrical and Computer Engineering, National Tehnical University of Athens, Athens (2003). 15.

Hopcroft, John E., Rajeev Motwani, and Jeffrey D. Ullman. "Introduction to automata theory, languages, and computation."

ACM SIGACT News

Hopcroft, John E ; Ullman, Jeffrey D. (1979 ). "The equivalence of DFA's and NFA's". Introduction to Automata Theory, Languages, and Computation. Reading Massachusetts: Addison-Wesley. pp. 22–23. ISBN 0-201-02988-X.

Oliver, Robert J. Lemke, and Kannan Soundararajan. "Unexpected biases in the distribution of consecutive primes."

Proceedings of the National Academy of Sciences

Pathria, R. K. "A statistical study of randomness among the first 10,000 digits of π."

Mathematics of Computation

Kendall, Maurice G., and B. Babington Smith. "Randomness and random sampling numbers."

Journal of the royal Statistical Society

Kendall, Maurice G., and B. Babington-Smith. "Second paper on random sampling numbers."

Supplement to the Journal of the Royal Statistical Society

Good, I. J. "The serial test for sampling numbers and other tests for randomness."