Recursive Prime Factorizations: Dyck Words as Numbers
aa r X i v : . [ c s . F L ] F e b Recursive Prime Factorizations: Dyck Wordsas Representations of Numbers
Ralph [“Tim”] Leroy Childress, Jr.February 16, 2021
Abstract
I propose a class of numeral systems where numbers are repre-sented by Dyck words, with the systems arising from a generalizationof prime factorization. After describing two proper subsets of theDyck language capable of uniquely representing all natural and ra-tional numbers respectively, I consider “Dyck-complete” languages, inwhich every member of the Dyck language represents a number. Iconclude by suggesting possible research directions.
My fascination with patterns exhibited in the set of natural numbers N = { , , , , . . . } led me to much experimentation and indeed quite a bit offrustration trying to discover and characterize such patterns. One of themost perplexing problems I encountered was the inherent arbitrariness ofpositional numeral systems. Consider the number 520:520 = 5 × + 2 × + 0 × . (1)The implicit selection of 10 as radix, though a convention tracing back toantiquity, reflects an arbitrary choice with consequences for patterns mani-fested in the representations. For instance, a well-known pattern is that ifthe sum of the digits in a decimal representation of a number is equal to amultiple of 3, then the number itself is divisible by 3; yet that is not thecase with base-2 or base-5. This and many other such patterns may be gen-eralized to apply to numeral systems of any base ≥
2, but for most of usthe generalization detracts from the immediacy of the realization. It wouldbe useful if the system of representation did not require any one number to1ssume undue importance above the others, so that patterns would directlyreflect characteristics of those numbers under examination rather than beingobscured by the selection of some irrelevant number to serve as “the radix.”There is another drawback inherent in positional numeral systems, atleast with regard to their use to identify and characterize patterns amongnumbers. As is evident in Equation 1, evaluation of a number’s positional rep-resentation requires three distinct operations, namely exponentiation, mul-tiplication and addition. However, the number represented by decimal 520can be more simply represented by a unique product of prime numbers2 × × , called its prime factorization , the evaluation of which does not involve addi-tion. Remark.
More precisely, the prime factorization of a number is unique up tothe order of the factors. Also, when I say “the evaluation of which does notinvolve addition,” I am referring to addition as a distinct operation in theevaluation; obviously the multiplication of natural numbers may be viewedas iterated addition.Thus I set out on a quest to discover systems for representing numberswhere the systems, being based upon prime factorization, neither involve theconcept of a radix nor require addition for evaluation. I furthermore soughtsuch systems with alphabets of the smallest size.I succeeded in my quest, discovering a class of systems I call “NaturalRecursive Prime Factorizations” (“Natural RPFs”), where each of these sys-tems can represent all members of N using a language with an alphabet ofonly two symbols. I subsequently realized natural RPFs can be modified toyield another class of systems, “rational RPFs,” each of which is capable ofrepresenting not merely all members of N but all rational numbers. A re-markable fact about rational RPF systems is that, unlike decimal and otherpositional numeral systems, no enlargement of the alphabet is needed forrepresentation of the rationals; the same two-symbol alphabet is employedas for natural RPF, without need for a negative sign or a radix point. Thereis also no need for an overbar to designate repeating symbol sequences, sinceevery rational RPF system is able to represent all rational numbers by stringsof finite length.I must warn the reader at the outset that these systems are impracticalfor application to the mundane tasks of everyday life, such as balancingcheckbooks or enumerating street addresses. But they were never intendedfor such purposes, instead being conceived to facilitate the study of patternsamong numbers, offering a convenient bridge between number theory and the2heory of formal languages. Natural RPF systems, for example, invite theanalysis of their words using powerful techniques from computer science suchas context-free grammars, parsers and finite state machines, providing directconnections between numbers and subsets of the well-studied Dyck language D , including D itself. Words produced in these systems moreover do notinvolve an arbitrarily selected radix, eliminate the necessity for addition intheir evaluation, and are closely related to the prime factorizations of almostall the numbers they represent. Remark.
I say almost all because 0 has no prime factorization, and becausethe question of whether 1 has a prime factorization is a matter of dispute[4]. Also, note that I regard prime numbers as having prime factorizations,the factorization of a prime number being the number itself, as given by theequation p k = Q ki = k p i .Many interesting patterns arise in number sequences defined accordingto properties shared by their members’ representations as Dyck words (seeSection 5.2 on page 43, for example). N r min I begin by presenting a system capable of representing natural numbers byunique finite sequences of left and right parentheses. For now I will referto this system as “minimal natural RPF,” abbreviated RPF min , in order tointroduce the concept without first launching into a lengthy digression con-cerning languages and their interpretations. In Section 2.3, I will identify thesystem more precisely as “the standard minimal RPF natural interpretationRPF N r min .” If challenged to describe minimal natural RPF in one sentence, I might say:“It is a numeral system in which 0 and 1 are represented by the empty string ǫ and () respectively, with every other natural number n being written asa product of powers of consecutive primes from 2 up to and including thegreatest prime factor of n , each exponential term being surrounded by asingle pair of parentheses and nonzero exponents being recursively treated inthe same fashion as described for n , with the final resulting expression beingstripped of all symbols except the parentheses, which are then rewritten onone line while preserving their order from left to right.”3 myself have difficulty digesting that long-winded sentence; let us there-fore abandon it in favor of three examples, these being collectively sufficientto suggest how an arbitrary natural number may be represented in minimalnatural RPF. To start with, the representations of zero and one are given byexplicit definition:• Zero is represented by the empty word ǫ .• One is represented by the word ().The RPF min representation of every other natural number may be obtainedby application of a recursive algorithm, as I will illustrate by finding theRPF min equivalent of decimal 520. But first I must introduce a function thatwill be used extensively in the algorithm. We can express 520 as the exponential form of its prime factorization p p p , (2)where p k is the k th prime number. The expression includes powers of p , p and p , but not of p , p or p , since these last three do not contribute to theprime factorization of 520. But let us rewrite Expression 2 as p p p p p p , so that powers of all consecutive primes p , . . . , p m are included, where p m isthe greatest prime factor of 520. Now let us use single pairs of parenthesesas grouping symbols around each exponential term, giving( p )( p )( p )( p )( p )( p ) . (3)Expression 3 is the minimal parenthesized padded prime factorization (MPPPF, pronounced “MIP-fuh”) of 520. It is minimal because only powersof prime numbers up to and including the greatest prime factor of the numberbeing represented are present, it is parenthesized for the obvious reason thatall exponential terms are enclosed in parentheses, and it is padded because itincludes exponential terms not appearing in the prime factorization.Whether 1 has a prime factorization is a matter of dispute; we avoid theissue altogether by defining MPPPF(1) to be ( p ). Certainly 0 has no primefactorization, but neither would one be useful for our purposes even if itwere to exist. We accordingly define the domain of MPPPF to be the set ofpositive integers. 4bserve that there cannot be more than one MPPPF corresponding to agiven number, since MPPPF(1) is unique and MPPPFs for all other numbers n in the domain of MPPPF are the result of padding the unique primefactorization of n with the 0th powers of those noncontributing primes lessthan n ’s greatest prime factor. min Equivalent of Decimal 520
We begin by expressing 520 as its MPPPF( p )( p )( p )( p )( p )( p ) . Our next step is to replace all nonzero exponents in the expression by theirMPPPFs as well. We repeat this step until there no longer exist opportunitiesto replace exponents by MPPPFs:( p )( p )( p )( p )( p )( p ) =( p ( p )( p )1 )( p )( p ( p )3 )( p )( p )( p ( p )6 ) =( p ( p )( p ( p )1 )( p )( p ( p )3 )( p )( p )( p ( p )6 ) . (4)Now we proceed to the next and final step, which is to treat the expressionas a string and delete all symbols except parentheses from it, writing theparentheses all on one line while preserving their order from left to right toyield the RPF min word (()(()))()(())()()(()) . We may be certain (()(()))()(())()()(()) is the only minimal natural RPFword corresponding to decimal 520. This is because the MPPPF of 520is unique, and each nonzero exponent in Equation Set 4 has exactly onecorresponding MPPPF.
Remark.
Since RPF stands for recursive prime factorizations , the reader maywonder why I chose this name if words in the underlying languages do notappear to be prime factorizations at all, a prime factorization being by defini-tion the product of only those prime numbers that are factors of the numberbeing factorized. I offer this defense of my choice: we may indeed regardnatural RPF words as prime factorizations, if we consider the empty pairsof matching parentheses arising from transcribing the zeroeth powers of non-contributing primes to be markers collectively allowing us to deduce whichprime numbers do contribute to the factorization.5 .1.3 Finding the Decimal Equivalent of (()(()))()(())()()(()) Having found the equivalent of decimal 520 in minimal natural RPF, let usgo in the reverse direction, finding the decimal equivalent of the RPF min word(()(()))()(())()()(()).We begin by inserting 0 inside each empty matched pair of parentheses,yielding the expression ((0)((0)))(0)((0))(0)(0)((0)) . For the next step, we will treat expressions as containing zero or more“clusters,” by which I mean substrings beginning and ending with outermostmatching parentheses; for example, the clusters from left to right in theexpression above are ((0)((0))), (0), ((0)), (0), (0) and ((0)). For each cluster w k , we replace w k by the string p contents k k , where contents k is the string obtained by deleting the outermost parenthesesof w k . We do this repeatedly to the successive expressions until all theparentheses are gone: ((0)((0)))(0)((0))(0)(0)((0)) ❀ p (0)((0))1 p p (0)3 p p p (0)6 ❀ p p p (0)2 p p p p p p p ❀ p p p p p p p p p p p . All that remains to be done is to evaluate the expression: p p p p p p p p p p p = 2 · ·
13 = 520 . We thus have a system capable of representing every natural number withan alphabet of only two symbols and not involving addition for evaluation.This fact may not seem particularly significant, given that the unary sys-tem of representing n by n contiguous marks uses an alphabet of only one N except 0 and1). Indeed, for all natural numbers n greater than one, the minimal naturalRPF word representing n contains not merely the prime factorization of n ,but also the prime factorizations of all factorizable numbers involved in theprime factorization of n , the exponents in the prime factorization themselvesbeing represented by their factorizations in recursive fashion. Remark.
See Table 1 on page 14 for the minimal natural RPF representationsof the first 20 natural numbers.Later (Section 3) I will show that minimal natural RPF can be modifiedto yield a system capable of representing every rational number, again usingan alphabet of only two symbols, with no need for a negative sign or aradix point. Moreover, unlike decimal and other positional numeral systems,rational numbers can always be represented in this system by words of finitelength, without the aid of such devices as continuation dots or overbars.What decimal represents as − . . . . , for example, is represented inminimal rational RPF as ()(()())(). But before we consider these matters indetail, let us move away from my hand-waving description of RPF min andestablish the concept upon a firmer foundation. Recall that I introduced Equation 1 by writing “Consider the number 520.”My wording was intended as a device to illustrate an important point in thepresent section. So conflated in our minds are numbers with their representa-tions, and with their decimal representations in particular, that I suspect fewreaders were bothered by the phrase “the number 520” as being meaningless,or at best an incomplete abbreviation of “the number represented by 520 inthe decimal numeral system.” That is to say, most of us seldom stop to distin-guish between numbers and number words . But there is in fact a distinction,and to ignore it can yield untoward consequences. For example, the set ofnatural numbers contains a unique multiplicative identity element 1 such that1 · n = n = n · n ∈ N . But all of the members of the following set arein the decimal system, and all evaluate to 1: { , , , , . . . } . Thereare then infinitely many “decimal numbers” (decimal number words) thatcan be considered the identity element for multiplication. Thus we mightbe tempted to conclude that the set of natural numbers contains infinitelymany multiplicative identity elements, despite the existence of simple proofsto the contrary. 7t is especially important that we maintain the distinction in this paper,which is intimately concerned with numbers and different ways of represent-ing them. A sequence of symbols is one thing; what that sequence meansis quite another. For example, 11 can be understood to mean 3 in binary,17 in hexadecimal, or 11 in decimal.I find “meaning” a difficult concept to state with precision, so instead Ioffer a definition of the word “interpretation.” Definition 2.1.
Let L be a formal language, and let S be a set. If thereexists some surjective function f : L → S , then the triplet ( L, S, f ) is aninterpretation of L as S , specifically the interpretation of L as S according tof , and we may say any of the following:• L interpreted according to f is S (equivalently: L is the underlyinglanguage in the interpretation ( L, S, f )).• S is L interpreted according to f (equivalently: S is the target set inthe interpretation ( L, S, f )).• f interprets L as S (equivalently: f is the evaluation function in theinterpretation ( L, S, f )).As an example, let B denote the set of nonempty strings over the alphabet { , } , and let f : B → N , where f ( b ) is the nonnegative integer correspond-ing to b such that the latter is regarded as a word in unsigned binary. Then( B, N , f ) is an interpretation of B as the set of natural numbers, specificallythe interpretation of B as the set of natural numbers according to f . Nowconsider g : B → Z , where g ( b ) is the integer corresponding to b such that thelatter is regarded as a word in 2s-complement binary, with the qualificationthat nonnegative integers always correspond to words containing the prefix0. Then ( B, Z , g ) is an interpretation of B as the set of integers, specificallythe interpretation of B as the set of integers according to g . Thus we seethat the same language may underlie multiple interpretations.The following definition allows us to speak of interpretations in terms ofset members as well. Definition 2.2.
Let (
L, S, f ) be an interpretation. For any l ∈ L , let s ∈ S such that s = f ( l ). Then we may say any of the following:• s is l interpreted according to f .• l interpreted according to f is s .• f interprets l as s . 8ometimes the terminology of interpretations becomes awkward, resultingin a surfeit of passive participles (“ l interpreted as...”; “ l interpreted accordingto...”). We may remedy this to some extent by making use of the followingdefinition. Definition 2.3.
Let (
L, S, f ) be an interpretation, let l ∈ L , and let s bea member of S satisfying the equation s = f ( l ). Then we say l representss in (L,S,f) . If we do not wish to mention a particular interpretation (asfor example when the interpretation would be clear from the context), wemay simply say l represents s , implying some interpretation exists such that l represents s in that interpretation.In addition to speaking of members of languages representing membersof sets, we can speak of languages representing sets. Definition 2.4.
If (
L, S, f ) is an interpretation, then we say
L represents S in ( L, S, f ), or, equivalently, L is a representation of S in ( L, S, f ). In caseswhere we do not wish to mention a specific interpretation, we may simplysay L represents S , or, equivalently, L is a representation of S .Definition 2.1 only requires that the function f be surjective. I introducespecial terminology for the case where f is injective as well. Definition 2.5.
Let (
L, S, f ) is be an interpretation such that f is a bijection.Then we many say any of the following:• ( L, S, f ) is minimal .• L is minimal with respect to (L,S,f) .• The representation of S in ( L, S, f ) is minimal .If the interpretation is minimal, there is exactly one member of L repre-senting any given s ∈ S ; otherwise we cannot exclude the possibility that s may have multiple representations. Regardless of whether the interpretationis minimal, the surjectivity of f ensures that every member of S has at leastone representation in L .The following definition clarifies what I understand by the word system when I speak of RPF systems. Definition 2.6.
Let I be an interpretation such that the target set in I isnumerical. Then we say I is a numeral system . We may simply say “ I is asystem,” if doing so does not incur ambiguity.9 .2.1 Standard and Other Prime-permuted RPF Interpretations RPF systems are less arbitrary than positional numeral systems, not re-quiring the selection of a special number to serve as the radix. Yet someparticular permutation of the sequence of prime numbers must be chosenfor the interpretation of RPF words as numbers and the representation ofnumbers by RPF words. Returning momentarily to the concept of minimalparenthesized padded prime factorizations (Section 2.1.1), MPPPFs involvepowers of primes appearing in the same order as those primes occur in thesequence (2 , , , . . . , p k ), where p k is the greatest prime factor of the numberof which the MPPPF is taken. But the descending sequence ( p k , . . . , , , S swap in( p , p , . . . , p j , p j − , . . . ) such that p was the first term in S swap and all theprimes from the least to the greatest prime factor of the argument of MPPPFwere in S swap . In fact, any permutation of the sequence of prime numberswould suffice to determine the ordering of the prime powers in the definitionof a proposed MPPPF. Nevertheless, in order to avoid a profusion of sym-bols designating the choice of the underlying prime permutation, and to havecommon ground for discussing RPF systems, it would be well to consider onesequence as the “standard,” with other permutations only being talked aboutwhen their existence was relevant to the discussion.I select the identity permutation P = (2 , , , , . . . ) of prime numbers,where the terms appear in the same order as they occur in the sequence ofnatural numbers, to be the standard permutation. Indeed, we can regard P as not being a permutation at all, but the original sequence from which otherprime sequences are derived by scrambling the terms in P . Because valuesof successive terms in P increase as the terms are written in customary orderfrom left to right, a standard RPF system can also be called a right-ascending or rightwise system. This is why I will often refer to a standard RPF systemusing the subscript r , as in RPF N r min and RPF Q r min . Definition 2.7.
The standard permutation , also called the rightwise or right-ascending permutation , is the sequence P of prime numbers (2 , , , , , . . . ). Remark.
Here we have an illustration of how the intimate relationship be-10ween prime factorizations and recursive prime factorizations results in par-allels between the two. The fundamental theorem of arithmetic states that anumber’s prime factorization is unique except for the order of the factors , sothat the prime factorization of 520 could be written variously as 2 × × ,13 × × , etc. But in practice we usually write prime factorizations withthe primes appearing in ascending order.On occasion I will designate an RPF interpretation based upon a par-ticular but arbitrary permutation of the prime number sequence (with theidentity permutation being one of the possibilities); in such cases I will uselowercase Greek letters in the notation, as with RPF N σ min .In all of this, we must take care to remember that when we speak of“permutations,” we are not referring to the RPF systems themselves butrather to the sequences of prime numbers underlying them. I will thereforenot refer to an RPF interpretation or its language as being a permutation,but rather as being prime-permuted , or, if I am referring to a specific primepermutation σ , as being σ -permuted . γ N r and RPF N r min Remark.
This section is confined to a discussion of mathematical objectsrelevant to those RPF systems arising from the standard permutation. SeeSection 2.7 for generalizations of the same objects for all prime permutations.We can think of an RPF representation of a number as being a spellingof that number, with a spelling function mapping from the set of numbersonto the set of words in the language underlying the corresponding RPFinterpretation. If the interpretation is minimal, we can then use the resultingpossible spellings to define the language as the image of the spelling function.We will employ a modification of this approach to define the standardminimal RPF natural interpretation RPF N r min , as follows. We will define afunction γ ′ N r (from γ in ὀρθογραφία , orthographia , Greek for “word”) mappinga natural number n to a unique string of parentheses, and then we will definethe language underlying RPF N r min as the set of all possible strings producedby the function. Since γ ′ N r takes a natural number input and outputs anRPF N r min spelling of the number, we could choose to regard γ ′ N r as our spellingfunction; however, because we wish the spelling function to have an inverse,we will instead define a spelling function γ N r identical to γ ′ N r except with itscodomain restricted to RPF N r min . Thus γ N r will be a bijection, enabling usto speak of its inverse.First I introduce two notations for convenience in concatenating strings;these will find extensive use throughout the rest of this paper.11 efinition 2.8. The symbol a is the string concatenation operator; a a b denotes the concatenation of strings a and b . The concatenation of a and b may also be written in customary fashion as ab , provided that doing soincurs no ambiguity. Definition 2.9.
Let j, k ∈ N + . Then we shall understand k L i = j s i to mean thestring concatenation s j . . . s k if j ≤ k ; otherwise the concatenation is the nullstring ǫ .Without further ado, let us define the nonsurjective precursor to ourspelling function. Definition 2.10.
Let Σ ∗ be the Kleene closure of the set { ( , ) } . Then the standard nonsurjective RPF natural transcription function , denoted by γ ′ N r ,is given by γ ′ N r : N → Σ ∗ , where• For n = 0, γ ′ N r ( n ) is the empty string ǫ .• For n = 1, γ ′ N r ( n ) is the string ().• For n >
1, let p m be the greatest prime factor of n , and let a =( a , . . . , a m ) be the integer sequence satisfying the equation n = m Y i =1 p a i i . Then γ ′ N r ( n ) = m M i =1 ( ′ ( ′ a γ ′ N r ( a i ) a ′ ) ′ ) . Now we can define our bijective spelling function by specifying its graph.
Definition 2.11.
The standard RPF natural spelling function , denoted by γ N r , is given by γ N r = { ( n, w ) ∈ N × γ ′ N r ( N ) | w = γ ′ N r ( n ) } , where γ ′ N r ( N ) is the image of γ ′ N r . Remark.
Definition 2.11 gives us our spelling function, but relies upon Defi-nition 2.10 for computation of its values.12t times it will be convenient to use a notation for γ N r ( n ) that does notinclude the parentheses inherent in function notation. Definition 2.12.
Let k be a natural number. The expression γ N rk denotesthe k th term in the sequence ( γ N r ( n )) n ∈ N . Example . Let us find the spelling corresponding to decimal 2646.First we note that 2646 = 2 · · · = p · p · p · p . Thus the standard RPF natural spelling of 2646 is γ N r = ( γ N r )( γ N r )( γ N r )( γ N r )= (())(( γ N r )( γ N r ))()(( γ N r ))= (())(()(()))()((())) . (5)Before we define the standard minimal RPF natural interpretation, let usgive a name to its underlying language. Definition 2.13.
The standard minimal RPF language , denoted by D r min ,is the codomain of γ N r . Remark.
The presence of the symbol D in the notation is not intended to sig-nify that D r min is a Dyck language; it is rather intended to suggest a relation-ship between the standard minimal RPF language and the Dyck language.This relationship is the subject of Section 2.4. Also, notice that I neitherincluded N in D r min nor used the word natural in the appellation standardminimal RPF language . That is because the language has a nonnumericalalternative definition, as we will see in Section 2.8.At long last we arrive at the definition of RPF N r min . Definition 2.14.
The standard minimal RPF natural interpretation , de-noted by RPF N r min , is the interpretation ( D r min , N , γ − N r ). D and the Dyck Natural Numbers Table 1 on the following page shows the standard minimal RPF spellings ofthe first twenty natural numbers, suggesting a resemblance between wordsin D r min and those in the Dyck language D ; indeed, D r min is a proper subsetof D . In light of this fact, and in light of the relevance of the Dyck languageto all RPF systems, I provide here a brief description of D .13ecimal RPF N r min Decimal RPF N r min ǫ
10 (())()(())1 () 11 ()()()()(())2 (()) 12 ((()))(())3 ()(()) 13 ()()()()()(())4 ((())) 14 (())()()(())5 ()()(()) 15 ()(())(())6 (())(()) 16 (((())))7 ()()()(()) 17 ()()()()()()(())8 (()(())) 18 (())((()))9 ()((())) 19 ()()()()()()()(())Table 1: Standard RPF spellings of the first twenty natural numbers.
Remark.
We should avoid drawing too many conclusions from merely lookingat the RPF spellings in Table 1. For example, we might hypothesize thatthe RPF spelling of every natural number greater than 0 is longer than itsdecimal counterpart. Yet that hypothesis is easily disproven by noting that γ N r (443426488243037769948249630619149892803) = ()(()(((())))) . Recall Section 2.1.2, where we found an RPF representation of the nat-ural number represented by decimal 520. In doing so, we wrote 520 as theexpression ( p ( p )( p ( p )1 )( p )( p ( p )3 )( p )( p )( p ( p )6 )and then deleted everything except the parentheses to yield(()(()))()(())()()(()) . The result was a word in the Dyck language D , the set of all strings consist-ing of zero or more well-balanced parenthesis pairs. Informally, we can thinkof the property of being well-balanced as what distinguishes a syntacticallycorrect use of grouping parentheses in an algebraic expression from a syntac-tically incorrect one. For example, the expression ( p ( p ) is nonsensical, thenumber of closing parentheses not equaling the number of opening parenthe-ses; thus the string (() is not well-balanced and is not a word in the Dycklanguage. The expression p )( p is also nonsensical, even though the numberof opening and closing parentheses is equal, because the closing parenthesisis not preceded by a matching opening parenthesis. Thus the string )(, notbeing well-balanced, is not a word in D .14 emark. The symbols in the language do not have to be parentheses; variousauthors use parentheses, square brackets, 1s and 0s, etc. Any binary symbolset will suffice.
Definition 2.15.
Let Σ ∗ be the Kleene closure of the alphabet { ( , ) } . The Dyck language , denoted by D , is the set of all w ∈ Σ ∗ such that the numberof right parentheses in any prefix w ′ of w does not exceed the number of leftparentheses in w ′ , and the number of left parentheses in w is equal to thenumber of right parentheses in w .The Dyck language is context-free, and can be generated by the followinggrammar: S → ( S ) S | ǫ . D has significance beyond that of merely being just another context-free language, rather having a deep relationship with all context-free lan-guages. According to the Chomsky-Sch¨utzenberger representation theorem,every context-free language L is a homomorphic image of the intersection ofsome regular language R and D . From a practical standpoint, this can beuseful if we wish to prove L is context-free but have not been able to isolatea set of rules defining a context-free grammar for L . (Indeed, I will soonemploy the theorem to prove that D r min is context-free.)Now that I have provided a bare-bones introduction to D , I return to mydiscussion concerning the relationship between that language and D r min .The function γ N r is recursive. In order that I might discuss sequences ofrecursive function evaluations, I now develop special notation and vocabulary.Recall Equation Set 5, which showed the steps in the evaluation of γ N r (2646): γ N r = ( γ N r )( γ N r )( γ N r )( γ N r )= (())(( γ N r )( γ N r ))()(( γ N r ))= (())(()(()))()((())) . We can use these equations to construct a tree where every node rep-resents an invocation of the spelling function to evaluate a number, withchildren of the node appearing in the same order as their invocations occurin the node’s evaluation: γ N r γ N r γ N r γ N r γ N r γ N r γ N r γ N r γ N r to spell 2646 leads to γ N r beinginvoked to spell 1, followed by γ N r being invoked to spell 3, followed by γ N r being invoked to spell 0, followed by γ N r being invoked to spell 2, with thespellings of 3 and 2 leading recursively to further invocations. Definition 2.16.
Let S be a set, let f be a unary function such that thedomain of f is S , and let s ∈ S . Then the recursion tree for the evaluationof f ( s ), also called the recursion tree of f ( s ), is a tree T in which everynode represents an invocation of f to evaluate a member of S such that theinvocation arises from the evaluation of f ( s ), with the root of T representingthe invocation f ( s ), the children of a node having the same order as theyoccur in the invocation, and all invocations of f arising from the evaluationof f ( s ) appearing in T . A recursion tree consisting only of a root node issaid to be trivial . Definition 2.17.
Let f be a unary function such that the domain of f issome set S , and let T be the recursion tree for f ( s ), where s ∈ S . Supposethere exist distinct nodes A and B in T such that A represents the invocationof f to evaluate some a ∈ S and B represents the invocation of f to evaluatesome b ∈ S . If B is a descendant of A , then we say f ( a ) entails f ( b ). If B isa child of A , we may also say f ( a ) directly entails f ( b ). Example . Referring to the recursion tree for the natural RPF spellingof 2646, we see that the spelling of 2646 entails the spelling of 0, since thespelling of 0 is a descendant of the spelling of 2646. Moreover, the spelling of2646 directly entails the spelling of 0, since one of the children of the spellingof 2646 is the spelling of 0. However, the spelling of 3 does not entail thespelling of 2, or vice versa; although both are nodes in the tree, neither is adescendant of the other.
Definition 2.18.
Let f be a unary function such that the domain of f issome set S , and let T be a recursion tree such that ( f ( s ) , . . . , f ( s k )) is thesequence of nodes in a path from the root of T to one of its leaves. Then wecall ( f ( s ) , . . . , f ( s k )) a recursion chain in the evaluation of f ( s ). Example . Referring to the recursion tree shown earlier for the naturalRPF spelling of 2646, we see that γ N r directly entails γ N r , which in turndirectly entails γ N r . Furthermore, the tree is rooted at γ N r , and γ N r isa leaf. Therefore ( γ N r , γ N r , γ N r ) is a recursion chain in the evaluation of γ N r . Since γ N r = (())(()(()))()((())), γ N r = ()(()), and γ N r = ǫ , wecan also write the recursion chain as ( ′ (())(()(()))()((())) ′ , ′ ()(()) ′ , ǫ ).Now we have sufficient notation and vocabulary to prove that the stan-dard minimal RPF language is a proper subset of the Dyck language.16 heorem 2.1. The language D r min is a proper subset of the Dyck language D . Proof.
In order for D r min to be a proper subset of D , every member of theformer must also be a member of the latter, and at least one member of thelatter must not be a member of the former. Let us address these criteriaseparately:• Definition 2.13 identifies D r min as the codomain of γ N r , which in turnis the image of the standard nonsurjective RPF natural transcriptionfunction γ ′ N r according to Definition 2.10. That latter definition con-tains three cases for the spelling of a natural number n . The first casegives the spelling of 0 as ǫ , while the second case gives the spellingof 1 as (); both of these are members of D . For every other naturalnumber n , the recursion tree of the spelling of n must have as its leavesmembers of the set { γ N r , γ N r } , because only the spellings of 0 and 1do not entail further spellings. The parent of a leaf in the recursion treeof the spelling of n is a spelling where the leaf and its siblings are eachsurrounded by single pairs of matched parentheses, the parenthesizedexpressions then being concatenated in the same order as the siblingsappear in the tree. Enclosing a Dyck word in matched single parenthe-ses results in a Dyck word, and the concatenation of Dyck words alsoresults in a Dyck word; thus the parents of leaves are spellings of Dyckwords. The same process of surrounding children by parentheses andconcatenating the resulting expressions to spell the parent applies atevery level of the tree, so the spelling of n is a Dyck word. Thus thespellings of all members of N are Dyck words, allowing us to concludethat D r min ⊆ D .• The string s = ′ (())() ′ will suffice as a counterexample to demonstratethat at least one member of D is not a member of D r min . The string sat-isfies the definition of a Dyck word: it contains no symbols other thanleft and right parentheses, with the number of left parentheses equal tothe number of right parentheses and with every prefix of s containingat least as many left parentheses as right parentheses. However, s isnot a member of D r min , as it is not in the codomain of γ N r . To seewhy the spelling function is incapable of producing s , we observe that s contains the suffix ′ () ′ and then try to find a number n such that itsspelling contains that suffix. Let us treat the possibilities individuallyas follows. Certainly the spelling of natural number 0 does not contain ′ () ′ as a suffix, since γ N r (0) = ǫ . While it is true that γ N r (1) = ′ () ′ is a word containing suffix ′ () ′ , the word does not equal s . The final17ossibility is γ N r ( n ) for some n ≥
2. We note that regardless of whichsuch n we choose, its spelling only includes exponents a i from a up toand including a m , where p m is the greatest prime factor of n . However,spelling function γ N r only outputs an empty parenthesis pair if it en-counters the number 1, either as the original number being spelled orin the course of spelling a prime number raised to the 0th power. But m cannot be 0, since any prime number p raised to the 0th power isequal to one and is therefore not present in the prime factorization of n , implying p cannot be the greatest prime factor of n . This in turnimplies that the spelling of n cannot contain the suffix ′ () ′ , and we canthus be certain that s ∈ D but s / ∈ D r min .We therefore conclude that D r min is a proper subset of D .We will see (Theorem 2.3) that every Dyck word represents a naturalnumber. But the fact that D r min contains the spelling of every n ∈ N , togetherwith the fact that D r min ⊂ D , implies there must exist natural numberspossessing multiple representations in D (indeed, 0 is the only natural numberhaving a unique representation in D , every other natural number havinginfinitely many). Therefore we cannot view the number-words in D as beingequivalent to the numbers they represent; to do so would result in absurditiessuch as the existence of more than one value of the product of two integers.On the other hand, D r min , enjoying a 1:1 correspondence with N , may betreated as if its members are the natural numbers themselves. For instance,the equation 6 · + 2 = 50is equivalent to (())(()) · (()) ()(()) + (()) = (())()((())) , and the existence of a unique identity element for natural multiplication canbe stated by asserting that for all w ∈ D r min ,() · w = w = w · () . In short, we may regard our D r min number-words as numbers. And so I give D r min a simpler name: Definition 2.19.
The set of Dyck natural numbers is the language D r min . Remark.
There existing infinitely many σ -permuted minimal natural RPFlanguages, I could choose any one of them to be the Dyck naturals. Butdoing so would be no more advantageous than, say, reversing the order ofwriting digits in decimal numbers so that 102 would instead be written as201 . Thus I select the language underlying the standard interpretation.18 .5 The Standard RPF Natural Evaluation Functions α N r and α N r min Definition 2.11 gave us the standard RPF natural spelling function γ N r , whichis a bijection and therefore for which there exists an inverse function γ − N r . Buteven though we are aware that the inverse exists, we do not yet know how tocompute its values; we would like to correct this deficiency, especially since γ − N r is the evaluator in RPF N r min . An evaluation function mapping D r min onto N will thus be useful. First, though, I will define a function having a domainequal to D rather than D r min , because I intend the function to eventuallyserve as the evaluator in the standard general RPF natural interpretation.I will then restrict the function to D r min to yield the evaluation function wepreviously referred to as γ − N r .The definition of the evaluation function involves regarding the inputword as a concatenation of chunks and recursively evaluating these. There-fore before we go any further we must define exactly what we mean by a“chunk,” which in turn requires a definition of the “dimensionality” of aDyck word. Definition 2.20.
The dimensionality of a Dyck word w is the number of out-ermost matching parenthesis pairs in w . More formally, the dimensionalityof w is given by the function dim : D → N such that• For w = ǫ , dim( w ) = 0.• For w = ǫ , dim( w ) = k , where k satisfies w = k M i =1 ( ′ ( ′ a d i a ′ ) ′ )for some sequence of Dyck words ( d , . . . , d k ). Remark.
Dyck words d , . . . , d k are certain to exist, because a criterion forthe well-formedness of a Dyck word is that if the word can be expressed asa string s enclosed within a matching pair of parentheses, then s is also aDyck word. Furthermore, the sequence d is unique, since there is only oneset of outermost matching parenthesis pairs in w , and d i is the Dyck wordcontained within the i th outermost matching parenthesis pair. Definition 2.21. A chunk is a Dyck word of dimensionality 1. Lemma 2.2. If w is a chunk, w = ′ ( ′ a d a ′ ) ′ for some Dyck word d , and | d | = | w | −
2. 19 roof.
Being a chunk, w is a Dyck word of dimensionality 1. Therefore w = M i =1 ( ′ ( ′ a d i a ′ ) ′ )= ′ ( ′ a d a ′ ) ′ = ′ ( ′ a d a ′ ) ′ , where d = d . Since w = ′ ( ′ a d a ′ ) ′ , d obviously has two fewer parentheses than w ; thus | d | = | w | − Definition 2.22.
The content of a chunk w is the Dyck word d , where d isgiven by:• For w = (), d is the Dyck word ǫ .• For w = (), d is the Dyck word satisfying the equation ′ ( ′ a d a ′ ) ′ = w . Definition 2.23.
The standard RPF natural evaluation of a Dyck word w is the function α N r : D → N such that• For w = ǫ , α N r ( w ) = 0.• For w = ǫ , α N r ( w ) = dim( w ) Y i =1 p α N r ( d i ) i where d i is the content of the i th chunk in w . Remark.
The function is named α for ἀριθμός ( arithmos ), Greek for “num-ber.” Remark.
A generalization of the standard RPF natural evaluation functionto apply to all prime-permuted natural interpretations may be found in Sec-tion 2.7.
Example . Let us evaluate ()(()(())): α N r ( ′ ()(()(())) ′ ) = Y i =1 p α N r ( d i ) i = p α N r ( ǫ )1 p α N r ( ′ ()(()) ′ )2 = p p p α N r ( ǫ )1 p α N r ( ′ () ′ )2 = p p p p p α N r ( ǫ )12 = p p p p p = p p = 3 = 27 . d ∈ D , I offer the following: Theorem 2.3.
The domain of the standard RPF natural evaluation function α N r is D . Proof.
Let S be the domain of α N r . We must show that S ⊆ D and D ⊆ S .The first of the two requirements is satisfied by the language of Definition 2.23itself, which explicitly restricts the domain of α N r to D . It therefore remainsto be shown that D ⊆ S . Toward that end, let d be a member of D . Theneither d = ǫ or d = ǫ .• For d = ǫ , α N r ( d ) is explicitly and uniquely defined to be 0.• For d = ǫ , α N r ( d ) is recursively defined as Q dim( d ) i =1 p α N r ( e i ) i , where e i is the content of the i th chunk in d. If α N r ( e i ) exists for all i ∈{ , . . . , dim( d ) } , then Q dim( d ) i =1 p α N r ( e i ) i exists as well, so that α N r ( d ) eval-uates to a number. Suppose that one of the terms p α N r ( e i ) i does notexist. There are only two possible reasons for its nonexistence: either e i is not a Dyck word, or α N r ( e i ) involves an endless recursion of invo-cations and thus does not yield a value. Let us show that neither ofthese supposed possibilities can be the case: – Lemma 2.2 assures us that e i , being the content of a chunk, mustitself be a Dyck word. – To see why the evaluation function cannot involve an endless re-cursion of invocations of itself, consider the evaluation of ()(()(()))from Example 2.4 on the previous page. The recursion tree for α N r ( ′ ()(()(())) ′ ) is: α N r ( ′ ()(()(())) ′ ) α N r ( ǫ ) α N r ( ′ ()(()) ′ ) α N r ( ǫ ) α N r ( ′ () ′ ) α N r ( ǫ )The longest recursion chain arising from the evaluation of ()(()(()))is ( α N r ( ′ ()(()(())) ′ ) , α N r ( ′ ()(()) ′ ) , α N r ( ′ (()) ′ ) , α N r ( ′ () ′ ) , α N r ( ǫ )). Nowlet k be the greatest integer such that there is a recursion chainof length k arising from the evaluation of an arbitrary Dyck word d . Then k must be finite, because the first term in the chain is21he evaluation of a finite-length word and each subsequent term isthe evaluation of the content of a chunk from the evaluation of itspredecessor, with the length of the content being 2 less than thelength of the chunk from which it came, according to Lemma 2.2.As the longest chain (or chains, if more than one chain is of length k ) must be finite, the evaluation of d cannot involve an infinitesequence of recursive invocations of α N r upon itself.Since we have demonstrated that α N r ( d ) exists for every d ∈ D , wehave demonstrated that D ⊆ S .Having shown that S ⊆ D and D ⊆ S , we conclude that S = D . There-fore the domain of α N r is D .Note that α N r ( w ) exists for every w ∈ D r min , since D r min ∈ D . However,the inverse of bijective function γ N r is not α N r , since the domain of the latteris a superset of D r min . We could opt to simply continue using γ − N r to refer tothe evaluation function in the standard minimal natural interpretation, using α N r to compute its values; nevertheless I prefer a more direct approach: Definition 2.24.
The standard minimal RPF natural evaluation function ,denoted by α N r min , is the restriction of α N r to D r min .I will make use of the following two lemmas to prove a theorem establish-ing that α N r min = γ − N r . Lemma 2.4.
Let n be a natural number greater than 1. Then dim( γ N r ( n )) = m , where p m is the greatest prime factor of n . Proof.
Referring to Definition 2.10, which we use to calculate the value of γ N r ( n ), we see that the spelling of n results in a word containing m outermostmatching parenthesis pairs, where p m is the greatest prime factor of n . Lemma 2.5.
Let p k be the k th prime number. Then α N r min ( γ N r ( p k )) = p k . Proof.
Observe that γ N r ( p k ) = k M i =1 ( ′ ( ′ a γ N r ( a i ) a ′ ) ′ ) , where every a i is zero except for a k = 1. We may thus rewrite α N r min ( γ N r ( p k ))as α N r min ( k M i =1 ( ′ ( ′ a γ N r ( a i ) a ′ ) ′ )) , k Y i =1 p a i i . Because the only nonzero a i is a k = 1, we conclude that for every primenumber p k , α N r min ( γ N r ( p k )) = p k . With these lemmas in hand, I am ready to present the theorem and itsproof.
Theorem 2.6.
The bijections α N r min and γ N r are mutual inverses. Proof.
Our proof will be by mathematical induction. Let P k be the proposi-tion α N r min ( γ N r ( k )) = k for natural number k .Propositions P and P are easily verified by considering Definition 2.10together with Definition 2.23:• α N r min ( γ N r (0)) = α N r min ( ǫ ) = 0.• α N r min ( γ N r (1)) = α N r min ( ′ () ′ ) = p = 1.Now let n be a natural number such that n ≥ P l is true for all l less than n .Choose nonnegative integers a , . . . , a m satisfying m Y i =1 p a i i = n, (6)where p m is the greatest prime factor of n . Then γ N r ( n ) = m M i =1 ( ′ ( ′ a γ N r ( a i ) a ′ ) ′ ) . Applying α N r min to both sides gives α N r min ( γ N r ( n )) = α N r min ( m M i =1 ( ′ ( ′ a γ N r ( a i ) a ′ ) ′ )) . Observe that the dimensionality of γ N r ( n ) is m , according to Lemma 2.4;thus α N r min ( γ N r ( n )) = m Y i =1 p α N r min ( γ N r ( a i )) i . (7)23ote that either n is a product of powers of prime numbers all of whichare less than or equal to n −
1, or n is itself prime. But since we are giventhat every natural number l less than n satisfies P l , and since we know fromLemma 2.5 that α N r min ( γ N r ( p k )) = p k for any prime number p k , we canrewrite Equation 7 as α N r min ( γ N r ( n ) ) = m Y i =1 p a i i . (8)Considering Equation 6 together with Equation 8, we see that α N r min ( γ N r ( n )) = n. Thus P k is true for all k ∈ N , implying that α N r min is the inverse of γ N r . Furthermore, the two functions are mutual inverses, as the inverse of abijection’s inverse is the bijection itself.Theorem 2.6 gives us certainty that α N r min is the evaluation function inthe standard minimal RPF natural interpretation. Whether an algorithm exists capable of performing prime factorization inpolynomial time on a classical (non-quantum) computer is one of the greatunanswered questions of computer science, if the input and output are ex-pressed using a positional numeral system such as binary or decimal. UsingDyck naturals, Algorithm 1 on the following page executes in linear time. Example . Given (())()()(()(()))((())) as input, the algorithm outputs theexpression [(()) ∧ ()][()()()(()) ∧ ()(())][()()()()(()) ∧ (())]. Thus the algorithmtells us that(())()()(()(()))((())) = (()) () · ()()()(()) ()(()) · ()()()()(()) (()) . We can check the correctness of the equation by replacing Dyck naturals withtheir decimal equivalents, yielding83006 = 2 · · . Listing 1 on page 26 is an implementation of the algorithm in Python.Looking at the listing, we can see that there is only one loop, with the num-ber of iterations being equal to the length of the input string; thus the scriptruns in linear time. But while that is technically true, it may be mislead-ing. For example, numbers represented in unary can be factorized in lineartime, but still the running time is slower than with numbers represented in a24 lgorithm 1:
Finding the prime factorization of a Dyck naturalnumber.
Input:
An arbitrary Dyck natural w . Output:
The prime factorization of w as an expression of Dycknaturals, with ∧ as the exponentiation operator, squarebrackets as grouping symbols, and multiplication ofgrouped expressions implied by their juxtaposition. base ← ′ (()) ′ ; foreach chunk c k in w doif c k = ′ () ′ thenprint ′ [ ′ ; /* print without newline */ print base ; print ′ ∧ ′ ; print content of c k ; print ′ ] ′ ; end base ← ′ () ′ a base ; end positional numeral system, as the length of input for a unary representationof a number n is equal to the number itself, whereas the length of a base- k word representing n is proportional to log k ( n ). On the other hand, wemight speed things up for the RPF-based algorithm by abbreviating words,e.g., by replacing sufficiently long strings of empty parenthesis pairs withhexadecimal words giving the lengths of unbroken sequences of such pairs;(())()()()()()()()()()(()) thus might be shortened to (())9(()). Under such ascheme the following two factorizations would be equivalent:(())()()()()()()()()()(()(())) = (()) () · ()()()()()()()()()()(()) ()(()) (())9(()(())) = (()) () · A(()) ()(()) . A prime factorization algorithm might use the compressed form for bothinput and output without ever having to reconstitute the corresponding Dycknumbers, since empty parenthesis pairs have no contents requiring processing.
Remark.
The reader might object that finding the prime factorization of aDyck natural number in linear time is a contrived problem, since the workinvolved with prime factorization has already been performed “up front”in order to get the RPF representation in the first place. But that is notnecessarily true; in Theorem 2.7 on page 29 we will see that D r min has an25isting 1: A Python implementation of Algorithm 1. w = input ('Enter the Dyck word to be factorized: ')base = '(())'parenthesisDifference = 0leftIndex = 0ignoreThisChunk = True for i in range (0, len (w)): if w[i] == '(':parenthesisDifference = parenthesisDifference + 1 if parenthesisDifference > 1:ignoreThisChunk = False elif w[i] == ')':parenthesisDifference = parenthesisDifference - 1 if parenthesisDifference == 0: if not ignoreThisChunk: print ('[{}ˆ{}]'. format (base, w[leftIndex+1 : i]), end='')ignoreThisChunk = TrueleftIndex = i + 1base = '()' + base alternative nonnumerical definition, so that we can easily determine whethera given Dyck word is a Dyck natural, even though we may have no idea whatnumber the Dyck natural represents. So far I have confined my treatment of minimal interpretations of N to the“standard” or “rightwise” one; now I will provide a generalization to includeall minimal prime-permuted natural interpretations. Using this generaliza-tion, we will be able to convert objects in one permuted interpretation intocorresponding objects in another.We will define a function γ ′ N σ mapping a natural number n to a uniquestring of parentheses, and then we will define the language underlying RPF N σ min as the set of all possible strings produced by the function. Since γ ′ N σ takesa natural number input and outputs an RPF N σ min spelling of the number,we could choose to regard γ ′ N σ as our spelling function; however, because26e wish the spelling function to have an inverse, we will instead define aspelling function γ N σ identical to γ ′ N σ except with its codomain restricted toRPF N σ min . Thus γ N σ will be a bijection, enabling us to speak of its inverse. Definition 2.25.
Let Σ ∗ be the Kleene closure of the set { ( , ) } , and let σ bea permutation of P , where P is the sequence of prime numbers (2 , , , , . . . ).Then the σ -permuted nonsurjective RPF natural transcription function , de-noted by γ ′ N σ , is given by γ ′ N σ : N → Σ ∗ , such that• For n = 0, γ ′ N σ ( n ) is the empty string ǫ .• For n = 1, γ ′ N σ ( n ) is the string ().• For n >
1, let ( s k ) ∞ k =1 be the sequence ( σ ( p ) , σ ( p ) , σ ( p ) , . . . ), and let m be the smallest number for which an integer sequence ( a j ) mj =1 existssatisfying the equation n = Q mi =1 s a i i . Then γ ′ N σ ( n ) = m M i =1 ( ′ ( ′ a γ ′ N σ ( a i ) a ′ ) ′ ) . At this point we can define our bijective spelling function by specifyingits graph.
Definition 2.26.
The σ -permuted RPF natural spelling function , denotedby γ N σ , is given by γ N σ = { ( n, w ) ∈ N × γ ′ N σ ( N ) | w = γ ′ N σ ( n ) } , where γ ′ N σ ( N ) is the image of γ ′ N σ .At times it will be convenient to use a notation for γ N σ ( n ) that does notinclude the parentheses inherent in function notation. Definition 2.27.
Let k be a natural number. The expression γ N σ k denotesthe k th term in the sequence ( γ N σ ( n )) n ∈ N .Before we define the σ -permuted minimal RPF natural interpretationRPF N σ min , we will assign a name to the language underlying the interpreta-tion. Definition 2.28.
The σ -permuted minimal RPF language , denoted by D σ min ,is the codomain of γ N σ . 27ow we are ready to state the definition of RPF N σ min . Definition 2.29.
The σ -permuted minimal RPF natural interpretation , de-noted by RPF N σ min , is the interpretation ( D σ min , N , γ − N σ ).An evaluation function mapping D σ min into N will be useful. However, Iwill define the function so that its domain is D rather than D σ min , as I intendfor the definition to apply equally well to the general natural RPF languageas to its subset D σ min . Definition 2.30.
Let let σ be a permutation of the sequence (2 , , , , . . . )of prime numbers. Then the σ -permuted RPF natural evaluation of a Dyckword w is the function α N σ : D → N such that:• For w = ǫ , α N σ ( w ) = 0.• For w = ǫ , let ( s k ) ∞ k =1 be the sequence ( σ ( p ) , σ ( p ) , σ ( p ) , . . . ). Then α N σ ( w ) = dim( w ) Y i =1 σ ( p i ) α N σ ( d i ) , where d i is the content of the i th chunk in w .Note that α N σ ( w ) exists for every w ∈ D σ min , since D σ min ∈ D . However,the inverse of bijective function γ N σ is not α N σ , the domain of the latterbeing a superset of D σ min . The following definition gives us a σ -permutedevaluation function that is the inverse of the σ -permuted spelling function: Definition 2.31.
The σ -permuted minimal RPF natural evaluation function ,denoted by α N σ min , is the restriction of α N σ to D σ min .Now we have the ability to convert spellings in one miminal natural in-terpretation to equivalent spellings in another, according to the followingstraightforward procedure. Suppose we have a word w σ evaluating to somenatural number n in the σ -permuted minimal natural interpretation and wishto find the word w τ evaluating to n in the τ -permuted minimal natural in-terpretation. We first find n by applying α N σ min to w σ ; we then apply γ N τ to n , giving us w τ . In other words, we use the following equation: w τ = γ N τ ( α N σ min ( w σ )) . (9)We can also convert the evaluations in one minimal natural interpretationto equivalent evaluations in another, as follows. Suppose we have a number n σ ∈ N , the spelling of which is w in the σ -permuted minimal natural inter-pretation, and we wish to find the number n τ ∈ N with the spelling w in the28 -permuted minimal natural interpretation. The following equation gives us n τ : n τ = α N τ min ( γ N σ ( n σ )) . (10)For the rest of this paper, I will focus primarily upon interpretationsarising from the standard permutation. Generalizations of the associatedmathematical objects to all prime-permuted interpretations are somewhattedious but straightforward. We defined D r min to be the codomain of γ N r (Definition 2.13). However,there is an interesting alternative definition—I say interesting, because it isa non-numerical definition of D r min , and because it can be used to prove that D r min is a context-free language (see Theorem 2.8 on the next page). Theorem 2.7.
These definitions are equivalent:1. The standard minimal RPF language D r min is the codomain of thestandard RPF natural spelling function γ N r .2. The standard minimal RPF language D r min is the set { d ∈ D | ( ′ )()) ′ is not a substring of d ) ∧ ( ′ )() ′ is not a suffix of d ) } . Proof.
The spellings of 0 and 1 are ǫ and () respectively, by explict definition;neither of these contains the substring )()) or the suffix )().Suppose that there exists some natural number n greater than 1 suchthat its spelling contains the suffix )(), i.e, γ N r ( n ) = w = w ′ a ′ () ′ , with w ′ being a nonempty Dyck word. This would imply that the last chunk in w corresponds to the zeroeth power of the greatest prime factor of n , which isa contradiction, the zeroeth power of a prime number not appearing in the(nonrecursive) prime factorization of n .Now suppose that there exists some natural number m greater than 1such that its spelling contains the substring )()). The presence of the closingparenthesis immediately following the empty matched pair of parentheseswould imply that γ N r had been invoked with some number n ≥ n contained the suffix )(), which we havealready shown is a contradiction. 29s for all other words w in D other than those we have already considered,these must also be members of D r min . To see this, suppose that w is not inthe codomain of γ N r . Since the domain of α N r is D , α N r ( w ) is a naturalnumber. This would imply that a natural number exists (other than 0 or1, these already having been considered) which cannot be represented by aproduct of prime powers n = m Y i =1 p a i i , where p m is the greatest prime factor of n and a is the unique integer sequencesatisfying the equation. Note that if we form a product of only the p a i where a i = 0, we obtain the (nonrecursive) prime factorization of n . Also note thatall zero a i designate the exponents of primes not contributing to the primefactorization. Thus n is an integer greater than or equal to 2 such that n hasno prime factorization, which is a contradiction.We now make use of the alternative definition of D r min to prove that D r min is context-free. Theorem 2.8.
Let Σ ∗ be the Kleene closure of the set { ( , ) } , and let R be thelanguage accepted by the deterministic finite automaton (DFA) representedby the following state transition diagram:q0 q1 q2 q3 q4( ) ) (( )( ) ()Then R = { w ∈ Σ ∗ | ( ′ )()) ′ not a substring of w ) ∧ ( ′ )() ′ not a suffix of w ) } , and D r min = D ∩ R, proving by the Chomsky-Sch¨utzenberger representation theorem that D r min is a context-free language. 30 emark. I am indebted to Dr. Brian M. Scott [2] for his assistance identifyingthe state transition table for the DFA used in Theorem 2.8.
Proof.
We first verify that no word w ∈ Σ ∗ containing the substring )())is accepted by the DFA. Suppose the automaton has read some arbitrarynumber of symbols and is currently at state q i , with the remaining inputstarting with the string )()). For each i ∈ { , , , , } , the input sequence)()) places the automaton in state q , from which no transition to anotherstate is possible, so that w ends with the automaton in state q . Since q is notan acceptor state, w is rejected. Thus the DFA rejects all words containingthe substring )()).Next we verify that no word w in Σ ∗ containing the suffix )() is acceptedby the DFA. Suppose the automaton has read some arbitrary number ofsymbols and is currently at state q i , the remaining input being )(). Now letus consider each of the possibilities.• If i ∈ { , , } , then the input ends with the automaton in state q .Since q is not an acceptor state, w is rejected.• If i ∈ { , } , then the input ends with the automaton in state q . Since q is not an acceptor state, w is rejected.Thus the automaton rejects all words in Σ ∗ containing the suffix )().We must still verify that all words w in Σ ∗ other than those rejected aboveare recognized by the DFA. In our verification, we can ignore any wordsending with the DFA in state q , since the only way to reach that state isfor w to contain the substring )()), and all of these words have already beenconsidered. We may likewise ignore words ending with the DFA in state q ,these all containing the suffix )(). The only other possibility is that w endsin either q , q or q , each of which is an acceptor state. Thus the automatonaccepts all words in Σ ∗ except for those containing the substring )()) or thesuffix )().Because D ⊂ Σ ∗ , the DFA accepts every d ∈ D such that )()) is not asubstring of d and )() is not a suffix of d . Therefore, from Theorem 2.7 , D r min = D ∩ R, which we may rewrite as D r min = h ( D ∩ R ) , h : D → D is the identity function, so that h is a homomorphism asrequired by the Chomsky-Sch¨utzenberger representation theorem. And sowe conclude that D r min is context-free.I propose a context-free grammar for the generation of D r min . The gram-mar is ambiguous, yielding two shift-reduce conflicts among the states gener-ated by an LALR parser; nevertheless, with these conflicts resolved in favor ofshifting I have used the parser to verify that the grammar generates minimalspellings of the first thousand natural numbers. Conjecture 2.1.
The language underlying RPF N r min is generated by thefollowing context-free grammar: N → ǫ N → ( ) N → SS → SSS → ( ) SS → ( S ) S → (( )) Q r min The prime factorization of each natural number n ≥ n (if we understand primenumbers to be their own prime factorizations); we are guaranteed by thefundamental theorem of arithmetic that the product is unique up to theorder of the factors. Thus we may write 40 as 2 × × ×
5, or more brieflyin exponential form as 2 · . Let us consider the general exponential formof the prime factorization of n p b a · · · p b k a k , where p a i is the i th prime factor of n , p a k is the greatest prime factor of n , and ( b , . . . b k ) is the unique sequence of positive integers such that theexpression evaluates to n . 32ut let us now relax our requirement that the integers b i in ( b i ) ki =1 bepositive, allowing negative integers to be included as well. Then we couldwrite 5 .
6, for example, as p · p − · p . Observe that the above expression bears a striking resemblance to the ex-ponential form of prime factorization. Indeed, just as with prime factoriza-tion, the expression is unique up to the order of its factors, since 5 . , the numerator and denomina-tor each corresponding to unique prime factorizations:5 . × × . If we are willing to include -1 as a factor, we may represent negative rationalsas well. Hence we may generalize prime factorization to nonzero rationalnumbers.
Definition 3.1.
Let q be a nonzero rational number, and let ( b , . . . , b k ) bethe integer sequence of shortest length such that | q | is equal to p b a · · · p b k a k ,where ( p a i ) ki =1 is a subsequence of the sequence P of prime numbers. Thenthe rational prime factorization of q is p b a · · · p b k a k if q >
0; otherwise, therational prime factorization of q is − · p b a · · · p b k a k . Remark.
I used the name “ rational prime factorization” rather simply “primefactorization” in order to avoid controversy. For one thing, the fundamentaltheorem of arithmetic only addresses integers, specifically integers greaterthan 1. Also, note that while I was able to write the prime factorization ofdecimal 40 without using exponentiation, there is no integer k satisfying5 . × × × · · · × | {z } k × . We saw in Section 2.4 that D r min is not equal to D , being rather a propersubset of it. Recall that this is because D r min is the codomain of the spellingfunction γ N r , which is only capable of producing spellings where emptychunks correspond to the zeroeth powers of those prime numbers less thanthe greatest prime factor of the number being spelled (with the exception ofthe spelling of 1, that being ′ () ′ ). We will relax this restriction to includeadditional words beyond those produced by γ N r , yielding nonminimal rep-resentations of natural numbers. But we will do so incrementally, because33aking slight changes to α and γ will give us the remarkable ability to regardevery rational number as a unique Dyck word.Definition 2.5 tells us what we are to understand by the term “minimalrepresentation,” and of course a nonminimal representation is a representa-tion that is not minimal. But there is another way to view the concept ofnonminimality. In the decimal system, every member of the infinite sequenceof strings (102 , , , . . . ) is understood to represent the number onehundred and two. We have no problem, for instance, determining the quan-tity represented by 000102 on a car odometer. Nevertheless there is oneparticular string, 102, that is the minimal word representing one hundredand two; all others may be regarded as a result of “inflating” the minimalword with zeroes without changing the quantity it represents. The situationis similar for words in nonminimal RPF languages, except that the inflationsinvolve empty parenthesis pairs rather than zeros and there almost alwaysexist multiple locations in the word where inflation may take place. Definition 3.2. An empty pair is the string ().Consider the three Dyck words w = ()()(()), x = ()()(())() and y =()()(()()). Applying the standard natural evaluation function to each of these,we find that α N r ( w ) = α N r ( x ) = α N r ( y ) = 5 . If we apply the standard natural spelling function to each of the aboveevaluations, we discover that γ N r ( α N r ( w )) = γ N r ( α N r ( x )) = γ N r ( α N r ( y )) = ′ ()()(()) ′ = w, so that w is the only Dyck word of the three where the word equals thespelling of its evaluation. Also notice that of w , x and y , w is the word ofshortest length; indeed, ()()(()), being a member of D r min , is the shortestDyck word evaluating to 5 in the standard natural interpretation. Definition 3.3.
Let d and d ′ be Dyck words. d is said to be an inflation of d ′ if α N r ( d ) = α N r ( d ′ ) and | d | > | d ′ | , where | s | denotes the length of string s .If we do not wish to specify d ′ , we may simply say d is an inflation , with theimplication that some w ∈ D exists such that d is an inflation of w . Remark.
Dyck word w is an inflation if and only if w = γ N r ( α N r ( w )). Definition 3.4.
Let d ′ and d be Dyck words such that d is an inflation of d ′ . Then we say d ′ is a deflation of d .34ords in D r min cannot be inflations, as the only empty pairs they containare those essential as separators to collectively designate indices of primenumbers contributing to the prime factorization (or, in the case of (), tospell the number 1). Consider an arbitrary Dyck word w . We can forma sequence with w as its first term, with every subsequent term being adeflation of its predecessor; the longest possible such sequence must have amember of D r min as its last term. Definition 3.5.
Let w and w ′ be distinct Dyck words. We say w collapsesto w ′ if α N r ( w ) = α N r ( w ′ ) and w ′ ∈ D r min . Example . Dyck word w = (()()(())()())()()() collapses to w ′ = (()()(()));the standard natural evaluation of both w and w ′ is 32, and w ′ is a memberof D r min .The following gives us terminology to talk about strings consisting solelyof empty pairs. Definition 3.6.
Let w = L ki =1 ′ () ′ for some k ∈ N . Then we say w is thestring of k contiguous empty pairs . If k = 0, we say the string is trivial .Thus the trivial string of zero contiguous empty pairs is ǫ , the string ofone contiguous empty pair is (), the string of two contiguous empty pairs is()(), and so on.The following definition allows us to quantify “how inflated” we considera given Dyck word to be. Definition 3.7.
Let w be a Dyck word. The standard inflationary degree of w , denoted by dinf r ( w ), is the largest integer n such that a string of n contiguous empty pairs can be deleted from w to yield a Dyck word w ′ satisfying the equation α N r ( w ′ ) = α N r ( w ) . Example . Let w = (()()()())(). The longest substring that can be deletedfrom w to yield a Dyck word w ′ with the same standard natural evaluation asthat of w is ()()(), the string of 3 contiguous empty pairs. Thus dinf r ( w ) = 3.And now I introduce the language underlying the standard minimal RPFrational interpretation. Definition 3.8.
The standard quasiminimal RPF language , also called theset of Dyck rational numbers and denoted by D r qmin , is the set of all Dyckwords d such that dinf r ( d ) ≤ Remark.
The designation quasiminimal is due to the fact that D r qmin is thelanguage underlying both the standard minimal RPF rational interpretationand the standard nonminimal RPF natural interpretation ( D r qmin , N , α N r ).35 .3 The Standard Rational Spelling Function γ Q r Definition 3.9.
Let q be a nonzero rational number. The greatest primebase of q is given by the function gpb : Q \ { } → { p , p , p , . . . } , where• If q = 1, then gpb( q ) = p .• Otherwise, let the integer sequence ( a , . . . , a m ) satisfy the equation q = Q mi =1 p a i i such that m is the greatest number for which a m = 0.Then gpb( q ) = p m . Remark.
Informally, the greatest prime base may be thought of as the largestprime number that must appear in the product in order for the product toevaluate to q . Definition 3.10.
Let q be an arbitrary member of the set Q of rationalnumbers. The standard RPF rational spelling is the function γ Q r : Q →D r qmin , where• For q = 0, γ Q r ( q ) = ǫ .• For q = 1, γ Q r ( q ) = ′ () ′ .• For q < γ Q r ( q ) = γ Q r ( | q | ) a ′ () ′ .• Otherwise, let p m be the greatest prime base of q , and let integer se-quence ( a , . . . , a m ) satisfy the equation q = Q mi =1 p a i i . Then γ Q r ( q ) = m M i =1 ( ′ ( ′ a γ Q r ( a i ) a ′ ) ′ ) . Example . Let us spell the Dyck rational corresponding to decimal − . − . − = − · · − = − p p − . Thus the standardRPF rational spelling corresponding to decimal − . γ Q r ( − .
2) = γ Q r (0 . a ′ () ′ = ′ ( ′ a γ Q r (1) a ′ ) ′ a ′ ( ′ a γ Q r ( − a ′ ) ′ a ′ () ′ = ′ (()) ′ a ′ ( ′ a γ Q r (2) a ′ () ′ a ′ )() ′ = ′ (())( ′ a ′ ( ′ a γ Q r (1) a ′ ) ′ a ′ ())() ′ = (())((())())() . Observe how our modification of the spelling function allowed us to en-large its domain from N to Q without requiring an intermediate modificationto go from N to Z . This is because the recursive nature of the spelling36unction implies that if it can spell negative numbers, it can also spell nega-tive exponents. Indeed, enlargement of the domain of the spelling functionfrom N to merely Z requires that the set of permitted inflations be expresslylimited to those where a string of inflationary empty pairs only occurs as asuffix of the original word being spelled; this constitutes an artificial restric-tion, intended to defeat the recursivity of the function. In other words, withrecursive prime factorizations it is easier and more straightforward to go di-rectly from representing natural numbers to representing rational numbers,than to go from representing natural numbers to representing integers andthence to representing rationals. When we modify the spelling function sowe can spell negative numbers, we get the ability to spell rational numbers“for free.”The standard quasiminimal RPF language D r qmin has alternative defini-tions equivalent to Definition 3.8 . I leave the following as a proposition;its proof is tedious but not conceptually difficult, with my preceding treat-ment of D r min providing background for how to proceed (refer especially toTheorem 2.7 on page 29). Proposition 1.
The following definitions are equivalent:• The standard quasiminimal RPF language , also called the set of Dyckrational numbers and denoted by D r qmin , is the set of all Dyck words d such that dinf r ( d ) ≤ standard quasiminimal RPF language , also called the set of Dyckrational numbers and denoted by D r qmin , is the codomain of γ Q r .• The standard quasiminimal RPF language , also called the set of Dyckrational numbers and denoted by D r qmin , is the set of all d ∈ D suchthat d fulfills the following two criteria: – d does not contain the substring )()()). – d does not contain the suffix )()(). Remark.
Using the nonnumerical alternative definition of D r qmin , the defini-tions of “inflation” (Def. 3.3), “collapse” (Def. 3.5) and “inflationary degree”(Def. 3.7) can be restated so that they do not refer to evaluation function α N r ,thus eliminating the necessity to perform prime factorization in the courseof their application. 37 .4 The Standard RPF Rational Evaluation Functionsand the Standard Minimal RPF Rational Interpre-tation I define a function capable of evaluating every member of the Dyck languageas a rational number, and then I define a second function as a restrictionof the first, yielding the bijection to serve as the evaluator in the standardminimal rational interpretation.
Definition 3.11.
Let w ∈ D . Then the standard RPF rational evaluation of w is the function α Q r : D → Q , where• If w = ǫ , α Q r ( w ) = 0.• Otherwise, let w ′ and z be Dyck words where w = w ′ a z , with z beingthe longest suffix of w such that w ′ = ǫ and z contains only emptyparenthesis pairs. Then α Q r ( w ) = ( − dim( z ) dim( w ′ ) Y i =1 p α Q r ( d i ) i , where d i is the content of the i th chunk in w ′ . Example . Let us evaluate the Dyck word w = (())((())())() as a standardRPF rational number. Note that w = w ′ a z , where w ′ = (())((())()) and z = (). Thus we have α Q r ( w ) = ( − dim( z ) dim( w ′ ) Y i =1 p α Q r ( d i ) i , with d = ( ′ () ′ , ′ (())() ′ ). Therefore α Q r ( w ) = ( − p α Q r ( ′ () ′ )1 p α Q r ( ′ (())() ′ )2 = − p p α Q r ( ǫ )1 p ( − p α Q r ( ′ () ′ )1 = − p p p − p p α Q r ( ǫ )11 = − p p − p p = − p p − = − · − = −
29 = − . . . . . Definition 3.12.
The standard minimal RPF rational evaluation function ,denoted by α Q r min , is the restriction of α Q r to D r qmin .Now we are ready to define the interpretation.38 efinition 3.13. The standard minimal RPF rational interpretation , de-noted by RPF Q r min , is the interpretation ( D r qmin , Q , α Q r min ). Remark.
The presence of both “min” and “qmin” in the definition abovedoes not reflect a typographical error. The interpretation is minimal, asevaluation function α Q r min is a bijection. However, the underlying language isquasiminimal, as was noted in the remark following Definition 3.8 on page 35.We found that (())((())()))() evaluates to the number represented in dec-imal by − . . . . . Note that the decimal representation required threeaugmentations to the language: a negative sign to designate negative num-bers, a radix point to mark the boundary between nonnegative and negativepowers of the radix, and an ellipsis (or, alternatively, an overbar) to designateendlessly repeating digit sequences. Also note that the RPF representationof the same number did not require such augmentation; especially note thatno ellipsis or overbar was required for the RPF representation to be of finitelength. The following theorem tells us that all rational numbers have finite-length RPF representations, with no addition to the symbol set required. Theorem 3.1.
Every rational number may be represented by a word of finitelength in RPF Q r min . Proof.
Let q be an arbitrary rational number. To prove that q has a finite-length representation in RPF Q r min , we will find the representation.• For q = 0, the finite-length representation of q is ǫ .• For | q | = 1, q can be either 1 or -1, the finite-length representationsof which are () and ()() respectively.• Otherwise choose t ∈ N and u ∈ N + such that tu is a reduced fractionand | q | = tu . By the definition of reduced fractions, t and u have nocommon factors. Let us express t as p if t = 1, otherwise as p a · · · p a k k ,where p k is the greatest prime factor of t . Similarly, let us express u as p if u = 1, otherwise as p b · · · p b m m , where p m is the greatest primefactor of u . Thus tu = p c · · · p c j j , where j = max( k, m ) and c i = a i if p i is a factor of t − b i if p i is a factor of u ≤ i ≤ j . The expression p c · · · p c j j is then equal to | q | andcorresponds to the RPF Q r min word w = γ Q r ( tu ) of finite length, since j is finite. If q is positive, then w is the representation of q and we aredone; otherwise the finite-length word w a ′ () ′ is the representation of q . Remark.
In Section 4 we will see that the language underlying RPF Q r min isa proper subset of the language D underlying the standard Dyck-completeinterpretation of Q . Since q as obtained above is a member of the former set,it is also a member of the latter, implying that every rational number has arepresentation of finite length in D . Having modified RPF N r min to yield RPF Q r min by permitting first-degree infla-tions, we might imagine that further modifying it to permit second-degree in-flations would buy us additional representational power, so that we could useterminal empty parenthesis pairs to encode additional two-state attributes—for example, an attribute called “spin,” which could either be clockwise orcounterclockwise. Furthermore, we might imagine that the attributes of signand of spin would be orthogonal , i.e., that the value of the sign could bedetermined without having to know that of the spin (and vice versa). Butsuch a happy state of affairs is not the case, as the following theorem statesfor standard RPF interpretations. Theorem 3.2.
Let R be the standard minimal RPF language of 0-degreeinflations, let R be the standard quasiminimal RPF language of inflations ofdegree ≤
1, and let R be the standard RPF language of inflations of degree ≤
2. Then R is no more powerful than R for encoding orthogonal binaryattributes. Proof. R includes minimal RPF words and their single terminal inflations,so it can be successfully used for encoding a single binary attribute mod-ifying the evaluation of the word. For example, suppose the attribute is(north , south), such that (()) means “2 units due north,” whereas its termi-nal inflation (())() evaluates to “2 units due south.” This is indeed possibleusing a subset of the Dyck language confined to inflations of degree ≤
1. Butnow suppose we try to extend the concept by using R so we can encodean additional orthogonal binary attribute, say, that of (west , east),with the40ext-to-last empty parenthesis pair signifying north versus south and the lastempty parenthesis pair signifying west versus east. We might for instancena¨ıvely claim that whereas (()) represents “2 units northwest”, (())()() rep-resents “2 units southeast.” Under such an interpretation, however, the word(())() is ambiguous; without further information aside from the word itselfand its interpretation, we cannot tell whether it signifies 2 units northeast or2 units southwest. In other words, we can only be sure of the values of wordsthat are minimal or have ′ ()() ′ as a proper suffix . We might be content toimpose a convention for disambiguation, so that all occurrences of a minimalword concatenated with a single empty parenthesis pair would be resolved infavor of, say, south rather than east. But to impose such a convention woulddestroy the orthogonality of the two attributes. Remark.
The theorem states that the set of inflations of degree ≤ ≤
1, but says nothing about sets including inflations of even higherdegree. Obviously, though, permitting even longer strings of inflationaryempty pairs does not eliminate the ambiguity.
While a minimal RPF natural interpretation and its corresponding minimalRPF rational interpretation are sufficient to represent all natural numbersand all rational numbers respectively, and while each of these interpretationsenjoys the property that a bijection exists between its underlying languageand its target set, we can generalize our notion of RPF languages to permitall possible inflations. The result is the Dyck language, which underlies allgeneralized RPF interpretations, regardless of whether they are natural orrational, or which prime permutations determine the ordering of factors intheir words.
Remark.
The Dyck language has been a topic of much research, resulting inan extensive body of knowledge of potential use for the study of recursiveprime factorizations.
Definition 4.1.
An interpretation is
Dyck-complete if its underlying lan-guage is D .We already have our evaluation functions for the standard Dyck-completenatural and rational interpretations; these are α N r and α Q r , respectively. Definition 4.2.
The standard RPF Dyck-complete natural interpretation ,denoted by RPF N r , is the interpretation ( D , N , α N r ).41 efinition 4.3. The standard RPF Dyck-complete rational interpretation ,denoted by RPF Q r , is the interpretation ( D , Q , α Q r ).These interpretations are nonminimal, their evaluation functions beingnoninjective. There is a spelling function associated with each interpretation— γ N r for RPF N r , γ Q r for RPF Q r —but words do not generally equal the spellingsof their evaluations. Indeed, the evaluation functions define equivalence re-lations partitioning the underlying languages of their interpretations intoequivalence classes, where each equivalence class contains Dyck words thatevaluate to the same number. For example, let R N be the equivalence relation R N = { ( w , w ) ∈ D | α N r ( w ) = α N r ( w ) } . Then we can identify each equivalence class as S n , where every memberof the class evaluates to the natural number n . Thus S = { ǫ } , S = { ′ () ′ , ′ ()() ′ , ′ ()()() ′ , . . . } , and so on. In each equivalence class, there is ex-actly one word that is in the codomain of the spelling function; it is thus theonly member of its class such that it is equal to the spelling of its evaluation.Figure 1 shows the hierarchy of the languages L underlying standardRPF interpretations, with the interpretations designated by subscripts. Alsoshown are the same sets as named according to their status as subsets of theDyck language: the standard Dyck minimals, the standard Dyck quasimini-mals, and the Dyck language itself. L RPF N r min = D r min L RPF Q r min = D r qmin L RPF N r = L RPF Q r = D Figure 1: Euler diagram illustrating the hierarchy of standard RPF lan-guages. 42
Conclusion: Research Directions and Pos-sible Applications
I conclude with suggestions for further study of recursive prime factorizations,as well as possible applications in mathematics and computer science.
Conjecture 5.1.
Let n ∈ N . The number of words in D r min of length 2 n − n −
1) is given by the sequence( a n ) n ∈ N + = (1 , , , , , , , , , , , , , , . . . ) , i.e., Sequence A082582 in the Online Encyclopedia of Integer Sequences (“Ex-pansion of (1 + x − √ − x + 2 x + x ) / (2 x ) in powers of x”) [3]. Remark.
For example, the number of words of length 2(4 −
1) = 6 in D r min is given by a = 2. This may be easily verified for the case of n = 4 byenumerating all 6-character strings of left and right parentheses, and theneliminating those that are not Dyck naturals. The only two strings noteliminated are ()(()) and ((())). One interpretation of sequence A082582involves excluding Dyck paths containing the 4-character subpath DDUU.How that relates to RPF N r min words is a question to be pondered. Definition 5.1.
Let k be a natural number. The stripe of semilength k ,denoted by θ ( k ), is the longest subsequence s of (0 , , , , . . . ) such thatevery term n in s satisfies the equation | γ N r ( n ) | = 2 k , where | w | is the lengthof string w . We call θ (0) and θ (1) trivial stripes .Let S be the sequence of the first four nontrivial stripes. Then S is((2) , (3 , , (5 , , , , , (7 , , , , , , , , , , , , . We see interesting patterns in S . For example, the first member of the i thstripe in S is the prime number p i , and no member m exists in S i such thatthe spelling of m entails the spelling of p i . Also, the sequence of last termsof stripes in S may be expressed as (2 , , , ). Do these patterns holdfor all stripes of semilength ≥
2? What other patterns characterize stripes?Do these patterns suggest a nonnumerical algorithm for finding the successorof Dyck natural w , i.e., γ N r ( α N r min ( w ) + 1)? And so we are led to the nextquestion: 43 .3 Can We Develop an Algorithm to Compute theSuccessor of a Dyck Natural Number in Polyno-mial Time? Whether numerical or nonnumerical, an algorithm to find the successor ofa Dyck natural number such that the algorithm runs in polynomial timeon a classical computer could be easily modified according to the Peanoaxioms to yield a second algorithm performing addition on Dyck naturals,again running on a classical computer in polynomial time; this algorithm inturn could be modifed to yield a third algorithm capable of solving primefactorization of a number on a classical computer in polynomial time, withthe input and output both being representations in a positional numeralsystem such as decimal. The last algorithm would constitute a solution toone of the currently unsolved problems of computer science. (As we sawin Section 2.6, prime factorization in linear time on a classical computer isalready attainable, if we are willing to accept algorithms where the input andthe output are expressed using Dyck natural numbers.)
While Algorithm 1 accomplishes prime factorization in linear time, that doesnot mean the algorithm is faster than currently known prime factorizationalgorithms. In particular, the presence of long sequences of empty pairs innatural RPF words may result in very long input strings, with the result thatexecution time, though linear, is too large for the algorithm to be practical.I suggested a way to improve performance by using a hybrid scheme forinput and output, where each sufficiently long substring of empty pairs wouldbe replaced by a hexadecimal word giving the number of empty pairs inthe substring. What would be the time and space complexities of such analgorithm?
In decimal, we use a negative sign to extend the set of numbers we canexactly represent with a word of finite length from N to Z ; if we then usea radix point to imply the positions of coefficients of negative powers of theradix, we can exactly represent such members of Q as -4.2 and 1.75, but notthose such as 0 . . . . . If we then use a vinculum (overbar or underbar)to designate an unending sequence of repeating digits, as in 0 . Q by a decimal word of finite length.44he Dyck rationals already represent Q by words of finite length. Also,we know that irrational numbers can be expressed as infinite products ofrationals, as is the case with the Wallis product: π ∞ Y n =1 n n − . Can we use a “vinculoid,” by which I mean a device serving a purposeanalogous to that served by a vinculum in decimal to permit the exact repre-sentations of rational numbers, to extend the set of numbers we can representby finite-length words in rational RPF from Q to R ?Perhaps a vinculoid would be a form of augmentation or “markup” suchthat marked-up Dyck rationals could represent irrational numbers. Conjecture 5.2.
It is possible to augment D r min to represent the set of realnumbers by words of finite length.Alternatively, a vinculoid might be a set of grammar productions fromwhich an infinite sequence of Dyck rationals would be generated, such thatthe grammar would define a pattern manifested by successive Dyck rationalapproximations of an irrational number. Conjecture 5.3.
For every irrational number x , there exists a context-freegrammar G x such that G x generates an infinite sequence of Dyck rationals( s i ) i ∈ N + , with α Q r ( s i ) converging to x as i approaches infinity. Grammar-based compression algorithms such as Re-Pair [1] produce a context-free grammar for the string being compressed. Input with lower informationentropy (i.e., input that is less random) will have a higher compression ratiothan input with higher information entropy. Suppose we have two samplesof input, these being of equal size but with one consisting of a concatenationof randomly-chosen Dyck naturals, the other consisting of a concatenation ofDyck naturals we hypothesize to be consecutive terms in a sequence mani-festing some pattern. If the compression ratio obtained for the second inputstring is greater than that obtained for the first, we might take this as corrob-orating (but not proving, of course) our hypothesis. If we find the differencein compression ratios becomes more pronounced as the size of the input in-creases, we might take that to be an even stronger indication our hypothesisis correct. 45erhaps a conventional compression algorithm would serve as well as agrammar-based algorithm to accomplish what I described in the previousparagraph. But now suppose we have a sequence of Dyck rationals consti-tuting successive approximations of an irrational number such as π , and wefeed the algorithm longer and longer concatenations of terms in the sequence.Might we observe a pattern in the corresponding sequence of context-freegrammars produced by the algorithm such that we can use the pattern toobtain a vinculoid (see Conjecture 5.3) enabling us to exactly represent theirrational number? The
Gaussian rationals are complex numbers of the form q + ri , such that q and r are rational. While we cannot represent such numbers as Dyck words(see Theorem 3.2 on page 40), the Dyck language can be generalized to in-clude more than one type of delimiter, where the criteria for well-formednessapply to each type of delimiter in a word. Thus w = (()())(())[()()(()())]is a word in that generalization of the Dyck language having the alphabet { ( , ) , [ , ] } . We can regard w as an RPF representation of the Gaussian ra-tional + i , where the imaginary part is designated by being surroundedwith square brackets. What would be the description of such a language L with the additional property that L is minimal? Does there exist a lan-guage L ′ satisfying the criteria for L , with the additional property that L ′ iscontext-free? What class of algebraic objects would L ′ generate? My work on RPFs arose from a search for ways to make patterns in num-bers more immediately accessible by avoiding certain drawbacks of positionalnumeral systems. While RPF spellings of numbers do show the prime fac-torizations involved, long strings of parentheses can become overwhelming,reminiscent of what programmers call “LISP hell.” I wonder whether graphi-cal methods might be developed to aid the recognition of numerical patternsamong RPF words, for example:• Define an “RPF natural number spiral,” based on the Sacks naturalnumber spiral but with the k th counterclockwise rotation of the spiralcontaining all naturals n such that | γ N r ( n ) | = 2 k , these appearing innumerical order. The sequence of numbers appearing on the zeroeth46otation of the spiral, for instance, would be (0), while the the sequenceof numbers on the fifth rotation would be (5 , , , , . . . −
1. But if Conjecture 5.1is true, there are 54,857,506 numbers appearing in rotation 19 of theRPF natural spiral.• Create a computer program that produces two-dimensional images ac-cording to an adaptation of escape-time algorithms such as those usedto depict Mandlebrot and Julia sets, as follows. For every coordinate( x, y ) on the pixel display, let z be a function of ( x, y ), for exam-ple z = ⌊ x ⌋⌊ y ⌋ , and let z k +1 be defined as a function of z k , for example z k +1 = α Q r ( ′ () ′ a γ Q r ( z k )). Now let z n be the smallest number greaterthan L in the sequence, where L is a large natural number of our choos-ing; we may regard n as a measure of how quickly the values of z k areapproaching infinity for the coordinate ( x, y ). We then map values of n to a range of colors, using the appropriate color to display the pixelcorresponding to ( x, y ). Of course, there may be instances where suc-cessive values of z k do not grow out of bounds, instead converging tosome number; to handle this possibility, if k reaches a certain large pos-itive integer value Q we have chosen, we quit the iterations and displaythe pixel for ( x, y ) using the color black (for example). σ -permuted Min-imal Natural RPF Languages Does S σ D σ min equal D ? Does T σ D σ min equal { ǫ , () } ?47 .10 Context-free Grammars for the Minimal and Quasi-minimal Dyck Languages Theorem 2.8 assures us that a context-free grammar (CFG) exists for D r min ,and I have provided a candidate grammar as a conjecture. Is that grammarcorrect? If so, does an equivalent grammar with a smaller set of productionsexist? Can we formulate a meta-grammar useful for deriving the productionsof the language underlying an arbitrary prime-permuted minimal naturalRPF interpretation? What would be an example of a CFG for D r qmin ? On many occasions it is desirable to be able to specify, manipulate and testfor the exact values of rational numbers; however, such numbers as canonly be approximated in binary. This problem may be ameliorated to someextent by use of arbitary-precision arithmetic, in which the amount of stor-age allocated for numbers is determined by the precision required for theapplication. I can imagine a modification of arbitrary-precision arithmeticto include an intermediate lookup to an array of RPF natural spellings, eachof bit width b , the spellings stored as bit sequences encoding left parentheses(respectively, right parentheses) as 1s and right parentheses (respectively, leftparentheses) as 0s, such that the spelling of natural number k would reside atoffset bk from the start of the array. The purpose of this arrangement wouldbe to allow “RPF shortcuts” in arbitrary-precision computations in those sit-uations where taking such shortcuts would be advantageous in terms of timeand/or space, or when an exact result was sufficiently important to warrantthe implicit conversions. Of course, while it is true that every rational num-ber has a finite-length representation in standard rational RPF (or in anyother prime-permuted rational RPF interpretation, for that matter), thereare several limitations that would impose constraints, and I do not knowwhether anything of value would arise from the idea. One limitation thatcan be overcome is the excessive size of words containing long substrings ofempty pairs; in order for the scheme to be practical, it would have to bemodified so that Dyck words are stored and accessed using a form of com-pression, such as replacing each sufficiently long substring of empty pairs bya hexadecimal word specifying the number of pairs replaced.48 .12 Can Recursive Prime Factorizations Be Useful forCryptography? I can imagine encryption algorithms involving recursive prime factorizations.As a very simple example, suppose we feed the ASCII values of plaintextinto an algorithm that spells the values as Dyck naturals and then outputsas ciphertext the σ -permuted evaluations of the spellings for some prime per-mutation σ . Of course, this particular scheme is useless, being easily defeatedby frequency analysis. Can recursive prime factorizations find practical usein cryptography? Although D r min is not regular, many interesting subsets of it are. For example,the set { w ∈ D r min | α N r min ( w ) is prime } is recognized by the following DFA.q0q1 q5 q4q3q2q1( )) ( ( ) ( )( , )( , ) Remark.
We can also employ regular expressions to describe such subsets. Inorder to avoid confusion between grouping parentheses and symbols in thealphabet, let us use the binary alphabet { , } , where ′ ′ is equivalent to ′ ( ′ and ′ ′ is equivalent to ′ ) ′ . Then (10)*1100 denotes the set of prime Dycknaturals, and (10)*1100(10)*1100 denotes the set of squarefree semiprimeDyck naturals.Let k be a positive integer. What is the characterization of the set N k ⊂ N such that each member n of N k is represented according to α N r min by a word w in some regular subset D ⊂ D r min , where no spelling γ N r ( n ) has a recursionchain of length greater than k ? 49 eferences [1] Bille, P., Gørtz, I. L., and Prezza, N.
Practical and effectivere-pair compression. arXiv preprint arXiv:1704.08558 (2017).[2] (https://math.stackexchange.com/users/816393/justasking),J.
Is language of all binary-digit strings not containing substring0100 or suffix 010 context-free? Mathematics Stack Exchange.URL:https://math.stackexchange.com/q/3926763 (version: 2020-11-28).[3]
Munarini, E.
The On-Line Encyclopedia of Integer Sequences.http://oeis.org/A082582A082582, May 2003. Expansion of (1 + x − sqrt (1 − ∗ x + 2 ∗ x + x )) / (2 ∗ x ) in powers of x .[4] user11750 (https://math.stackexchange.com/users/11750/user11750)user11750 (https://math.stackexchange.com/users/11750/user11750)