[PDF] Innovation and imitation

Abstract

We study several models of growth driven by innovation and imitation by a continuum of firms, focusing on the interaction between the two. We first investigate a model on a technology ladder where innovation and imitation combine to generate a balanced growth path (BGP) with compact support, and with productivity distributions for firms that are truncated power-laws. We start with a simple model where firms can adopt technologies of other firms with higher productivities according to exogenous probabilities. We then study the case where the adoption probabilities depend on the probability distribution of productivities at each time. We finally consider models with a finite number of firms, which by construction have firm productivity distributions with bounded support. Stochastic imitation and innovation can make the distance of the productivity frontier to the lowest productivity level fluctuate, and this distance can occasionally become large. Alternatively, if we fix the length of the support of the productivity distribution because firms too far from the frontier cannot survive, the number of firms can fluctuate randomly.

Full PDF

IINNOVATION AND IMITATION

JESS BENHABIB, ERIC BRUNET, AND MILDRED HAGER

Abstract.

We study several models of growth driven by innovation and im-itation by a continuum of ﬁrms, focusing on the interaction between the two.We ﬁrst investigate a model on a technology ladder where innovation and imita-tion combine to generate a balanced growth path (BGP) with compact support,and with productivity distributions for ﬁrms that are truncated power-laws. Westart with a simple model where ﬁrms can adopt technologies of other ﬁrms withhigher productivities according to exogenous probabilities. We then study thecase where the adoption probabilities depend on the probability distribution ofproductivities at each time. We ﬁnally consider models with a ﬁnite number ofﬁrms, which by construction have ﬁrm productivity distributions with boundedsupport. Stochastic imitation and innovation can make the distance of the pro-ductivity frontier to the lowest productivity level ﬂuctuate, and this distancecan occasionally become large. Alternatively, if we ﬁx the length of the supportof the productivity distribution because ﬁrms too far from the frontier cannotsurvive, the number of ﬁrms can ﬂuctuate randomly. Introduction

Economic growth is partly the result of costly research activities that ﬁrmsundertake in order to innovate, and to increase their productivity. Growth isalso driven by technology diﬀusion and imitation that takes place between ﬁrms.The role of technology diﬀusion across countries is evidenced by the extraordinarysustained growth rates in China and other East Asian countries during the recentdecades. In this paper we investigate several models of growth driven both byinnovation and imitation, focusing on the interaction between the two. New ideasand innovations push out the technology frontier. Imitation enables ﬁrms to catchup with those higher up on the technology ladder. We study the dynamics of theproductivity distribution of ﬁrms, where productivity is increasing with the ratesof innovation and imitation, and we provide a characterization of its stationarydistribution in the long run. In our study we do not take into account the eﬀectof the size of the ﬁrm on its growth . Date : August 27, 2020. One of the ﬁrst precursor papers that explores the dynamics of ﬁrm-size distributions is Boniniand Simon (1958). They introduce random growth proportionate to ﬁrm size, coupled with entryof new ﬁrms of the smallest-size at a constant rate. In the limit the productivity distributionconverges to a Pareto distribution. Another classical investigation of the ﬁrm productivity andsize distribution is Hopenhayn (1992). a r X i v : . [ ec on . T H ] A ug J. BENHABIB, E. BRUNET, AND M. HAGER

As demonstrated by Lucas (2009), in models of technology diﬀusion based onimitation alone, growth can be sustained only if the initial distribution of produc-tivities has unbounded support for high productivity levels. In Lucas and Moll(2014), and Perla and Tonetti (2014), technology diﬀusion is search theoretic,where ﬁrms seek higher productivity ﬁrms to imitate from and to adopt superiortechnology. In these models an unbounded productivity distribution is necessaryto sustain growth through imitation in the long run. With an initial productivitydistribution that has bounded support imitation ultimately stops, as productiv-ities of the imitating ﬁrms collapse toward the productivity frontier. Therefore,unboundedness is more than a convenient and inconsequential simpliﬁcation.In contrast, in models of endogenous growth, innovation is the primary drivingforce of growth. Firms engage in research to generate individual innovations. Theseinnovations later may become a common stock of ideas that are available to thewhole economy, generating spillovers (Romer (1990)). Alternatively, innovationsare Schumpeterian, in the sense that ﬁrms can leapfrog beyond the productivityfrontier. They overtake incumbent ﬁrms and drive them out of business, increasingoverall productivity over time (Aghion and Howitt (1992)).Models involving both random innovations via a geometric Brownian motion, aswell as imitation via random meetings between ﬁrms generating technology diﬀu-sion, have been proposed by Luttmer (2012) and Staley (2011) . Their approachis related to the KPP equation, originally studied in the mathematics literatureby Kolmogorov, Petrovski and Piskunov (1937), and later by McKean (1975) andBramson (1984) among others. These models can admit a unique balanced growthpath (BGP) that is a global attractor, and whose shape depends on imitationand innovation propensities, but not on initial conditions . Innovations driven byBrownian motion however assures that the productivity distribution immediatelybecomes unbounded, and the resulting BGP does not have compact support.Having a compact support is particularly relevant for empirical purposes, as thesupport of the productivity distribution in individual industries is found to be quitelocalized (Syverson (2004), Hsieh and Klenow (2009)). Firms with signiﬁcantly lowproductivity relative to the frontier ﬁrms are unlikely to survive the competition,and to preserve their market shares for long. The forces of Schumpeterian “creativedestruction” may endogenously replace the ineﬃcient ﬁrms at the bottom of theproductivity distribution. However other ﬁrms, below but not too far from thefrontier may survive, giving a distribution of productivities that allows both forinnovation and imitation to persist over time. Other recent models combining innovation and imitation include Benhabib, Perla and Ton-netti (2014), K¨onig, Lorenz and Zillibotti (2016), Akcigit and Kerr (2016) and Buera and Lucas(2018). To be more precise however, for the KPP equation the asymptotic BGP velocity and shapedoes depend on initial conditions if the initial distribution is thick tailed. See Bramson (1984).

NNOVATION AND IMITATION 3

In section 2, we ﬁrst investigate a model on a ladder. Innovation and imitationcombine to generate a balanced growth path (BGP) with compact support. Incontrast to models with imitation alone (see Lucas (2009)), the distribution doesnot collapse to the frontier either. The distribution of productivities is centeredaround some productivity moving up at a constant growth rate, and keeps itsshape relative to this productivity over time (a traveling wave which is compactlysupported). In section 2.1 we ﬁrst propose a very simple model of ﬁrms on aquality ladder that can both innovate and imitate, and where with some positiveprobability imitators can leapfrog to the productivity frontier. We characterize thestationary distribution of productivities as a truncated power law. This model hasthe advantage of being very simple, but leaves imitation rates mostly exogenous.In section 2.2 we extend this model to introduce density dependent imitation rates.In section 3 we endogenize the length of the support of balance growth path asarising from optimal choices of ﬁrms.Another approach to generating productivity distributions that have ﬁnite sup-port is to limit the number of ﬁrms to be ﬁnite. By construction, distributionsover a ﬁnite number of ﬁrms have bounded support; however, stochastic imitationand innovation can make the distance of the productivity frontier to the lowestproductivity level ﬂuctuate, and this distance can occasionally become quite large.In section 4, we study such models with innovation, imitation and a ﬁnite num-ber of ﬁrms, the so-called N -BRW and L -BRW models. These models introducealternative approaches to modeling entry, exit, and competition, but also featurebalanced growth paths with compact support. We characterize some features oftheir productivity distributions and relate them to results obtained in earlier sec-tions. Section 5 concludes.2. Innovation and imitation with fixed compact support

In this section, we consider a discrete time model of innovation and imitation.Innovation can be gradual, by moving up a quality ladder as in Klette and Kortum(2004) and K¨onig, Lorenz and Zillibotti (2016). But it can also be a breakthrough,where agents or ﬁrms move up from the bottom and overtake the top, that is they“leapfrog”. On top of that, agents imitate other agents. So we start in section 2.1with a model of exogenous imitation rates. In section 2.2, we endogenize theimitation choice and obtain a stationnary productivity distribution that looks likea truncated power law on a ﬁnite support. Then in section 3 we show that theassumption of a ﬁxed support length of the BGP is actually the result of an optimalchoice problem that trades oﬀ the costs of imitation and its beneﬁts, as in Perlaand Tonetti (2014).In this section, we only consider the case where the number of ﬁrms is suﬃcientlylarge to neglect ﬁnite-size eﬀects and stochastic behavior. In fact, to make “mi-croscopic” and probabilistic interpretations at the ﬁrm level, and not only speakabout densities, we have to assume that a law of large number holds.

J. BENHABIB, E. BRUNET, AND M. HAGER

Exogenous innovation and imitation.

At each time t ∈ N , a ﬁrm has aproductivity level i ∈ N on a discrete ladder .The density of ﬁrms at time t on level i is given by a non-negative number f it ∈ R with (cid:80) i f it = 1 for every t . At each time step, when going from t to t + 1, ﬁrms improve their productivity and the density climbs up the ladder alongsome rules which we explicit now. We assume that, at each time step and at eachlevel, a fraction a ∈ (0 ,

1) of ﬁrms moves up the productivity ladder by one level(“innovation”), and a fraction 1 − a remains stagnant, with the same productivity.This amounts to assuming a law of large numbers for random innovation withprobability of success a . Then, all of the ﬁrms that remained stagnant at the lowestproductivity level i = 1 either leapfrog or imitate as described below, leaving thelowest level empty. (This corresponds to a fraction (1 − a ) f t of all ﬁrms.)We call m ∈ N the highest level at time t . Given our process, at time t + 1,the productivity level i = m + 1 gets populated, and the lowest productivity level i = 1 is emptied as described below. Then, at each period, we rename the levels:what was the level i at time t becomes the level i − t + 1. In thisway, the populated levels at the beginning of each time step are always numbered { , , . . . , m } . For the moment, we take the length of support m as given, andpostpone a discussion of endogenously chosen m to section 3.In this section 2.1, imitation for non-innovating ﬁrms at the lowest level happensas follows: at level i ∈ { , , . . . , m } , the fraction of imitators entering level i attime t + 1 is (1 − a ) f t q i , where the q i ∈ [0 ,

1] satisfy(1) q i ≥ , m (cid:88) i =1 q i = 1 . The q i for i ∈ { , . . . , m − } represent imitation (exogenous in this section 2.1),while q m represents leapfrogging — i.e. ﬁrms at the lowest level of the productivitydistribution at each time t that overtake the current productivity frontier (indeed,the productivity level m at time t + 1 corresponds to the productivity level m + 1at time t which was not yet populated). If q m = 0, leapfrogging is excluded, whilesetting q m = 1 excludes any imitation.Note that we assume that jumps to higher productivity levels, whether fromleapfrogging or imitation, do not depend on target productivity densities, so forthe time being we abstract away from any search-theoretic microfoundation.The transition dynamics can be written in a single equation(2) f it +1 = (1 − a ) f i +1 t (cid:124) (cid:123)(cid:122) (cid:125) fall back + af it (cid:124)(cid:123)(cid:122)(cid:125) innovation + (1 − a ) f t q i (cid:124) (cid:123)(cid:122) (cid:125) imitation or leapfrogging4 There is no assumption that these productivity levels placed on a ladder are equally spaced:the rungs of the ladder need not be equidistant from each other. They simply represent theproductivities that can be imitated and adopted, and we could easily use any ladder for i thatmaps to N . NNOVATION AND IMITATION 5 or, more conveniently, with a matrix A ∈ M m ([0 , f t +1 = Af t ,  f t +1 f t +1 ...f mt +1  =  a + q (1 − a ) 1 − a . . q (1 − a ) a − a . .. . . . . .. . . a − a q m − (1 − a ) . . a − aq m (1 − a ) 0 . . a   f t f t ...f mt  . (3)By construction A has column sums adding to 1 as the number of ﬁrms remainsconstant (in eﬀect, we have a particular birth and death model). A admits 1as an eigenvalue. The associated eigenvector is the stationary distribution forproductivity densities, moving up as a traveling wave. The stationary distributioncan be characterized as follows. Proposition 1.

Let Q s = ( q m + q m − + · · · + q s ) = (cid:80) mj = s q j , with Q m = q m and Q = 1 .The stationary distribution ( f ∞ , f ∞ , . . . , f m ∞ ) , for any a ∈ (0 , , is given by: (4) f s ∞ = Q s f ∞ , s = 1 , ..., m , and f ∞ (cid:32) m (cid:88) s =1 Q s (cid:33) = 1 or f ∞ = (cid:32) m (cid:88) s =1 Q s (cid:33) − . Proof.

The stationary solution fulﬁlls f ∞ = Af ∞ . To simplify notation let f j ∞ ≡ x j ,j = 1 , . . . , m .We start with the last line of equation (3): q m (1 − a ) x + ax m = x m ⇒ x m = q m x . We prove by induction. The line next to last yields q m − (1 − a ) x + ax m − + (1 − a ) x m = x m − ⇒ ( q m − + q m ) (1 − a ) x = (1 − a ) x m − ⇒ x m − = ( q m − + q m ) x . Assume that x m − ( s − = ( q m − ( s − + · · · + q m ) x . Then we have q m − s (1 − a ) x + ax m − s + (1 − a ) x m − ( s − = x m − s ⇒ x m − s = (cid:0) q m − s + q m − ( s − + · · · + q m − + q m (cid:1) x . This completes the induction proof. Relabeling m − s as s , we obtain (4). J. BENHABIB, E. BRUNET, AND M. HAGER

We have left one free variable, x , which will be determined by the normalizationof f ; writing (cid:80) mi =1 x i = 1 = x (cid:0)(cid:80) m − s =0 Q m − s (cid:1) = x ( (cid:80) ms =1 Q s ), we get the results. (cid:3) The stationary distribution is independent of the probability of innovation a ∈ (0 , , and only depends on the intensity q i of imitation rates across productivities.But the speed of convergence to the stationary distribution depends on a , as itaﬀects the eigenvalues of A . In particular the second highest eigenvalue of A ,which is less than 1 in modulus , can be taken as an indicator of the convergencerate. The lower this eigenvalue, the faster the convergence rate. For m = 2, it canbe explicitly computed to be equal to 2 a − q (1 − a ) = a − (1 − a ) q . This isincreasing in q (more imitation implies slower convergence), decreasing in q , theleapfrogging rate (more leapfrogging implies faster convergence), and increasingin a (more innovation implies slower convergence).Firms at any productivity level except the lowest one tend to drop down theladder over time. At the bottom of the ladder, non-innovating ﬁrms jump tohigher levels through innovation and imitation. Overall, the stationary density ofproductivity levels is non-increasing over productivity levels.We now discuss two special cases: No imitation, only leapfrogging.

If there is no imitation and only leapfrogging, thatis if q m = 1 and therefore q i = 0 for i ∈ { , . . . , m − } , it follows that f i ∞ = m − for i = 1 , . . . , m , so the productivity distribution becomes uniform. Firms thatjump to the frontier slide down the productivity distribution, until they reach thelowest density from which they again jump to the frontier. No leapfrogging, only imitation.

In this case q m = 0 and the matrix in (3) isdecomposable. In particular, the highest productivity evolves with f mt +1 = af mt in-dependently, and converges to zero. This makes the last element of the eigenvectorassociated with root 1 equal to zero, so there is no density for it at the stationarydistribution: f m ∞ = 0.2.2. Density dependent imitation.

In Proposition 1 we solved for the densitiesin terms of exogenous imitation rates q i , i = 1 , . . . , m − , with m >

2. Now weconsider the case that the imitation rates are proportional to densities. We areagain seeking a stationary solution.If imitation is similar to learning from another ﬁrm, then imitation rates shouldbe proportional to the number of ﬁrms to learn from, or the density at the corre-sponding ladder point. Learning is then conditional on meeting another ﬁrm with Indeed, the matrix A has non-negative entries, is aperiodic since the diagonal elements arepositive, and irreducible as any productivity level can be reached from any other one. Therefore,the Perron-Frobenius Theorem implies that the largest eigenvalue — here, 1 — is simple, andthat all other eigenvalues are strictly smaller in modulus. NNOVATION AND IMITATION 7 higher productivity, which happens with a probability proportional to the densitythere. Therefore, we let imitation rates be(5) q j ≡ q tj = µf j +1 t , j = 1 , . . . , m − . (Recall that q j is the probability of jumping to site j at time t +1, which is the sameas site j + 1 at time t because of the relabeling at each time step; this is why q j isproportional to f j +1 t and not f jt .) Here, µ , which is determined by normalization,is time-dependent: as seen below, it can be written as a function of f t . Thehighest f mt +1 is not imitated because it is not available for for imitation yet, so q m ,which represents leapfrogging, is independent of the densities, as in section 2.1.Observe that the problem is now non-linear, so existence and uniqueness of asolution are more involved than in the linear case. A stationary solution is again f ∞ = A ( f ∞ ) f ∞ .We ﬁrst determine µ : with m ﬁxed, we must have(6) 1 = f t + · · · + f mt and(7) 1 = q + q + · · · + q m . Therefore, in order to ﬁnd a solution, µ cannot take arbitrary values but will bedetermined (together with f ∞ ) as a function of q m . Indeed, inserting (5) into (7),and using (6) we obtain:(8) 1 = q m + µ m − (cid:88) j =1 f j +1 t = q m + µ (1 − f t ) . This implies that, assuming that f t < µ ≡ µ t = 1 − q m − f t . Overall, in this subsection, the two parameters m and q m determine all otherquantities, including µ . Note that the reason for which we are not free to choose µ is that we insist, in our model, that the lowest occupied site be emptied at eachtime step. This condition leads to (7) and then to (9). In section 4, we brieﬂydiscuss a model where µ is an arbitrary parameter and the lowest occupied site isnot necessarily emptied at each time step.For the stationary solution, we write x j = f j ∞ as before and(10) µ = 1 − q m − x . Remark 1. x = 1 is never a solution. If x = 1 , x j = 0 , ∀ j = 2 , . . . , m . Then,either q m = 1 and q j = 0 , ∀ j = 1 , . . . , m − , in which case the last line of (3) reads x m = (1 − a ) q m x = 0 . For a < , this is only possible if q m = 0 , a contradiction.Or, if q m < , µ = ∞ and the problem is not well-deﬁned. J. BENHABIB, E. BRUNET, AND M. HAGER

Proposition 2.

Under the assumptions above, with q m ∈ (0 , , (11) x i = q m x (cid:18) − q m − x x (cid:19) m − i , i = 1 , . . . , m, where x ∈ [0 , is the unique solution to (12) 1 = q m (cid:18) − q m − x x (cid:19) m − , or (13) x = ( q m ) − m − − q m ) − m − − q m . Proof.

We give a recursive proof. The last line of (3) gives again x m = q m x .Replacing q i = µx i +1 for i = 1 to m − i = m − ( j − x m − j = x m − ( j − + q m − j x = x m − ( j − (1 + µx ) ⇒ x m − j = x m (1 + µx ) j = q m x (1 + µx ) j . This ﬁnishes the induction proof. Relabeling i = m − j , we obtain (11).For existence of a solution, we need that(14) x = (1 + µx ) m − q m x or(15) (1 + µx ) m − q m = 1 . Inserting the expression for µ , (10), this gives equation (12). Let us check that thesolution thus obtained is normalized; we have m (cid:88) i =1 x i = m − (cid:88) j =0 (1 + µx ) j q m x = (1 + µx ) m − µx ) − q m x = (1 + µx ) − q m µ , where we have also used (15). But according to equation (10), one has µ =1 + µx − q m , and so we conclude that m (cid:88) i =1 x i = 1Therefore, a solution to (15) with µ given by (10) gives rise to a normalized x , assummed up in equation (12). Inserting the expression (13) for x proves existenceof a solution. This ﬁnishes the proof of Proposition 2. (cid:3) Corollary 1. If q m = 0 , there is no stationary solution. NNOVATION AND IMITATION 9

Proof.

For q m = 0, we have x m = 0. Using equation (3), this implies that(16) x m − = µ x + (1 − a )0 + ax m − , which implies that x m − = 0 for a <

1. By recursion, x j = 0 ∀ j , and there is nosolution. (cid:3) While there is no stationary solution for q m = 0, the limit of the dynamics maynevertheless converge to a distribution with x → x i → i >

1. Recallthat { x i } = { , , , . . . } is not a stationary state, because it would lead to µ = 0and a ill-deﬁned model. This case is easily illustrated for m = 2. Example 1.

Dynamics for m = 2 , q = 0 . We start from a density f with f = 1 − f and f > (else, as already pointed out, we would have µ → ∞ ).Then the dynamics for f t reduce to f t +1 = af t and hence f t = f a t → as t → ∞ . By normalization, f t = 1 − f t = 1 − f a t → as t → ∞ We can observe that because q m = 0, the upper level of the density is fallingover time as only a fraction a , namely the innovators, remains there each period.In the general case, all upper levels will successively experience such a decline inpopulation. Because imitation is proportional to the number of ﬁrms present atthe productivity level, fewer and fewer ﬁrms will ﬂow into the higher steps of theladder, which will be successively depopulated. In the limit, a single ladder stepsurvives.We would not think that the problematic asymptotic behavior for q m = 0 is amajor drawback of this model. Surely there are some highly innovative ﬁrms wholeapfrog to the highest operational productivity levels, so that the case q m = 0may be economically less interesting. Example 2.

We provide numerical illustrations for the stationary densities for m = 10 and q m ∈ { . . . . } . The solutions for µ and x are  µ = 1 . , x = 0 . if q m = 0 . µ = 0 . , x = 0 . if q m = 0 . µ = 0 . , x = 0 . if q m = 0 . µ = 0 . , x = 0 . if q m = 0 . These values can be computed from µ = ( q m ) − m − − q m , which is obtained from (10) and (13) .Figure 1 plots the solution for { x i } mi =1 for these four cases. Figure 1.

Stationary densities for diﬀerent values of leapfroggingintensity q m with m = 10. Notice that higher values of q m , or higher leapfrog values, ﬂatten the productivitydistribution. As q m → , we have µ → and x i → m − for i = 1 . . . , m , so thedistribution is uniform. The stationary density gets increasingly concentrated atthe lower boundary of the productivity ladder as q m gets smaller.We note that in a continuous time version of this model with a continuum ofﬁrm productivities, with growth driven by imitation as well as by leap-frogging in-novation to the frontier that is governed by a ﬁnite Markov chain, Benhabib, Perlaand Tonetti (2017) also show that there exists a stationary productivity distributionevolving as a travelling wave with compact and bounded support. Endogenizing the length of productivity distributions

In the previous section, only the ﬁrms at the lowest level j = 1 would innovateor imitate at each time step. We now allow ﬁrms at any level j ≤ m to chooseto leapfrog or imitate by paying a cost, if the ﬁrm estimates that it is proﬁtableto do so. We ﬁrst consider the simpler case of leapfrogging only (no imitation)in section 3.1, and we consider the full model with leapfrogging and imitation insection 3.2. In the ﬁrst case (leapfrogging only, section 3.1), we will show thatﬁrms will only choose to leapfrog or imitate at or below a certain threshold level NNOVATION AND IMITATION 11 j which is independent of time . Then with j determined, the ladder length isﬁxed and given by m − j + 1, and the results of section 2.1 with q m = 1 apply.In the second case (leapfrogging and imitation, section 3.2) we will also show thatﬁrms optimally choose to incur a cost for the opportunity to imitate or leapfrogat or below a certain threshold level j ( t ), but this level might depend on time. Inthis case, characterizing the transition dynamics of the productivity distribution isnot straightforward, but if we assume the system reaches a stationary distribution,then j ( t ) converges to some j ( ∞ ) and the results of Proposition 2 in section 2.2can be applied with the distribution of ﬁrms supported on an interval of size m − j ( ∞ ) + 1.In either case, the ﬁnite size of the support is now endogenized and depends onwhat is the cost to imitate, what are the payoﬀs at each quality level, etc.3.1. Leapfrogging Only.

We assume that a ﬁrm still faces an exogenous proba-bility of innovation a as in sections 2.1 and 2.2 but, when the ﬁrm fails to innovate,it is allowed to make a choice to leapfrog (and pay some cost) or not. The ﬁrm’soptimal choice problem is to maximize its value function, i.e. the expected dis-counted value of current and future payoﬀ streams net of costs . The ﬁrm’s choiceis to choose for every period whether to pay a cost to leapfrog and beneﬁt fromhigher payoﬀs now and in the future, or not to do so.When evaluating if it is advantageous to take some action or not, a ﬁrm usu-ally needs to anticipate the future distribution in order to have expectations forimitation probabilities and outcomes. In the case of leapfrogging only which weconsider now, we will see that it is actually enough to know the position of thefrontier m = m ( t ), which remains constant (after relabeling) and equal to itsinitial position m (0). (Without relabeling, we would have m ( t ) = m (0) + t dueto innovation.) Therefore, in the case of leapfrogging only, the outcome, whenthe ﬁrm decides whether to leapfrog or not, depends only on the initial value of m ; it does not depend on the distribution of ﬁrms on the quality ladder, nor ontime. (Note that this will no longer be true when we add imitation in section 3.2:with imitation, it is necessary to anticipate future distribution of ﬁrms to make anoptimal choice.)As is usual in economics, this optimal choice problem can be reformulated in arecursive way using a Bellman equation, that we will write down below. We will As in the previous sections, it is understood that after each time step the levels are relabeled(so that level i at time t becomes level i − t +1). Then, the highest occupied productivitylevel is always m at the beginning of each time step. Under the assumption of linear utility, the beneﬁt of payoﬀs to the ﬁrm are the payoﬀsthemselves. “Expected” refers to the fact that the ﬁrm might have to anticipate the futureﬁrms density in order to project imitation probabilities and thus payoﬀs. “Discounting” with aconstant intertemporal discounting factor β as usual reﬂects the fact that the ﬁrm values thefuture less than the present. For the reader unfamiliar with dynamic optimal choice problems,we refer for example to Lucas and Stokey (1989) or Ljungqvist and Sargent (2018). show here that the ﬁrm’s optimal choice is to leapfrog if it lies at a ﬁxed lengthbelow the frontier. This ﬁxed length becomes the new support size, thus providinga microfoundation to the previously exogenous support size m .Every time step, at every level, a ﬁrm innovates and moves up one ladder stepwith probability a . The ﬁrms that do not innovate have the choice either to fallbehind, or to catch up with the highest productivity level m (after relabeling, or m + 1 before) by paying a cost. We assume that it is not possible to “imitate”intermediate levels.We assume that the payoﬀs realized by a given ﬁrm increase by some factor λ > normalized payoﬀs p j = λ j for a ﬁrm being at level j ∈ { , . . . , m } . If, as the ﬁrm distribution moves up the quality ladder, costs to implement leap-frogging grow at the same rate λ as the payoﬀs, the ﬁrm problem can be reducedto a stationary problem where the normalized payoﬀs p j = λ j and the normalizedcost C are independent of time.Firms, in deciding whether to leapfrog or not, compare the costs to the expectedpayoﬀs. As normalized payoﬀs increase over the ladder, while normalized costs donot, ﬁrms choose to leapfrog if their distance to the frontier (the level m of thehighest performing ﬁrm) is larger than some threshold, and choose not to leapfrogif their distance to the frontier is smaller than that threshold.In other words, there must be a certain threshold level j such that a ﬁrm choosesnot to leapfrog for productivity levels j = j + 1 , . . . , m , but does leapfrog at levels j ≤ j . We now provide a formal argument.Let V LF ( j ) be the value of leapfrogging from some level j and V NLF ( j ) the valueof not leapfrogging from this level. Then, the value of being at productivity level j is(17) V ( j ) = max (cid:8) V LF ( j ) , V NLF ( j ) (cid:9) . The following equation represents the leapfrogging choice. We have V LF ( j ) = p j + βaV ( j ) + (1 − a ) (cid:2) βV ( m ) − C (cid:3) , (18)where β = λβ , with β < β <

1. This is the Bellman equation for the leapfrogging value function, whichdetermines the optimal choice recursively. The ﬁrst term on the right-hand side isthe payoﬀ received this period. Then, with probability a , the ﬁrm innovates andmoves up one step from j , which after relabeling becomes j , and this continuationvalue is discounted with β . With probability (1 − a ), the ﬁrm does not innovatebut decides to leapfrog; the ﬁrm moves above the frontier, at level m + 1 (whichafter relabeling becomes level m ) and pays the cost C . Remember that levels are relabeled at each time step, so level 1 (for instance) at diﬀerenttimes correspond to diﬀerent quality levels with diﬀerent payoﬀs. At a given time step, the realpayoﬀs of the diﬀerent ﬁrms can be obtained by multiplying the normalized payoﬀs p j by λ t . NNOVATION AND IMITATION 13

Similarly,(19) V NLF ( j ) = p j + βaV ( j ) + β (1 − a ) V ( j − . Notice that neither (18) nor (19) depend on the densities f or on time; the value V ( j ) of being at some level j remains constant in time.The ﬁrm wants to leapfrog from some level j if leapfrogging is beneﬁcial, i.e. V LF ( j ) > V NLF ( j ) , (i)and does not want to leapfrog if V LF ( j ) < V NLF ( j ) . (ii)Observe from (18) and (19) that(20) V LF ( j ) − V NLF ( j ) = (1 − a ) (cid:2) βV ( m ) − C − βV ( j − (cid:3) . We assume from here on that the value function V ( j ) increases with the produc-tivity level j . This property is not an obvious consequence of Bellman’s equation,but it seems clear that any mathematical solution where V ( j ) is not increasingcannot reasonably describe a real-life situation, because it would mean that someﬁrms should degrade the quality of their production in order to increase their value.Then, from (20) and using that V ( j ) increases in j , the quantity V LF ( j ) − V NLF ( j )decreases with j . Hence, if leapfrogging is beneﬁcial at j , it is even more so at j − j −

2, etc. Similarly, if leapfrogging is not beneﬁcial at j , it will be even lessso at j + 1, j + 2, etc. In other words, there must be a threshold level j such thatA site leapfrogs if and only if j ≤ j . Assume we let this system evolve from an initial condition where the highestoccupied site is m . At the end of the ﬁrst time step, site m + 1 is occupied(through innovation and leapfrogging) and all sites up to and including j areemptied through leapfrogging. At the start of the second time step, after rela-beling, the system occupies a subset of sites { j , . . . , m } . Then, at each followingtime step, only site j gets emptied through leapfrogging and the system remainsin { j , . . . , m } after relabeling. In the large time limit, the system reaches itsstationary state, which is a uniform distribution over { j , . . . , m } .This behavior we have just described is very similar to the behavior of the systemin section 2.1 with q m = 1, except that the lowest occupied site is now j insteadof 1 in section 2.1. In other words, the size of the support is now m − j + 1instead of m . This size of support depends on the parameters of the model: a , λ , C and β . (Using invariance by translation, it is easy to see that m − j + 1does not depend on m .) This means that the size of the support result from anendogenized optimum between costs and expected payoﬀs. By adjusting the valuesof the diﬀerent parameters, any size of support can be obtained.Through invariance by translation, one can shift the whole system on the valuescale so that the support is on { , . . . , m (cid:48) = m − j + 1 } . Then, the model is even more similar to section 2.1 with q m = 1, with the lowest occupied level at j = 1and with the endogenized m (cid:48) being both the highest occupied site and the size ofthe support.3.2. Leapfrogging and imitation.

We now introduce density-dependent imita-tion as in the section 2.2. A ﬁrm innovates at no cost with probability a and, ifit does not innovate, it can choose to pay a ﬁxed cost C to randomly leapfrog orimitate with density-dependent imitation probabilities, or it can forgo this oppor-tunity. We are already assuming mean-ﬁeld dynamics, which are valid in the limitof a large number of ﬁrms. We can therefore safely assume that the choice takenby a single ﬁrm does not impact the distribution.Recall the following assumptions made in section 2.2: at time t , when a ﬁrmchooses to innovate or leapfrog, it jumps with probability q j ( with j ∈ { , . . . , m } )onto site j + 1 which, after relabeling, becomes site j at time t + 1. Then, q m is the(exogenously given) probability of leapfrogging (since site m +1 is empty at time t )and q j for j ≤ m − j ≤ m − j + 1 (site j after relabeling) is proportional tothe proportion of ﬁrms f j +1 t on that site at the beginning of the time step, and wewrite q j = q j ( f ) = (1 − q m ) f j +1 t for j ≤ m −

1. The value of 1 − q m of the prefactoris chosen in such a way that the probabilities are normalized: (cid:80) j ≤ m q j ( f ) = 1.The value V ( j ; f ) of being at a site j now depends on the density { f kt } of ﬁrms atall sites for the current time. As in section 3.1, we write V NLF ( j ; f ) for the valueof being at j and choosing not to imitate/leapfrog given the current densities f = (cid:8) f k (cid:9) , and V LF ( j ; f ) for the value of being at site j and to imitate/leapfrog.The Bellman equations become V ( j ; f t ) = max (cid:16) V NLF ( j ; f t ) , V LF ( j ; f t ) (cid:17) , (21) V NLF ( j ; f t ) = p j + βaV ( j ; f t +1 ) + β (1 − a ) V ( j − f t +1 ) , (22) V LF ( j ; f t ) = p j + βaV ( j ; f t +1 ) + (1 − a ) (cid:18) β m (cid:88) k = j q k ( f t ) V ( k ; f t +1 )(23) + β (cid:88) k

1, the ﬂuctuations are negligible compared tothe average behavior and stochasticity can be ignored. On the other hand, when an jt is of order 1, the number of innovating ﬁrms is essentially random.The models we consider in this section are stochastic versions of the modeldescribed in section 2.2. We still assume that ﬁrms live on a discrete qualityladder and that time is discrete. At the beginning of any time step, an active ﬁrmis characterized by its productivity level. Then, during one time step, for eachﬁrm, two things can happen (independently). • The ﬁrm can innovate with probability a , thus gaining one productivitylevel. • The ﬁrm can be imitated with probability µ by a new entrant.The four outcomes for a single ﬁrm are graphically represented in ﬁgure 2.productivity leveltimeno innovation, not imitated. Probability (1 − a )(1 − µ )no innovation, imitated. Probability (1 − a ) µ innovation, not imitated. Probability a (1 − µ )innovation and imitated. Probability aµ Figure 2.

The four outcomes after a time step for a single ﬁrm.Note that in this section, and unlike in section 2, we do not rename the pro-ductivity levels after each time step and we assume that µ is a parameter givenexogenously. NNOVATION AND IMITATION 17

The evolution of the whole system during one time step then comes in twophases:(25)  (a) each ﬁrm present at time t evolves independently accordingto the probabilities in ﬁgure 2,(b) a culling of the ﬁrms in the system occurs by removingsome ﬁrms at the bottom of the productivity scale.Note that in the evolution phase, the imitating ﬁrms can either be some ﬁrmsat the lowest productivity level who successfully imitate those above them, ornew entrants displacing ﬁrms at the lowest level of productivity. There is noleapfrogging in this model.We consider two variants of the model, depending on the way the culling occurs.A ﬁrst variant is to ﬁx the number of ﬁrms at each time step to an exogenousparameter N . Then, the number of removed ﬁrms during the culling phase mustbe equal to the number of imitated ﬁrms in the evolution phase to keep the totalnumber of ﬁrms constant. This model is called a N -BRW ( N Branching RandomWalk) and is discussed in section 4.1.Another possibility for the culling phase is to remove all ﬁrms lagging L produc-tivity steps or more behind the most productive ﬁrm, with L given exogenously.In this variant, the total number of ﬁrms ﬂuctuate with time. This model is calleda L -BRW ( L Branching Random Walk) and is discussed in section 4.2.In the following sections we characterize the shape and properties of the produc-tivity distributions in the N -BRW and L -BRW models of innovation and imitation.The discussion is adapted from works that have been conducted on KPP frontssince the late nineties in the context of statistical mechanics, reaction-diﬀusionmodels and population genetics. A good point of entry on this literature is Brunet(2016).4.1. The N -BRW model. Before introducing the N -BRW, we need ﬁrst to dis-cuss what a BRW is. A Branching Random Walk is a process in discrete timestarted from a single particle at the origin. At each time step, each particle (each“parent”) is replaced by a random number of particles (the “children”) positionedrelatively to the parent according to some point process. This rule is appliedindependently at each generation for each particle.For instance, following ﬁgure 2, the rule could be that a particle at y gives eitherone particle at y , or two particles at y , or one particle at y + 1 or two particlesrespectively at y and y + 1. The left part of ﬁgure 3 shows a BRW with a diﬀerentrule where each parent can have 1, 2 or 3 children.Note that the number of particles N t at each generation follows the followingrecursion: N t +1 = (cid:80) N t i =1 n i,t , where n i,t is the number of children of individual i attime t and where it is assumed that the n i,t are independent identically distributedrandom variables over integers. This is called a Galton-Watson process. In other position time position time Figure 3.

Left: an exemple of BRW where, at each generation aparticle can have 1, 2 or 3 oﬀsprings. Right: a N -BRW with N = 2obtained by keeping at each time step only the two highest childrenof the surviving particles of the previous time step. Notice that thisrule is not the same as keeping the two highest particles of the BRWat each time step.words, a BRW is a Galton-Watson process where we keep as an extra informationthe position of the particles. For simplicity, we exclude the possibility that aparticle has zero children and we insist that it has more than one child withpositive probability. Then, the population size increases exponentially with time.Denote by ( (cid:15) , (cid:15) , . . . , (cid:15) n ) the positions of the children relative to the parent(both n and the (cid:15) i are random). Then, under conditions on the laws of n and (cid:15) i ,listed for example in Gantert, Hu and Shi (2011) (see also there for references),one can show that the highest position y max ( t ) in the BRW at time t increaseslinearly with time:(26) lim t →∞ y max ( t ) t = v c , The conditions are:a) E [ n ] > n can be 0, and insisted that n > δ > E [ n δ ] < ∞ (in other words, there are never too many children.This is automatic if the number of children is bounded.)c) there exists δ > E (cid:2) (cid:80) ni =1 e δ(cid:15) i (cid:3) < ∞ (in other words, the children are not createdtoo much upwards relative to the parent. This is automatic if the number of children n and thedisplacements (cid:15) i are bounded, as in our case.)d) there exists δ > E (cid:2) (cid:80) ni =1 e − δ(cid:15) i (cid:3) < ∞ (in other words, the children are not createdtoo much downwards relative to the parent. This is automatic if the number n of children andthe displacements (cid:15) i are bounded.)e) The function v ( γ ) = γ log E (cid:2) (cid:80) ni =1 e γ(cid:15) i (cid:3) , which is necessarily well deﬁned on some interval(0 , δ ) with δ ∈ (0 , ∞ ], must reach a minimum v c = v ( γ c ) on that interval. It is automatic in theexample developed below for any µ ∈ (0 ,

1] and a ∈ (0 , NNOVATION AND IMITATION 19 with some velocity v c given by(27) v c = min γ v ( γ ) = v ( γ c ) with v ( γ ) = 1 γ log E (cid:20) n (cid:88) i =1 e γ(cid:15) i (cid:21) , as soon as this minimum exists for some γ c > (cid:15) i and the number n of chil-dren. γ c is the value of γ for which the minimum is reached.For instance, with the rules of ﬁgure 2, one checks that v ( γ ) = 1 γ log (cid:2) µ + a ( e γ − (cid:3) . Indeed, E (cid:104) n (cid:88) i =1 e γ(cid:15) i (cid:105) = (1 − a )(1 − µ ) × e + (1 − a ) µ × ( e + e )+ a (1 − µ ) × e γ + aµ × ( e + e γ ) = 1 + µ + a ( e γ − . We can now deﬁne the N -BRW. The evolution for one time step of a N -BRWgoes like a BRW, except that after each step only the N highest particles are kept,the other being removed, so that after some time there are exactly N particles inthe system at each time step. Note that this rule is not the same as keeping the N highest of a BRW at each time step; see the right part of ﬁgure 3.The N -BRW and related models (the N -BBM, the stochastic Fisher equation)have been studied in mathematics, theoretical physics and biology and severalresults are known both from non-rigorous and rigorous arguments.For the N -BRW, a striking result is that one can still deﬁne a velocity v N forthe highest particle, as in (26). This velocity depends on N , converges to v c as N → ∞ , but the speed of convergence is unexpectedly slow (this is explained inBrunet and Derrida (1997) with a rigorous proof provided by Berard and Gou´er´e(2010) for the case µ = 1). Theorem 1 (Velocity. Berard and Gou´er´e (2010)) . For the N -BRW with µ = 1 ,we have: (28) v N = v c − π v (cid:48)(cid:48) ( γ c )2 L + o (cid:18) L (cid:19) with (29) L = 1 γ c log N,v c , v ( γ ) and γ c deﬁned as in (27) and o (1 /L ) a term that is vanishing faster than /L as N → ∞ . (Nota: even though a proof is available only in one case, heuristic argumentsand numerical simulations suggest that (28) holds in a large number of cases.) Size of support.

Based on numerical observations and phenomenological theory forclosely related models, it is believed (see Brunet and Derrida (1997) and Brunet,Derrida, Mueller and Munier (2006)) that after a long time, the system reaches astationary regime as seen from the center of mass of the system. Here, stationaryis to be interpreted in a probabilistic sense: while for ﬁnite N , there are stillﬂuctuations, the laws determining the system become stationary. In this stationaryregime, the size of support, which is the diﬀerence between the position y max ofthe highest particle and the position y min of the lowest particle, satisﬁes(30) L := y max − y min = L + O (1) , with L as in (29) and O (1) is designating a random variable whose law becomesindependent of N in the large N limit. (Therefore, it will be smaller and smalleras compared to L when N → ∞ .).By construction, a ﬁnite number of ﬁrms assures a productivity distribution thathas a ﬁnite support at any ﬁxed time, but what (30) means is that the ﬁrms haveat all time comparable productivity levels, and the scenario where some ﬁrms stayput while others diverge at inﬁnity due to innovations cannot occur. However,because the process is stochastic, there is a probability of a ﬁrm with an extendedstreak of successful innovations breaking out for a while, so that the support ofproductivity distribution may occasionally get large, but after some time laggardﬁrms will catch-up via imitation and close the gap. Shape of the front.

Another interesting result concerns the typical density of thecloud of particles in the stationary regime. To simplify the discussion, we assumethat the underlying BRW is the one described in ﬁgure 2. Then, the populationlives on the lattice, and we introduce f ( y, t ) the fraction of particles (or ﬁrms) atposition (or quality level) y at time t .After the reproduction phase (but before the culling phase, see (25)), the ex-pected fraction of ﬁrms at position y and time t +1 is (1 − a + µ ) f ( y, t )+ a f ( y − , t ).Then, one could write the evolution equation as(31) f ( y, t + 1) = (1 − a + µ ) f ( y, t ) + a f ( y − , t ) + (noise) if y > y min ( t + 1) , where y min ( t ) is the position of the lowest ﬁrm at time t (the values of y min ( t + 1)and of f (cid:0) y min ( t + 1) , t + 1 (cid:1) are obtained by writing (cid:80) y f ( y, t + 1) = 1). Thenoise term is some random number with zero expectation and standard deviationof order O ( (cid:112) f /N ), depending on the density . The value for N [ f ( y, t + 1) − f ( y, t )] before the culling phase is (the number of newﬁrms innovating from y −

1) minus (the number of ﬁrms innovating to y + 1) plus (thenumber of imitators). These three terms are independent Binomial random variables and soone ﬁnds that the exact expression for the standard deviation of the noise term in (31) is (cid:112) [ f ( y − , t ) a (1 − a ) + f ( y, t ) a (1 − a ) + f ( y, t ) µ (1 − µ )] /N . NNOVATION AND IMITATION 21 As N → ∞ , in the so-called hydrodynamic limit, the noise term in (31) isexpected to disappear. While there are no rigorous result concerning this hydro-dynamic limit for the N -BRW, such a result exists for two closely related models,see Durrett and Remenik (2011) and De Masi, Ferrari, Presutti and Soprano-Loto(2019). In the ﬁrst model, time is continuous, and at rate 1 each particle creates anadditional particle at a random distance (cid:15) ; when this occurs, the lowest particle isremoved to keep the population constant. The second model is the N -BBM, whichcan be described as follows: time and space are continuous. N particles performindependent Brownian motions. At rate 1, each particle creates an oﬀspring at itsown position ( (cid:15) = 0); when this occurs, the lowest particle is removed to keep thepopulation constant.The equation obtained in this large N limit, as given by (31) without the noiseterm, is reminiscent of the model described in section 2.2. The only remainingdiﬀerence is that in section 2.2, the imitation rate was tuned at each time step insuch a way that y min ( t ) would increase by exactly one unit at each time step. In(31) (with or without the noise term), the imitation rate µ is ﬁxed exogenouslyand, depending on its value, the lower bound y min ( t ) can increase by several unitsin a time step or take several time steps to increase by one unit.The evolution equation is maybe easier to write on h ( y, t ) = (cid:80) z ≥ y f ( z, t ), whichrepresents the fraction of ﬁrms with a quality level at least y . One checks that(32) h ( y, t + 1) = min (cid:104) , (1 − a + µ ) h ( y, t ) + a h ( y − , t ) + (noise) (cid:105) where, here again, the noise term disappears in the large N limit. Without thenoise term, (32) is the discrete-time version of the equation studied in [8] whichwas shown to display most of the characteristics of the Fisher-KPP equation.With the noise term, it is very similar to the equations studied in Brunet andDerrida (1997), Brunet and Derrida (2001) and Brunet, Derrida, Mueller andMunier (2006) papers, as well as Mueller, Mytnik and Quastel (2011), which iswith continuous time and space.As suggested by Brunet and Derrida (1997), the velocity (28) of the noisy front(and thus of the N -BRW) could be obtained to the 1 /L order by replacing thenoise term in (32) by a cutoﬀ of order 1 /N , meaning that after each time step thevalue of h is set to 0 at all the positions y where the evolution equation leads to aresult smaller than 1 /N . Furthermore, the shape of the front at large times is forlarge N (and hence large L = (log N ) /γ c ), large z and large L − z (so that z isnot too close to 0 or to L ) approximately given by(33) h ( y min ( t ) + z, t ) ≈ AL sin πzL e − γ c z . Notice then that, to leading order, the density f ( y, t ) = h ( y, t ) − h ( y + 1 , t ) is givenby the same equation with the prefactor A replaced by A (1 − e − γ c ) .The shape of the front (33) is for the front equation (31) with the noise replacedby a cutoﬀ. For the N -BRW model itself as described by equation (31) with itsnoise term, Brunet, Derrida, Munier and Muller (2006) give the following non-rigorous phenomenological description of the model. This description is supportedby numerical simulation and, to some extent, by rigorous work (Maillard (2016)).In the N -BRW, the shape of the front is most of the time given by the cutoﬀshape (33) plus some small ﬂuctuations. Occasionally, typically every ∝ L unitsof time, a huge ﬂuctuation occurs where the shape of the front is signiﬁcantlydiﬀerent from (33) for about ∝ L units of time. Such a huge ﬂuctuation comesin the following way: a single particle moves up further than typical (a singleﬁrm innovates a lot in a short time). That particle branches as usual (the ﬁrmis imitated), but its ‘imitation oﬀspring’, i.e. its imitators, the imitators of itsimitators, etc., are at ﬁrst rarely removed from the system because they typicallylie above the others (they have better quality than the other ﬁrms). The end eﬀectof such a ﬂuctuation is that a positive fraction of all the ﬁrms are replaced by theimitation oﬀspring of the highly successful ﬁrm that started the ﬂuctuation. So, toreformulate, based on numerical computations for similar models, in the stationaryregime, a density of ﬁrms like (33) is expected, while occasionally (every ∝ L unitsof time), a single ﬁrm innovates a lot and gets imitated by so many ﬁrms that itredeﬁnes the industry (in the sense that the innovation is shared by a positivefraction of the agents). The transition time to redeﬁne the industry is of order ∝ L .This is particularly interesting: at random times, a ﬁrm is so successful that afull fraction of the industry ends up imitating it (or its imitators).A word of caution: the results presented above are asymptotic results, whichare believed to be valid for large values of N . It is not obvious that N = 10 or N = 10 are big enough for these results to be very accurate.4.2. The L -BRW model. A variant of the N -BRW is the L -BRW which mightbe more adapted to describe a situation where lagging ﬁrms go out of businessand new ﬁrms enter the market. The evolution phase of the L -BRW (innovationand imitation) is the same as for the BRW or the N -BRW (particles innovate andare imitated), but the culling phase is diﬀerent; in the L -BRW, at each time step,ﬁrms with a productivity lagging more than L level below the leading ﬁrm areremoved from the system, as it is assumed that they are not productive enough tosurvive the market. Here, the parameter L is given exogenously.In the L -BRW, the number N of ﬁrms ﬂuctuates. However, for large L , oneobserves that the number N of ﬁrms ﬂuctuates around some average value N An interesting question, which we postpone to another paper, is whether the sin prefactorcan be observed in real data.

NNOVATION AND IMITATION 23 with(34) N = e γ c L which is formally the same relation as (29).The heuristic argument of Brunet, Derrida, Mueller, and Munier (2006) is thata N -BRW and L -BRW have very similar typical behaviors: in the N -BRW, N is agiven parameter and L (deﬁned as the observed support size or distance betweenthe best and worst ﬁrm) ﬂuctuates, while in the L -BRW, it is the support size L which is given, and the population size N ﬂuctuates. In either case, one has therelation L ≈ γ c log N. Then, one expects that the velocity v L of the L -BRW is given by(35) v L ≈ v c − π v (cid:48)(cid:48) ( γ c )2 L (compare to (28)), that the average shape of the front is given by the sine shape(33) of the cutoﬀ theory, etc.There is unfortunately no rigorous paper establishing these results for the L -BRW. However, Pain (2016) has established result (35) in the case of the L -BBM(where BBM stands for Branching Brownian Motion) which is a continuous versionof the L -BRW. More precisely, in the L -BBM, particles perform Brownian motions.With rate 1, a particle is replaced by two particles, and any particle at a distancemore than L from the highest particle is removed. The fact that (35) holds for the L -BBM and the close similarity between the L -BBM and the L -BRW is a strongindication that the heuristic arguments are correct.5. Conclusion

We model the dynamics of technology diﬀusion to characterize shapes of the sta-tionary ﬁrm productivity distributions with a skew, and explore conditions thatwill lead to compact productivity supports. Innovation and imitation activitiesmove the productivity distribution forward, and compact supports can be sus-tained as competition causes the low productivity ﬁrms to exit. Section 2 providesa model generating skewed productivity distributions with compact support. Itrelies on an endogenously determined ﬁnite productivity ladder, sustained by afraction of ﬁrms that can leapfrog to the frontier. Section 4 introduces modelswith either a ﬁnite number of N ﬁrms, or a ﬁnite productivity support L . In bothcases the support of the productivity distribution remains compact; in the formercase the length of the support is stochastic while the number of ﬁrms are constant,and in the latter the support length is ﬁxed while the number of ﬁrms ﬂuctuates. References [1] P. Aghion and P. Howitt.,

A model of growth through creative destruction , Econometrica (1992), 323-351.[2] U. Akcigit and W. R. Kerr, Growth through heterogeneous innovations , Journal of PoliticalEconomy (2016), 1374-1443.[3] J. Benhabib, J. Perla and C. Tonetti,

Catch-up and fall-back though innovation and imita-tion . Journal of Economic Growth (2014), 1-35.[4] J. Benhabib, J. Perla and C. Tonetti, Reconciling models of diﬀusion and innovation: ATheory of the Productivity Distribution and Technology Frontier , NBER WP

Brunet-Derrida behavior of branching-selection particle systemson the line , Commun. Math. Phys. (2010), 323–342.[6] C.P. Bonini and H. Simon,

The size distribution of business ﬁrms , American EconomicReview ( 1958), 607-617.[7] M. Bramson, Convergence of solutions of the Kolmogorov equation to travelling waves , Amer-ican Mathematical Society, Providence, RI, 1983.[8] E. Brunet and B. Derrida,

An exactly solvable travelling wave equation in the Fisher KPPclass , Journal of Statistical Physics (2015), 801–820.[9] E. Brunet and B. Derrida,

Shift in the velocity of a front due to a cutoﬀ , Physical ReviewE. (1997), 2597-2604.[10] E. Brunet and B. Derrida, Eﬀect of microscopic noise on front propagation , Journal ofStatistical Physics (2001), 269–282.[11] E. Brunet, B. Derrida, A. H. Mueller, and S. Munier,

Phenomenological theory giving thefull statistics of the position of ﬂuctuating pulled fronts , Phys Rev E : 056126, (2006).[12] E. Brunet and B. Derrida, A. H. Mueller, S. Munier, Noisy traveling waves: Eﬀect ofselection on genealogies , Europhys. Lett., , (2006) 1-7 .[13] E. Brunet, Some aspects of the Fisher-KPP equation and the branching Brownian motion ,Habilitatio`a diriger des recherches (2016).[14] F. J. Buera, and R. E. Lucas,

Idea ﬂows and economic growth . Annual Review of Economics10 (2018), 315-345.[15] A. De Masi, P. A. Ferrari, E. Presutti, and N. Soprano-Loto,

Non local branching Brownianswith annihilation and free boundary problems , Electron. J. Probab. (2019), 1-30.[16] R. Durrett and D. Remenik, Brunet–Derrida particle systems, free boundary roblems andWiener–Hopf equations , Ann. Probab. (2011), 2043-2078.[17] N. Gantert, Y. Hu, and Z. Shi, Asymptotics for the survival probability in a killed branchingrandom walk , Annales de l’I.H.P. Probabilit´es et Statistiques (2011), 111-129.[18] H.A. Hopenhayn, Entry, exit, and ﬁrm dynamics in long run equilibrium , Econometrica ( 1992),1127-1150.[19] C. T. Hsieh, T. Chang and Peter J. Klenow, Misallocation and manufacturing TFP in Chinaand India , The Quarterly Journal of Economics (2009), 403-1448.[20] B. Jovanovic and R. Rob,

The growth and diﬀusion of knowledge , Review of EconomicStudies (1989), 569-582.[21] T. J. Klette and S. Kortum, Innovating ﬁrms and aggregate innovation , Journal of PoliticalEconomy (2004), 986-1018.[22] M. D. K¨onig, J. Lorenz and F. Zilibotti,

Innovation vs. imitation and the evolution ofproductivity distributions. Theoretical Economics (2016), 1053–1102.[23] A. Kolmogorov, I. Petrovsky and N. Piscounov, ´Etude de l’ ˜A c (cid:13) quation de la diﬀusion aveccroissance de la quantit´e de mati`ere et son application `a un probl`eme biologique, Bull. Univ.´Etat Moscou, A (1937), 1-25. NNOVATION AND IMITATION 25 [24] L. Ljungqvist, and T. J. Sargent,

Recursive macroeconomic theory , MIT Press, Cambridge,MA, 2018.[25] R. E. Lucas,

Ideas and growth , Economica (2009), 1-19.[26] R. E. Lucas, and B. Moll, Knowledge growth and the allocation of time , Journal of PoliticalEconomy (2014), 1-51.[27] R. E. Lucas and N. L. Stokey,

Recursive methods in economic dynamics , Harvard UniversityPress, Cambridge, MA, 1989.[28] E. G. J. Luttmer,

Selection, growth, and the size distribution of ﬁrms , Quarterly Journal ofEconomics (2007), 1103-1144.[29] E. G. J. Luttmer,

Eventually, noise and imitation implies balanced Growth, working paper699, Federal Reserve Bank of Minneapolis (2012).[30] P. Maillard,

Speed and ﬂuctuations of N-particle branching Brownian motion with spatialselection , Probability Theory and Related Fields (2016), 1061-1173.[31] H. P. Mc Kean,

Applications of Brownian motion to the equation of Kolmogorov-Petrovski-Piscounov , Commun. Pure Appl. Math. (1975), 323-331.[32] C. M¨uller, L. Mytnik and J. Quastel, Eﬀect of noise on front propagation in reaction-diﬀusion equations of KPP type , Invent. math. (2011), 405–453.[33] M. Pain,

Velocity of the L-branching Brownian motion. Electron . J. Probab, 21 (2016), 1-28.[34] J. Perla, Jesse and C. Tonetti.

Equilibrium imitation and growth.

Journal of Political Econ-omy (2014), 52-76.[35] P. Romer,

Endogenous technical change , Journal of Political Economy, Part 2 (1990),S71-S102.[36] M. Staley,

Growth and the diﬀusion of ideas , Journal of Mathematical Economics (2011),470-478.[37] P. Segerstrom, Innovation, imitation and economic growth,

Journal of Political Economy, (1991), 807-27.[38] C. Syverson, Market structure and productivity: A concrete example , Journal of PoliticalEconomy, (2004), 1181-1222.(2004), 1181-1222.