IINNOVATION AND IMITATION
JESS BENHABIB, ERIC BRUNET, AND MILDRED HAGER
Abstract.
We study several models of growth driven by innovation and im-itation by a continuum of firms, focusing on the interaction between the two.We first investigate a model on a technology ladder where innovation and imita-tion combine to generate a balanced growth path (BGP) with compact support,and with productivity distributions for firms that are truncated power-laws. Westart with a simple model where firms can adopt technologies of other firms withhigher productivities according to exogenous probabilities. We then study thecase where the adoption probabilities depend on the probability distribution ofproductivities at each time. We finally consider models with a finite number offirms, which by construction have firm productivity distributions with boundedsupport. Stochastic imitation and innovation can make the distance of the pro-ductivity frontier to the lowest productivity level fluctuate, and this distancecan occasionally become large. Alternatively, if we fix the length of the supportof the productivity distribution because firms too far from the frontier cannotsurvive, the number of firms can fluctuate randomly. Introduction
Economic growth is partly the result of costly research activities that firmsundertake in order to innovate, and to increase their productivity. Growth isalso driven by technology diffusion and imitation that takes place between firms.The role of technology diffusion across countries is evidenced by the extraordinarysustained growth rates in China and other East Asian countries during the recentdecades. In this paper we investigate several models of growth driven both byinnovation and imitation, focusing on the interaction between the two. New ideasand innovations push out the technology frontier. Imitation enables firms to catchup with those higher up on the technology ladder. We study the dynamics of theproductivity distribution of firms, where productivity is increasing with the ratesof innovation and imitation, and we provide a characterization of its stationarydistribution in the long run. In our study we do not take into account the effectof the size of the firm on its growth . Date : August 27, 2020. One of the first precursor papers that explores the dynamics of firm-size distributions is Boniniand Simon (1958). They introduce random growth proportionate to firm size, coupled with entryof new firms of the smallest-size at a constant rate. In the limit the productivity distributionconverges to a Pareto distribution. Another classical investigation of the firm productivity andsize distribution is Hopenhayn (1992). a r X i v : . [ ec on . T H ] A ug J. BENHABIB, E. BRUNET, AND M. HAGER
As demonstrated by Lucas (2009), in models of technology diffusion based onimitation alone, growth can be sustained only if the initial distribution of produc-tivities has unbounded support for high productivity levels. In Lucas and Moll(2014), and Perla and Tonetti (2014), technology diffusion is search theoretic,where firms seek higher productivity firms to imitate from and to adopt superiortechnology. In these models an unbounded productivity distribution is necessaryto sustain growth through imitation in the long run. With an initial productivitydistribution that has bounded support imitation ultimately stops, as productiv-ities of the imitating firms collapse toward the productivity frontier. Therefore,unboundedness is more than a convenient and inconsequential simplification.In contrast, in models of endogenous growth, innovation is the primary drivingforce of growth. Firms engage in research to generate individual innovations. Theseinnovations later may become a common stock of ideas that are available to thewhole economy, generating spillovers (Romer (1990)). Alternatively, innovationsare Schumpeterian, in the sense that firms can leapfrog beyond the productivityfrontier. They overtake incumbent firms and drive them out of business, increasingoverall productivity over time (Aghion and Howitt (1992)).Models involving both random innovations via a geometric Brownian motion, aswell as imitation via random meetings between firms generating technology diffu-sion, have been proposed by Luttmer (2012) and Staley (2011) . Their approachis related to the KPP equation, originally studied in the mathematics literatureby Kolmogorov, Petrovski and Piskunov (1937), and later by McKean (1975) andBramson (1984) among others. These models can admit a unique balanced growthpath (BGP) that is a global attractor, and whose shape depends on imitationand innovation propensities, but not on initial conditions . Innovations driven byBrownian motion however assures that the productivity distribution immediatelybecomes unbounded, and the resulting BGP does not have compact support.Having a compact support is particularly relevant for empirical purposes, as thesupport of the productivity distribution in individual industries is found to be quitelocalized (Syverson (2004), Hsieh and Klenow (2009)). Firms with significantly lowproductivity relative to the frontier firms are unlikely to survive the competition,and to preserve their market shares for long. The forces of Schumpeterian “creativedestruction” may endogenously replace the inefficient firms at the bottom of theproductivity distribution. However other firms, below but not too far from thefrontier may survive, giving a distribution of productivities that allows both forinnovation and imitation to persist over time. Other recent models combining innovation and imitation include Benhabib, Perla and Ton-netti (2014), K¨onig, Lorenz and Zillibotti (2016), Akcigit and Kerr (2016) and Buera and Lucas(2018). To be more precise however, for the KPP equation the asymptotic BGP velocity and shapedoes depend on initial conditions if the initial distribution is thick tailed. See Bramson (1984).
NNOVATION AND IMITATION 3
In section 2, we first investigate a model on a ladder. Innovation and imitationcombine to generate a balanced growth path (BGP) with compact support. Incontrast to models with imitation alone (see Lucas (2009)), the distribution doesnot collapse to the frontier either. The distribution of productivities is centeredaround some productivity moving up at a constant growth rate, and keeps itsshape relative to this productivity over time (a traveling wave which is compactlysupported). In section 2.1 we first propose a very simple model of firms on aquality ladder that can both innovate and imitate, and where with some positiveprobability imitators can leapfrog to the productivity frontier. We characterize thestationary distribution of productivities as a truncated power law. This model hasthe advantage of being very simple, but leaves imitation rates mostly exogenous.In section 2.2 we extend this model to introduce density dependent imitation rates.In section 3 we endogenize the length of the support of balance growth path asarising from optimal choices of firms.Another approach to generating productivity distributions that have finite sup-port is to limit the number of firms to be finite. By construction, distributionsover a finite number of firms have bounded support; however, stochastic imitationand innovation can make the distance of the productivity frontier to the lowestproductivity level fluctuate, and this distance can occasionally become quite large.In section 4, we study such models with innovation, imitation and a finite num-ber of firms, the so-called N -BRW and L -BRW models. These models introducealternative approaches to modeling entry, exit, and competition, but also featurebalanced growth paths with compact support. We characterize some features oftheir productivity distributions and relate them to results obtained in earlier sec-tions. Section 5 concludes.2. Innovation and imitation with fixed compact support
In this section, we consider a discrete time model of innovation and imitation.Innovation can be gradual, by moving up a quality ladder as in Klette and Kortum(2004) and K¨onig, Lorenz and Zillibotti (2016). But it can also be a breakthrough,where agents or firms move up from the bottom and overtake the top, that is they“leapfrog”. On top of that, agents imitate other agents. So we start in section 2.1with a model of exogenous imitation rates. In section 2.2, we endogenize theimitation choice and obtain a stationnary productivity distribution that looks likea truncated power law on a finite support. Then in section 3 we show that theassumption of a fixed support length of the BGP is actually the result of an optimalchoice problem that trades off the costs of imitation and its benefits, as in Perlaand Tonetti (2014).In this section, we only consider the case where the number of firms is sufficientlylarge to neglect finite-size effects and stochastic behavior. In fact, to make “mi-croscopic” and probabilistic interpretations at the firm level, and not only speakabout densities, we have to assume that a law of large number holds.
J. BENHABIB, E. BRUNET, AND M. HAGER
Exogenous innovation and imitation.
At each time t ∈ N , a firm has aproductivity level i ∈ N on a discrete ladder .The density of firms at time t on level i is given by a non-negative number f it ∈ R with (cid:80) i f it = 1 for every t . At each time step, when going from t to t + 1, firms improve their productivity and the density climbs up the ladder alongsome rules which we explicit now. We assume that, at each time step and at eachlevel, a fraction a ∈ (0 ,
1) of firms moves up the productivity ladder by one level(“innovation”), and a fraction 1 − a remains stagnant, with the same productivity.This amounts to assuming a law of large numbers for random innovation withprobability of success a . Then, all of the firms that remained stagnant at the lowestproductivity level i = 1 either leapfrog or imitate as described below, leaving thelowest level empty. (This corresponds to a fraction (1 − a ) f t of all firms.)We call m ∈ N the highest level at time t . Given our process, at time t + 1,the productivity level i = m + 1 gets populated, and the lowest productivity level i = 1 is emptied as described below. Then, at each period, we rename the levels:what was the level i at time t becomes the level i − t + 1. In thisway, the populated levels at the beginning of each time step are always numbered { , , . . . , m } . For the moment, we take the length of support m as given, andpostpone a discussion of endogenously chosen m to section 3.In this section 2.1, imitation for non-innovating firms at the lowest level happensas follows: at level i ∈ { , , . . . , m } , the fraction of imitators entering level i attime t + 1 is (1 − a ) f t q i , where the q i ∈ [0 ,
1] satisfy(1) q i ≥ , m (cid:88) i =1 q i = 1 . The q i for i ∈ { , . . . , m − } represent imitation (exogenous in this section 2.1),while q m represents leapfrogging — i.e. firms at the lowest level of the productivitydistribution at each time t that overtake the current productivity frontier (indeed,the productivity level m at time t + 1 corresponds to the productivity level m + 1at time t which was not yet populated). If q m = 0, leapfrogging is excluded, whilesetting q m = 1 excludes any imitation.Note that we assume that jumps to higher productivity levels, whether fromleapfrogging or imitation, do not depend on target productivity densities, so forthe time being we abstract away from any search-theoretic microfoundation.The transition dynamics can be written in a single equation(2) f it +1 = (1 − a ) f i +1 t (cid:124) (cid:123)(cid:122) (cid:125) fall back + af it (cid:124)(cid:123)(cid:122)(cid:125) innovation + (1 − a ) f t q i (cid:124) (cid:123)(cid:122) (cid:125) imitation or leapfrogging4 There is no assumption that these productivity levels placed on a ladder are equally spaced:the rungs of the ladder need not be equidistant from each other. They simply represent theproductivities that can be imitated and adopted, and we could easily use any ladder for i thatmaps to N . NNOVATION AND IMITATION 5 or, more conveniently, with a matrix A ∈ M m ([0 , f t +1 = Af t , f t +1 f t +1 ...f mt +1 = a + q (1 − a ) 1 − a . . q (1 − a ) a − a . .. . . . . .. . . a − a q m − (1 − a ) . . a − aq m (1 − a ) 0 . . a f t f t ...f mt . (3)By construction A has column sums adding to 1 as the number of firms remainsconstant (in effect, we have a particular birth and death model). A admits 1as an eigenvalue. The associated eigenvector is the stationary distribution forproductivity densities, moving up as a traveling wave. The stationary distributioncan be characterized as follows. Proposition 1.
Let Q s = ( q m + q m − + · · · + q s ) = (cid:80) mj = s q j , with Q m = q m and Q = 1 .The stationary distribution ( f ∞ , f ∞ , . . . , f m ∞ ) , for any a ∈ (0 , , is given by: (4) f s ∞ = Q s f ∞ , s = 1 , ..., m , and f ∞ (cid:32) m (cid:88) s =1 Q s (cid:33) = 1 or f ∞ = (cid:32) m (cid:88) s =1 Q s (cid:33) − . Proof.
The stationary solution fulfills f ∞ = Af ∞ . To simplify notation let f j ∞ ≡ x j ,j = 1 , . . . , m .We start with the last line of equation (3): q m (1 − a ) x + ax m = x m ⇒ x m = q m x . We prove by induction. The line next to last yields q m − (1 − a ) x + ax m − + (1 − a ) x m = x m − ⇒ ( q m − + q m ) (1 − a ) x = (1 − a ) x m − ⇒ x m − = ( q m − + q m ) x . Assume that x m − ( s − = ( q m − ( s − + · · · + q m ) x . Then we have q m − s (1 − a ) x + ax m − s + (1 − a ) x m − ( s − = x m − s ⇒ x m − s = (cid:0) q m − s + q m − ( s − + · · · + q m − + q m (cid:1) x . This completes the induction proof. Relabeling m − s as s , we obtain (4). J. BENHABIB, E. BRUNET, AND M. HAGER
We have left one free variable, x , which will be determined by the normalizationof f ; writing (cid:80) mi =1 x i = 1 = x (cid:0)(cid:80) m − s =0 Q m − s (cid:1) = x ( (cid:80) ms =1 Q s ), we get the results. (cid:3) The stationary distribution is independent of the probability of innovation a ∈ (0 , , and only depends on the intensity q i of imitation rates across productivities.But the speed of convergence to the stationary distribution depends on a , as itaffects the eigenvalues of A . In particular the second highest eigenvalue of A ,which is less than 1 in modulus , can be taken as an indicator of the convergencerate. The lower this eigenvalue, the faster the convergence rate. For m = 2, it canbe explicitly computed to be equal to 2 a − q (1 − a ) = a − (1 − a ) q . This isincreasing in q (more imitation implies slower convergence), decreasing in q , theleapfrogging rate (more leapfrogging implies faster convergence), and increasingin a (more innovation implies slower convergence).Firms at any productivity level except the lowest one tend to drop down theladder over time. At the bottom of the ladder, non-innovating firms jump tohigher levels through innovation and imitation. Overall, the stationary density ofproductivity levels is non-increasing over productivity levels.We now discuss two special cases: No imitation, only leapfrogging.
If there is no imitation and only leapfrogging, thatis if q m = 1 and therefore q i = 0 for i ∈ { , . . . , m − } , it follows that f i ∞ = m − for i = 1 , . . . , m , so the productivity distribution becomes uniform. Firms thatjump to the frontier slide down the productivity distribution, until they reach thelowest density from which they again jump to the frontier. No leapfrogging, only imitation.
In this case q m = 0 and the matrix in (3) isdecomposable. In particular, the highest productivity evolves with f mt +1 = af mt in-dependently, and converges to zero. This makes the last element of the eigenvectorassociated with root 1 equal to zero, so there is no density for it at the stationarydistribution: f m ∞ = 0.2.2. Density dependent imitation.
In Proposition 1 we solved for the densitiesin terms of exogenous imitation rates q i , i = 1 , . . . , m − , with m >
2. Now weconsider the case that the imitation rates are proportional to densities. We areagain seeking a stationary solution.If imitation is similar to learning from another firm, then imitation rates shouldbe proportional to the number of firms to learn from, or the density at the corre-sponding ladder point. Learning is then conditional on meeting another firm with Indeed, the matrix A has non-negative entries, is aperiodic since the diagonal elements arepositive, and irreducible as any productivity level can be reached from any other one. Therefore,the Perron-Frobenius Theorem implies that the largest eigenvalue — here, 1 — is simple, andthat all other eigenvalues are strictly smaller in modulus. NNOVATION AND IMITATION 7 higher productivity, which happens with a probability proportional to the densitythere. Therefore, we let imitation rates be(5) q j ≡ q tj = µf j +1 t , j = 1 , . . . , m − . (Recall that q j is the probability of jumping to site j at time t +1, which is the sameas site j + 1 at time t because of the relabeling at each time step; this is why q j isproportional to f j +1 t and not f jt .) Here, µ , which is determined by normalization,is time-dependent: as seen below, it can be written as a function of f t . Thehighest f mt +1 is not imitated because it is not available for for imitation yet, so q m ,which represents leapfrogging, is independent of the densities, as in section 2.1.Observe that the problem is now non-linear, so existence and uniqueness of asolution are more involved than in the linear case. A stationary solution is again f ∞ = A ( f ∞ ) f ∞ .We first determine µ : with m fixed, we must have(6) 1 = f t + · · · + f mt and(7) 1 = q + q + · · · + q m . Therefore, in order to find a solution, µ cannot take arbitrary values but will bedetermined (together with f ∞ ) as a function of q m . Indeed, inserting (5) into (7),and using (6) we obtain:(8) 1 = q m + µ m − (cid:88) j =1 f j +1 t = q m + µ (1 − f t ) . This implies that, assuming that f t < µ ≡ µ t = 1 − q m − f t . Overall, in this subsection, the two parameters m and q m determine all otherquantities, including µ . Note that the reason for which we are not free to choose µ is that we insist, in our model, that the lowest occupied site be emptied at eachtime step. This condition leads to (7) and then to (9). In section 4, we brieflydiscuss a model where µ is an arbitrary parameter and the lowest occupied site isnot necessarily emptied at each time step.For the stationary solution, we write x j = f j ∞ as before and(10) µ = 1 − q m − x . Remark 1. x = 1 is never a solution. If x = 1 , x j = 0 , ∀ j = 2 , . . . , m . Then,either q m = 1 and q j = 0 , ∀ j = 1 , . . . , m − , in which case the last line of (3) reads x m = (1 − a ) q m x = 0 . For a < , this is only possible if q m = 0 , a contradiction.Or, if q m < , µ = ∞ and the problem is not well-defined. J. BENHABIB, E. BRUNET, AND M. HAGER
Proposition 2.
Under the assumptions above, with q m ∈ (0 , , (11) x i = q m x (cid:18) − q m − x x (cid:19) m − i , i = 1 , . . . , m, where x ∈ [0 , is the unique solution to (12) 1 = q m (cid:18) − q m − x x (cid:19) m − , or (13) x = ( q m ) − m − − q m ) − m − − q m . Proof.
We give a recursive proof. The last line of (3) gives again x m = q m x .Replacing q i = µx i +1 for i = 1 to m − i = m − ( j − x m − j = x m − ( j − + q m − j x = x m − ( j − (1 + µx ) ⇒ x m − j = x m (1 + µx ) j = q m x (1 + µx ) j . This finishes the induction proof. Relabeling i = m − j , we obtain (11).For existence of a solution, we need that(14) x = (1 + µx ) m − q m x or(15) (1 + µx ) m − q m = 1 . Inserting the expression for µ , (10), this gives equation (12). Let us check that thesolution thus obtained is normalized; we have m (cid:88) i =1 x i = m − (cid:88) j =0 (1 + µx ) j q m x = (1 + µx ) m − µx ) − q m x = (1 + µx ) − q m µ , where we have also used (15). But according to equation (10), one has µ =1 + µx − q m , and so we conclude that m (cid:88) i =1 x i = 1Therefore, a solution to (15) with µ given by (10) gives rise to a normalized x , assummed up in equation (12). Inserting the expression (13) for x proves existenceof a solution. This finishes the proof of Proposition 2. (cid:3) Corollary 1. If q m = 0 , there is no stationary solution. NNOVATION AND IMITATION 9
Proof.
For q m = 0, we have x m = 0. Using equation (3), this implies that(16) x m − = µ x + (1 − a )0 + ax m − , which implies that x m − = 0 for a <
1. By recursion, x j = 0 ∀ j , and there is nosolution. (cid:3) While there is no stationary solution for q m = 0, the limit of the dynamics maynevertheless converge to a distribution with x → x i → i >
1. Recallthat { x i } = { , , , . . . } is not a stationary state, because it would lead to µ = 0and a ill-defined model. This case is easily illustrated for m = 2. Example 1.
Dynamics for m = 2 , q = 0 . We start from a density f with f = 1 − f and f > (else, as already pointed out, we would have µ → ∞ ).Then the dynamics for f t reduce to f t +1 = af t and hence f t = f a t → as t → ∞ . By normalization, f t = 1 − f t = 1 − f a t → as t → ∞ We can observe that because q m = 0, the upper level of the density is fallingover time as only a fraction a , namely the innovators, remains there each period.In the general case, all upper levels will successively experience such a decline inpopulation. Because imitation is proportional to the number of firms present atthe productivity level, fewer and fewer firms will flow into the higher steps of theladder, which will be successively depopulated. In the limit, a single ladder stepsurvives.We would not think that the problematic asymptotic behavior for q m = 0 is amajor drawback of this model. Surely there are some highly innovative firms wholeapfrog to the highest operational productivity levels, so that the case q m = 0may be economically less interesting. Example 2.
We provide numerical illustrations for the stationary densities for m = 10 and q m ∈ { . . . . } . The solutions for µ and x are µ = 1 . , x = 0 . if q m = 0 . µ = 0 . , x = 0 . if q m = 0 . µ = 0 . , x = 0 . if q m = 0 . µ = 0 . , x = 0 . if q m = 0 . These values can be computed from µ = ( q m ) − m − − q m , which is obtained from (10) and (13) .Figure 1 plots the solution for { x i } mi =1 for these four cases. Figure 1.
Stationary densities for different values of leapfroggingintensity q m with m = 10. Notice that higher values of q m , or higher leapfrog values, flatten the productivitydistribution. As q m → , we have µ → and x i → m − for i = 1 . . . , m , so thedistribution is uniform. The stationary density gets increasingly concentrated atthe lower boundary of the productivity ladder as q m gets smaller.We note that in a continuous time version of this model with a continuum offirm productivities, with growth driven by imitation as well as by leap-frogging in-novation to the frontier that is governed by a finite Markov chain, Benhabib, Perlaand Tonetti (2017) also show that there exists a stationary productivity distributionevolving as a travelling wave with compact and bounded support. Endogenizing the length of productivity distributions
In the previous section, only the firms at the lowest level j = 1 would innovateor imitate at each time step. We now allow firms at any level j ≤ m to chooseto leapfrog or imitate by paying a cost, if the firm estimates that it is profitableto do so. We first consider the simpler case of leapfrogging only (no imitation)in section 3.1, and we consider the full model with leapfrogging and imitation insection 3.2. In the first case (leapfrogging only, section 3.1), we will show thatfirms will only choose to leapfrog or imitate at or below a certain threshold level NNOVATION AND IMITATION 11 j which is independent of time . Then with j determined, the ladder length isfixed and given by m − j + 1, and the results of section 2.1 with q m = 1 apply.In the second case (leapfrogging and imitation, section 3.2) we will also show thatfirms optimally choose to incur a cost for the opportunity to imitate or leapfrogat or below a certain threshold level j ( t ), but this level might depend on time. Inthis case, characterizing the transition dynamics of the productivity distribution isnot straightforward, but if we assume the system reaches a stationary distribution,then j ( t ) converges to some j ( ∞ ) and the results of Proposition 2 in section 2.2can be applied with the distribution of firms supported on an interval of size m − j ( ∞ ) + 1.In either case, the finite size of the support is now endogenized and depends onwhat is the cost to imitate, what are the payoffs at each quality level, etc.3.1. Leapfrogging Only.
We assume that a firm still faces an exogenous proba-bility of innovation a as in sections 2.1 and 2.2 but, when the firm fails to innovate,it is allowed to make a choice to leapfrog (and pay some cost) or not. The firm’soptimal choice problem is to maximize its value function, i.e. the expected dis-counted value of current and future payoff streams net of costs . The firm’s choiceis to choose for every period whether to pay a cost to leapfrog and benefit fromhigher payoffs now and in the future, or not to do so.When evaluating if it is advantageous to take some action or not, a firm usu-ally needs to anticipate the future distribution in order to have expectations forimitation probabilities and outcomes. In the case of leapfrogging only which weconsider now, we will see that it is actually enough to know the position of thefrontier m = m ( t ), which remains constant (after relabeling) and equal to itsinitial position m (0). (Without relabeling, we would have m ( t ) = m (0) + t dueto innovation.) Therefore, in the case of leapfrogging only, the outcome, whenthe firm decides whether to leapfrog or not, depends only on the initial value of m ; it does not depend on the distribution of firms on the quality ladder, nor ontime. (Note that this will no longer be true when we add imitation in section 3.2:with imitation, it is necessary to anticipate future distribution of firms to make anoptimal choice.)As is usual in economics, this optimal choice problem can be reformulated in arecursive way using a Bellman equation, that we will write down below. We will As in the previous sections, it is understood that after each time step the levels are relabeled(so that level i at time t becomes level i − t +1). Then, the highest occupied productivitylevel is always m at the beginning of each time step. Under the assumption of linear utility, the benefit of payoffs to the firm are the payoffsthemselves. “Expected” refers to the fact that the firm might have to anticipate the futurefirms density in order to project imitation probabilities and thus payoffs. “Discounting” with aconstant intertemporal discounting factor β as usual reflects the fact that the firm values thefuture less than the present. For the reader unfamiliar with dynamic optimal choice problems,we refer for example to Lucas and Stokey (1989) or Ljungqvist and Sargent (2018). show here that the firm’s optimal choice is to leapfrog if it lies at a fixed lengthbelow the frontier. This fixed length becomes the new support size, thus providinga microfoundation to the previously exogenous support size m .Every time step, at every level, a firm innovates and moves up one ladder stepwith probability a . The firms that do not innovate have the choice either to fallbehind, or to catch up with the highest productivity level m (after relabeling, or m + 1 before) by paying a cost. We assume that it is not possible to “imitate”intermediate levels.We assume that the payoffs realized by a given firm increase by some factor λ > normalized payoffs p j = λ j for a firm being at level j ∈ { , . . . , m } . If, as the firm distribution moves up the quality ladder, costs to implement leap-frogging grow at the same rate λ as the payoffs, the firm problem can be reducedto a stationary problem where the normalized payoffs p j = λ j and the normalizedcost C are independent of time.Firms, in deciding whether to leapfrog or not, compare the costs to the expectedpayoffs. As normalized payoffs increase over the ladder, while normalized costs donot, firms choose to leapfrog if their distance to the frontier (the level m of thehighest performing firm) is larger than some threshold, and choose not to leapfrogif their distance to the frontier is smaller than that threshold.In other words, there must be a certain threshold level j such that a firm choosesnot to leapfrog for productivity levels j = j + 1 , . . . , m , but does leapfrog at levels j ≤ j . We now provide a formal argument.Let V LF ( j ) be the value of leapfrogging from some level j and V NLF ( j ) the valueof not leapfrogging from this level. Then, the value of being at productivity level j is(17) V ( j ) = max (cid:8) V LF ( j ) , V NLF ( j ) (cid:9) . The following equation represents the leapfrogging choice. We have V LF ( j ) = p j + βaV ( j ) + (1 − a ) (cid:2) βV ( m ) − C (cid:3) , (18)where β = λβ , with β < β <
1. This is the Bellman equation for the leapfrogging value function, whichdetermines the optimal choice recursively. The first term on the right-hand side isthe payoff received this period. Then, with probability a , the firm innovates andmoves up one step from j , which after relabeling becomes j , and this continuationvalue is discounted with β . With probability (1 − a ), the firm does not innovatebut decides to leapfrog; the firm moves above the frontier, at level m + 1 (whichafter relabeling becomes level m ) and pays the cost C . Remember that levels are relabeled at each time step, so level 1 (for instance) at differenttimes correspond to different quality levels with different payoffs. At a given time step, the realpayoffs of the different firms can be obtained by multiplying the normalized payoffs p j by λ t . NNOVATION AND IMITATION 13
Similarly,(19) V NLF ( j ) = p j + βaV ( j ) + β (1 − a ) V ( j − . Notice that neither (18) nor (19) depend on the densities f or on time; the value V ( j ) of being at some level j remains constant in time.The firm wants to leapfrog from some level j if leapfrogging is beneficial, i.e. V LF ( j ) > V NLF ( j ) , (i)and does not want to leapfrog if V LF ( j ) < V NLF ( j ) . (ii)Observe from (18) and (19) that(20) V LF ( j ) − V NLF ( j ) = (1 − a ) (cid:2) βV ( m ) − C − βV ( j − (cid:3) . We assume from here on that the value function V ( j ) increases with the produc-tivity level j . This property is not an obvious consequence of Bellman’s equation,but it seems clear that any mathematical solution where V ( j ) is not increasingcannot reasonably describe a real-life situation, because it would mean that somefirms should degrade the quality of their production in order to increase their value.Then, from (20) and using that V ( j ) increases in j , the quantity V LF ( j ) − V NLF ( j )decreases with j . Hence, if leapfrogging is beneficial at j , it is even more so at j − j −
2, etc. Similarly, if leapfrogging is not beneficial at j , it will be even lessso at j + 1, j + 2, etc. In other words, there must be a threshold level j such thatA site leapfrogs if and only if j ≤ j . Assume we let this system evolve from an initial condition where the highestoccupied site is m . At the end of the first time step, site m + 1 is occupied(through innovation and leapfrogging) and all sites up to and including j areemptied through leapfrogging. At the start of the second time step, after rela-beling, the system occupies a subset of sites { j , . . . , m } . Then, at each followingtime step, only site j gets emptied through leapfrogging and the system remainsin { j , . . . , m } after relabeling. In the large time limit, the system reaches itsstationary state, which is a uniform distribution over { j , . . . , m } .This behavior we have just described is very similar to the behavior of the systemin section 2.1 with q m = 1, except that the lowest occupied site is now j insteadof 1 in section 2.1. In other words, the size of the support is now m − j + 1instead of m . This size of support depends on the parameters of the model: a , λ , C and β . (Using invariance by translation, it is easy to see that m − j + 1does not depend on m .) This means that the size of the support result from anendogenized optimum between costs and expected payoffs. By adjusting the valuesof the different parameters, any size of support can be obtained.Through invariance by translation, one can shift the whole system on the valuescale so that the support is on { , . . . , m (cid:48) = m − j + 1 } . Then, the model is even more similar to section 2.1 with q m = 1, with the lowest occupied level at j = 1and with the endogenized m (cid:48) being both the highest occupied site and the size ofthe support.3.2. Leapfrogging and imitation.
We now introduce density-dependent imita-tion as in the section 2.2. A firm innovates at no cost with probability a and, ifit does not innovate, it can choose to pay a fixed cost C to randomly leapfrog orimitate with density-dependent imitation probabilities, or it can forgo this oppor-tunity. We are already assuming mean-field dynamics, which are valid in the limitof a large number of firms. We can therefore safely assume that the choice takenby a single firm does not impact the distribution.Recall the following assumptions made in section 2.2: at time t , when a firmchooses to innovate or leapfrog, it jumps with probability q j ( with j ∈ { , . . . , m } )onto site j + 1 which, after relabeling, becomes site j at time t + 1. Then, q m is the(exogenously given) probability of leapfrogging (since site m +1 is empty at time t )and q j for j ≤ m − j ≤ m − j + 1 (site j after relabeling) is proportional tothe proportion of firms f j +1 t on that site at the beginning of the time step, and wewrite q j = q j ( f ) = (1 − q m ) f j +1 t for j ≤ m −
1. The value of 1 − q m of the prefactoris chosen in such a way that the probabilities are normalized: (cid:80) j ≤ m q j ( f ) = 1.The value V ( j ; f ) of being at a site j now depends on the density { f kt } of firms atall sites for the current time. As in section 3.1, we write V NLF ( j ; f ) for the valueof being at j and choosing not to imitate/leapfrog given the current densities f = (cid:8) f k (cid:9) , and V LF ( j ; f ) for the value of being at site j and to imitate/leapfrog.The Bellman equations become V ( j ; f t ) = max (cid:16) V NLF ( j ; f t ) , V LF ( j ; f t ) (cid:17) , (21) V NLF ( j ; f t ) = p j + βaV ( j ; f t +1 ) + β (1 − a ) V ( j − f t +1 ) , (22) V LF ( j ; f t ) = p j + βaV ( j ; f t +1 ) + (1 − a ) (cid:18) β m (cid:88) k = j q k ( f t ) V ( k ; f t +1 )(23) + β (cid:88) k 1, the fluctuations are negligible compared tothe average behavior and stochasticity can be ignored. On the other hand, when an jt is of order 1, the number of innovating firms is essentially random.The models we consider in this section are stochastic versions of the modeldescribed in section 2.2. We still assume that firms live on a discrete qualityladder and that time is discrete. At the beginning of any time step, an active firmis characterized by its productivity level. Then, during one time step, for eachfirm, two things can happen (independently). • The firm can innovate with probability a , thus gaining one productivitylevel. • The firm can be imitated with probability µ by a new entrant.The four outcomes for a single firm are graphically represented in figure 2.productivity leveltimeno innovation, not imitated. Probability (1 − a )(1 − µ )no innovation, imitated. Probability (1 − a ) µ innovation, not imitated. Probability a (1 − µ )innovation and imitated. Probability aµ Figure 2. The four outcomes after a time step for a single firm.Note that in this section, and unlike in section 2, we do not rename the pro-ductivity levels after each time step and we assume that µ is a parameter givenexogenously. NNOVATION AND IMITATION 17 The evolution of the whole system during one time step then comes in twophases:(25) (a) each firm present at time t evolves independently accordingto the probabilities in figure 2,(b) a culling of the firms in the system occurs by removingsome firms at the bottom of the productivity scale.Note that in the evolution phase, the imitating firms can either be some firmsat the lowest productivity level who successfully imitate those above them, ornew entrants displacing firms at the lowest level of productivity. There is noleapfrogging in this model.We consider two variants of the model, depending on the way the culling occurs.A first variant is to fix the number of firms at each time step to an exogenousparameter N . Then, the number of removed firms during the culling phase mustbe equal to the number of imitated firms in the evolution phase to keep the totalnumber of firms constant. This model is called a N -BRW ( N Branching RandomWalk) and is discussed in section 4.1.Another possibility for the culling phase is to remove all firms lagging L produc-tivity steps or more behind the most productive firm, with L given exogenously.In this variant, the total number of firms fluctuate with time. This model is calleda L -BRW ( L Branching Random Walk) and is discussed in section 4.2.In the following sections we characterize the shape and properties of the produc-tivity distributions in the N -BRW and L -BRW models of innovation and imitation.The discussion is adapted from works that have been conducted on KPP frontssince the late nineties in the context of statistical mechanics, reaction-diffusionmodels and population genetics. A good point of entry on this literature is Brunet(2016).4.1. The N -BRW model. Before introducing the N -BRW, we need first to dis-cuss what a BRW is. A Branching Random Walk is a process in discrete timestarted from a single particle at the origin. At each time step, each particle (each“parent”) is replaced by a random number of particles (the “children”) positionedrelatively to the parent according to some point process. This rule is appliedindependently at each generation for each particle.For instance, following figure 2, the rule could be that a particle at y gives eitherone particle at y , or two particles at y , or one particle at y + 1 or two particlesrespectively at y and y + 1. The left part of figure 3 shows a BRW with a differentrule where each parent can have 1, 2 or 3 children.Note that the number of particles N t at each generation follows the followingrecursion: N t +1 = (cid:80) N t i =1 n i,t , where n i,t is the number of children of individual i attime t and where it is assumed that the n i,t are independent identically distributedrandom variables over integers. This is called a Galton-Watson process. In other position time position time Figure 3. Left: an exemple of BRW where, at each generation aparticle can have 1, 2 or 3 offsprings. Right: a N -BRW with N = 2obtained by keeping at each time step only the two highest childrenof the surviving particles of the previous time step. Notice that thisrule is not the same as keeping the two highest particles of the BRWat each time step.words, a BRW is a Galton-Watson process where we keep as an extra informationthe position of the particles. For simplicity, we exclude the possibility that aparticle has zero children and we insist that it has more than one child withpositive probability. Then, the population size increases exponentially with time.Denote by ( (cid:15) , (cid:15) , . . . , (cid:15) n ) the positions of the children relative to the parent(both n and the (cid:15) i are random). Then, under conditions on the laws of n and (cid:15) i ,listed for example in Gantert, Hu and Shi (2011) (see also there for references),one can show that the highest position y max ( t ) in the BRW at time t increaseslinearly with time:(26) lim t →∞ y max ( t ) t = v c , The conditions are:a) E [ n ] > n can be 0, and insisted that n > δ > E [ n δ ] < ∞ (in other words, there are never too many children.This is automatic if the number of children is bounded.)c) there exists δ > E (cid:2) (cid:80) ni =1 e δ(cid:15) i (cid:3) < ∞ (in other words, the children are not createdtoo much upwards relative to the parent. This is automatic if the number of children n and thedisplacements (cid:15) i are bounded, as in our case.)d) there exists δ > E (cid:2) (cid:80) ni =1 e − δ(cid:15) i (cid:3) < ∞ (in other words, the children are not createdtoo much downwards relative to the parent. This is automatic if the number n of children andthe displacements (cid:15) i are bounded.)e) The function v ( γ ) = γ log E (cid:2) (cid:80) ni =1 e γ(cid:15) i (cid:3) , which is necessarily well defined on some interval(0 , δ ) with δ ∈ (0 , ∞ ], must reach a minimum v c = v ( γ c ) on that interval. It is automatic in theexample developed below for any µ ∈ (0 , 1] and a ∈ (0 , NNOVATION AND IMITATION 19 with some velocity v c given by(27) v c = min γ v ( γ ) = v ( γ c ) with v ( γ ) = 1 γ log E (cid:20) n (cid:88) i =1 e γ(cid:15) i (cid:21) , as soon as this minimum exists for some γ c > (cid:15) i and the number n of chil-dren. γ c is the value of γ for which the minimum is reached.For instance, with the rules of figure 2, one checks that v ( γ ) = 1 γ log (cid:2) µ + a ( e γ − (cid:3) . Indeed, E (cid:104) n (cid:88) i =1 e γ(cid:15) i (cid:105) = (1 − a )(1 − µ ) × e + (1 − a ) µ × ( e + e )+ a (1 − µ ) × e γ + aµ × ( e + e γ ) = 1 + µ + a ( e γ − . We can now define the N -BRW. The evolution for one time step of a N -BRWgoes like a BRW, except that after each step only the N highest particles are kept,the other being removed, so that after some time there are exactly N particles inthe system at each time step. Note that this rule is not the same as keeping the N highest of a BRW at each time step; see the right part of figure 3.The N -BRW and related models (the N -BBM, the stochastic Fisher equation)have been studied in mathematics, theoretical physics and biology and severalresults are known both from non-rigorous and rigorous arguments.For the N -BRW, a striking result is that one can still define a velocity v N forthe highest particle, as in (26). This velocity depends on N , converges to v c as N → ∞ , but the speed of convergence is unexpectedly slow (this is explained inBrunet and Derrida (1997) with a rigorous proof provided by Berard and Gou´er´e(2010) for the case µ = 1). Theorem 1 (Velocity. Berard and Gou´er´e (2010)) . For the N -BRW with µ = 1 ,we have: (28) v N = v c − π v (cid:48)(cid:48) ( γ c )2 L + o (cid:18) L (cid:19) with (29) L = 1 γ c log N,v c , v ( γ ) and γ c defined as in (27) and o (1 /L ) a term that is vanishing faster than /L as N → ∞ . (Nota: even though a proof is available only in one case, heuristic argumentsand numerical simulations suggest that (28) holds in a large number of cases.) Size of support. Based on numerical observations and phenomenological theory forclosely related models, it is believed (see Brunet and Derrida (1997) and Brunet,Derrida, Mueller and Munier (2006)) that after a long time, the system reaches astationary regime as seen from the center of mass of the system. Here, stationaryis to be interpreted in a probabilistic sense: while for finite N , there are stillfluctuations, the laws determining the system become stationary. In this stationaryregime, the size of support, which is the difference between the position y max ofthe highest particle and the position y min of the lowest particle, satisfies(30) L := y max − y min = L + O (1) , with L as in (29) and O (1) is designating a random variable whose law becomesindependent of N in the large N limit. (Therefore, it will be smaller and smalleras compared to L when N → ∞ .).By construction, a finite number of firms assures a productivity distribution thathas a finite support at any fixed time, but what (30) means is that the firms haveat all time comparable productivity levels, and the scenario where some firms stayput while others diverge at infinity due to innovations cannot occur. However,because the process is stochastic, there is a probability of a firm with an extendedstreak of successful innovations breaking out for a while, so that the support ofproductivity distribution may occasionally get large, but after some time laggardfirms will catch-up via imitation and close the gap. Shape of the front. Another interesting result concerns the typical density of thecloud of particles in the stationary regime. To simplify the discussion, we assumethat the underlying BRW is the one described in figure 2. Then, the populationlives on the lattice, and we introduce f ( y, t ) the fraction of particles (or firms) atposition (or quality level) y at time t .After the reproduction phase (but before the culling phase, see (25)), the ex-pected fraction of firms at position y and time t +1 is (1 − a + µ ) f ( y, t )+ a f ( y − , t ).Then, one could write the evolution equation as(31) f ( y, t + 1) = (1 − a + µ ) f ( y, t ) + a f ( y − , t ) + (noise) if y > y min ( t + 1) , where y min ( t ) is the position of the lowest firm at time t (the values of y min ( t + 1)and of f (cid:0) y min ( t + 1) , t + 1 (cid:1) are obtained by writing (cid:80) y f ( y, t + 1) = 1). Thenoise term is some random number with zero expectation and standard deviationof order O ( (cid:112) f /N ), depending on the density . The value for N [ f ( y, t + 1) − f ( y, t )] before the culling phase is (the number of newfirms innovating from y − 1) minus (the number of firms innovating to y + 1) plus (thenumber of imitators). These three terms are independent Binomial random variables and soone finds that the exact expression for the standard deviation of the noise term in (31) is (cid:112) [ f ( y − , t ) a (1 − a ) + f ( y, t ) a (1 − a ) + f ( y, t ) µ (1 − µ )] /N . NNOVATION AND IMITATION 21 As N → ∞ , in the so-called hydrodynamic limit, the noise term in (31) isexpected to disappear. While there are no rigorous result concerning this hydro-dynamic limit for the N -BRW, such a result exists for two closely related models,see Durrett and Remenik (2011) and De Masi, Ferrari, Presutti and Soprano-Loto(2019). In the first model, time is continuous, and at rate 1 each particle creates anadditional particle at a random distance (cid:15) ; when this occurs, the lowest particle isremoved to keep the population constant. The second model is the N -BBM, whichcan be described as follows: time and space are continuous. N particles performindependent Brownian motions. At rate 1, each particle creates an offspring at itsown position ( (cid:15) = 0); when this occurs, the lowest particle is removed to keep thepopulation constant.The equation obtained in this large N limit, as given by (31) without the noiseterm, is reminiscent of the model described in section 2.2. The only remainingdifference is that in section 2.2, the imitation rate was tuned at each time step insuch a way that y min ( t ) would increase by exactly one unit at each time step. In(31) (with or without the noise term), the imitation rate µ is fixed exogenouslyand, depending on its value, the lower bound y min ( t ) can increase by several unitsin a time step or take several time steps to increase by one unit.The evolution equation is maybe easier to write on h ( y, t ) = (cid:80) z ≥ y f ( z, t ), whichrepresents the fraction of firms with a quality level at least y . One checks that(32) h ( y, t + 1) = min (cid:104) , (1 − a + µ ) h ( y, t ) + a h ( y − , t ) + (noise) (cid:105) where, here again, the noise term disappears in the large N limit. Without thenoise term, (32) is the discrete-time version of the equation studied in [8] whichwas shown to display most of the characteristics of the Fisher-KPP equation.With the noise term, it is very similar to the equations studied in Brunet andDerrida (1997), Brunet and Derrida (2001) and Brunet, Derrida, Mueller andMunier (2006) papers, as well as Mueller, Mytnik and Quastel (2011), which iswith continuous time and space.As suggested by Brunet and Derrida (1997), the velocity (28) of the noisy front(and thus of the N -BRW) could be obtained to the 1 /L order by replacing thenoise term in (32) by a cutoff of order 1 /N , meaning that after each time step thevalue of h is set to 0 at all the positions y where the evolution equation leads to aresult smaller than 1 /N . Furthermore, the shape of the front at large times is forlarge N (and hence large L = (log N ) /γ c ), large z and large L − z (so that z isnot too close to 0 or to L ) approximately given by(33) h ( y min ( t ) + z, t ) ≈ AL sin πzL e − γ c z . Notice then that, to leading order, the density f ( y, t ) = h ( y, t ) − h ( y + 1 , t ) is givenby the same equation with the prefactor A replaced by A (1 − e − γ c ) .The shape of the front (33) is for the front equation (31) with the noise replacedby a cutoff. For the N -BRW model itself as described by equation (31) with itsnoise term, Brunet, Derrida, Munier and Muller (2006) give the following non-rigorous phenomenological description of the model. This description is supportedby numerical simulation and, to some extent, by rigorous work (Maillard (2016)).In the N -BRW, the shape of the front is most of the time given by the cutoffshape (33) plus some small fluctuations. Occasionally, typically every ∝ L unitsof time, a huge fluctuation occurs where the shape of the front is significantlydifferent from (33) for about ∝ L units of time. Such a huge fluctuation comesin the following way: a single particle moves up further than typical (a singlefirm innovates a lot in a short time). That particle branches as usual (the firmis imitated), but its ‘imitation offspring’, i.e. its imitators, the imitators of itsimitators, etc., are at first rarely removed from the system because they typicallylie above the others (they have better quality than the other firms). The end effectof such a fluctuation is that a positive fraction of all the firms are replaced by theimitation offspring of the highly successful firm that started the fluctuation. So, toreformulate, based on numerical computations for similar models, in the stationaryregime, a density of firms like (33) is expected, while occasionally (every ∝ L unitsof time), a single firm innovates a lot and gets imitated by so many firms that itredefines the industry (in the sense that the innovation is shared by a positivefraction of the agents). The transition time to redefine the industry is of order ∝ L .This is particularly interesting: at random times, a firm is so successful that afull fraction of the industry ends up imitating it (or its imitators).A word of caution: the results presented above are asymptotic results, whichare believed to be valid for large values of N . It is not obvious that N = 10 or N = 10 are big enough for these results to be very accurate.4.2. The L -BRW model. A variant of the N -BRW is the L -BRW which mightbe more adapted to describe a situation where lagging firms go out of businessand new firms enter the market. The evolution phase of the L -BRW (innovationand imitation) is the same as for the BRW or the N -BRW (particles innovate andare imitated), but the culling phase is different; in the L -BRW, at each time step,firms with a productivity lagging more than L level below the leading firm areremoved from the system, as it is assumed that they are not productive enough tosurvive the market. Here, the parameter L is given exogenously.In the L -BRW, the number N of firms fluctuates. However, for large L , oneobserves that the number N of firms fluctuates around some average value N An interesting question, which we postpone to another paper, is whether the sin prefactorcan be observed in real data. NNOVATION AND IMITATION 23 with(34) N = e γ c L which is formally the same relation as (29).The heuristic argument of Brunet, Derrida, Mueller, and Munier (2006) is thata N -BRW and L -BRW have very similar typical behaviors: in the N -BRW, N is agiven parameter and L (defined as the observed support size or distance betweenthe best and worst firm) fluctuates, while in the L -BRW, it is the support size L which is given, and the population size N fluctuates. In either case, one has therelation L ≈ γ c log N. Then, one expects that the velocity v L of the L -BRW is given by(35) v L ≈ v c − π v (cid:48)(cid:48) ( γ c )2 L (compare to (28)), that the average shape of the front is given by the sine shape(33) of the cutoff theory, etc.There is unfortunately no rigorous paper establishing these results for the L -BRW. However, Pain (2016) has established result (35) in the case of the L -BBM(where BBM stands for Branching Brownian Motion) which is a continuous versionof the L -BRW. More precisely, in the L -BBM, particles perform Brownian motions.With rate 1, a particle is replaced by two particles, and any particle at a distancemore than L from the highest particle is removed. The fact that (35) holds for the L -BBM and the close similarity between the L -BBM and the L -BRW is a strongindication that the heuristic arguments are correct.5. Conclusion We model the dynamics of technology diffusion to characterize shapes of the sta-tionary firm productivity distributions with a skew, and explore conditions thatwill lead to compact productivity supports. Innovation and imitation activitiesmove the productivity distribution forward, and compact supports can be sus-tained as competition causes the low productivity firms to exit. Section 2 providesa model generating skewed productivity distributions with compact support. Itrelies on an endogenously determined finite productivity ladder, sustained by afraction of firms that can leapfrog to the frontier. Section 4 introduces modelswith either a finite number of N firms, or a finite productivity support L . In bothcases the support of the productivity distribution remains compact; in the formercase the length of the support is stochastic while the number of firms are constant,and in the latter the support length is fixed while the number of firms fluctuates. References [1] P. Aghion and P. Howitt., A model of growth through creative destruction , Econometrica (1992), 323-351.[2] U. Akcigit and W. R. Kerr, Growth through heterogeneous innovations , Journal of PoliticalEconomy (2016), 1374-1443.[3] J. Benhabib, J. Perla and C. Tonetti, Catch-up and fall-back though innovation and imita-tion . Journal of Economic Growth (2014), 1-35.[4] J. Benhabib, J. Perla and C. Tonetti, Reconciling models of diffusion and innovation: ATheory of the Productivity Distribution and Technology Frontier , NBER WP Brunet-Derrida behavior of branching-selection particle systemson the line , Commun. Math. Phys. (2010), 323–342.[6] C.P. Bonini and H. Simon, The size distribution of business firms , American EconomicReview ( 1958), 607-617.[7] M. Bramson, Convergence of solutions of the Kolmogorov equation to travelling waves , Amer-ican Mathematical Society, Providence, RI, 1983.[8] E. Brunet and B. Derrida, An exactly solvable travelling wave equation in the Fisher KPPclass , Journal of Statistical Physics (2015), 801–820.[9] E. Brunet and B. Derrida, Shift in the velocity of a front due to a cutoff , Physical ReviewE. (1997), 2597-2604.[10] E. Brunet and B. Derrida, Effect of microscopic noise on front propagation , Journal ofStatistical Physics (2001), 269–282.[11] E. Brunet, B. Derrida, A. H. Mueller, and S. Munier, Phenomenological theory giving thefull statistics of the position of fluctuating pulled fronts , Phys Rev E : 056126, (2006).[12] E. Brunet and B. Derrida, A. H. Mueller, S. Munier, Noisy traveling waves: Effect ofselection on genealogies , Europhys. Lett., , (2006) 1-7 .[13] E. Brunet, Some aspects of the Fisher-KPP equation and the branching Brownian motion ,Habilitatio`a diriger des recherches (2016).[14] F. J. Buera, and R. E. Lucas, Idea flows and economic growth . Annual Review of Economics10 (2018), 315-345.[15] A. De Masi, P. A. Ferrari, E. Presutti, and N. Soprano-Loto, Non local branching Brownianswith annihilation and free boundary problems , Electron. J. Probab. (2019), 1-30.[16] R. Durrett and D. Remenik, Brunet–Derrida particle systems, free boundary roblems andWiener–Hopf equations , Ann. Probab. (2011), 2043-2078.[17] N. Gantert, Y. Hu, and Z. Shi, Asymptotics for the survival probability in a killed branchingrandom walk , Annales de l’I.H.P. Probabilit´es et Statistiques (2011), 111-129.[18] H.A. Hopenhayn, Entry, exit, and firm dynamics in long run equilibrium , Econometrica ( 1992),1127-1150.[19] C. T. Hsieh, T. Chang and Peter J. Klenow, Misallocation and manufacturing TFP in Chinaand India , The Quarterly Journal of Economics (2009), 403-1448.[20] B. Jovanovic and R. Rob, The growth and diffusion of knowledge , Review of EconomicStudies (1989), 569-582.[21] T. J. Klette and S. Kortum, Innovating firms and aggregate innovation , Journal of PoliticalEconomy (2004), 986-1018.[22] M. D. K¨onig, J. Lorenz and F. Zilibotti, Innovation vs. imitation and the evolution ofproductivity distributions. Theoretical Economics (2016), 1053–1102.[23] A. Kolmogorov, I. Petrovsky and N. Piscounov, ´Etude de l’ ˜A c (cid:13) quation de la diffusion aveccroissance de la quantit´e de mati`ere et son application `a un probl`eme biologique, Bull. Univ.´Etat Moscou, A (1937), 1-25. NNOVATION AND IMITATION 25 [24] L. Ljungqvist, and T. J. Sargent, Recursive macroeconomic theory , MIT Press, Cambridge,MA, 2018.[25] R. E. Lucas, Ideas and growth , Economica (2009), 1-19.[26] R. E. Lucas, and B. Moll, Knowledge growth and the allocation of time , Journal of PoliticalEconomy (2014), 1-51.[27] R. E. Lucas and N. L. Stokey, Recursive methods in economic dynamics , Harvard UniversityPress, Cambridge, MA, 1989.[28] E. G. J. Luttmer, Selection, growth, and the size distribution of firms , Quarterly Journal ofEconomics (2007), 1103-1144.[29] E. G. J. Luttmer, Eventually, noise and imitation implies balanced Growth, working paper699, Federal Reserve Bank of Minneapolis (2012).[30] P. Maillard, Speed and fluctuations of N-particle branching Brownian motion with spatialselection , Probability Theory and Related Fields (2016), 1061-1173.[31] H. P. Mc Kean, Applications of Brownian motion to the equation of Kolmogorov-Petrovski-Piscounov , Commun. Pure Appl. Math. (1975), 323-331.[32] C. M¨uller, L. Mytnik and J. Quastel, Effect of noise on front propagation in reaction-diffusion equations of KPP type , Invent. math. (2011), 405–453.[33] M. Pain, Velocity of the L-branching Brownian motion. Electron . J. Probab, 21 (2016), 1-28.[34] J. Perla, Jesse and C. Tonetti. Equilibrium imitation and growth. Journal of Political Econ-omy (2014), 52-76.[35] P. Romer, Endogenous technical change , Journal of Political Economy, Part 2 (1990),S71-S102.[36] M. Staley, Growth and the diffusion of ideas , Journal of Mathematical Economics (2011),470-478.[37] P. Segerstrom, Innovation, imitation and economic growth, Journal of Political Economy, (1991), 807-27.[38] C. Syverson, Market structure and productivity: A concrete example , Journal of PoliticalEconomy, (2004), 1181-1222.(2004), 1181-1222.