[PDF] A Quantitative History of A.I. Research in the United States and China

Abstract

Motivated by recent interest in the status and consequences of competition between the U.S. and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore and quantify trends in publications on A.I. topics from institutions affiliated with each country. We find the total volume of publications produced in both countries grows with a remarkable regularity over tens of years. While China initially experienced faster growth in publication volume than the U.S., growth slowed in China when it reached parity with the U.S. and the growth rates of both countries are now similar. We also see both countries undergo a seismic shift in topic choice around 1990, and connect this to an explosion of interest in neural network methods. Finally, we see evidence that between 2000 and 2010, China's topic choice tended to lag that of the U.S. but that in recent decades the topic portfolios have come into closer alignment.

Full PDF

AA Quantitative History of A.I. Research in the United States and China

Daniel Ish, Andrew Lohn, and Christian CurridenRAND Corporation, 1776 Main Street, Santa Monica, CA, USASeptember 2019

Abstract

Motivated by recent interest in the status and consequences of competition between the U.S.and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore andquantify trends in publications on A.I. topics from institutions aﬃliated with each country. We ﬁndthe total volume of publications produced in both countries grows with a remarkable regularity overtens of years. While China initially experienced faster growth in publication volume than the U.S.,growth slowed in China when it reached parity with the U.S. and the growth rates of both countriesare now similar. We also see both countries undergo a seismic shift in topic choice around 1990,and connect this to an explosion of interest in neural network methods. Finally, we see evidencethat between 2000 and 2010, China’s topic choice tended to lag that of the U.S. but that in recentdecades the topic portfolios have come into closer alignment. a r X i v : . [ c s . D L ] J un Introduction

In July 2017, the State Council of the People’s Republic of China announced its goal to make China theworld leader in A.I. technology by 2030.[1] This interest in developing AI technology is easily understoodby considering present assessments of potential economic beneﬁts[2] and with recent interest in thedefense applications of this technology.[3] In the United States, this statement was received in somequarters as indicating a competitive approach to AI on the part of China and heralding a new periodof technological competition.[4] Naturally, as with any competition, the question of who is ahead haspiqued the interest of a number of observers.A number of groups across both the US and China produce quantitative analyses of the state andrecent evolution of the AI research and technology communities.[5, 6, 7, 8] Generally speaking, theseanalyses treat publication count and impact as a proxy for the productivity of a given academic com-munity and do tend to engage, to one extent or another, with the question of which countries have morerobust AI communities. Some even attempt to directly answer the question of who is ahead in the AIrace between U.S and China, such as recent work from the Allen Institute that takes scientiﬁc publica-tions partitioned by impact factor as its unit of analysis.[5] These analyses neglect any consideration ofthe structural features likely present in these communities, however, like the long-known dynamics of thegrowth of publication volume[9] or the dynamics of knowledge sharing through social networks[10, 11]and across geographic[12, 13, 14], institutional[15] or disciplinary[16] boundaries. Nor, indeed, do theseanalyses grapple with the diﬃcult question of how research activities provide beneﬁts to the communitiesin which they take place.[17, 18, 19] Without putting analyses of the AI research communities in the USand China in the context of their internal dynamics, the relationship between the two communities andthe means by which they provide value to their host communities, one cannot get a meaningful pictureof who is “ahead," the rate of change of the positions of the two communities or, indeed, the extentto which the picture of two independent entities in competition is even meaningful. Conversely, to theauthors’ knowledge, the sciencometric literature has relatively little to say about how to think aboutthe scientiﬁc dimensions of nation-state competition.This work is intended as a ﬁrst step towards a rigorous understanding of the scientiﬁc dimensionsof nation-state competition. Looking at long term trends in A.I. publications in the U.S. and China,we will see that both countries have exhibited remarkably stable exponential growth in the volume ofpublications aﬃliated with their institutions. The single exception we observe is a reduction in the rateof this exponential growth in institutions aﬃliated with China when their publication volume ﬁrst beganto match that of the U.S. between 2008 and 2009. We also report a metric for the overall similaritybetween the choice of topics in these two countries over time, and ﬁnd evidence of a dramatic andpersistent shift in the topic choices of both countries individually around 1990. Across this shift, wenotice that China’s choice of research topics through the 1990s and 2000s generally lagged that of theU.S., with our metric suggesting that until 2010 the choice of topics in China’s research portfolio tendedto more closely resemble that of the U.S. in previous years than that of the U.S. in the correspondingyear. We oﬀer a possible explanation for these phenomena, noting a dramatic increase of the proportionof publication on neural networks across both countries during the 1990s which then receded over timeto be overtaken by a more diverse portfolio of topics. This shift was especially strong in China, wherewe note a marked overall reduction in the diversity of the topic portfolio during the 1990s. Both oursimilarity and diversity metrics, together with direct inspection of the popularity of selected topics,suggest that starting in 2010 these gaps closed and the modern topic portfolio of China bears closeresemblance to that of the U.S.Our results do not show any signals which map cleanly onto the onset of a competitive relationshipbetween the two research communities. On the contrary, our results suggest that the A.I. research1ommunities in these two countries appear to be closely linked and develop without being aﬀected bythe relationship between their governments. The apparent shift of China’s publication growth when itﬁrst matched that of the U.S., the presence of the dramatic topic shift corresponding to an increasein focus on neural network methods in 1990 across both countries, and the evolution of China’s topicchoice to more closely match that of the U.S. all suggest that the two research communities interact andaﬀect each other’s development. Subsequent research is needed to determine whether this phenomenonis borne out in more granular co-authorship and citation data and begin to address questions of thevalue each research community is providing to its host country above that which it provides to theinternational community. Further research is also needed to determine the degree to which these resultsare speciﬁc to the AI community. For example, robust private sector interest in AI technology[8] couldbe swamping any eﬀect due to national priorities, which could be explored by comparing our resultshere with a comparable study of a technology more closely aﬃliated with government priorities, e.g.aerospace research and development. Beyond explorations of the role of science and technology ininternational competition, the metrics presented here form a lightweight suite of techniques for exploringthe relationship of research communities across time and boundaries that is complimentary with studiesof citation and co-authorship.

Deﬁning the boundaries of the ﬁeld of artiﬁcial intelligence can be a research question unto itself.[6]Typically, AI is deﬁned as the set of tasks technologies which allow computers to complete tasks whichseem to require intelligence, most frequently though not exclusively through the use of machine learning,a set of techniques for using computers to discover statistical relationships in data.[20] This includestasks like image and speech recognition and natural language processing.[6] This focus on machinelearning techniques is a modern development in the ﬁeld, with older research utilizing a wide variety oftechniques, including formal logic.[21]The subject of our study is data scraped from the web interface of Scopus, an abstract and citationdatabase run by Elsevier. We automatically queried Scopus for articles aﬃliated with China or the U.S.with certain A.I. related keywords in the title, abstract or keyword ﬁelds and recorded the number ofresults by year. Scopus deﬁnes article country aﬃliation as the country in which the author’s aﬃliatedinstitution is located. Keywords were generated from the methodologies described in the scikit-learndocumentation as of January 2019 and topics listed in the tutorials and workshops at the 2018 NeurIPS,ICML, and AAAI conferences. Scikit-learn was the second most popular open-source machine learningsoftware in 2019 as measured by cumulative GitHub "stars"[8] and contains extensive documentationdescribing the wide variety of techniques supported. NeurIPS, ICML and AAAI were the largest, thirdlargest and ﬁfth largest AI conferences by attendance in 2019, with NeurIPS and ICML exhibiting thefastest growth.[8] We included terms from these resources among our search keywords if, in the judgmentof the authors, they represented a topic or method speciﬁc to AI research. We kept results starting in1981 for papers aﬃliated with China, since 1980 was the latest year for which we did not ﬁnd anypublications associated with our keywords and aﬃliated with China on Scopus. For the U.S., we keptresults starting in 1962, as this was the earliest year for which we found data. For both countries, weonly analyzed data up to 2018, since that was the last full year of data when the analysis was conducted.Before proceeding with our analysis, we oﬀer a few caveats about the quality of the data we obtainedthis way. The ﬁrst and most glaring issue is that our data collection methodology does not capture anyinformation about publications that are not written in English. As such, our estimate of publicationvolume aﬃliated with China are probably a signiﬁcant underestimate and it is possible that the overall2opic portfolio of research aﬃliated with China is diﬀerent from what we see in our data. Indeed, whileour search for “machine learning" returned , articles aﬃliated with China on Scopus, a search for theMandarin equivalent (“ 机器学习 ” ） on CNKI, a repository of scholarly publications Chinese scholarlypublications, returned , results, suggesting rough parity between the size of the English andMandarin literatures. In their recent work tracking China’s overall scientiﬁc output, Qingnan Xie andRichard Freeman found this to be true of scientiﬁc articles in general.[22] We did not collect informationon citations for each paper, so we have no proxy for the quality or impact of a particular paper. Ouranalysis also relies on assigning national aﬃliation by institution, which elides the tricky question of howto assign the research production of, for example, a Chinese national aﬃliated with a U.S. university ora U.S. national aﬃliated with a Chinese A.I. research ﬁrm. While this omission seems especially acute inour dataset, it is emblematic of the diﬃculty inherent in analyzing entangled international institutionslike research communities in the frame of competition between nation states.We should also point out that the way we assembled our dataset and chose our keywords is somewhatad hoc and biased towards the modern conception of the ﬁeld in the U.S. technology sector. One mightreasonably be concerned about missing research on substantially similar topics, particularly the fartherinto the past one looks, possibly even connected to the citation network of some of the papers we doassemble. Based on some checks performed by hand, it is also likely that some number of publicationswhich the authors would not consider artiﬁcial intelligence or machine learning papers made it intothe dataset. For example, we observed psychology papers among the results for our search of “transferlearning." We are somewhat concerned about the implications for our results on the keyword “logisticregression” in particular, since a large number of other ﬁelds of science use logistic regression as methodfor statistical inference. Ultimately, this presents a deﬁnitional problem like that tackled in the Elsevierstudy[6] and we proceed under the theory that further manual tweaks to our dataset to try to addressthese deﬁciencies are as likely to introduce additional bias as remove any. For this work, we leave it tothe reader to decide how substantively they believe these considerations aﬀect the results presented andlook forward to future work which explores this topic with a more robust data collection process. As a simplifying assumption to enable our analysis, we take each of these keywords to be mutuallyexclusive. That is, we assume that the papers returned in the search for each keyword do not appear inthe searches for any other keyword. Put another way, we ignore any double counting that has occurredin the process of assembling our data. This is certainly not true in reality (e.g. machine vision papersare likely to utilize neural networks), but is hopefully close enough to true as a ﬁrst approximation toprovide some insight. To address the question of trends in Chinese and U.S. publication volume relativeto one another, we develop two metrics. The ﬁrst of these is simply the total number of publications weobserved under the assumption of mutual exclusivity. That is, if n C ( y, k ) (resp. n U ( y, k ) ) is the totalnumber of publications observed with keyword k in year y from China (resp. the U.S.), we track n I ( y ) = (cid:88) k n I ( y, k ) (1)as our proxy for the total number of papers produced, where I = C or U . A visual inspection of thedata suggests that the total volume for the U.S. is exponential over the whole period of study, so we ﬁt On 8/21/2019

3t to the model ln( n U ( y )) = m U y + b U + (cid:15) U ( y ) (2) Year T o t a l N u m b e r Sum of Publication Number over Keywords

USUS fitChinaChina Fit1960 1970 1980 1990 2000 2010 2020

US Year Y ea r s o f L a g Lag Averaged over Keywords lagfit

Fit Value Std. Err. t -statistic m U m C µ C -0.2210 0.015 -14.756 m δ -0.5964 0.008 -76.245Figure 1: Plots and linear ﬁts of keyword-averagedlag and total volume. Note the logarithmic scale onthe plots of total volume.For China, the data also appears to follow anexponential growth pattern, though there is somesuggestion that the rate of growth of the exponen-tial changes when the publication volume of Chinaﬁrst matches that of the U.S. between 2008 and2009. So, we ﬁt the total volume data for Chinato the model ln( n C ( y )) = m C y + µ C ( y − y ) + + b C + (cid:15) C ( y ) (3)where ( x ) + = (cid:26) x x ≥ x < (4)and y is chosen to be the point in time wherewe estimate that Chinese total publication volumeﬁrst met that of the U.S. by linear interpolationbetween the data points in 2008 and 2009. µ C isthe magnitude of the change in slope of the log-linear relationship, i.e. the putative change in therate of growth of the exponential.The second metric is somewhat more cir-cuitous, but does not depend on our assumption ofmutual exclusivity. We deﬁne the “lag" of the key-word k at year y , δ k ( y ) , to be the smallest integersuch that n C ( y + δ k ( y ) , k ) ≥ n U ( y, k ) (5)if such an integer exists. That is, the lag is thenumber of years by which Chinese publication vol-ume is behind the U.S. publication volume in key-word k . For example, in 1987, the U.S. volumeof papers with the keyword “neural network" was , and China did not meet or exceed that volumeof papers with the keyword “neural network" until1993 when we observed such papers. Thus the lag for the year 1987 and keyword “neural network"was 6 years. We then calculate the lag averaged over keywords δ ( y ) = 1 m ( y ) (cid:88) k δ k ( y ) (6)where m ( y ) is the number of keywords in year y for which δ k ( y ) exists, and ﬁt it to the model δ ( y ) = m δ y + b δ + ε ( y ) (7)4he relationship between total volume and lag is somewhat complicated. If every keyword wereindividually exponential in time with the same rate of growth and µ C = 0 (which is not the case in ourdata), we would expect that m δ = m U − m C m C (8)That is, if all the keywords grew at the same rate the slope of the lag would be related to the ratesof growth of the total volumes by Equation 8. However, this relationship is no longer guaranteedunder weaker assumptions. To see this, note that the lag weights each keyword individually while thetotal volume ﬁt weights them by their observed volume. That is, this relationship could be violated ifour data had a large number of low-volume keywords and a few high volume keywords with radicallydiﬀerent time dependences. Put another way, the lag incorporates some information about how thekeyword distributions compare and some information about the volume over time within each keywordand, generally speaking, rewards diverse research portfolios. The lag would remain large, for example,if China’s growth to match the U.S. in total publication volume occurred only within a few keywordswith the U.S. continuing to lead by large numbers of years in many other keywords. Furthermore, it ismathematically possible for the lag to not assume a linear dependence on time despite the total volumesfrom each country remaining exponential in time.We obtain the ﬁt parameters for all these models with the ordinary least squares estimator, asthis estimator remains consistent under any structure for the covariance matrix of the errors. TheDurbin-Watson statistic of each of these ﬁts suggest positive autocorrelation, however, so to quantifythe extent to which these ﬁt parameters diﬀer from zero we use a Bartlett kernel HAC estimator withbandwidth √ N , where N is the number of data points, to calculate the standard error of each of the ﬁtparameters.[23] Thus, the “ t -statistic" reported does not have a t distribution in ﬁnite samples, thoughit remains asymptotically normally distributed.In general, these ﬁts are quite convincing. The t statistics for the parameters of interest listed inthe table in Figure 1 are all quite large, assuaging any anxiety we might have about the ﬁnite sampledistributions of these quantities. This analysis suggests that publication volume on these topics in bothcountries grew exponentially in time, and that the time constant of China’s growth shifted to come intomore close alignment with that of the U.S. when the two countries began to have similar publicationvolumes. This could be due, for example, to the growth of both countries’ publications being limited bya third independent process, e.g. the growth in the number of academic journals accepting publicationson these topics. The right-hand quantity in Equation 8 is more negative than m δ by more than times the standard error in m δ . This is easily explained by our subsequent explorations of the keyworddistribution, as China’s heavy focus on neural networks relative to the U.S. during its period of initialgrowth likely resulted in a larger lag value than simply observing the volume would suggest. This resultsin the lag suggesting that the moment China “caught up" to the U.S. was around mid-2011, later thanthe date suggested by the total volume, in mid-2008. It is remarkable that the dynamics of the diversityof each country’s portfolio interact with those of the total volume to produce a linear dependence of thelag on time, despite the complicated structure of topic choice over time we will see below.5 Topic Distributions

US Year C h i n a Y ea r US-China Total Variation Distance

US Year U S Y ea r US-US Total Variation Distance

China Year C h i n a Y ea r China-China Total Variation Distance

Figure 2: Plots of the total variation distance acrossboth country and year. Darker colors indicate moresimilar topic choice.We turn now to attempting to quantify and ex-plore the similarities of topic choice both betweencountries and over time. We interpret the quantity ˆ p I ( y, k ) = n I ( y, k ) n I ( y ) (9)as being an estimate of the underlying probabilityfor a paper produced by country I in year y topossess keyword k . We then investigate the totalvariation distance ∆ I,I (cid:48) ( y, y (cid:48) ) = 12 (cid:88) k | ˆ p I ( y, k ) − ˆ p I (cid:48) ( y (cid:48) , k ) | (10)as our measure of the diﬀerence between the dis-tribution of keywords of country I in year y andthat in country I (cid:48) in year y (cid:48) . Under the mutualexclusivity assumption, this quantity is the max-imum diﬀerence in probability the two distribu-tions would assign to an event.Even absent this assumption, however, we canstill regard ˆ p I ( y, k ) for ﬁxed I and y as a randomvector of unit (cid:96) norm which captures some infor-mation about the distribution of topics. ∆ is thensimply half the (cid:96) distance between these vectors,which still captures some idea of the distance be-tween the topic distribution in diﬀerent years andcountries. In either interpretation, smaller valuesof ∆ indicate that the topic portfolios being com-pared bear a closer relationship to one another.Looking to Figure 2, we ﬁnd a remarkableamount of structure. Both counties show a markedtransformation of their research portfolio around1990 that persists to the present day. This is per-haps most striking in the case of the U.S., whichhas signiﬁcant autocorrelation both before and af-ter this sudden shift. We see smaller shifts takeplace both before and after this realignment inthe U.S. portfolio, suggesting a balance of topicsslowly shifting over time. China does not show thesame clear, marked autocorrelation before about1990, but this is easily understood by returningto Figure 1 and noticing that before about 1990China was producing less that 100 English lan-guage publications on these topics per year. The 6

960 1970 1980 1990 2000 2010 2020

Year F r ac ti on o f P ub li ca ti on s Fraction of US Publication Volume for Selected Keywords discriminantanalysisfactoranalysis logisticregressionmachinelearning nearestneighborsneuralnetwork reinforcementlearning1980 1985 1990 1995 2000 2005 2010 2015

Year F r ac ti on o f P ub li ca ti on s Fraction of Chinese Publication Volume for Selected Keywords birchfactoranalysis knowledgerepresentationnearestneighbors neuralnetworkprincipalcomponentanalysis ridgeregressionsupportvectormachine

Figure 3: Plots of the fraction of publication volume for selected keywords over time. Keywords wereincluded if they contributed at least 12% of total volume from the country in question in at least oneyear.quantity in Equation 9 will be more noisy for smaller sample sizes, and so too will our estimate of thetotal variation distance between the underlying probabilities.Considering the third plot in Figure 2, which depicts the total variation distance between the keywordprevalence in the U.S. and China across all available years, we see some evidence of similarity betweenthe research programs of the two countries which carries across the large shift around 1990. Before1990, this relationship is more noisy, likely due to the low volume of English language publications out7 .00 0.05 0.10 0.15 0.20 0.25

Fraction of Papers by Country logistic regressionmachine learningneural networkdeep learningfactor analysisprincipal component analysissupport vector machinedecision treerandom forestnearest neighborsnatural language processingmulti-agent systemsreinforcement learninggeneralized linear modeldiscriminant analysisk-meanrecommender systemssingular value decomposition

Keywords with at least 1% of total volume in 2018

United StatesChina

Figure 4: Fraction of total publication volume in 2018 by country for keywords which make up at least1% of total volume of at least one country in that year.of China during that time period. The two countries had similar research programs during the 1990s,during which time both exhibited visible autocorrelation. During the 2000s, however, China’s researchprogram bore more resemblance to the research program of the U.S. in the 1990s than that of the 2000s.In the past decade, however, China’s research program has become more closely aligned with the U.S.program from a broad base of years between 2000 and the present, mirroring the autocorrelation of thecurrent U.S. research program.

Year E n t r opy Keyword Distribution Entropy

USChina

Figure 5: The entropy of the two publication dis-tributions (as estimated by Equation 11) over time.While the total variation distance provides anattractive measure of the similarity between theresearch programs of diﬀerent countries and yearswhich captures the contributions of all keywords,it does not easily expose which keywords actuallyparticipated in these shifts in research nor admita clear intuition for how similar two research pro-grams actually are given their total variation dif-ference. Figure 3 addresses the ﬁrst question, al-lowing us in particular to assign the seismic shiftthat occurred in research focus around 1990 asdue at least in part to the rise of neural networks.Similarly, we can see one possible explanation forthe phenomenon observed in the third plot of Fig-ure 2 in that the proportion of U.S. publicationvolume devoted to neural networks waned muchmore quickly at the turn of the millennium thanthat of Chinese publication volume.Though the less popular topics which still met the criterion for inclusion in Figure 3 are diﬀerent8etween the two countries, we can look to Figure 4 to see that the leading topics in both countriestend to appear in similar though distinct proportions, at least in 2018. The two research programs arediﬀerent in emphasis rather than wholly diﬀerent, at least by this analysis. This presents us with a bitmore context on why the waning of popularity of neural networks in China began to bring China backinto closer alignment with the U.S.: the next most popular topics are, broadly speaking, shared betweenboth countries.Looking at Figure 3, we see some evidence that though the U.S. shifted some focus onto neuralnetworks, China shifted its focus much more aggressively. Some of that focus remains even today, as wecan see in Figure 4, which still shows neural networks making up a greater fraction of China’s publicationthan any single topic makes up of the U.S. publications. In order to probe the extent to which this focusimpacts the overall diversity of the research portfolio in China, we study the entropy S I ( y ) = − (cid:88) k ˆ p I ( y, k ) ln ˆ p I ( y, k ) (11)of the publication distribution as a probe of the overall diversity of the research portfolio. If a countrywere to publish all of its on a single topic in a year, we would have S I ( y ) = 0 . With our keywords, themaximum value S I ( y ) can take is ln(64) ≈ . , which would indicate a country that publishes equallyon all topics.Figure 5 conﬁrms quantitatively one of our suspicions from looking at Figure 3. During the 1990s,the overall topic diversity of China’s English language publication output declined precipitously due totheir focus on neural networks. Conversely, though the U.S. entropy shows a slight decline in the 1990s,the rise of neural networks did not aﬀect the overall diversity of U.S. publication nearly as strongly.Notably, as neural networks waned in popularity in China we see the overall diversity of publicationtopics rise to eventually exceed that of the U.S. Though the U.S. and China reacted to diﬀerent degrees to the recent interest in neural network methods,the data shows a dramatic consonance between the research programs in both time and topic choice.China was and remains more focused on neural network methods than the U.S., at least by this data, butboth countries clearly explore the same areas and react to the same trends in research topics. Notably,most of the regions of lower total variation distance between the two countries shown in Figure 2 occurabove the diagonal. That is, from about 1990 through to the present, the portfolio of topics thatinstitutions in China produced publications on in any given year bore more resemblance to the U.S. inprevious years than it did to the U.S. in that year or future years. This is consistent with a picturewherein China lagged the U.S. not just in volume but also in topic choice. Without further study,however, it’s diﬃcult to say whether this indicated that China’s research program was actually any lesseﬀective during that time.Further study is also needed to determine how a competitive posture between two governments aﬀectsthe interaction of their scientiﬁc communities, and indeed if it does at all. If some eﬀect is found, towhat degree is it heterogeneous across disciplines? Most critically, does this eﬀect actually correspondto each community providing beneﬁts to its host nation above those that it provides to the internationalcommunity as a whole? Though we have plenty of historical and anecdotal reason to suspect thatscience plays an important role in nation-state competition, at present there is no rigorous, quantitativeaccounting of this role. 9 eferences [1] Paul Mozur. Beijing wants A.I. to be made in China by 2030.

New York Times , 20,2017. (Accessed 8/23/2019).[2] Jacques Bughin, Jeongmin Seong, James Manyika, Michael Chui, and Raoul Joshi. Notes from theai frontier: Modeling the impact of ai on the world economy.

McKinsey Global Institute , 2018.[3] Danielle C. Tarraf, William Shelton, Edward Parker, Brien Alkire, Diana Gehlhaus Carew, JustinGrana, Alexis Levedahl, Jasmin Leveille, Jared Mondschein, James Ryseﬀ, Ali Wyne, Dan Elinoﬀ,Edward Geist, Benjamin N. Harris, Eric Hui, Cedric Kenney, Sydne Newberry, Chandler Sachs,Peter Schirmer, Danielle Schlang, Victoria M. Smith, Abbie Tingstad, Padmaja Vedula, and KristinWarren. The department of defense posture for artiﬁcial intelligence.

RAND Corporation , 2019. .[4] National Security Commission on Artiﬁcial Intelligence. Interim report. 2019. .[5] Field Cady and Oren Etzioni. China may overtake US in AI research.Allen Institute for Artiﬁcial Intelligence, https://medium.com/ai2-blog/china-to-overtake-us-in-ai-research-8b6b1fe30595 (Accessed 8/23/2019), 2019.[6] Artiﬁcial intelligence: How knowledge is created, transferred, and used. Technical report, Elsevier,2018.[7] Global artiﬁcial intelligence industry data report [ 全球人工智能产业数据报告 ]. Technical re-port, China Academy of Information and Communications Technology Data Research Center [ 中国信息通信研究院数据研究中心 ], 2019. (Accessed 8/22/19), Translation by Joy Dantong Ma and JeﬀreyDing available at https://chinai.substack.com/ , (Accessed 8/22/19), see issue AI Index Steering Committee, Human-Centered AI Institute, Stanford University, Stanford,CA , 2019.[9] Derek John de Solla Price.

Little science, big science . Columbia University Press New York, 1965.[10] Mark EJ Newman. The structure of scientiﬁc collaboration networks.

Proceedings of the nationalacademy of sciences , 98(2):404–409, 2001.[11] Yansong Hu. Hyperlinked actors in the global knowledge communities and diﬀusion of innovationtools in nascent industrial ﬁeld.

Technovation , 33(2-3):38–49, 2013.[12] Behlül Üsdiken and Yorgo Pasadeos. Organizational analysis in north america and europe: Acomparison of co-citation networks.

Organization studies , 16(3):503–526, 1995.[13] Jasjit Singh and Matt Marx. Geographic constraints on knowledge spillovers: Political borders vs.spatial proximity.

Management Science , 59(9):2056–2078, 2013.1014] Adam B Jaﬀe, Manuel Trajtenberg, and Rebecca Henderson. Geographic localization of knowledgespillovers as evidenced by patent citations. the Quarterly journal of Economics , 108(3):577–598,1993.[15] Adam B. Jaﬀe and Manuel Trajtenberg. Flows of knowledge from universities and federal labo-ratories: Modeling the ﬂow of patent citations over time and across institutional and geographicboundaries.

Proceedings of the National Academy of Sciences , 93(23):12671–12677, 1996.[16] Erjia Yan. Disciplinary knowledge production and diﬀusion in science.

Journal of the Associationfor Information Science and Technology , 67(9):2223–2245, 2016.[17] Adam B Jaﬀe. Real eﬀects of academic research.

The American economic review , pages 957–970,1989.[18] Lee G Branstetter. Are knowledge spillovers international or intranational in scope?: Microecono-metric evidence from the u.s. and japan.

Journal of International Economics , 53(1):53 – 79, 2001.[19] Stefano Breschi and Francesco Lissoni. Mobility of skilled workers and co-invention networks: ananatomy of localized knowledge ﬂows.

Journal of economic geography , 9(4):439–468, 2009.[20] National Academies of Sciences, Engineering, and Medicine.

Implications of Artiﬁcial Intelligencefor Cybersecurity: Proceedings of a Workshop . The National Academies Press, Washington, DC,2019.[21] Bruce G Buchanan. A (very) brief history of artiﬁcial intelligence.

Ai Magazine , 26(4):53–53, 2005.[22] Qingnan Xie and Richard B Freeman. Bigger than you thought: China’s contribution to scientiﬁcpublications and its impact on the global economy.

China & World Economy , 27(1):1–27, 2019.[23] Yixiao Sun, Peter CB Phillips, and Sainan Jin. Optimal bandwidth selection in heteroskedasticity–autocorrelation robust testing.

Econometrica , 76(1):175–194, 2008.11