The Privatization of AI Research(-ers): Causes and Potential Consequences -- From university-industry interaction to public research brain-drain?
Roman Jurowetzki, Daniel Hain, Juan Mateos-Garcia, Konstantinos Stathoulopoulos
TThe Privatization of AI Research(-ers):Causes and Potential Consequences – From university-industry interaction to public research brain-drain? –
Roman Jurowetzki „ , Daniel S. Hain „ , Juan Mateos-Garcia † , andKonstantinos Stathoulopoulos † „ Aalborg University Business School, DK † Nesta, UK
February 15, 2021
Abstract:
The private sector is playing an increasingly important role in basic Ar-tificial Intelligence (AI) R&D. This phenomenon, which is reflected in the perceptionof a brain drain of researchers from academia to industry, is raising concerns about aprivatisation of AI research which could constrain its societal benefits. We contributeto the evidence base by quantifying transition flows between industry and academia andstudying its drivers and potential consequences. We find a growing net flow of researchersfrom academia to industry, particularly from elite institutions into technology companiessuch as Google, Microsoft and Facebook. Our survival regression analysis reveals thatresearchers working in the field of deep learning as well as those with higher averageimpact are more likely to transition into industry. A di erence-in-di erences analysisof the e ect of switching into industry on a researcher’s influence proxied by citationsindicates that an initial increase in impact declines as researchers spend more time inindustry. This points at a privatisation of AI knowledge compared to a counterfactualwhere those high-impact researchers had remained in academia. Our findings highlightthe importance of strengthening the public AI research sphere in order to ensure thatthe future of this powerful technology is not dominated by private interests. Keywords:
AI, university-industry interaction, researcher careers, private research,bibliometrics Introduction “The regulatory environment fortechnology is often led by the peoplewho control the technology”— Zoubin Ghahramani
In December 2020, renowned AI researcher Timnit Gebru was dismissed from her po-sition as Ethics Co-Lead in Google Brain, Google’s AI research unit (Hao, 2020b). Thereason was a disagreement with senior management about a conference paper whereshe and her co-authors outline the limitations and risks of large language models thathave come to dominate AI research and become an important component of Google’stechnical infrastructure (Bender et al., 2021). More specifically, the paper highlightedgrowing concerns about the fairness of models trained on biased and noisy internetdata, their substantial environmental impacts and their limited ability to understand language as compared to generate plausible-reading text. Gebru’s dismissal created anuproar in the AI research community - As of 1st February 2021, a letter in her supporthas garnered just under 7,000 signatures, including 2,695 Google employees. This controversy illustrates the increasing role that industrial labs are playing inAI research, where they are not only advancing new AI techniques but also studyingtheir ethical challenges and socio-economic impacts. It also underscores the risk thatthese labs may discourage employees from pursuing research agendas that are notaligned with their commercial interests, potentially resulting in the development of AItechnologies that are unfair, unsafe or unsuitable beyond the use-cases of the companiesthat build them. Ultimately, it bolsters the case for increasing AI research capabilitiesin academia and government in order to ensure that public interests can continueplaying an active role in monitoring and shaping the trajectory of powerful AI systems.However, strong industry demand for AI researchers with advanced technical skillsmay create a brain drain from academia into industry that shrinks the pool of talentavailable for public interest AI research. https://googlewalkout.medium.com/standing-with-dr-timnit-gebru-isupporttimnit-believeblackwomen-6dadc300d382
2n this paper, we use bibliographic data to measure this flow of researchers fromacademia to industry, study the factors driving it and consider its potential conse-quences. In doing this we provide, to the best of our knowledge, the first compre-hensive quantitative analysis of AI researcher flows between academia and industry,contributing to the evidence base for science policies aimed at ensuring that AI evolvesfollowing a trajectory that is consistent with the public good. Last decade has witnessed an unprecedented acceleration in the development, dif-fusion and application of methods and technologies from the field of machine learning(ML) and Artificial Intelligence (AI). This has been driven by breakthroughs in the de-velopment of deep learning (DL) algorithms (LeCun et al., 2015) trained on a growingamount of public and private data (Einav and Levin, 2014). Since 2012 in particular,AI has flourished in academia and industry alike (Arthur, 2017), and is considered asa likely candidate to become a general purpose technology (GPT; Trajtenberg, 2018;Goldfarb et al., 2019; Klinger et al., 2018; Bianchini et al., 2020).
One important feature of AI’s modern R&D trajectory is that private companiesnative to the digital economy such as Google and Facebook are playing an increasinglyimportant role in basic research activities that used to be the domain of academia. Forexample, at the 2019 “Neural Information Processing Systems” (NeurIPS) conference,the main annual conference in AI and DL, Google research accounted for 167 of theaccepted full papers (fractionalized by the number of authors), more than twice theamount of the second most represented institution, Stanford University (82 full papers). Gofman and Jin (2019) also studies the AI brain drain but with a specific focus on the transitionsof university professors and the subsequent (negative) impacts that their transition into industryhas on the levels of entrepreneurship in their departments.
3n addition to the growing amount of research output generated, industry is also playinga dominant role in the creation of research tools, platforms, and frameworks. Whilethe first DL frameworks - Theano and Ca e - emerged out of universities, today’smost popular frameworks for deep learning - Tensorflow (GoogleBrain) and PyTorch(Facebook AI) - have been developed by corporate players.This shift in the centre of gravity of AI research from academia to industry is alsoreflected in the career trajectory of researchers. Many star-scientists in the field ofDL have over time moved to full- or part-time industry a liations, for instance Geof-frey Hinton (Google), Yann LeCun (Facebook AI), Ian Goodfellow (Apple via GoogleBrain), Zoubin Ghahramani (Uber AI) or Ruslan Salakhutdinov (Apple). The scale ofmovement of AI researchers from academia to industry has led to concerns about an“AI brain drain” (Sample, 2017; Gofman and Jin, 2019).There are multiple (complementary) potential explanations for the increasing par-ticipation of private sector companies in basic research activities despite the possibilitythey may generate spillovers that benefit their competitors.1. Modern AI methods have to be trained on large datasets and computationalinfrastructures that have already been collected by these companies and aredi cult to transfer to researchers in academia for technical, data protection andprivacy reasons.2. There may be a disconnect between the type of AI research undertaken inacademia and the needs of industry (Arora et al., 2020) as a consequence ofinnovation systems failures (Gustafsson and Autio, 2011) leading private compa-nies to take basic research activities “in their own hands”.3. AI systems are increasingly becoming tightly integrated into the cloud infras-tructure of private sector companies to help them address their own needs aswell as those of third-party customers - the development of such systems maybe easier to undertake in-house. In doing this, companies also seek to establishtheir AI systems as a de facto standard that increases the competitiveness of4omplementary platforms and cloud computing services.4. The opportunity to continue doing basic research and publishing results helpsprivate sector companies attract top talent that is intrinsically attracted to envi-ronments where it is possible to conduct creative, “blue-skies” research and gainacademic esteem.5. Large technology companies with a substantial degree of market power are able tointernalize many of the externalities generated by basic research by, for example,recruiting researchers, developers and engineers who have built up their AI skillsusing open source tools and research results generated in industry, and acquiringstart-ups that sell AI-driven products and services. In general terms, there are several reasons to worry about an encroachment on pub-lic research agendas by the private sector. Increasing participation of private sectororganisations in basic research could lead to a potential homogenisation of public andprivate research spheres as academic researchers respond to financial incentives to com-mercialise their work in a way that limits its spillovers (David, 2003; David and Hall,2006). Further, there is no guarantee that market-led opportunities correspond to so-cial needs (Archibugi and Filippetti, 2018) or that they take into account technology’sexternalities and broader (perhaps longer-term) socioeconomic impacts.If anything, industry-driven and dominated technological development could be ex-pected to favor solutions that can be monetized in the short term, utilize incumbents’accumulated capabilities, resources, infrastructure, and other types of competitive ad-vantages, thus making them less inclusive and posing higher barriers for new entrants(Hain and Jurowetzki, 2017). Ultimately, all this restricts the scope to steer techno-logical development in a way that is aligned with societal goals. As biologist PaulBerg wrote in relation to the Asilomar conference that led to a moratorium on geneticmodification of humans: “the best way to respond to concerns created by emergingknowledge or early-stage technologies is for scientists from publicly funded institutions5o find common cause with the wider public about the best way to regulate - as early aspossible. Once scientists from corporations begin to dominate the research enterprise,it will simply be too late” (Berg, 2008).All these concerns are heightened in the case of AI because of its potentially perva-sive impact. As a strong candidate for one of the near-future’s GPTs, AI technologiesare expected to cause major disruptions across multiple domains, from communica-tion, production, transport to education and health, and more broadly socioeconomicdynamics – for example around public attitudes to privacy, autonomy or the right toan explanation for a decision. As a still emerging technology, AI’s dominant trajectoryis still to be established but there are increasing concerns about certain aspects of theindustry-sponsored DL trajectory that has driven recent advances in the field.Training deep neural networks requires enormous amounts of data and comput-ing power (Marcus, 2018; Russell, 2019), often exclusively available to large industryplayers and costly in terms of energy use and carbon emissions (Strubell et al., 2019).While platforms and frameworks provided by industry (such as Tensorflow or PyTorch)dramatically decrease entry barriers and advance collective progress, the direction ofsearch and e ort along this trajectory reinforces the data and computation hungry DLparadigm. Strong demand for data has led researchers to exploit large online corporathat are increasingly being shown to incorporate a variety of gender and racial biasesthat are subsequently transmitted into the trained models and their outputs (Paulladaet al., 2020). In the field of natural-language-processing (NLP), pretrained languagemodels in need of enormous resources such as “Bidirectional Encoder Representationsfrom Transformers” (BERT, GoogleAI Devlin et al., 2018) have become the de facto standard for research and industry alike, shifting attention and resources away fromother “leaner” techniques - this concern was at the heart of the censored Timnit Gebrupaper mentioned in the introduction. It should be noted that a comparably big and resource intensive model (GPT-2 Radford et al.,2019) has been open-sourced by the nonprofit research lab OpenAI, which aims at counterbalancingcorporate AI with a public-spirited approach o technology development. Interestingly, as the costsof basic AI research have increased, OpenAI has been criticized for becoming more secretive andaggressive in fundraising in order to keep pace with their corporate competitors (see Hao, 2020a). Already today, algo-rithmic bias, has been identified as one such problem, where technologies are clearly inconflict with social values and regulations but lack of technological insight are hinder-ing regulation (Sweeney, 2013; Hajian et al., 2016; Zou and Schiebinger, 2018; Clarkand Hadfield, 2019).
Here, we analyze the causes and discuss potential consequences of this ongoing pri-vatization of AI research, focusing particularly on the transition of AI researchers fromacademia to industry. We start by assessing the scale of the phenomenon by measuringtransition flows between industry and academia, and providing a descriptive accountand exploratory analysis of characteristics of industry transition, research topics, andtemporal dynamics.Having done this, we estimate the importance of various mechanisms that triggerthese university-industry transitions including researcher characteristics, performance In recent years public agencies have launched numerous funding calls and initiatives to support AI.Yet, it remains questionable to which extent research in such a dynamic and competitive domaincan be supported with the volumes of funding currently available from public funders to an extentthat would allow it to compete with private AI labs. researcher-push mechanism. Further, the increasingdemand for data and infrastructure in particular fields of AI research (e.g., DL) result ina technology-push providing incentives for AI researchers to seek an industry a liationin order to get access to necessary resources beyond the capacities most universitieso er (Ahmed and Wahed, 2020). Lastly, industry might indeed attempt to play amore active role in shaping the trajectories of AI research by either recruiting star AIresearchers per se , researchers associated with current key technologies, or researchersin the process of developing potentially disruptive future technologies - we refer to thisas an industry-pull mechanism.To assess the relative importance of these factors we deploy a survival model wherewe estimate the probability that academic AI researchers will transition into indus-try. In doing this, we test the e ect of a range of researcher characteristics related totheir preferences for academia, their topical focus, and academic success. Finally, weattempt to quantify the e ect of university industry transition on researchers produc-tivity. To do this, we match industry transitions with similar peers that remained inacademia. Here, we leverage insights on the mechanism triggering academia-industrytransitions identified in the previous step. In a di erence-in-di erence analysis weinvestigate the impact of transitions on researcher outputs proxied through citations. We collect data from Microsoft Academic Graph (MAG), a scientometric databasewith more than 232 million academic documents (Wang et al., 2019). We leverage We note that lack of data about potential drivers of researcher career decisions such as salarydi erentials between academia and industry makes it di cult for us to distinguish, in practice,between the researcher and technology push mechanisms we mentioned above. machine learning , deep learning and reinforcement learning .We bound the timeframe of our analysis between 2000 and 2020 and retrieve theacademic publications containing at least one of the queried FoS. In total, we collect786,118 AI research papers alongside their metadata such as citation count, publica-tion year and venue, title and abstract, fields of study, author names and a liations.These papers include peer-reviewed academic journal publications, conference proceed-ings and preprint cllections such as arXiv, which are a popular medium of knowledgedissemination in ML and AI research. We find that 1,165,913 scholars have developedor used AI methods in their research which has been published in 10,653 journals andpresented at 3,150 conferences. We believe that this is an implausibly high number:only 294,000 authors have more than one publication in the data, suggesting potentialquality issues. For this reason, the bulk of the analysis we present below focuses onresearchers whose activity is observable over five years, a restricted and more relevantsample for the analysis of career transitions.To investigate this paper’s main research question, we construct the a liation historyof all researchers to be found as (co-) authors of the AI papers that we have identified.We leverage a liation information to be found on the papers and identify 10,381 uniqueinstitutional a liations allowing us to construct the a liation history for all authors.Having done this, we infer the type of an a liation (industry or non-industry) usingan expansive list of terms related to academic institutions and governmental agencies,finding that 80.73% are non-industry a liations. We use the resulting variable toidentify academia-industry transitions. This is complemented, in our exploratory data analysis, with an alternative strategy where wematch researcher a liations with the Global Research Identifier (GRID) database using the methoddescribed in Klinger et al. (2018), providing information about the character of an organisation (inparticular, whether it is a private company or an educational institution). .2 Analytical strategy To investigate the phenomenon of university-industry transition in AI research, westructure our analysis in three steps. First, we perform a basic exploratory dataanalysis to determine the magnitude, characteristics, pattern, and trends of academia-industry transitions.Second, we aim at identifying the drivers of university-industry transition. Here,we assume the transition of academics to industry do not happen at random, butare instead subject to self-selection by the researchers (research-push), technology andresource requirements of particular technologies (technology-push), and external selec-tion by potential employers (industry-pull). Using the a liation history of all deeplearning researchers which either remain in academia or at one point transit to industry,we perform a survival analysis (Cox proportional hazard model) where we model theprobability of a researcher undergoing an university-industry transition in a particularyear as a function of researcher characteristics, their research interactions and overallpre-transition academic performance as potential candidates for transition drivers.Third, we perform a regression analysis of the consequences of university-industrytransition in terms of research performance. To address the assumed endogenous se-lection of researchers that transit to industry (ca. 10%), we apply the following strat-egy to mimic a (quasi-) experiential setting. For every researcher that undergoes anuniversity-industry transition, we perform a propensity-score matching (PSM) proce-dure to find their most similar counterpart among peers which remained in academiathroughout their observed career. We then for every academia-remaineder createan “artificial transition” point, which we define to happen after the same number ofperiods observed as the actual transition of their academia-industry matched peer.By doing so, we aim at constructing an empirical setting that allows us to tacklethe question: “What would have happened to the researcher if she had remained inacademia?”. Using this matched sample, we perform a di erence-in-di erence regres- We match researchers on their main field of study, mean number of annual publications and receivedcitations, and gender. We also enforce that matched researchers need to have the exactly samenumber of periods observed in our sample. ect on citations of university-industry transitionsof researchers which undergo this transition with peers that remained in academia. In the following, we describe the construction of and rationale behind the vari-ables used in our survival (transition drivers) and di erence-in-di erence (impact ofuniversity-industry transition) models (see table Table 1 for a summary). To addressremaining endogeneity concerns, all independent and control variables are lagged byone year. Table 1: Variable Description Variable Model Description Dependent Variablestransition Surv. Dummy indicating the year of academia-industry transition.citation rank
DiD Percentage rank of researcher’s received citations in the corresponding year.Independent VariablesDeepLearning Surv. Dummy variable for researcher’s publication of min. 1 deep learning paper in corresponding year.cent dgr
Surv. Researcher’s degree centrality in overall co-publication network.cent dgr ≠ ind Surv. Researchers degree centrality in industry co-publication network.switcher DiD Dummy variable indicating researcher to at one point undergo a university-industry transition.transited DiD Dummy variable indicating the researcher has undergone a university-industry transition.transited t DiD Number of years since researcher’s university-industry transition.Control Variablesseniority Surv., DiD Years since first observed publication.gender Surv., DiD Dummy variable for researcher’s gender (fermale = 0, male = 1)paper n Surv., DiD Number of researcher’s publications in corresponding year.cit cumln
Surv. Cumulative number of researchers citations (natural logarithm).StudyField Surv., DiD Categorical control for most popular field of study in the researcher’s publications.Year Surv., DiD Categorical control for the corresponding year.
Dependent Variables
The dependent variable in the survival analysis (transition drivers) is a dichoto-mous indicator which takes the value of zero in the years a researcher has been a liatedwith academia in the previous year and continues to do so in this year, and takes thevalue of one in the year the researcher’s first changes to a corporate a liation. Tomeasure this, we use the a liation information found in the researcher’s published pa-pers in the corresponding year. In order to avoid being biased by short term a liations(eg. project based co-a liation, internship, visiting researcher programs, random er-rors when extracting institutional information from paper metadata), we compute the11 liation of researchers on an annual basis, and assign it to the institution found onmost papers published by the researcher in the corresponding year.In case of a draw,we prioritize a liations in the order they are mentioned on the publication.This allows us to identify three distinct research-career profiles over time: (i.)academia-only, (ii.) industry-only, and (iii.) university-industry transitions. We de-fine the latter as researchers which started their career in academia, but at one pointbecome mainly associated with industry for at least one consecutive years. We do notfurther di erentiate between additional career paths, for instance “academia returnees”or “serial switchers”. To derive meaningful information regarding the researchers’ ca-reer paths, we also exclude researchers that could not be observed in the MAG datafor at least five years. Furthermore, due to the timeliness of the phenomenon underresearch, we exclude researchers which have the last time been observed before 2015.When analysing the e ect of university-industry transitions on researcher’s careerin a di erence-in-di erence regression, we use the percentage-rank of the researcher’sreceived citations in the corresponding year (cit rank ) as dependent variable to approx-imate research performance and impact. Here, zero corresponds the researcher withthe lowest and one to researcher with the highest citation rank in the correspondingyear. Independent Variables
We construct additional independent variables in the following way:
DeepLearning:
A dummy variable indicating that the researcher published at leastone paper in the corresponding year which includes the MAG field-of-research tagfor either “Deep Learning” or one of the most related tags. Since deep learningrepresents a field of research where access to large amounts of data and computingpower gives researchers an important competitive advantages, we expect deep learningresearchers to be more likely to undergo a university-industry transition in their career In this case, we include the field tags that most often co-occur together with “Deep Learning” inour corpus. These are Recurrent neural network, Time delay neural network, Types of artificialneural networks, Deep neural networks, Autoencoder, Deep belief network cent dgr : The authors degree-centrality in the co-publication network of papers pub-lished in the corresponding year. Edges are weighted by the number of researchersper paper, so that an increasing number of authors on a paper leads to a decreasingedge-weight attributed to that paper. This variable approximates the researcher’scurrent embeddedness within the research community. We expect researchers that aremore embedded in the community to be better networked and influential and thereforeattractive for industry recruiters (industry-pull). cent dgr ≠ ind : The authors degree-centrality in the co-publication network of paperspublished in the corresponding year, where only edges to researchers with a currentindustry-a liation are included. This variable approximates the researcher’s proximityto industry actors. We expect researchers that are already collaborating actively withindustry to be more likely to transition into industry. paper n : The number of papers (co-) authored by the researcher in the correspondingyear fractionalized by the number of authors, approximating the quantity of researchoutput. cit cumln : Accumulated number of citations to the researcher’s current and historicalpublications. Assuming cumulative citations to have a decreasing marginal e ect, wetransform this variable’s value by its’ natural logarithm.For the di erence-in-di erence analysis, we create two additional independent vari-ables, namely: Switcher:
A dummy variable indicating the researcher at one point in time duringtheir observable career transits from academia to industry. Here we follow Newman (2004) in assuming that a larger number of authors will lead to decreasedinteraction and general bonding between the authors. ransited: A dummy variable which takes the value of zero for researchers that havenot undergone an university-industry throughout their observable carer up to the cor-responding year. transited t : The number of years passed since a researcher has undergone the university-industry transition, zero for researchers (yet) in academia.
Control Variables
We approximate
Seniority by the number of years since we observe a researcher’sfirst publication in the data. We also control for the researcher’s
Gender , which weinfer automatically from their name using GenderAPI (Stathoulopoulos and Mateos-Garcia, 2019). This dummy variable takes the value of one for researchers who areinferred to be male. We also include categorical controls for the researchers mainMAG field-of-research (Shen et al., 2018), where we assign the MAG field which ismost often found within the categories of her publications in the corresponding year.Finally, to cover time-dependent exogeneous e ects, we also for the current year. Table 2 provides descriptive statistics and Table 3 the corresponding correlationmatrix on our full dataset.
We begin our exploration of the data by considering changes in levels of overallactivity (Figure 1), company participation in research (Figure 2) and thematic focusof di erent organisation types (Figure 3). Note that due to the our sample only including publications from earliest 2000, this variable isleft-censored by our starting point. n rank cumn dgr dgr ≠ ind ú (3) seniority -0.08 ú -0.02 ú (4) gender -0.01 0.01 0.03 ú (5) DeepLearning -0.01 -0.06 ú ú ú (6) paper n -0.06 ú ú ú ú ú (7) cit rank -0.05 ú -0.01 0.10 ú ú ú ú (6) cit cumn -0.02 ú -0.03 ú ú ú ú ú ú (9) cent dgr -0.04 ú -0.01 ú ú ú ú ú ú ú (10) cent dgr ≠ ind -0.04 ú ú ú ú ú ú ú ú ú p < liation have started capturing a largershare of research since the 2010s. This is consistent with the idea that private sectororganisations are playing a stronger role in AI research although, at least in overallvolume of activity they are very far from dominant.In Figure 3 we look at the share of all papers involving an educational institution ora company in a year that contain a field of study (focusing on the 20 most frequentlyoccurring fields of study in the data). We note in particular that deep learning wasover-represented in private sector research by comparison to academia but educationalinstitutions seem to have caught up in recent years. Companies are also more active inreinforcement learning and, more broadly, computer science topics - this could also belinked to the finding elsewhere in the literature that private sector companies specialise16igure 2: Organisational participation in absolute terms (top panel) and share of pa-pers with company participation (bottom panel)in more scalable and computationally demanding techniques than academic researchers(Klinger et al., 2020; Ahmed and Wahed, 2020), consistent with one of our hypothesisthat the private sector may be a more suitable setting to pursue research in deeplearning methods. 17 i g u r e : Sh a r e o f a ll p a p e r s i n v o l v i n g c o m p a n i e s ( o r a n g e p o i n t) a nd e du c a t i o n a li n s t i t u t i o n s ( b l u e p o i n t)t h a t h a v e b ee n a ss i g n e d a fi e l d o f s t ud y e a c h y e a r . .2 Transition trends We move on to analyse the dynamics of researcher transitions, going from a macropicture that considers all transition types in the data (Figure 4) to focus on researcherflows between university and industry (Figure 5) distinguishing between academic in-stitutions in di erent positions of Nature’s global university rankings (Figure 6) andfinally considering the main educational sources and industrial destinations of AI re-search talent ( Figure 7).Figure 4 shows the changes in the composition of transitions by transition type in to-tal (top panel) and as share of all transitions (bottom panel). It shows rapid growth alltypes of researcher transitions in the AI ecosystem while underscoring that researchermobility between academic institutions remains the dominant type of transition, re-flecting the prevalence of educational institutions, at least when measured based onbibliometric data.Figure 4: Number of researcher transition by types (top panel) and transition types asa share of the total (bottom panel).19n Figure 5 we concentrate on researcher transitions between educational institutionsand industry taking into account that flows can go in either direction. Our analysisshows that in net terms, researcher flows favour industry (consistent with the hypoth-esis of a ‘brain drain’) from academia to industry but also that there is a non-trivialnumber of industry researchers transitioning into academia. One potential explanationfor this which would be worth exploring is that having moved into industry, academicresearchers do not enjoy the environment and decide to return to the public sector.Figure 5: Researcher transitions between education and industry (blue area) and in-dustry and education (orange area). Net flow in black bars.When looking at labour flows between academia and industry it is important to takeinto account the prestige of the organisations involved, which could be seen as a roughproxy for the ‘quality’ of the researchers involved. To do this, we have fuzzy-matchedinstitution names from Microsoft Academic Graph with the 2020 Nature Index, whichranks institutions based on the quality of their research in the Natural Sciences. InFigure 6 we present the share of transitions from institutions in di erent positions ofthe ranking into industry (the Nature Index only includes 500 institutions so thosenot included in it are labelled as ‘unranked’. The chart shows a clear and strong linkbetween a university’s prestige and its propensity to experience a flow of researchersinto industry. In particular, 25% of the AI researcher transitions from institutions in Figure 6: Share of all transitions from education to industry by year and position ofuniversity in Nature University ranking.Figure 7 drills down further to consider what are the top educational sources of talentmoving into industry and what are the top industrial destinations for graduates fromthose institutions. It shows that the top academic sources of talent in the verticalaxis are prestigious institutions such as Carnegie Mellon, Stanford, Princeton, MITetc. The top destinations for AI talent (in the horizontal axis) are tech companies,and particularly Google. We note the rapid growth in the share of all AI researchertransitions from source institutions into Google between an early period (before 2015)and a late period (after 2015) - in many cases Google accounts for more than 10% of allresearcher transitions into industry from top institutions. We also see that Facebookhas rapidly growth in importance as a destination for AI researcher talent since 2015. We note that this strategy could be detrimental for socio-demographic diversity in the AI industrialresearch workforce, for example because it excludes graduates from historically black colleges anduniversities. .3 Characterisation of academic researchers transitioning into industry
Our strategy to define career transitions and measure career transitions betweenacademia and industry yields a set of summary statistics that we present in Table 4.Table 4: Number and characteristics of author typesautho type n share paper n,mean cit mean genderacademia 54113 0.89 0.96 1.33 0.82industry 1837 0.03 0.79 2.78 0.85switcher 4751 0.08 1.18 4.23 0.86Table 4 reports counts and mean values for characteristics and publication perfor-mance for the di erent researcher groups. Overall our sample contains 60.701 uniqueAI researchers with approximately 90 percent who spent their observable career upto now solely in academia, 3 percent in industry, and 8 percent transitioning fromacademia to industry. As for the research productivity, we observe that AI researchersin industry are least productive with regard to numbers of papers produced per year,which is perhaps not surprising given that output in industry is measured di erentlythan in academia. However, at least by the academic yardstick of knowledge dissem-ination we see that industry researchers lag behind. At the same time, their impactin terms of received citations per paper is on average double that of academic re-searchers, suggesting that industry researchers might participate more selectively inthe documentation of their research in the form of p paper, and only do so if theydeem the impact to justify the e ort. Finally, transitioning AI scholars show thehighest publication totals and citation averages. This could indicate “cherry picking”by the private sector of either already established or currently rising star researchers.As for the gender of the scholars, we can see that the field is mainly populated by men.Diversity is even lower in industry, and particularly inside the “switcher” group. Additionally, internal peer review processes such as those controversially deployed by Google maycreate additional filters to publication in the private sector Econometric analysis
In this section we investigate the drivers and mechanisms of university-industrytransition in AI research. We do so by performing a survival analysis on the thelikelihood that an academic AI researcher will transit into industry at a particularpoint in time. Generally, survival analysis refers to a set of statistical techniquesto investigate the time it takes for an event of interest to occur. Here we deploy aproportional hazard model (Cox, 1972), a multivariate regression technique allowingus to identify the simultaneous e ect of continuous as well as categorical variables onthe probability of a certain event (in this case, the transition to industry) to take place.The results of this model are to be found in Table 5. Panel (1) only includes thecontrol variables, panel (2) additionally includes the deep learning dummy, panel (3)adds the network-related independent variables, panel (4) the research-performancerelated independent variables, and finally model (5) includes all variables together.In model (1) including only our basic control variables, we observe a strong nega-tive and significant e ect for seniority , indicating that industry transitions appear tohappen sooner rather than later in research careers. This could be interpreted thateither researchers with a taste-for-industry already set themselves up for a early post-graduation transition, or that industry generally prefers promising young over alreadyestablished researchers. The coe cient for gender is positive and significant on the 1%level, indicating female researchers, which are already underrepresented in AI research,are less likely to transit to a career in industry. This e ect remains persistent for allfollowing models.The DeepLearning variable included in model (2) has a relatively high positivecoe cient, significant on the 1% level. This is in line with our initial expectationsthat the characteristics of this particular research field make a transition to industrymore attractive (research- and technology-push), as well as the earlier observation ofthe strong engagement of companies with deep learning.24able 5: Cox Proportional Hazard Regression: Probability of university-industry tran-sition Dependent variable:
Industry Transition(1) (2) (3) (4) (5)seniority ≠ úúú ≠ úúú ≠ úúú ≠ úúú ≠ úúú (5.113) (5.099) (5.068) (5.100) (5.113)gender 0.230 úúú úúú úúú úúú úúú (0.018) (0.018) (0.018) (0.018) (0.018)DeepLearning 0.568 úúú úúú (0.017) (0.017)cent dgr úúú ≠ dgr ≠ ind úúú ≠ úúú (0.002) (0.003)paper n ≠ úúú ≠ úúú (0.003) (0.003)cit rank úúú úúú (0.020) (0.020)cit cumln úúú úúú (0.004) (0.004)Study Field Control Yes Yes Yes Yes YesYear Control Yes Yes Yes Yes YesN 479,093 479,093 479,093 479,093 479,093Pseudo R úúú úúú úúú úúú úúú LR Test 91,412 úúú úúú úúú úúú úúú
Score (Logrank) Test 36,988 úúú úúú úúú úúú úúú
Note: ú p < úú p < úúú p < cient,thereby preliminarily lending support to our initial expectations.Model (4) shows the impact of research performance related variables, measuringcurrent quantity ( pater n ), quality ( cit rank ) of research output as well as accumulatedreputation ( cit cumln ). Our results indicate the average citation rank as well as cumulativecitation numbers to increase the probability of transition, while the number of paperspublished decreases it. This may be an indication for industry to favour quality overquantity in terms of research output of transitioning scholars and thus provide furthersupport for the “cherry-picking” hypothesis.Finally, when including all variables jointly in model (5), most observed e ects re-main roughly unchanged. The only exception are the results regarding the embedded-ness in the AI research community, where the formerly significant and positive overallembededness turns insignificant. The variable measuring industry-embededness re-mains significant yet changes the coe cient’s direction from positive to negative. Thismight indicate that the positive impact we have formerly seen in model (4) might havebeen driven by the variable’s correlation with the quantity of papers (more co-authoredpapers result in higher centrality). When controlling for the number of papers, it turnsout that–against initial expectations–industry embededness in research makes a transi-tion to industry less likely. This might hint at an industry preference to hire researchersengaged in more fundamental and basic rather than applied research.26 .2 Consequences of switching - Di erence-in-Di erence analysis Finally, we investigate the consequences of university industry transition in termsof research performance. Table 6 reports the results of a regression analysis, where weinvestigate the e ect of university-industry transitions on research productivity, whichwe approximate by a researcher’s annual citation rank. We perform this analysisin a di erence-in-di erence setting, where we compare the development of scientificperformance of researchers which undergo a university-industry transition (treated)with their counterparts who stay in academia. The dependent variable here is theresearcher’s annual citation rank ( citation rank ) as a three year moving average.Due to self-selection into an industry career, switchers are expected to be system-atically di erent from their peers remaining in academia. We address this issue byperforming a di erence-in-di erence analysis containing the following steps. First, weperform a nearest neighbor matching, where we match every researcher in the sam-ple which at one point transits to industry with a peer which is only observed withacademic a liations. We match these pairs on their field of study, gender, mean num-ber of papers published and citations received per year. We additionally require thematched pair to be observable for the exactly same number of periods.Having done so, we attempt to empirically transform this observational study intoa quasi-experimental econometric setting. In a di erence-in-di erence analysis, oneusually matches an observations subject to an intervention (treatment) with a similarone which did not experience this intervention. However, since our sample is not strat-ified and subject to left and right censoring, and furthermore the university-industrytransition happens at di erent points in time and at di erent stages of their career foreach researcher, we cannot define one intervention point across the sample. Rather,we create a ‘pseudo-treatment’ time for every researcher remaining in academia whichis equal to the observation period in which their matched university-industry switcher transits (variable transited ). We furthermore create a variable indicating the yearssince this transit takes place ( transited t ). Beyond this, the models include a similarselection of independent and control variables as the ones above.27able 6: Di erence-in-Di erence Regression: E ect of university-industry transition Dependent variable: citation rank (1) (2) (3) (4)switcher ≠ úúú ≠ úúú ≠ úúú ≠ úúú (0.002) (0.002) (0.002) (0.002)transited 0.029 úúú úúú ≠ úúú (0.003) (0.003) (0.004) (0.003)seniority ≠ úú ≠ úúú ≠ úúú ≠ úúú (0.0003) (0.0003) (0.0003) (0.0003)gender 0.015 úúú úúú úúú úúú (0.003) (0.003) (0.003) (0.003)cent dgr úúú úúú (0.001) (0.001)cent dgr ≠ ind úúú úúú (0.001) (0.001)transited t úúú úúú (0.001) (0.001)switcher ú transited 0.050 úúú úúú úúú úúú (0.004) (0.003) (0.005) (0.005)switcher*transited t ≠ úúú ≠ úúú (0.001) (0.001)Study Field Control Yes Yes Yes YesYear Control Yes Yes Yes YesN 83,364 83,364 83,364 83,364 R ¯ R Note: ú p < úú p < úúú p < switcher ) and the periods after the transitionhas taken place ( transited ). In the next panel (2) we include further controls forthe researchers overall ( cent dgr ) and industry ( cent dgr ≠ ind ) centrality within the co-citation network. In the following panel (3), we turn our attention to the e ect ofuniversity-industry transitions by including the number of periods since the researcherhas transited to industry ( transited t ) as well as interaction between switcher and thevariables indicating the post transition period. This enables us to identify di erencesin citation rank between researchers after their transition has taken place, as comparedto peers remaining in academia with otherwise similar characteristics, and therebyisolate the e ect of university-industry transitions on research performance. The finalpanel (4) includes all variables jointly.While we control for a set of variables also included in the previous survival anal-ysis, this di erence-in-di erence analysis also includes the interaction terms between switcher , transited and additional variables, since they reveal the impact of a “real”industry transition as compared to the artificial “pseudo” transition of their matchedpeers that remain in academia.In model (1), we only include the switcher ú transited interaction term which turnsout to be significant with a positive coe cient, revealing that the industry transitionindeed appears to be conductive to research performance, placing them post-transition5 percent higher in the citation ranking than their academia counterparts. Additionalcontrolling for embeddedness e ects in model (2) does not alter the results.In model (3), we introduce an additional interaction term switcher ú transited t whichcaptures the time e ects of industry transition. A positive and significant coe cientindicates that after the initial boost in citation ranking researchers experience aftertheir transition, there is no continued beneficial e ect in the long term. Instead, posttransition researchers over time loose 0.7% in their citation ranking per year comparedto their academia counterparts, as illustrated in Figure 8. However, the comparably29igure 8: Interaction plot: Model (3), switcher ú transited t . Note this graph onlydepicts the over-time e ect and not the constant e ect of the switcher ú transited interaction term.small coe cient indicates this to happen slowly over time, taking approximately tenyears for a switcher to - after the initial boost - fall back again to the same level of theiracademia remaining counterpart.Again, controlling for embededness-e ects in model(4) leaves the main results unchanged. Studying career paths of AI researchers, we shed light on the interplay betweenacademic and corporate research in this field and provide evidence about a potentialbrain-drain from the public sector together with its drivers and outcomes. Our primaryaim is to inform science policy discussions around the development and application ofAI technologies and the supply of talent required to preserve a public research space forAI focused on the creation of AI systems independently from short-term commercialinterests and in a way that is aware of ethical risks and externalities.We show thatincreasing participation of the private sector in AI research has been accompanied bya growing flow of researchers from academia into industry, and specially into technologycompanies such as Google, Microsoft and Facebook.30he survival analysis shows that researchers working with deep learning techniquesthat have driven recent advances in AI systems have a much higher likelihood totransition to industry, consistent with the idea that the private sector has been buildingcapabilities in state-of-the-art AI systems and raising questions about the ability of‘public interest’ deep learning research to keep up, specially since industry tends torecruit influential, high impact researchers.Scholars producing lower numbers of papers with higher impact – which we interpretas prioritising quality rather than quantity – are also more likely to transition. Onequestion for further research is to determine to which extent some of this indicatorsare linked to di erences in publication strategies across subfields of AI.Interestingly, stronger embedding of researchers in the overall and industry specificresearch community seems to marginally reduce the transition probability when paperimpact is accounted for.Together, these results suggest that supply push and demand pull both play a rolein researcher transitions from academia to industry: researchers who specialise in deeplearning techniques may have incentives to pursue their careers in technology com-panies with the data and infrastructure required to deploy these methods, and busi-nesses have incentives to hire them because those techniques complement their assetsand business models. The private sector’s propensity to hire high impact researcherssuggests that there is an element of cherry-picking of researchers by industry which,as mentioned, could raise concerns about a hollowing-out of the talent pool for publicinterest AI research.On average researchers working in industry receive twice the amount of citationsas compared to scholars in academia, while publishing less. In an industry with arather short distance from research to deployment in products and services, it is notsurprising that industry players are better at selecting and funding promising research,and promoting its relevance. However, looking at the results from the di -in-di analysis we also see some indication of stagnation in the academic impact of researcherswho transition to industry. This presents some similarities with the outcomes of start-31ps that are acquired and absorbed by large companies that may be more interestedin implementation and exploitation of existing technology rather than exploration ofentirely new trajectories. Looking into the recent developments in NLP one couldargue that this is not the case – the majority of breakthrough developments (i.e. largescale models) came out of industrial labs in the recent years. On the other hand – andthat brings us to the story that we mentioned in the introduction – it might be arguedthat these models are in line with interest of large companies while leaner approachesin state-of-the-art language processing remain unexplored. Would the situation bedi erent if researchers who transitioned into industry had stayed in academia? Thatkind of counterfactual analysis is challenging. Existing research suggests that thereare important di erences between the research portfolios of academia and industry butthey do not consider how these di erences are shaped by researcher career transitions(Klinger et al., 2020). One potential avenue to understand this would be to comparethe ‘research trajectory’ of individual researchers for example estimated through asemantic analysis of their paper before and after joining industry, and compared withtheir peers who remain in academia.Overall, our results based on a comprehensive analysis of bibliographic data supportthe idea of a growing flow of talent from academia to industry which may requireattention from policymakers.Future work should look into further e ects of researcher transitions into industry,examining for instance potential thematic change, diversity of themes as well as co-authors. It is also important to explain the reasons for the gradual decline in academicimpact for those researchers who transition into industry: is this a consequence ofcorporate policies that lead researchers to concentrate on specialised technologicaldevelopment activities that are less relevant for the outside community (in line withthe model proposed by Rock (2019)) and along the lines of our startup harvestinganalogy, or is it that over time, researchers experience ‘industry burn-out’, becomingless productive. Ultimately, and in order to answer the ‘so what’ question, we needto find ways to measure the impact of career transitions from academia into industry32eyond our scholarly productivity proxies: in what ways are public interest AI researchtrajectories diminished when AI researchers transition into industry, and what is theopportunity cost of subsequent declines in the research productivity of switchers.We conclude by pointing out that while strong contributions to research from privatecompanies are commendable, it is vital to understand where complementary public in-vestments in R&D can contribute to favourable long-term outcomes. More specificallyit is important to make sure that public research organization remain an attractiveworkplace for talented AI researchers who may otherwise be attracted by lucrativepositions in industry that also o er, at least in the short term, the prospect of en-hanced academic impact. This requires investments in equipment, research fundingas well as well coordinated frameworks that allow these scholars to contribute to thedevelopment of this technology and promote their contributions in the same way thatmarketing departments in technology companies do with their own research outputs.We know that AI is a strong contender for being a general purpose technologyand therefore much of the development and application requires coordination betweendi erent stakeholders across disciplines. In practise that means that is unlikely - andperhaps undesirable from an e ciency standpoint - that public research institutes tryto replicate open source frameworks and cloud computing infrastructures developed inindustry. Researchers in the public sector, however, have an important role to playin studying a variety of questions related to the societal suitability and impacts of AIsystems - for example around fairness, security and accessibility - as well as exploringnew ideas that may provide the foundation for future AI research trajectories that areless reliant on big datasets and computational infrastructures and more environmentalsustainable, explainable and robust. A burgeoning public interest sphere conductingthis research without having to balance academic integrity with commercial interestsis, as the Timnit Gebru case with which we began this paper, a critical requirementfor this space, and one that may be threatened by the sustained flow of researchersfrom academia to industry that we have evidenced in this paper.33 eferences Acemoglu, D. and Restrepo, P. (2019). The wrong kind of ai? artificial intelligenceand the future of labor demand. Technical report, National Bureau of EconomicResearch.Ahmed, N. and Wahed, M. (2020). The de-democratization of ai: Deep learning andthe compute divide in artificial intelligence research.Archibugi, D. and Filippetti, A. (2018). The retreat of public research and its adverseconsequences on innovation.
Technological Forecasting and Social Change , 127:97–111.Arora, A., Belenzon, S., Patacconi, A., and Suh, J. (2020). The changing structureof american innovation: Some cautionary remarks for economic growth.
InnovationPolicy and the Economy , 20(1):39–93.Arthur, B. (2017). Where is technology taking the economy?
McKinsey Quarterly ,(October):1–12.Bender, E., Gebru, T., McMillan-Major, A., and Schmitchell, S. (2021). On the dangersof stochastic parrots: Can language models be too big?Berg, P. (2008). Asilomar 1975: Dna modification secured.
Nature , 455(7211):290–291.Bianchini, S., Müller, M., and Pelletier, P. (2020). Deep learning in science. arXivpreprint arXiv:2009.01575 .Clark, J. and Hadfield, G. K. (2019). Regulatory markets for ai safety. arXiv preprintarXiv:2001.00078 .Cox, D. R. (1972). Regression models and life-tables.
Journal of the Royal StatisticalSociety: Series B (Methodological) , 34(2):187–202.D’Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen,C., Deaton, J., Eisenstein, J., Ho man, M. D., et al. (2020). Underspecifica-tion presents challenges for credibility in modern machine learning. arXiv preprintarXiv:2011.03395 .David, P. A. (2003). 8 innovation and europe’s academic institutions–second thoughtsabout embracing the bayh–dole regime. This page intentionally left blank , page 251.David, P. A. and Hall, B. H. (2006). Property and the pursuit of knowledge: Ipr issuesa ecting scientific research. Research Policy , 35(6).Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-trainingof deep bidirectional transformers for language understanding. arXiv preprintarXiv:1810.04805 .Einav, L. and Levin, J. (2014). Economics in the age of big data.
Science ,346(6210):1243089. 34ofman, M. and Jin, Z. (2019). Artificial intelligence, human capital, and innovation.
Human Capital, and Innovation (August 20, 2019) .Goldfarb, A., Gans, J., and Agrawal, A. (2019).
The Economics of Artificial Intelli-gence: An Agenda . University of Chicago Press.Gustafsson, R. and Autio, E. (2011). A failure trichotomy in knowledge explorationand exploitation.
Research Policy , 40(6):819–831.Hain, D. S. and Jurowetzki, R. (2017). Incremental by design? on the role of incum-bents in technology niches. In
Foundations of Economic Change , pages 299–332.Springer.Hajian, S., Bonchi, F., and Castillo, C. (2016). Algorithmic bias: From discriminationdiscovery to fairness-aware data mining. In
Proceedings of the 22Nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining , KDD ’16, pages2125–2126, New York, NY, USA. ACM.Hao, K. (2020a). The messy, secretive reality behind openai’s bid to save the world.
MIT Technology Review , March/April.Hao, K. (2020b). “I started crying”: Inside Timnit Gebru’s last days at Google.Klinger, J., Mateos-Garcia, J., and Stathoulopoulos, K. (2018). Deep learning, deepchange? mapping the development of the artificial intelligence general purpose tech-nology. arXiv preprint arXiv:1808.06355 .Klinger, J., Mateos-Garcia, J. C., and Stathoulopoulos, K. (2020). A narrowing of airesearch?
SSRN Electronic Journal .LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning.
Nature , 521(7553):436.Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprintarXiv:1801.00631 .Newman, M. (2004). Who is the best connected scientist? a study of scientific coau-thorship networks.
Complex networks , pages 337–370.Paullada, A., Raji, I. D., Bender, E. M., Denton, E., and Hanna, A. (2020). Dataand its (dis)contents: A survey of dataset development and use in machine learningresearch.Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019).Language models are unsupervised multitask learners.
OpenAI Blog , 1(8):9.Roach, M. and Sauermann, H. (2010). A taste for science? phd scientists’ academicorientation and self-selection into research careers in industry.
Research policy ,39(3):422–434.Rock, D. (2019). Engineering value: The returns to technological talent and invest-ments in artificial intelligence.
Available at SSRN 3427412 .35ussell, S. (2019).
Human compatible: Artificial intelligence and the problem of control .Penguin.Sample, I. (2017). ’We can’t compete’: why universities are losing their best AIscientists.Shen, Z., Ma, H., and Wang, K. (2018). A web-scale system for scientific knowledgeexploration. arXiv preprint arXiv:1805.12216 .Stathoulopoulos, K. and Mateos-Garcia, J. C. (2019). Gender diversity in ai research.
Available at SSRN 3428240 .Strubell, E., Ganesh, A., and McCallum, A. (2019). Energy and policy considerationsfor deep learning in nlp. arXiv preprint arXiv:1906.02243 .Sweeney, L. (2013). Discrimination in online ad delivery.
Queue , 11(3):10:10–10:29.Trajtenberg, M. (2018). Ai as the next gpt: a political-economy perspective. Technicalreport, National Bureau of Economic Research.Wang, K., Shen, Z., Huang, C.-Y., Wu, C.-H., Eide, D., Dong, Y., Qian, J., Kanakia,A., Chen, A., and Rogahn, R. (2019). A review of microsoft academic services forscience of science studies.