[PDF] An analysis of the evolution of science-technology linkage in biomedicine

Abstract

Demonstrating the practical value of public research has been an important subject in science policy. Here we present a detailed study on the evolution of the citation linkage between life science related patents and biomedical research over a 37-year period. Our analysis relies on a newly-created dataset that systematically links millions of non-patent references to biomedical papers. We find a large disparity in the volume of science linkage among technology sectors, with biotechnology and drug patents dominating it. The linkage has been growing exponentially over a long period of time, doubling every 2.9 years. The U.S. has been the largest producer of cited science for years, receiving nearly half of the citations. More than half of citations goes to universities. We use a new paper-level indicator to quantify to what extent a paper is basic research or clinical medicine. We find that the cited papers are likely to be basic research, yet a significant portion of papers cited in patents that are related to FDA-approved drugs are clinical research. The U.S. National Institute of Health continues to be an important funder of cited science. For the majority of companies, more than half of citations in their patents are authored by public research. Taken together, these results indicate a continuous linkage of public science to private sector inventions.

Full PDF

AAn analysis of the evolution of science-technology linkage in biomedicine

Qing Ke ∗ Northeastern University, Boston, MA 02115, USA

Demonstrating the practical value of public research has been an important subject in sciencepolicy. Here we present a detailed study on the evolution of the citation linkage between life sciencerelated patents and biomedical research over a 37-year period. Our analysis relies on a newly-createddataset that systematically links millions of non-patent references to biomedical papers. We ﬁnda large disparity in the volume of science linkage among technology sectors, with biotechnologyand drug patents dominating it. The linkage has been growing exponentially over a long period oftime, doubling every 2.9 years. The U.S. has been the largest producer of cited science for years,receiving nearly half of the citations. More than half of citations goes to universities. We use a newpaper-level indicator to quantify to what extent a paper is basic research or clinical medicine. Weﬁnd that the cited papers are likely to be basic research, yet a signiﬁcant portion of papers cited inpatents that are related to FDA-approved drugs are clinical research. The U.S. National Instituteof Health continues to be an important funder of cited science. For the majority of companies, morethan half of citations in their patents are authored by public research. Taken together, these resultsindicate a continuous linkage of public science to private sector inventions.

Keywords: patent-to-paper citation; non-patent reference; science-technology interaction; biomedical re-search; public science

I. INTRODUCTION

There is a longstanding policy interest in unravelinghow knowledge generated from public research is used inthe private-sector. Studies towards this goal have heavilyfocused on patent data and considered citations betweenpatents as evidence of knowledge ﬂow. Despite some crit-icism [17], such notion has been widely accepted in theliterature. Consequently, substantial attention has beenpaid to patents assigned to universities and other pub-lic organizations, examining how those patents are citedby other patents, especially by patents from companies[26, 31].University patents, however, only account for a smallportion of granted patents, and the main products ofpublic research are scholarly papers rather than patents.Just as patents, papers can also be cited by patents,and indeed both the cited patents and cited papers areserved as the “prior art” of a patent application, playinga signiﬁcant role for patent examiner to determine thepatentability of the application. There has been a largeliterature on both the patent-to-patent and the patent-to-paper citation linkage. Yet, systematic studies, as weshall present in this paper, have been relatively scarce.Our primary interest in this work is in the life sciencesector. The last several decades have seen an unprece-dented rapid progress of life science, both in basic sci-entiﬁc discoveries and clinical medicine. Recent studieshave suggested that biotechnology and pharmaceuticalpatents have been the main driver for the overall growthof patents and exhibit a particularly prominent “sciencelinkage” [18]. This has prompted us to ask: How hasthe patent-to-paper citation linkage of life science patents ∗ [email protected] changed over time? In particular, we aim to answer thefollowing lines of research questions:1. How has the amount of science linkage changed overtime? Does the change vary across diﬀerent tech-nology classes?2. On the cited side of the linkage, which countriesand types of institutions produce the cited papers?Whether basic or applied research are more likelyto be cited?3. On the citing side, to what extent company patentscite public science?These questions are important due to their high relevanceto the policy community. Although the study of sciencelinkage of patents has a long history, initiated by Narinand his colleagues in the 1980s [16, 20, 21], an up-to-date“status report” of science linkage has been lacking in theliterature, partially due to the daunting task of resolv-ing non-patent references to corresponding scholarly pa-pers. Even in Narin’s landmark study [20], the analyzedpatents were granted in two two-year periods (1987–1988and 1993–1994). By contrast, our analysis covers patentsfrom 1976 to 2012. Such a large-scale corpus allows us toprobe how the science linkage has changed over time. Byusing a large sample over a 36-year period, we contributeto the literature a systematic accounting of linkage fromtechnology to science. On the methodology side, we usea novel, paper-level indicator to quantify to what extenta paper is basic science or clinical medicine, allowing usto distill new insights on the science-technology linkagein biomedicine.The rest of the paper is organized as follows. Section IIdiscusses the context of our work. In Section III, we de-scribe the data source, selection of the cohort of patentsanalyzed in this work, and methods used to identify var-ious properties of patents and cited papers. Section IV a r X i v : . [ c s . D L ] J un presents the results of our analysis. Finally, we discussand conclude in Section V. II. LITERATURE REVIEW

This section brieﬂy reviews three lines of literature thatare closely related to our work. The ﬁrst two are aboutknowledge ﬂows as evidenced from patent-to-patent andpatent-to-paper citations, and the third one presentssome alternative interpretations other than knowledgeﬂows.

A. Knowledge ﬂow as evidenced frompatent-to-patent citations

Many studies have compared the importance ofpatents, as operationalized as the number of citationsit receives, from diﬀerent sectors. Jaﬀe and Trajtenberg[11] conﬁrmed the geographic localization of citations andfound that university patents are cited more frequentlyand government patents are cited less than companypatents. Henderson et al. [10] pointed out that the im-portance of university patents has been overshadowed bythe increasing rate of university patenting. This ﬁndingwas challenged later by Sampat et al. [27] that found thatsuch a decline is due to the “changes in the intertemporaldistribution of citations to university patents.” Bacchioc-chi and Montobbio [2] compared patent citations acrosscountries and technological ﬁelds, showing that chemi-cal, drugs & medical, and mechanical patents from U.S.universities are more cited than company patents, whichdoes not hold for Europe and Japan patents. Numerousworks have used regression frameworks to measure thelikelihood of knowledge ﬂow between patents assigned todiﬀerent types of institutions, suggesting that universitypatents are more important than corporate ones in termsof knowledge diﬀusion [2, 11, 31]. Rosell and Agrawal[26] found that for a limited scope of technological ﬁelds,there was a more-than-half decline of both knowledge in-ﬂows and outﬂows during the 1980s.

B. Knowledge ﬂow as evidenced frompatent-to-paper citations

We now shift our attention to the studies of patent-to-paper knowledge ﬂow. This topic has a longer historythan that of the patent-to-patent case, dating back to the1980s when Narin and his colleagues published a “sta-tus report” examining the time and nation dimensionsof the science-technology linkage [21]. Follow-up studiesincreased the timespan of analyzed data and pointed outthe increasingly heavy reliance of private-sector patentson public science [20]. Particularly related to our workis McMillan et al. [16] that concluded that the depen-dence of biotechnology patents on public science is much heavier than other industries.Several studies have found that patent-to-paper cita-tions better represent knowledge ﬂow when comparingto patent-to-patent citations. For example, Lemley andSampat [14] showed that patent examiners generate lessamount of NPRs. Later studies reinforced this claim [25].Sorenson and Fleming [28] found that patents that citepublished non-patent literature have more citations, im-plicating the important role of publications in technolog-ical innovation.Some works have examined the country dimension ofpatent-to-paper citations. Tijssen [29] analyzed Dutch-authored papers referenced in patents granted at theUSPTO and found the dominance of self-citation for do-mestic citation links. Acosta and Coronado [1] uncoveredsigniﬁcant diﬀerences between scientiﬁc citations in sec-tors and patent citations in Spanish regions. Guan andHe [8] explored the science-technology linkage in termsof regions and sectors for Chinese patents at the USPTOand showed the heterogeneity in cited journals.Other studies have instead paid attention to diﬀer-ent technology sectors. Popp [24] analyzed three typesof knowledge ﬂow, namely patent-to-patent, paper-to-paper, and patent-to-paper, in the alternative energy sec-tor and revealed that papers produced by governmentresearch are more likely to accrue patent citations thanany other types of institutions, highlighting an importantrole of government research in translating from basic toapplied research. The analysis also emphasized a lessimportant role for universities in wind research, whencompared to solar and biofuels research. Du et al. [5]looked at the grant-publication-patent-drug linkage andobserved that, among others, the vast majority of papersthat are cited by drug patents are publicly funded.NPRs have also been used to assist in the identiﬁca-tion of novel patents. Verhoeven et al. [32], for example,measured novelty of patents in terms of both combina-torial novelty of cited patents and novelty in knowledgeorigin, which is based on NPRs.

C. Interpretation of patent citations

While the vast majority of literature interpreted patentcitations as knowledge ﬂow, some studies have criticizedthis interpretation and proposed alternatives. Meyer [17]looked at nanoscale patents and suggested that citationlinkages from citing patents to cited papers hardly repre-sent direct knowledge-transfer. Callaert et al. [4] arguedthat patent-to-paper citations reveal the relatedness be-tween science and technology. Fleming and Sorenson [6]viewed inventions as combinatorial search and hypothe-sized that science helps direct inventors’ search processto more useful combinations, therefore helping increaseinvention rate.

D. Indicators for “basicness” of papers

A repeatedly occurring assumption in the literatureabout the role of public science is that public scienceinstitutions conduct basic science, while private ﬁrmsperform applied research by utilizing ﬁndings from ba-sic research. Yet, few studies have examined to someextent papers cited in patents are basic research, pos-sibly because of the diﬃculty in the operationalizationof the two notions to papers. One notable exception is[16] that used the four-level classiﬁcation scheme devel-oped by the CHI Research in the 1970s [22]. The schemeassigns journals to one of the four categories, which are,from the most basic to the most applied, “basic research”,“clinical investigation”, “clinical mix”, and “clinical ob-servation”. Focusing on the biomedicine domain, a re-cent proposal from Weber [33] used MeSH terms to de-velop an indicator of whether a paper is basic research orclinical research. The indicator is constructed based onwhether the MeSH terms of a paper contain cell-, animal-, and human-related terms and classiﬁes it as clinical re-search if there are human-related terms, in accordancewith the widely adopted deﬁnition that clinical researchis the study with human subjects.

III. DATA AND METHODSA. Sample selection

The NBER patent database [9] has been one of the ma-jor sources for information about U.S. patents. However,it only covers patents granted until 2006, whereas wewant to extend to later patents. We therefore used patentdata directly from the USPTO and parsed the down-loaded XML ﬁles ( https://bulkdata.uspto.gov/ ) toobtain bibliographic information of patents. The NBERdataset instead is used as an auxiliary source when weinfer various attributes of patents.As we are interested in science-technology linkage inthe life science domain, we need to select life sciencepatents. In doing this, we note that there is an inherenttrade-oﬀ regarding sample coverage. On one hand, itmay not be desirable to narrow our analysis to, for exam-ple, patents about drugs that treat certain diseases. Onthe other, selecting patents from other domains, such asthe software industry, may bias our statistics about sci-ence linkage, since those patents seldom cite biomedicalpapers. Here we leverage the categorization developedby NBER [9], which segments patents into six categories.We deﬁne life science patents as those belonging to oneof the two NBER technological categories, namelyChemical (Category 1) and Drugs & Medical (Category3). Operationally, we selected not-withdrawn ( ), utilitypatents granted between 1976 and 2012 whose pri-mary, three-digit USPC (U.S. Patent Classiﬁcation)

TABLE I. Top 20 countries with most patents.Country Patents % Country Patents %US 602695.19 55.42 TW 9905.90 0.91JP 156946.13 14.43 SE 9244.52 0.85DE 87173.10 8.02 BE 7976.88 0.73FR 35592.16 3.27 IL 6849.96 0.63GB 33518.90 3.08 AU 6410.51 0.59CA 20636.18 1.90 DT 5365.60 0.49CH 18003.52 1.66 DK 4885.43 0.45KR 14994.77 1.38 JA 4850.13 0.45IT 14563.86 1.34 FI 4075.29 0.37NL 11000.00 1.01 AT 3522.86 0.32 technology codes are in the 92 codes corresponding tothe two NBER categories (Appendix 1 in [9]). Theﬁnal sample used in our study consists of 1 , , B. Country origin of patent

To examine whether science linkage varies acrosspatents from diﬀerent countries, we need to identify thecountry origin of a patent. We do so by looking at theresidences of all inventors. For a tiny portion (0.63%)of patents whose country of origin cannot be determinedthrough this way, which is due to missing data of the ﬁrstinventor’s address, we use the NBER data to locate thecountry.Table I lists the top 20 countries that have the largestnumber of patents, based on fractional counting. They intotal contribute to 97.3% of all patents in our cohort. Itis clear that the US patent system has granted life sciencepatents from inventors originated from diverse countries,although US accounts for more than half of the patents.These country statistics remain very similar if we use theresidence of the ﬁrst-inventor to identify country origin(Appendix A).

C. Type of patent assignee

To study how the types of assignees may aﬀect cita-tions to scientiﬁc papers, we need to classify patent as-signees. To do this, we again leverage the NBER patentdataset that has already classiﬁed assignees of patentsin 1976–2006 into one of the following six types: cor-poration, university, institute, government, hospital, andindividual. For later patents, we assign the type basedon the exact match of assignee names. If this fails, wethen classify by checking the role of the assignee provided

HospitalIndividualInstituteGovernmentUniversityCompany

Chemical patents (%) 1.221.782.991.919.06 83.03Drugs & Medical patents (%)

FIG. 1. Distribution of types of assignees. by USPTO, whether the assignee name is the same as anapplicant, and whether it contains certain keywords (e.g.,“Ltd”, “University”). There are 9.76% patents withoutany assignee listed.Figure 1 gives a side-by-side comparison of the decom-position of the types of assignees for both chemical anddrugs & medical (DM) patents. Not surprisingly, theoverwhelmingly majority of patents are assigned to com-panies. A larger fraction of DM patents, however, comefrom universities. Many previous works have linked thisto the Bayh–Dole Act that permits universities to owninventions that are funded by government [19].

D. Non-patent references in patent

Each and every NPR cited in the patents has beenresolved previously to determine whether and whichMEDLINE paper it refers to, with a high accuracy ob-tained [12]. MEDLINE is perhaps the most widely useddatabase for the biomedical research literature, curatedand maintained by the US National Library of Medicine(NLM). It is publicly available and provides a variety ofmeta data about papers indexed there, including com-mon bibliographic information like authors, aﬃliations,journal, publication year, funding, etc. It also providesdomain speciﬁc information like Medical Subject Head-ings (MeSH). Moreover, many additional resources thatwe rely on have been built on top of MEDLINE, and lit-erature has been using MEDLINE for innovation study,such as operationalization of the triple-helix model basedon MeSH terms [23].

E. Country and institution type of papers

To understand how public science contributes toknowledge cited in patents, we need to classify the typesof institutions of papers. However, an important questionbefore the classiﬁcation is which author’s (or authors’) af-ﬁliation we should use, as modern science has become a collaboration endeavor [34]. Here we choose to look atonly the ﬁrst-author’s aﬃliation for two reasons. First, asstated from the NLM, “until 2014, only the aﬃliation ofthe ﬁrst author was included,” ( ) and the ﬁrstauthor’s aﬃliation was not included until 1988. This lim-itation is also reﬂected in the data: 87% of the 218 , MapAffil ( http://abel.lis.illinois.edu/cgi-bin/mapaffil/search.py ; [30]). Itreturns geography information and institution type of theinput MEDLINE paper and has a reported accuracy of97.7%. MapAffil classiﬁes institutions into eight cate-gories, namely educational, hospital, educational hospi-tal, organization, commercial, government, military, andunknown. For our study, we merge educational hospitalinto educational, since teaching hospitals still serve theeducation role for training medical students. Further-more, we combine the organization, government, and mil-itary categories into a single one, called public researchorganization (PRO), because we primarily concern aboutwhether or not cited research are performed by com-panies. Previous studies have also employed a similarprocedure [2]. Therefore, there are ﬁve types of institu-tions of papers, namely educational (EDU), PRO, hospi-tal (HOS), commercial (COM), and unknown (UNK).

F. Funding support of papers

An ongoing eﬀort in the study of the patent-to-papercitation linkage is to understand to what extent citedpapers are supported by public funding. We retrievethis information from the paper meta data provided inthe MEDLINE database. First, we determine whethera paper is funded by the US government by looking atwhether the “Publication Type” ﬁeld has any of thefollowing four terms: “Research Support, U.S. Gov’t,Non-P.H.S.”, “Research Support, U.S. Gov’t, P.H.S.”,“Research Support, N.I.H., Extramural”, and “ResearchSupport, N.I.H., Intramural”. Second, we determinewhether a paper is supported by the NIH by looking atthe “Grant List” ﬁeld and further record which NIH in-stitutes support the paper.

G. “Basicness” of papers

In this study, we do not adopt the method proposed inNarin et al. [22] to quantify “basicness” of papers for fourreasons. First, we are not aware the scheme is publiclyavailable. Second, it remains unclear whether a schemedeveloped in the 1970s is still applicable nowadays, withnumerous journals established since then. Third, thescheme only considers journals indexed in the SCI, whilemany MEDLINE journals are not there. Lastly and mostimportantly, the scheme operates on journals rather thanpapers. One immediate implication of this is that paperspublished in all the journals belonging to the same cate-gory have the same “basicness.” This is problematic, be-cause many biomedical journals publish qualitatively dif-ferent types of research, which can be basic or applied. Asan example,

Circulation , a prestige journal with a 2017Impact Factor of 18.88, “publishes [...] related to cardio-vascular health and disease, including observational stud-ies, clinical trials, epidemiology, health services and out-comes studies, and advances in basic and translational re-search” ( ).Here we use an innovative method that was recentlyproposed to identify translational science in biomedicine[13]. Translation science is research that “translate” ba-sic scientiﬁc discoveries (bench-side or basic research) toclinical applications (bed-side or applied research). Themethod quantiﬁes the basicness of papers directly. Itresults in a paper-level indicator called level score (LS)ranging from -1 to 1, with LS closer to -1 meaningthat the paper is, by construction, more basic and 1more applied. The method learns similarities betweenMeSH terms based on their co-occurrences among pa-pers, using modern representation learning techniques.It then identiﬁes an axis that points from basic sci-ence terms to applied ones. MeSH terms are organizedinto a hierarchical structure, and each of them has alocation in the tree. For example, “Eukaryota” (B01)is within branch B (“Organisms”). Given this tree, aMeSH term is a basic science one if it is located withinthe following terms: “Cells” (A11), “Archaea” (B02),“Bacteria” (B03), “Viruses” (B04), “Molecular Struc-ture” (G02.111.570), “Chemical Processes” (G02.149),and “Eukaryota” (B01) except “Humans”. A term isapplied if it is located within the following nodes: “Hu-mans” (B01.050.150.900.649.801.400.112.400.400) and“Persons” (M01). The basicness of a MeSH term is itsprojected position onto the axis, expressed as the cosinesimilarity between the axis vector and the term vector.The LS of a paper is the average basicness of its MeSHterms. The method has been validated and is consistentwith Narin’s four-level classiﬁcation and other existingmethods.

IV. RESULTSA. Summary statistics

Table II reports the overall statistics of NPRs cited inthe 1 , ,

650 patents in our sample, grouped by theirNBER subcategories. The ﬁrst group of statistics in Table II concerns about the total number of patents.Chemical patents share 62.7%, and the rest are DMpatents. Among chemical patents, resins and organiccompounds are the two largest subcategories, whereasdrug and surgery & medical instruments patents aremost presented ones in the DM category. Overall only252 ,

821 (23.2%) patents have at least one NPR linkedto a MEDLINE paper (hereafter MNPR). This fraction,however, varies signiﬁcantly across the two categories:only 9.6% for chemical patents but 46.1% for DM ones.The variability also holds at the subcategory level. 29.1%of resins patents and 14.8% organic compounds patentshave MNPRs; by contrast, 80% of biotechnology patentscite MEDLINE papers, while 56.1% of drugs patents and21% of surgery & medical instruments patents do so.The second group of statistics is the total numberof NPRs and MNPRs. A total of 6 , ,

178 NPRswere emanated from our corpus of patents, among which2 , ,

621 (33.3%) are from chemical ones. More thanhalf (3 , , , , , B. Overall characteristics over time

Next, we investigate how overall characteristics changeover time. Figure 2A shows a steady increase of the to-tal number of granted patents over the examined period,reaching from 21 ,

151 in 1976 to 52 ,

994 in 2012. Such in-crease is largely driven by the remarkable growth of DMpatents: a nearly ten-fold increase from only 2 ,

827 in

TABLE II. Summary statistics of non-patent references (NPRs) cited by U.S. life science utility patents 1976–2012, groupedby their NBER categories. MNPR refers to an NPR that corresponds to a MEDLINE paper.NBER Category 1:

Chemical

Patents Total Mean MNPRs bySub-cat Name All w/ MNPRs % NPRs MNPRs % All w/ MNPRs11 Agriculture, food, textiles 22 ,

166 1 ,

019 4.60 54 ,

183 2 ,

853 5.26 0.129 2.80012 Coating 58 ,

326 1 ,

873 3.21 127 ,

440 9 ,

402 7.38 0.161 5.02013 Gas 20 ,

196 319 1.58 32 ,

179 1 ,

350 4.20 0.067 4.23214 Organic compounds 91 ,

301 26 ,

538 29.07 686 ,

384 291 ,

540 42.47 3.193 10.98615 Resins 105 ,

960 15 ,

667 14.78 585 ,

437 288 ,

730 49.32 2.725 18.42919 Miscellaneous 384 ,

434 20 ,

049 5.22 827 ,

008 123 ,

405 14.92 0.321 6.155

Total: ,

383 65 ,

465 9.59 2 , ,

631 717 ,

280 31.02 1.051 10.957NBER Category 3:

Drugs & Medical

Patents Total Mean MNPRs bySub-cat Name All w/ MNPRs % NPRs MNPRs % All w/ MNPRs31 Drugs 158 ,

665 89 ,

008 56.10 2 , ,

049 1 , ,

016 62.70 8.792 15.67332 Surgery & medical instruments 137 ,

981 28 ,

975 21.00 668 ,

424 274 ,

526 41.07 1.990 9.47433 Biotechnology 79 ,

148 63 ,

625 80.39 1 , ,

241 1 , ,

578 70.54 14.423 17.94239 Miscellaneous 30 ,

473 5 ,

748 18.86 123 ,

833 46 ,

377 37.45 1.522 8.068

Total: ,

267 187 ,

356 46.12 4 , ,

547 2 , ,

497 61.64 7.034 15.252 ,

616 in 2012. The number of chemical patents,on the other hand, has increased relatively slowly—44%.We notice that there is a ﬂatten period followed by a de-creasing period from 1998 to 2005, for both chemical andDM patents. Accompanying the increase of the raw num-ber of patents is an increasing fraction of patents that citeMEDLINE-indexed papers, as presented in Figure 2B.In 1976, only 1.7% chemical and 6.8% DM patents hadMNPRs, and in 2012 the number reached to 21.3% and58.7%, respectively. Figure 2C plots the total numberof patent-to-paper citations for patents granted in eachyear, demonstrating a remarkable increase of science link-age. We ﬁt the growth from 1976 to 1998, obtaining N t = 10 . · t − . , where t is the calendar year and N t is total citations at t . This means that there is an ex-ponential growth of the total number of MNPRs, whichdoubled every log / .

102 = 2 .

94 years. DM patents,again, drive the increase, and generate more MNPRsthan chemical patents across years. Finally, the increaseof the total number of MNPRs is not due to the in-crease of the number of patents, but rooted at patentsthemselves, as conﬁrmed in Figure 2D which shows thatthe average number of MNPRs per patent also increases.Yet, DM patents have a faster increase than chemicalpatents.We then add the country dimension to the analysis ofpatent-to-paper citations. Figure 3A shows that the av-erage number of MNPRs per patent has been increasingfor patents originated from the top 6 countries with thelargest number of patents. The extent, however, variesby countries. For patents from Canada, the U.S., and theU.K., the average increases faster than the overall case, while for patents from France, Germany, and Japan, it in-creases slower than the overall case. What is noteworthyis that, starting from around 1996, Canada has surpassedUS in generating more MNPRs per patent. Figures 3B–G further look at chemical and DM patents separately foreach of the top 6 countries. From these ﬁgures, we canconclude that (1) across the top countries and categories,there is an increasing citation linkage from life sciencepatents to biomedical research; and (2) DM patents ex-hibit a faster increase than chemical patents across yearsand countries.

C. Cited science

In this section, we explore the characteristics of pa-pers that are cited by patents. We do so at the referencelevel; that is, a paper that is cited by multiple patents iscounted multiple times, since the number of citations apaper receives from patents displays a heavy-tailed dis-tribution, similar to the case of citations from papers[12].First, we study the distributions of countries wherecited papers are produced. Figure 4A plots the fractionsof MNPRs authored by diﬀerent countries over time.Here we display the results separately for the six indi-vidual countries that have the largest shares at 2012 andcombine the shares of the rest countries together. Thedistribution at a particular year is derived as follows. Weﬁrst get all the patents granted in that year, and thencount the number of MNPRs produced by a given coun-try and normalize it by the total number of MNPRs cited P a t e n t s ×10 A All Chemical Drugs & Medical F r a c t i o n w i t h S N P R B Year, t T o t a l S N P R s , N t C N t = . t . Year M e a n S N P R s D FIG. 2. Overall characteristics of the patent-to-paper citation linkage over time. (A) The number of patents. (B) The fractionof patents with at least one MNPR. (C) The total number of MNPRs. The red, dash-dotted line represent an exponential ﬁtof the total number of MNPRs from 1976 to 1998, N t = 10 . · t − . . (D) The average number of MNPRs per patent. by all the patents in that year.Figure 4A shows that the U.S. has been consistentlythe largest producer of cited science, accounting for al-most half (49%) of the MNPRs cited by patents in 2012.Other top countries contribute to signiﬁcantly smallerfractions: 6.8% for the UK and 5.5% for Japan. Notethat here one may refrain to conclude that US sciencehas been increasingly cited by patents over time, becausethe apparent increase of the fraction of US science couldsimply due to an increasing portion of cited papers withaﬃliation information available. This is corroborated bythe observation that the share of US science has beenstable since around 2000.Figure 4B presents the fractions of cited referencesthat are produced by diﬀerent types of institutions overtime, derived using the same procedure described above.Universities have been consistently the largest producer;57.7% of references that are cited by patents granted in2012 are written by them. PRO, which includes insti-tutes and government, are the second major player, con-tributing to 9.8%. Public science, therefore, share 67.5%of cited science in patents. Companies account for only10%. We also examine what are the funding agency thatsupported the science cited by patents. Figure 4C showsthe portion of references that are supported by U.S. gov-ernment and by NIH speciﬁcally. Since 1990, more than30% of cited science are supported by U.S. governmentand 20% by NIH. Table III further shows the top NIHinstitutes by the amount of citations they receive.The last eﬀort to characterize the cited science is toexamine to what extent they are basic or applied re-search. We use the LS indicator described in Section IIIto measure the basicness of each paper. First, we plotin Figure 4D the histogram of LS for all the 14 , , th to separate the two modes, which is 0 .

16. For 42.7% of

Year M ean S N P R s OverallCAUSGBFRDEJP A B CA C US D GB Year E FR Year F DE Year G JP FIG. 3. The country dimension of the patent-to-paper citation linkage. (A) The average number of MNPRs per patent over allpatents and over patents from the six most-patented countries. (B–G) The average number of MNPRs in patents originatedfrom (B) Canada, (C) the U.S., (D) the United Kingdom, (E) France, (F) Germany, and (G) Japan. Solid lines in (B–D)represent all patents in the country, dashed lines chemical patents, and dotted lines drugs and medical patents.TABLE III. Number of citations for top NIH IC.IC Citations %National Cancer Institute 416 ,

642 22.3National Institute of General Medical Sciences 251 ,

171 13.5National Institute of Allergy and Infectious Diseases 248 ,

842 13.3National Heart, Lung, and Blood Institute 239 ,

139 12.8National Institute of Diabetes and Digestive and Kidney Diseases 128 ,

801 6.9National Institute of Neurological Disorders and Stroke 89 ,

904 4.8National Institute of Child Health and Human Development 55 ,

184 3.0National Center for Research Resources 54 ,

456 2.9National Institute on Aging 49 ,

538 2.7National Institute of Diabetes and Digestive and Kidney Diseases 40 ,

886 2.2 all papers in Figure 4D, their score is smaller than th ;By contrast, 85 .

2% of MNPRS in Figure 4E fall into thiscategory. This result is robust if we instead look at thepaper level. We further make additional measurementsto ascertain that the observation is not driven by patentswith many MNPRs. For each patent, we calculate (1)the average value of LS of its cited papers; and (2) thefraction of papers with LS smaller than th . The resultsconﬁrm that for the vast majority of patents, most oftheir references are papers from the basic side.As a separate case study, we examine MNPRs frompatents that are associated with drugs approved bythe U.S. Food and Drug Administration (FDA). TheHatch–Waxman Act mandates that drug innovators toprovide FDA with the list of patents that covers thedrug, and FDA included these patents in the ApprovedDrug Products With Therapeutic Equivalence Evalua- tions (also known as the Orange Book), although it isnot FDA’s task to actually evaluate the coverage. Suchpatents may possess economic value for their owner tosurpass the cost of the development of drugs, and atthe same time have the cure value for patients. Weget this list of patents from . We ﬁnda much smaller number (4 , ,

512 MNPRs in our sample of papers. Figure 4Fshows that, although most (59.6%) of these MNPRs areon the basic side, substantial amount are on the appliedside, yielding a bimodal distribution that is not presentin the overall case in Figure 4E. This may be related tothe underlying process of drug development where phar-maceutical companies need to test the safety and eﬀec-tiveness of drugs on human—which is applied researchby deﬁnition. A OtherUNKCanadaFranceGermanyJapanUKUSA P o r t i o n B UNKHOSCOMPROEDU

Year

US Gov'tNIH D D e n s i t y E Level score F Basic Applied

FIG. 4. Characteristics of cited science. (A–C) Fraction of MNPRs produced (A) by countries, (B) by institution types, and(C) supported by the U.S. government and the NIH in particular. (D) Histogram of level score for all MEDLINE-indexedpapers published between 1980 and 2012. The blue dotted line indicates the score (0.16) corresponding to the local minimumof density. (E) Histogram of level score of references cited in USPTO-issued patents. (F) The same as (E) but based on patentsassociated with FDA-approved drugs. For 42.7% of papers (D) and 85.2% (E) and 59.6% (F) of references, the score is smallerthan 0.16.

D. Private-sector patents

In this section, we analyze science linkage of patentsassigned to companies. Table IV presents the overall per-centages of citations originated from company patents topapers authored by diﬀerent types of aﬃliations. We ob-serve that about 48% of citations form company patentsgo to university papers, and this varies little if we focuson chemical or DM patents separately or US patents only.Other public research organization ranks the second, con-tributing to 13–15%. Companies share only 11–13% ofthe science base of their patents.We further look at the science linkage of individualcompanies. By way of example, Medtronic, a global med-ical device company, owns the largest number (3 , ,

242 MNPRs inthose patents, among which 5 ,

824 are from universities,

TABLE IV. Percentage of MNPRs originated from companiesto diﬀerent types of institutions.All Chemical DMAll US All US All USEDU 48.1 47.6 47.7 47.7 48.2 47.6PRO 13.4 13.0 14.8 14.5 13.1 12.6COM 11.5 11.6 13.1 13.2 11.1 11.2HOS 7.1 7.4 5.4 5.6 7.5 7.9UNK 19.9 20.4 19.0 19.0 20.1 20.8

579 from PRO, and only 297 from companies. The frac-tion of MNPRs authored by the public science section(universities and PRO), therefore, is 0.57. Table V ex-tends this calculation to the top 10 companies that havethe largest number of chemical and DM patents, indicat-0 C D F ChemicalDrugs & Medical

FIG. 5. Cumulative distributions of fraction of public scienceMNPRs cited in company patents. ing a signiﬁcant linkage to public science. We make onemore step and repeat this calculation to all the companieswhose patents have at least one MNPR. Figure 5 showsthe cumulative distributions of fraction of public scienceMNPRs for all those companies, across the chemical andDM categories. For more than 60% of companies, morethan half of MNPRs cited in their patents are from publicscience.

V. DISCUSSION

We have uncovered several empirical ﬁndings regard-ing how science linkage of US life science patents haschanged over a 37-year period. From the prevalent per-spective of viewing citation linkage as knowledge ﬂow,this study is particularly important, because our resultssuggest a continuous linkage of public science to privatesector inventions. First, the overall growth of life sci-ence patents are largely driven by the increase of drugand medical patents. The volume of science linkage areincreasing exponentially, doubling every 2.9 years. Theincrease happens in both chemical and drugs & medi-cal patents, as well as patents originated from diﬀerentcountries.Second, almost half of the MNPRs are produced inthe US; the majority of them are from the public sciencesector. Public science—research performed by academicsand government institutions—is widely acknowledged tohave a strong inﬂuence on technology development. Ourwork provides empirical, quantitative, and longitudinalevidence of the magnitude of the dependence of tech-nologies on public science.Third, the overwhelming of cited science are basic re-search; yet, the nuance is for patents associated withdrugs, with a non-negligible portion of them are appliedresearch. The premise that basic science lays foundationsfor applied science is extensively discussed and widelyembedded in many theoretical models about science- technology interaction ( e.g. , the “linear model” of inno-vation running from basic science to applied science totechnologies and economic growth [3]). Our work pushesforward this line of inquiry, by moving from a dichoto-mous question of whether basic science fuels applied re-search towards a quantitative understanding of the ex-tent of the reliance. Such a complication is important,because our ﬁndings in general corroborate the pivotalrole of basic science, but at the same time point to a pre-viously ignored contribution of applied research. On thisregard, our work supports Gittelman [7], which arguesthat understanding diseases requires embedding appliedscience into basic research.Fourth, the US government and NIH in particular con-tinue to be found as funders of research cited in patents.Last, for the majority of companies, most of their patentscite public science. However, to what extent the linkagerepresents a direct knowledge ﬂow is a line of challengingfuture work.Many previous works about the role of public sciencein private sector innovation assumed that public researchorganizations conduct basic research, while private ﬁrmsperform applied research by utilizing results generatedfrom basic research. However, few studies have exam-ined whether papers that are cited in patents are basicresearch. We bridge this gap by using a novel indicatorproposed in our previous work that quantiﬁes the extentto which a paper is basic research or applied research,by leveraging recent advances in machine learning liter-ature. Using this indicator, we quantitatively show thatcited papers are more likely to be basic research, resonat-ing with earlier results [16, 20]. Yet, we also ﬁnd that asigniﬁcant portion of papers cited in patents that are re-lated to FDA-approved drugs are clinical research. Theseﬁndings appears to be in a sharp contrast with a recentﬁnding that declares no relationship about whether basicor applied research are more likely to be cited by patents[15]. The inconsistency may be due to the diﬀerence inentities analyzed. While we focused on papers, they fo-cused on grants and made the basic/applied dichotomybased on grant abstracts. Furthermore, it remains to beseen to what extent one short grant abstract can rep-resent the actual research performed and how diﬀerentthe level scores are for papers produced under the samegrant.Throughout the work, we have grouped patentsbased on NBER categories, which rely on the USPCcodes. USPTO, however, scheduled to replace USPCcodes with the Cooperative Patent Classiﬁcation(CPC) schema in 2013 ( ), rais-ing the question of whether our analysis can be extendedto later patents without USPC codes. In Appendix B,we demonstrate that it is still feasible to assign USPCclasses and NBER categories to those patents, throughtheir IPC classes.Some of our analysis may be limited by the quality of1

TABLE V. The top 10 companies that own the largest number of chemical (top) and DM (bottom) patents. The Fractioncolumns refer to the fraction of MNPRs that are authored by the public science section (universities and PRO).Company Patents FractionChemicalBASF AG 8 ,

523 0.57Bayer AG 8 ,

450 0.42E. I. du Pont de Nemours and Company 7 ,

462 0.58General Electric Company 7 ,

276 0.73Eastman Kodak Company 7 ,

028 0.46Fuji Photo Film Co., Ltd. 6 ,

463 0.56The Dow Chemical Company 5 ,

545 0.47Ciba-Geigy Corporation 5 ,

059 0.31Hoechst AG 4 ,

468 0.36Shell Oil Company 4 ,

076 0.73Drug & MedicalMedtronic Inc. 3 ,

565 0.57Merck & Co., Inc. 3 ,

000 0.47The Procter & Gamble Company 2 ,

349 0.52Eli Lilly and Company 2 ,

314 0.42Bayer AG 2 ,

258 0.54Pioneer Hi-Bred International, Inc. 2 ,

064 0.81Cardiac Pacemakers, Inc. 1 ,

955 0.61Pﬁzer Inc. 1 ,

816 0.55Abbott Laboratories 1 ,

736 0.58Monsanto Technology LLC 1 ,

696 0.80 bibliographic data of papers. To be speciﬁc, we have re-lied on the ﬁrst author’s aﬃliation to infer the countryand institution type of a paper, due to the unavailabilityof the aﬃliation information for other authors. As sci-entiﬁc collaboration has become the dominant mode inknowledge production, future work is needed to collectmissing aﬃliation data and examine how the results maychange. Second, we have used the funding informationprovided in the MEDLINE database to analyze the roleof NIH. Although it remains unclear about the complete-ness of the data, our results nevertheless provide a lowerbound on the fraction of cited papers that are fundedby NIH. Future studies could also use other data sourcessuch as Web of Science to get the funding information.Future work is needed to model patent-to-paper knowl-edge ﬂow among diﬀerent types of institutions and com-pare how it is diﬀerent from patent-to-patent knowledgeﬂow. That the science linkage is dominated by biotech-nology and drug patents may suggest a ﬁner level cat-egorization of these patents that goes beyond existingschemes is needed. Future work can base the linkageto science to cluster these patents and compare howthe data-driven derived clusters align with traditionalschemes.

Appendix A: Country origin of patents based onﬁrst inventor residence

Table A.1 presents the number of patents bycountry based on the residence of ﬁrst inven-tor, a standard practice adopted by organizationslike WIPO ( ), patentoﬃces such as USPTO ( )and EPO ( ), and theliterature [2, 9].

Appendix B: Assigning USPC classes and NBERcategories to patents without USPC codes

To assign NBER categories to patents without IPCclasses, we ﬁrst establish the mapping from IPC to USPCclasses, using the US-to-IPC8 Concordance providedby USPTO. For example, the table that maps USPCsubclasses of class 424 to IPC subclass and group canbe found at . Wescraped all these tables and created the IPC-to-USPCmapping using fractional counting. As examples, IPC2

TABLE A.1. Top 20 countries (territories) with most patents.Country Patents % Country Patents %US 605 ,

875 55.7 TW 9 ,

849 0.9JP 156 ,

491 14.4 SE 9 ,

192 0.8DE 87 ,

127 8.0 BE 7 ,

883 0.7FR 35 ,

505 3.3 IL 6 ,

825 0.6GB 33 ,

274 3.1 AU 6 ,

388 0.6CA 20 ,

691 1.9 DT 5 ,

361 0.5CH 17 ,

865 1.6 JA 4 ,

847 0.4KR 14 ,

901 1.4 DK 4 ,

816 0.4IT 14 ,

376 1.3 FI 4 ,

051 0.4NL 10 ,

794 1.0 AT 3 ,

476 0.3 “A61K 51/00” is uniquely mapped to USPC 424. IPC“A61M 36/14” can be mapped to 22 unique USPCsubclasses, 21 of which corresponds to USPC class 424and the remaining 1 to class 427. Therefore, “A61M36/14” is mapped to USPC 424 with weight , andto USPC 427 with weight . Table A.2 provides themapping for the 3 exemplar IPC codes.

TABLE A.2. Mapping from IPC code to USPC class, withweight in parentheses.IPC code USPC class and weightA61K 5100 424 (1)A61M 3614 424 ( ); 427 ( )A61K 5104 534 (1)

For a particular patent without USPC codes, we canthen assign USPC classes based on its IPC codes. For ex-ample, patent 9,044,520 has 3 IPC codes: “A61K 5100”,“A61M 3614”, and “A61K 5104”. The weight for USPC424 is × × /

22 = 0 . [1] Acosta, M., Coronado, D., 2003. Science–technologyﬂows in spanish regions: An analysis of scientiﬁc cita-tions in patents. Research Policy 32, 1783–1803. doi:doi:10.1016/S0048-7333(03)00064-7.[2] Bacchiocchi, E., Montobbio, F., 2009. Knowledge dif-fusion from university and public research. a compari-son between us, japan and europe using patent citations.The Journal of Technology Transfer 34, 169–181. doi:doi:10.1007/s10961-007-9070-y.[3] Balconi, M., Brusoni, S., Orsenigo, L., 2010. In defenceof the linear model: An essay. Research Policy 39, 1–13.doi:doi:10.1016/j.respol.2009.09.013.[4] Callaert, J., Pellens, M., Looy, B.V., 2014. Sourcesof inspiration? making sense of scientiﬁc referencesin patents. Scientometrics 98, 1617–1629. doi:doi:10.1007/s11192-013-1073-x.[5] Du, J., Li, P., Guo, Q., Tang, X., 2019. Measuring theknowledge translation and convergence in pharmaceuti-cal innovation by funding-science-technology-innovationlinkages analysis. Journal of Informetrics 13, 132–148.doi:doi:10.1016/j.joi.2018.12.004.[6] Fleming, L., Sorenson, O., 2004. Science as a map intechnological search. Strategic Management Journal 25,909–928. doi:doi:10.1002/smj.384.[7] Gittelman, M., 2016. The revolution re-visited: Clini-cal and genetics research paradigms and the productiv-ity paradox in drug discovery. Research Policy 45, 1570– P a t e n t s ×10 AllChemicalDrugs & Medical

FIG. A.1. The number of patents granted between 1976 and2016.1585. doi:doi:10.1016/j.respol.2016.01.007.[8] Guan, J., He, Y., 2007. Patent-bibliometric analysis onthe Chinese science — technology linkages. Scientomet-rics 72, 403–425. doi:doi:10.1007/s11192-007-1741-1.[9] Hall, B.H., Jaﬀe, A.B., Trajtenberg, M., 2001. TheNBER Patent Citation Data File: Lessons, Insights andMethodological Tools. Working Paper 8498. National Bu-reau of Economic Research. doi:doi:10.3386/w8498.3