Coevolution of theoretical and applied research: a case study of graphene research by temporal and geographic analysis
CCoevolution of theoretical and applied research: a casestudy of graphene research by temporal and geographicanalysis
Ai Linh Nguyen a,1 , Wenyuan Liu, a,1, ∗ , Siew Ann Cheong a a Division of Physics and Applied Physics, School of Physical and Mathematical Sciences,Nanyang Technological University, 21 Nanyang Link, 637371, Singapore
Abstract
As a part of science of science (SciSci) research, the evolution of scientific dis-ciplines has been attracting a great deal of attention recently. This kind ofdiscipline level analysis not only give insights of one particular field but alsoshed light on general principles of scientific enterprise. In this paper we focuson graphene research, a fast growing field covers both theoretical and appliedstudy. Using co-clustering method, we split graphene literature into two groupsand confirm that one group is about theoretical research (T) and another cor-responds to applied research (A). We analyze the proportion of T/A and foundapplied research becomes more and more popular after 2007. Geographical anal-ysis demonstrated that countries have different preference in terms of T/A andthey reacted differently to research trend. The interaction between two groupshas been analyzed and shows that T extremely relies on T and A heavily relieson A, however the situation is very stable for T but changed markedly for A. Nogeographic difference is found for the interaction dynamics. Our results give acomprehensive picture of graphene research evolution and also provide a generalframework which is able to analyze other disciplines. ∗ Corresponding author
Email address: [email protected] (Wenyuan Liu,) These authors contributed equally.
Preprint submitted to arXiv February 5, 2021 a r X i v : . [ c s . D L ] F e b . Introduction As an emerging field, science of science (SciSci) has attracted a great dealattention recently (Zeng et al., 2017; Fortunato et al., 2018; Wang and Barab´asi,2021). In SciSci, science is treated as a complex system, which include ideas, pa-pers, scientists, funding agencies and, more importantly, the connections amongthem. Using methods from complex system and complex network, researchershave revealed many interest findings of Science from data. They includes uni-versal citation distribution (Radicchi et al., 2008), scientists’ career dynamics(Petersen et al., 2012), the role of team in science (Wuchty et al., 2007), just toname a few.As a complex system, science ecosystem has obvious hierarchical structure:roughly speaking, science includes physical science and life science, physical sci-ence includes physics, astronomy, chemistry and earth science, physics includesmechanics, electromagnetism, thermodynamics, relativity, quantum physics andso on, electromagnetism itself also have its internal fine structure. At each hi-erarchical level, scientific elements (people, ideas, papers) are closely connectedwithin disciplines, and loosely connected between disciplines. These naturallyexist modular structure provide convenience to SciSci research since we can con-centrate on particular discipline by assuming the influence from other disciplinesis negligible. This kind of discipline level analysis not only give insight of oneparticular field but also shed light on general principles of scientific enterprise.Therefore, people’s interest on discipline analysis has been growing rapidly.One of the earliest extensive studies on this subject was done by Bettencourtand Kaur (2011). The authors focus on the evolution of sustainability science,a new discipline emerged in 1980s. By analyzing a large corpus of relevantpublications, they found that sustainability science has been growing explosivelysince its advent in the 1980s. This discipline has an unusual spatial distributionof its contributions: they are widely distributed in both developed counties anddeveloping countries; the collaboration network has strong roots in nationalcapital rather than traditionally more academic cities. To capture the main2hemes that define the filed, the authors decomposed the corpus into traditionaldisciplines and found they are integrated management of human, social, andecological systems from an engineering and policy perspective. The work inBettencourt and Kaur (2011) also revealed that the unification of sustainabilityscience happened around the year 2000 by collaboration network analysis.One recent discipline analysis look to artificial intelligence (AI) research(Frank et al., 2019). The authors tried to figure out whether AI research andrelevant social science fields keep pace with each other. To answer this question,they used citation to track the communication between AI research and otherfields. They analyzed citation flows from 1950 to 2018 and found these floware neither constant nor symmetric. AI research cited philosophy, geographyand art a lot in its early years, however current AI research cited mathematicsand computer science most strongly. On the other hand, other fields didn’t citeAI research in proportion to its growing number of publications. There is anattention gap between AI research and social science.Here, we conduct a discipline analysis of graphene research, a relativelyyoung field focus on single layer of carbon atoms arranged in a two-dimensionalhoneycomb lattice. Since Novoselov et al. (2004) seminal work, which helpsAndre Geim and Konstantin Novoselov won Nobel Prize in Physics in 2010,the interest in graphene has grown explosively and even led European Com-mission to fund Graphene Flagship with e . Methodology To build the dataset, we chose ‘graphene’ as the topic keyword to searchin the Web of Science Core Collection and obtained bibliographical records of135,617 graphene-related journal papers in August 2018. These records havebeen used in our another paper (Nguyen et al., 2020) and interested readerscan find more information about these records there. Web of Science may notcover every journal publication in this topic, however, given the wide coverageof Science Citation Index, most mainstream graphene papers should be includedin our dataset.There are various document types in 135,617 records: articles, proceedingpapers, reviews, meeting abstracts, etc. Since the primary focus of this paper iscoevolution between theoretical and applied branches in graphene research, weonly included research articles with DOI names and publication years for ouranalysis. There are 115,988 remaining records and all analyses in this paper aredone with them if not mentioned otherwise.
To group these papers into clusters by their research topics, we applied ablock diagonal co-clustering algorithm introduced by Ailem et al. (2015, 2016),namely CoClus, to divided papers into a number of non-overlapping clusterswith their characteristic words. It is well known that papers with differentresearch topics tend to have different word frequency features and researchershave demonstrated that Coclus algorithm can effectively co-cluster document-word matrices (Ailem et al., 2016). In this paper, we assume (a) the linguisticcontent inside the title and abstract is sufficient to tell the topic in each article,and (b) words that appear less than 0.01% of all records have insignificantimpact on the clustering process. Generally speaking, CoClus algorithm aimsto partition the object set I of size P and corresponding attribute set J of sizeN into g non-overlapping clusters with high in-cluster density and low cross-cluster density. In our case, the goal is to split our data collection (papers and5ssociated words) into g groups and each group has a set of papers and a set ofwords. These words are used more frequently in the papers belong to the samegroup and less frequently in the papers belong to other groups.Firstly, every article is represented by a N-dimensions vector a i : a i = ( a w , a w , ..., a w N ) , (1)where N is the number of feature words we considered in this study and a w j isthe count of word w j in article a i . By concatenating all paper vectors together,we constructed a paper-word matrix A: A = a . . .a P = a w . . . a w N ... . . . ... a Pw . . . a Pw N , (2)where a Pw N represents the total number of word w N in the title and abstract ofarticle P . Then, the algorithm tries to cluster matrices effectively by introducinga block seriation C : C = { c pn } p =1 ,...,Pn =1 ,...,N , (3)where c pn = 1 if object p and attribute n belong to the same cluster and c pn = 0otherwise. Ailem et al. (2015) introduced a reformulated modularity as: Q ( A, C ) = 1 (cid:80) p,n a pn P (cid:88) p =1 N (cid:88) n =1 (cid:16) a pn − (cid:80) n a pn (cid:80) p a pn (cid:80) p,n a pn (cid:17) c pn . (4)And it turns out that partition with high in-cluster density and low cross-clusterdensity is equivalent to high Q and the C that returns largest Q is the best par-tition we are looking for. The reformulated modularity has a linear dependenceon C , therefore the co-clustering task can now be regarded as an integer lin-ear programming problem. A python package CoClust (Role et al., 2019) wasused to find the partition C that returns the largest reformulated modularity Q . Each paper or word has a cluster label g in C and we can easily constructany cluster by grouping papers or words with the same label g . This is the mainidea of CoClus algorithm and we refer the interested readers to (Ailem et al.,2015, 2016). 6 .3. Keywords by Z-score Co-clustering method can help us divide papers and words into severalgroups. However, it is not a panacea for all research problems. Apart from thetechnical perspective, the result of co-clustering heavily depends on how youabstract your research problem into mathematical form. If the abstraction isnot reasonable, then co-clustering may not return meaningful results. Thereforeit is very important to validate co-clustering result from different perspectives.In this study, we built a keyword list for each cluster. If the co-clustering workswell, then one keyword list should have more “theoretical” words and anothershould have more “applied” words and this can be easily checked by anyonewith basic science training. Beyond the validation, these keywords also give usa comprehensible overview of graphene research.There are many ways to extract keywords from corpus (Berry and Kogan,2010). The most straightforward option for us is to count all words in each clus-ter and pick most frequent words as keywords. However this naive methodmay take general words like “research”, “study”, “found”, “result” or even“graphene” which are extensively used in most papers. Such words are notinformative as keywords for graphene research topics. Therefore, we use statis-tical significance of word occurrence to measure the correlation between wordsand clusters. The words with strong correlation are the keywords for that clus-ter.To measure the correlation between word w j and cluster I k (with N k papers),we first count the number of papers in I k that contain w j in their titles andabstracts, refer to it as µ w j ,k . We also measure frequency of w j in the wholedata collection and denote it as P w j . By assuming the distribution of µ w j ,k follows a binomial function, the expected mean and variance of the number ofrecords in I k that contain w j are: µ w j ,k = N k · P w j , (5) σ w j ,k = N k · P w j · (1 − P w j ) . (6)7hen, the z-score for word w j can be defined as: z w j = µ w j ,k − µ w j ,k σ w j ,k . (7)Apparently, a word that appears more often in a cluster rather than any otherswill have a high z-score in that cluster. On the other hand, words with highz-score do not necessarily appear often, as long as its frequency is higher thanexpected value. The top z-score words should be able to illustrate the topic ofeach community and we use them as keywords for each cluster. As a high competitive field, many regions keep a important position forgraphene when they make their research programme. For example, EuropeanCommission fund Graphene Flagship with e £
38 million (homepage,2021b) and South Korea government announced the “Technology Roadmap forPromoting Commercialization of graphene (2015–2020)” in 2015 (). Since re-gions have different research traditions, resource and targets, it is not surprisethat they may has different aims and preferences in graphene research. To exam-ine this hypothesis, we counted authors’ affiliations of each paper and calculatedregional credit accordingly. More specifically, if all authors of paper P only haveaffiliations in region R , then R has full credit for this paper P , that is 1. On theother hand, if paper P is result of international collaboration, we first split thecredit among all authors evenly. Then, for each author, we split his/her creditamong all his/her affiliations uniformly. Finally the credits associated with eachaffiliation are added together to get regional credits to that paper. They can befractions for regions. For instance, paper P has three authors A , A and A . A has two affiliations, one in country C and the other in country C . A hasonly one affiliation in country C . A has two affiliations both in C . Then C has credit , C has credit = + and C has credit .8 .5. T/A dependency After graphene research literature be divided into theoretical and appliedbranches, a question emerges naturally: how did theoretical branch and appliedbranch influence each other and shape current graphene research? Althoughthe existence of interplay between them is apparent, the extend and strength ofinteraction are far from obvious. Generally, theoretical research provides guide-line for applied research and applied research contributes proving ground fortheoretical research . Meanwhile, theoretical research inspires followed discus-sion of theoretical questions and applied research stimulate subsequent appliedstudy for the common interest. Given those complicated interactions, a quanti-tative method is needed to measure the influence between theoretical researchand applied research. Only then we can answer that question from our dataset.In this study, we use citation to capture influence: paper A cites only the-oretical papers means only theoretical research influence A, paper B cites onlyapplied papers means only applied research influence B, paper C cites theoret-ical and applied papers means both theoretical and applied research influenceC. At first glance, it is tempting to simply use proportion of reference to mea-sure influence: if paper D cites 3 theoretical papers and 7 applied papers, thentheoretical research has 30% influence and applied research has 70% influenceover paper D. However, this method oversimplifies the process of knowledgeaccumulation: the 3 theoretical papers in D’s references may cite some appliedpapers and the 7 applied papers in D’s references may heavily depend on earlytheoretical works. Just counting references will miss this information, therefore,can not accurately reflect the interaction between T and A.Inspired by persistent influence in della Briotta Parolo et al. (2020), weintroduce a dependency factor pair ( D nT , D nA ) of paper n to describe its depen-dency on theoretical research ( D nT ) and applied research ( D nA ). A paper witha pair (0 . , .
3) means it receives 70% influence from theoretical research and30% influence from applied research. ( D nT , D nA ) satisfies following conditions:0 ≤ D T ≤
1, 0 ≤ D A ≤
1, and D T + D A = 1 for obvious reason. To avoid theoversimplification mentioned in previous paragraph, we split whole dependency9nto two parts: direct dependency ( d T/A ) and indirect dependency ( i T/A ). Thedirect dependency describes paper’s reference proportion of T and A, while theindirect dependency capture those references’ dependency. The direct depen-dency is very straightforward: for any paper, we only need to check its reference:if x % of them are theoretical papers, then d T = x % , d A = 1 − x %. We use in-direct dependency to reflect explicit influence: paper A may only cites appliedpapers, however those applied papers cite many theoretical papers, thereforepaper A benefits from theoretical research indirectly and this influence is cap-tured by i T/A . For paper n , its indirect dependency is defined as the average ofall its references’ dependency factor, that is to say, i nT = 1 (cid:107) ref s of n (cid:107) (cid:88) m ∈ refs of n D mT , (8a) i nA = 1 (cid:107) ref s of n (cid:107) (cid:88) m ∈ refs of n D mA . (8b)By combining the direct and indirect dependency, our definition can reflectpaper’s reliance on theoretical and applied research more accurately and com-prehensively.After obtaining ( d nT , d nA ) and ( i nT , i nA ), the direct and indirect dependencyof a paper n on T and A respectively, the overall dependency on T and A, D nT and D nA can be expressed in the following way: D nT = r · d nT + (1 − r ) · i nT , (9a) D nA = r · d nA + (1 − r ) · i nA , (9b)where r is the control parameter determining the mix ratio of direct and in-direct dependency with constrain 0 ≤ r ≤
1. By setting r = 1, dependencyfactor will reduce to only direct dependency, that is to say dependency factorwill only reflect paper’s reference proportion in T/A. On the other hand, depen-dency factor will reduce to only indirect dependency with r = 0 and referenceproportion in T/A will not have explicit effect. These extreme cases are theoversimplification problem we have discussed in previous paragraphs. Throughintroducing r , we are able to get over these oversimplification without loss of10 igure 1: Illustration of dependency factor in citation network. In all panels, we plot thesame citation network where nodes are papers and each edge represents a citation. The arrowcorresponds the flow of influence, i.e., from a cited paper to a citing paper. The square nodesare theoretical papers and the circle nodes are applied papers. All nodes are colored accordingto their D T values, which are calculated using Eq. 9. Root papers are left blank as their D T are undefined. The control parameter r is set as 0 in (a), 0.5 in (b) and 1 in (c). generality. The effect of r is shown in Fig. 1. We discussed the numerical effectof r is in SI and its value is set to 0 .
3. Results and discussion
We start our study by identifying theoretical and applied papers in ourdata collection. Given the complexity of research, this dichotomy may miss alittle information since some papers are hard to be categorized. However, thisclassification is well accepted by graphene research community and our resultsare in good agreement with this convention. Dividing graphene literature intotwo branches can capture the most significant heterogeneity inside graphene11esearch. Therefore, we use theoretical/applied dichotomy through this paperand call them T and A for short.It is well known that word frequency changes significantly from field to field.We assume theoretical graphene papers tend to use a set of words frequentlywhile applied graphene papers tend to use another set of words frequently andthese two sets of words are distinct. Based on this assumption, we first countedall words in titles and abstracts of 115,988 papers through standard naturallanguage processing techniques (tokenization, stopword filtering, stemming andso on). The words with frequency larger than 0.01% are selected as featurewords and we have 12328 of them. Every article is represented by a 12328-dimensions vector ( a w , a w , . . . , a w ) and a wj is the count of word w j in thatarticle. Combining all paper vectors together, we constructed a 115988 × k firstto find best partition with this k . Normally, the number of clusters is unknownat the beginning. The common protocol is to repeat this process with different k and choosing k with highest modularity as the result. We run CoClus algorithmwith 2 ≤ k ≤ k = 4 and k = 6,see supplementary information for more details.However, the cluster structure in paper-word network is quite fuzzy. Moreeffort is needed to get the reasonable partition. By comparing best partitionsunder different k , we noticed they share a common feature: one group repeat-edly occurred in most partitions while other groups are not stable. It suggeststhat there exist a distinct boundary between that group and others while otherdetected structures are more or less the “overfitting”. To show the stability ofthis ”hyperstructure”, we compare the partitions with two highest modularity,namely, k = 4 and k = 6. In case of k = 4, We named the stable group as Iand other groups as II, III, IV for convenience. For the same reason, we namedthe stable group as 1 and other groups as 2, 3, 4, 5, 6 in case of k = 6. Sincethe sum of group II, III and IV is the complement of group I, we call it groupI’ and also call the sum of group 2, 3, 4, 5, 6 as group 1’. We find group I has123323 papers, group 1 has 42798 papers and they have 40406 papers in common;group I’ has 72665 papers, group 1’ has 73190 papers and they have 70293 incommon. The situations are very similar for other k values. Therefore, to avoidthe “overfitting”, we use the “hyperstructure” with k = 4, namely group I andI’ as the partition result.Using the keyword extraction method, we found the stable group is abouttheoretical research and we call it T. Other three groups under k = 4 are (1)synthesis and functionalization, (2) supercapacitor, and (3) sensor. That alsoexplain the fuzzy boundary between them since they are closer to each otherthan they are from theoretical research. Since they are application-orientedresearch comparing with T, we merged them into one cluster as A. There are72665 papers, 8162 words in group A and 43323 papers, 4166 words in groupT. We visualized the partition in Fig. 2: paper vectors in T are colored blueand paper vectors in A are colored red. We also sorted columns to make first4166 columns represent words in group T and remaining 8162 columns representwords in group A. As illustrated by Fig. 2, it is apparent that first 4166 wordsare used more frequently in group T and other words are used more frequentlyin group A since these areas are darker than adjacent blocks. And this is exactlythe aim of co-clustering: high in-cluster density and low cross-cluster density.It is worth noting that there are three subclusters inside A (these blocks aredarker than other red area). They are more specific topics (1) synthesis andfunctionalization, (2) supercapacitor, and (3) sensor, respectively.To validate the clustering result and have an intuitive picture of researchtopic, we built a keyword list for each cluster. To avoid statistical fluctuations,we only considered the words those are in top 2% sorted by frequency. Theresults are not sensitive to the quantile we chosen here as long as we droppedrare words (large fluctuations). These keywords are ranked by their z-scores andtop 25 keywords for T and A are shown in Fig. 3. The lists are very informa-tive: list A covers hot terms in applied graphene research, like electrochemical,cycles, supercapacitors, batteries, electrode, lithium and so on; list T covers keyconcepts in theoretical graphene research, such as Dirac, gap, spin, calculations,13 igure 2: Heatmap for word distribution in our data collection. There are 115988 rows and12328 columns. Each row represents a paper and each column represent a word. A filledblock means that word appears in corresponding paper, otherwise we leave it empty. For thepurpose of comparison, theoretical papers are colored blue and applied papers are colored red. band, point, states and so on. Therefore, we can conclude confidentially thatco-clustering method successfully divide graphene research literature into theo-retical branch and applied branch. We also give two more extensive word cloudsfor T and A in SI.Based on the co-clustering result, we first analyze the proportion of T andA. Among all graphene research papers, 62.6% belong to group A and 37.4%belong to group T. It suggests that graphene research attract attention fromboth theoretical and applied perspectives and both have made indispensablecontribution to this field. Furthermore, this ratio is not constant over time. Asshown in Fig. 4, the proportion of T increased from 70% to around 90% during2004-2007, then gradually decreased to lower than 30% in 2017. (Our tempo-ral analysis focus on papers since 2004 because that is the year graphene gotglobal attention and papers about graphene before 2004 are rare, only about0.5% in our dataset.) These curves indicate that at the early stage of grapheneresearch, theoretical branch played a dominant role and gained even more popu-14 igure 3: Top keywords by z-score in group A and T. Each bar’s length is proportional to theword frequency in that group and color is based on z-score. larity until 2007. After that applied branch grown relatively faster and becamethe majority after 2012 and this trend kept until 2017. The reason behinds thisprocess is not clear. Our speculation is that after Novoselov et al. (2004) sem-inal paper, graphene research attract a great deal of attention. At that time,this area was still in it’s infancy and many theoretical questions remained tobe answered, while preparation of graphene was difficult and expensive, appliedresearch is only limited to few labs. So researchers published more theoreticalpapers than applied papers. As time elapsed, people gained more understand-ing of graphene, low-hanging fruit has been picked and theoretical questionsbecame more difficult and time-consuming. On the contrary, technical advance-ment make preparation of graphene easier. It lowered research barrier andallowed more scientists joined the applied research. Also, more understandingof graphene inspires more application scenarios, which motivate more appliedresearch. All of these factors together shape the curves in Fig. 4. Althoughthis hypothesis remains to be validated, our finding that proportion of groupA has increased steadily since 2007 provide a big picture of graphene researchecosystem for researchers, companies and funding agencies.15 igure 4: The yearly percentage of T and A from 2004 to 2017. Beside temporal evolution, we also studied geographic distribution of grapheneresearch. As a highly actively field with enormous economic potential, it attractsscientists and engineers all over the world. Affected by tradition, manpower andfunding policy, regions may have different aims and preferences in graphene re-search. To validate this conjecture, we calculated regional credit using methodin Sec. 2.4. By summing all papers’ credit distribution, we are able to mea-sure each region’s contribution in the whole field. As shown in Fig. 5, MainlandChina is the topmost player in both theoretical and applied research, with 22.4%share in T and 51.8% share in A. That means Mainland China’s contributionin applied research is even more than the sum of all other regions. The UnitedStates is also a big player with 18.9% share in T and 7.5% share in A. In contrastto Mainland China, US has more share in T than A. It suggests that MainlandChina is more focus on applied graphene research while The United States takesa more balanced position. Furthermore, the composition of Fig. 5 (a) and (b)reveal a subtle difference between theoretical and applied research: there are 13regions with at least 2% contribution in T while only 7 such regions in A. Itsuggests that applied research is less geographically diverse than theoretical re-search. The reason behinds is complicated, may due to economic factor, funding16 igure 5: The top regions of contribution in T and A. Only regions with at least 2% areshown. See Appendix A for region codes. policy, research culture and so on. We leave it for future research.So far we have studied the graphene research temporarily and geographically.If this two dimensions are combined together, we are able to analyze evolutionof graphene research in particular regions. More concretely, for a given regionin a given year we can calculate its credit in T ( C T ) and A ( C A ) by method inSec. 2.4 using publications only in that year. Then the yearly proportion of Tand A of that region in that year can be calculated as C T C T + C A and C A C T + C A . Andwe plot yearly proportion of T/A for six regions in Fig. 6. These six regions haveat least 2% share in both theoretical and applied research (see Fig. 5), therefore,are considered as top players in graphene research. The curves in Fig. 6 can bethought of as regional version or components of Fig. 4 and it shows that all sixregions follow the same general trend in Fig. 4: gradually decrease of theoreticalresearch and increase of applied research. However, that shift occurred slower inThe United States than in other regions. It suggests that research communityin The United States still put considerable resource in theoretical research. Thisfinding is very important to understand graphene research competition amongregions and make funding policy.Like most, if not all, science research fields, both theoretical and applied17 igure 6: The yearly proportion of T and A in Mainland China, The United States, SouthKorea, India, Iran and Singapore from 2004 to 2017. Curves break when that region has notpaper that year. branches are indispensable parts of graphene research. Each of them has its ownmission and focus. At the same time, they both rely on the communication witheach other: theoretical research gets feedback from applied research, applied re-search receives guideline from theoretical research. This inside coevolution isextremely important for any science research fields. If this coevolution mech-anism does not work well, that research field will experience certain difficultyto move forward, like Aristotelian Physics or science in ancient non-westerncivilizations. Given such importance, we quantified the inside coevolution ingraphene research in terms of interplay between T and A. The dependency fac-tors were computed for all papers, then the average values were calculated forpapers in group T and A respectively. As shown in Table 1, both T and A relymore on itself than others. However, the difference is obvious: T is criticallydepend on T (90%) while A is relatively rely on A (69%). This result illustratesa remarkable difference between T and A: theoretical research is mostly drivenby other theoretical research, on the other hand applied research pays fairly highattention to theoretical research. On possible interpretation of this differenceis that: theoretical graphene research have achieved significant progress and its18 able 1: The average dependency of T and A groups on T and A respectively. Group Dependency Theoretical (T) Applied (A)Theoretical (T) 0.90 0.31Applied (A) 0.10 0.69current effort is beyond existing technology, while applied research benefits alot from theoretical research and as the result A cites T, directly and indirectly,a lot. More evidences are need to verify this explanation, however, given thefact that in history of science theory is ahead of application at most of time ,we believe it is a plausible explanation.The numbers in Table 1 is aggregated results of papers published in all years.Even though they provides many insights for us, the temporal information islost. To get a better understanding of coevolution process, we calculated theaverage values of T/A dependency for T/A papers in each year and plot themin Fig. 7. As shown in that figure, dependency curves behavior significantlydifferent between theoretical and applied group: the curves in T is very stable,while the curves in A underwent dramatic changes during that time. From2004 to 2006, A’s dependency on T increased from 0.3 to 0.7 and graduallydecreased afterward. That is to say applied graphene research is mainly drivenby theoretical research at early stage and become more motivated by it self astime goes on. This result suggests that the inside coevolution is not a “staticequilibrium”, but rather a “dynamic process”. Not only proportions of T andA change with time (Fig. 4), the interaction between them also changes. Morework is needed to fully understand the reason behind these changes, and ourfinding can serve as the basis for further research.We have already shown the geographically difference in Fig. 6. Does sucheffect also happen in T/A dependency? To answer this questions, we groupedall papers according to authors’ affiliation region, and calculated their T/A de-pendency of T/A. We only consider papers with all authors in the same region19 igure 7: The yearly average dependency of T (top panel) and A (down panel) on T (blue)and A (red) respectively. since it is not clear to attribute dependency to regions with international collab-orated papers. However, we found that single region papers are very similar tointernational collaborated papers in terms of dependency evolution. Therefore,the result here will not change much after international collaborated papers areincluded. Please see SI for more details. The results of topmost six regionsare shown in Fig. 8. Unlike Fig. 6, all regions in Fig. 8 show roughly the samebehavior, even for Mainland China and The United States. This suggests thatthe trend we found in Fig. 7 is a universal phenomenon for graphene researchand is insensitive to geographic factor.
4. Conclusion
Discipline-level analysis give many insights of scientific enterprise, also offerimportant reference for practical purpose, like career decision, hiring decision,funding policy and so on. Some important works have been done with SciSciparadigm, for example the evolution of sustainability science, the structure andevolution of physics, and the development of artificial intelligence (Bettencourtand Kaur, 2011; Sinatra et al., 2015; Frank et al., 2019).20 igure 8: The yearly average dependency of T (square) and A (circle) on T (blue) and A (red)in Mainland China, The United States, South Korea, India, Iran and Singapore from 2004 to2017. Curves break when that region has not T/A paper that year.
In this study, we complemented previous studies by investigating evolutionof graphene research in terms of its main components: theoretical branch andapplied branch. Using the co-clustering method we divided the graphene pub-lication collection into two groups. By extracting each group’s keywords, weconfirmed that one group is about theoretical research and the other does ap-plied research. Overall, 37.4% of papers belong to T and remaining 62.6% aremembers of A. However, these ratios are not constants over time: the propor-tion of T increased to around 90% in 2007 and gradually decreased afterward,while the proportion of A did just opposite. It suggests that applied researchgrown faster than theoretical research and the attention of graphene researchcommunity shift gradually from theory to application. By analyzing authors’affiliations, we computed region credit for every paper and all regions total con-tribution. The distribution of contribution is very different in T and A: manyregions made significant contribution in T while Mainland China is dominantin A. And the evolution curves also show significant difference among regions.Using the dependency factor we invented, the reliance between theoretical re-21earch and applied research is quantified. We found that such dependency isasymmetric: theoretical research is extremely influenced by itself while appliedresearch benefits from T and A in a more balanced way. Such dependencyrelation is very stable for theoretical research while changed significantly forapplied research. And we found this phenomenon is insensitive to geographicfactor, which suggests it is a universal process.Although many interest findings were observed, several important questionsremain to be answer. For instance, graphene papers were classified either astheoretical research or applied research in this study. However, this dichotomymay fail for some papers since collaboration between theorists and experimen-tists become common nowadays and it is inaccurate to put those paper eitherin T or A. In other words, an overlapping-clusters picture of graphene researchmay be a better description of reality . Future work should take overlappinginto account given the importance of collaboration in modern scientific enter-prise. In this study all graphene papers are treated with equal importance.This choice simplifies our analysis, but introduces deviation from reality: a fewpapers receive most attention. It would be beneficial to incorporate this factinto our framework and new results may give us a better picture of grapheneresearch.
5. Acknowledgements
This research is supported by the Singapore Ministry of Education AcademicResearch Fund, under the grant number MOE2017-T2-2-075.
6. Author Contributions
Conceived and designed the analysis: Wenyuan Liu.Collected the data: Ai Linh Nguyen.Performed the analysis: Ai Linh Nguyen.Wrote the paper: Ai Linh Nguyen, Wenyuan Liu and Siew Ann Cheong22 . Appendix A. Region code
CN: Mainland China. US: The United States. JP: Japan. DE: Germany.KR: South Korea. IR: Iran. IN: India. RU: Russia. GB: The United Kingdom.FR: France. ES: Spain. IT: Italy. SG: Singapore. TW: Taiwan.
References
Melissa Ailem, Fran¸cois Role, and Mohamed Nadif. Co-clustering document-term matrices by direct maximization of graph modularity. In
Proceedings ofthe 24th ACM International on Conference on Information and KnowledgeManagement , pages 1807–1810, Melbourne Australia, October 2015. ACM.ISBN 9781450337946. doi: 10.1145/2806416.2806639. URL https://dl.acm.org/doi/10.1145/2806416.2806639 .Melissa Ailem, Fran¸cois Role, and Mohamed Nadif. Graph modularity max-imization as an effective method for co-clustering text data.
Knowledge-Based Systems , 109:160–173, October 2016. ISSN 09507051. doi: 10.1016/j.knosys.2016.07.002. URL https://linkinghub.elsevier.com/retrieve/pii/S0950705116302064 .Andreas Barth and Werner Marx. Graphene - A rising star in view of sciento-metrics. arXiv:0808.3320 [cond-mat, physics:physics] , September 2008. URL http://arxiv.org/abs/0808.3320 . arXiv: 0808.3320.Michael W Berry and Jacob Kogan.
Text mining applications and theory .2010. ISBN 9780470689653 9780470749821 9780470689646. URL https://nbn-resolving.org/urn:nbn:de:101:1-201501026976 . OCLC: 719451203.L. M. A. Bettencourt and J. Kaur. Evolution and structure of sustainability sci-ence.
Proceedings of the National Academy of Sciences , 108(49):19540–19545,December 2011. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1102712108.URL .23ietro della Briotta Parolo, Rainer Kujala, Kimmo Kaski, and Mikko Kivel¨a.Tracking the cumulative knowledge spreading in a comprehensive citationnetwork.
Physical Review Research , 2(1):013181, February 2020. ISSN 2643-1564. doi: 10.1103/PhysRevResearch.2.013181. URL https://link.aps.org/doi/10.1103/PhysRevResearch.2.013181 .Santo Fortunato, Carl T. Bergstrom, Katy B¨orner, James A. Evans, Dirk Hel-bing, Staˇsa Milojevi´c, Alexander M. Petersen, Filippo Radicchi, RobertaSinatra, Brian Uzzi, Alessandro Vespignani, Ludo Waltman, Dashun Wang,and Albert-L´aszl´o Barab´asi. Science of science.
Science , 359(6379):eaao0185,March 2018. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aao0185. URL .Morgan R. Frank, Dashun Wang, Manuel Cebrian, and Iyad Rahwan.The evolution of citation graphs in artificial intelligence research.
Na-ture Machine Intelligence , 1(2):79–85, February 2019. ISSN 2522-5839.doi: 10.1038/s42256-019-0024-5. URL .Graphene Flagship homepage, 2021a. URL https://graphene-flagship.eu/ .National Graphene Institute homepage, 2021b. URL .Peng Hui Lv, Gui-Fang Wang, Yong Wan, Jia Liu, Qing Liu, and Fei-cheng Ma.Bibliometric trend analysis on global graphene research.
Scientometrics , 88(2):399–419, August 2011. ISSN 1588-2861. doi: 10.1007/s11192-011-0386-x.URL https://doi.org/10.1007/s11192-011-0386-x .Ai Linh Nguyen, Wenyuan Liu, Khiam Aik Khor, Andrea Nanetti, and Siew AnnCheong. The golden eras of graphene science and technology: Biblio-graphic evidences from journal and patent publications.
Journal of Infor-metrics , 14(4):101067, November 2020. ISSN 17511577. doi: 10.1016/j.joi.2020.101067. URL https://linkinghub.elsevier.com/retrieve/pii/S1751157719303542 . 24. S. Novoselov, A. K. Geim, S. V. Morozov, D. Jiang, Y. Zhang, S. V. Dubonos,I. V. Grigorieva, and A. A. Firsov. Electric field effect in atomically thincarbon films.
Science , 306(5696):666–669, October 2004. ISSN 0036-8075,1095-9203. doi: 10.1126/science.1102896. URL .A. M. Petersen, M. Riccaboni, H. E. Stanley, and F. Pammolli. Persistence anduncertainty in the academic career.
Proceedings of the National Academy ofSciences , 109(14):5213–5218, April 2012. ISSN 0027-8424, 1091-6490. doi:10.1073/pnas.1121429109. URL .F. Radicchi, S. Fortunato, and C. Castellano. Universality of citation dis-tributions: Toward an objective measure of scientific impact.
Proceed-ings of the National Academy of Sciences , 105(45):17268–17272, November2008. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.0806977105. URL .Fran¸cois Role, Stanislas Morbieu, and Mohamed Nadif. coclust : a python package for co-clustering.
Journal of Statistical Software , 88(7), 2019. ISSN1548-7660. doi: 10.18637/jss.v088.i07. URL .Roberta Sinatra, Pierre Deville, Michael Szell, Dashun Wang, and Albert-L´aszl´oBarab´asi. A century of physics.
Nature Physics , 11(10):791–796, October2015. ISSN 1745-2473, 1745-2481. doi: 10.1038/nphys3494. URL .Dashun Wang and Albert-L´aszl´o Barab´asi.
The science of science .Cambridge University Press, Cambridge, 2021. ISBN 9781108492669.URL .S. Wuchty, B. F. Jones, and B. Uzzi. The increasing dominance of teams inproduction of knowledge.
Science , 316(5827):1036–1039, May 2007. ISSN25036-8075, 1095-9203. doi: 10.1126/science.1136099. URL .An Zeng, Zhesi Shen, Jianlin Zhou, Jinshan Wu, Ying Fan, Yougui Wang, andH. Eugene Stanley. The science of science: From the perspective of complexsystems.
Physics Reports , 714-715:1–73, November 2017. ISSN 03701573.doi: 10.1016/j.physrep.2017.10.001. URL https://linkinghub.elsevier.com/retrieve/pii/S0370157317303289https://linkinghub.elsevier.com/retrieve/pii/S0370157317303289