CybergeoNetworks, an interactive application for the geographical and semantic analysis of scientific publications
C. Cottineau, J. Raimbault, P.-O. Chasset, H. Commenges, A. Banos, D. Pumain
TTranslated from:
Cl´ementine Cottineau, Juste Raimbault, Pierre-OlivierChasset, Hadrien Commenges, Arnaud Banos, and Denise Pumain (2017).CybergeoNetworks, une application interactive pour l’analyse g´eographiqueet s´emantique des publications scientifiques. In Bouzeghoub M., Mosseri R.(eds.)
Les Big Data `a d´ecouvert , CNRS Editions (EAN: 9782271114648),pp.272-273.
CybergeoNetworks, an interactive application forthe geographical and semantic analysis of scientificpublications
C. Cottineau , , J. Raimbault , , , ∗ , P.-O. Chasset , H. Commenges ,A. Banos and D. Pumain UMR CNRS 8504 G´eographie-cit´es, France UMR CNRS 8097 Centre Maurice Halbwachs, France CASA, UCL, United Kingdom UPS CNRS 3611 ISC-PIF, France LISER, Luxembourg UMR IDEES 6266, France ∗ [email protected] Abstract
The increase in the number of publications has made more difficultfor authors to situate their work within previous literature, especiallyon subjects studied from different disciplinary viewpoints. Besides,new data analysis techniques and new bibliometrics data sources pro-vide an opportunity to map and navigate scientific landscapes. Weintroduce here an open-source and open-access web application de-signed for the multi-dimensional exploration of a journal content, in-cluding the mapping of geographical, semantic and citations networks.The application is profiled and implemented for the geography journalCybergeo, a generalist geography journal which receives contributionsfrom multiple sub-disciplines. We suggest that such initiatives are cru-cial to promote open science and reflexivity. a r X i v : . [ c s . D L ] N ov he exponential growth in the number of published papers and the in-creasing number of journals have led scientific publication into an era of bigdata. The evolutions induced by the use of new information technologiesseem more to have contributed to this massive growth of references ratherthan having simplified editorial processes or facilitated user access to thescientific literature. The difficulties in dealing with these data go far beyondwhat Eugene Garfield had anticipated when creating in 964 the ISI database(Institute for Scientific Information, which later became Web of Science af-ter being bought by Thomson Reuters) (Garfield, 1970). Progressively, thebenefits of “peer” work which is done as volunteers to evaluate the scientificproduction before publication, mostly by providing a contextualisation intoexisting knowledge, has been captured by publishers. These were supposedto guarantee quality but punctured academic libraries and built barriers toknowledge diffusion, despite the public calls for an open access to science.The issue of mastering existing literature remains crucial for any scien-tific who aims at knowing and surveying a “state-of-the-art”. It becomeseven more difficult for subjects situated at the interface of several disciplines,and which risk, if habitual disciplinary “niches” are followed, to be treated inan incomplete way. This may undermine the reaching potential of solutionsthat science can bring to societal problems. Geographers are particularlyaware of such pitfalls since they have for long built a discipline at the bound-aries of natural and social sciences, on settlements and urbanism issues, onthe environment and health, on planning and development (Kosmopoulosand Pumain, 2007). This may not be a coincidence if a tool proposed tohelp improving the exploration of a publication universe is elaborated froma geography journal, Cybergeo.The progresses made in terms of data collection and analysis of big datamake it possible today to navigate networks of scientific publications in anovel way. We designed and implemented an original open-access applica-tion, which enables the exploration of text contents and keywords in morethan 900 published papers for 20 years by Cybergeo. It also includes ref-erences cited in these papers, papers citing these or which cite the samereferences, among all other scientific journals accessible in Google Scholar,what corresponds to a database of around 200,000 papers with their hun-dreds of thousands associated keywords. The objective is to allow web usersto realise themselves on demand maps of geographical and semantic prox-imities between publications and between geographical areas by exploringaround this corpus. 2he application is available at https://analytics.huma-num.fr/geographie-cites/cybergeonetworks/ .A first possibility offered by the application is to realise a diachronic map-ping which represents for a given period the affiliation countries for authorsand the countries studied in the paper. When this information is crossedwith the thematic profile of papers, this reveals a diversity of research inter-ests depending on localisation, and a semantic proximity between countriesstudied with similar terms. The spatial proximity and the semantic prox-imity are in some places coinciding. Thus for example, institutional Europeis identified as a space of string semantic proximity in which the lexicon ofboundaries is over-represented and the lexicon of risk is under-represented(Figure 1).Figure 1: Classification of countries studies in Cybergeo, as a function ofthe thematic profile of terms used in main text.Keywords of papers in the journal and of cited papers allow connectingthe articles using them and to build weighted networks depending on thenumber of these co-occurences (Chavalarias and Cointet, 2013). The struc-ture of these networks provides a fine information on proximities betweenthematics, gathered into “communities”. Such communities are identifiedby colours in Figure 2, and can be explored at different levels of granularityby zooming of the graph of the network in the application.The Figure 2 spatialises this network and shows that disciplines such asphysical geography and economic geography are linked through their com-3igure 2: Keywords network within papers cited in Cybergeo or citing them.mon practice of methods such as spatial analysis and statistics or complexityparadigms.The semantic analysis of paper full texts also allows constructing wordclouds which gathers in synthetic themes. The granularity of theme groupingcan be varied and the frequency of words within a theme can be measured(Figure 3).The utility of this tool for authors is to improve their knowledge ofthemes studied in the journal, and to situate the potential contribution4igure 3: Around ten broad themes obtained through the analysis of Cyber-geo full texts and gathered into word clouds. The colour varies dependingon the number of documents within each theme; the size of words in wordclouds is proportional to the frequency of the word in the cloud theme.5f a new paper in the universe of its references and of neighbour disciplineswhich can be influenced. For the editorial team, it provides an instrument tomanage the journal policy, which can choose to maintain a rather generalisteditorial line or to become more specialised. Finally, the application can beadapted to other online journals.The large scientific publishing companies propose bibliometrics analysistools, allowing researchers and journals to optimise their investment to bethe best positioned on the citation market. We propose here an open andfree tool for self-analysis, conceived to foster the reflexivity and integrity ofresearch. References
Chavalarias, D. and Cointet, J.-P. (2013). Phylomemetic patterns in scienceevolution—the rise and fall of scientific fields.
PloS one , 8(2):e54847.Garfield, E. (1970). Citation indexing for studying science.
Nature ,227(5259):669–671.Kosmopoulos, C. and Pumain, D. (2007). Citation, citation, citation: Bib-liometrics, the web and the social sciences and humanities.