Report on the 7th International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2018)
EECIR WORKSHOP REPORT
Report on the 7th International Workshop onBibliometric-enhanced Information Retrieval (BIR 2018)
Philipp MayrGESIS – Leibniz Institute for the Social Sciences, Germany [email protected]
Ingo FrommholzInstitute for Research in Applicable ComputingUniversity of Bedfordshire, Luton, UK [email protected]
Guillaume CabanacUniversity of Toulouse, Computer Science DepartmentIRIT UMR 5505, France [email protected]
Abstract
The Bibliometric-enhanced Information Retrieval (BIR) workshop series has started atECIR in 2014 and serves as the annual gathering of IR researchers who address variousinformation-related tasks on scientific corpora and bibliometrics. We welcome contributionselaborating on dedicated IR systems, as well as studies revealing original characteristicson how scientific knowledge is created, communicated, and used. This report presents allaccepted papers at the 7th BIR workshop at ECIR 2018 in Grenoble, France.
The Bibliometric-enhanced Information Retrieval (BIR) workshop series has started at ECIR in2014 [1] and serves as the annual gathering of IR researchers who address various information-related tasks on scientific corpora and bibliometrics [2]. The workshop features original approachesto search, browse, and discover value-added knowledge from scientific documents and relatedinformation networks (e.g., terms, authors, institutions, references).The current incarnation is a continuation of the evolution of our workshop series. The first BIRworkshops set the research agenda by introducing the workshop topics, illustrating state-of-the-art methods, reporting on current research problems, and brainstorming about common interests. a r X i v : . [ c s . I R ] A p r or the fourth workshop, co-located with the ACM/IEEE-CS JCDL 2016, we broadened theworkshop scope and interlinked the BIR workshop with the natural language processing (NLP)and computational linguistics field [3]. This joint activity has been continued in 2017 at SIGIR inthe second BIRNDL workshop [4].This 7th full-day BIR workshop at ECIR 2018 aimed to foster a common ground for the in-corporation of bibliometric-enhanced services (including text mining functionality) into scholarlysearch engine interfaces. In particular we addressed specific communities, as well as studies onlarge, cross-domain collections. This workshop strived to feature contributions from core biblio-metricians and core IR specialists who already operate at the interface between scientometricsand IR. This year’s workshop hosted two keynotes as well as a set of regular papers and two demos . Thepublications are briefly outlined in the following subsections. This workshop featured two inspirational keynotes to kick-start thinking and discussion on theworkshop topic. They were followed by paper presentations and demos (Fig. 1) in a format thatwe found to be successful at previous BIR workshops.Cyril Labb´e tackled a hot topic in his keynote titled “Trends in gaming indicators: On failedattempts at deception and their computerised detection” [5]. He outlined various efforts to manip-ulate indicators by tricking the scientific community (e.g., by submitting automatically generatedpapers). Other issues undermining the trust we place in peer-reviewed science were examined,such as data–results mismatch impeding the reproduction of results in cancer research. Labb´esurveyed his recent work in these areas while reflecting on the potential of B+IR (bibliometricsand information retrieval) to address these critical issues.Ralf Schenkel presented in his keynote “Integrating and exploiting metadata sources in abibliographic information system” [6] an in-depth summary of recent metadata activities in thecomputer science bibliography DBLP, which is maintained by Schloss Dagstuhl and University ofTrier. He outlined procedures for monitoring, selecting, and prioritizing computer science venuesfor inclusion in the DBLP bibliography. A special focus was given to author disambiguation andutilization of citation data.
Sarol, Liu, and Schneider proposed a citation and text-based publication retrieval framework [7].After the user provides some seed articles, the system collects papers connected by citations andapplies a combination of citation- and text-based filtering methods. The framework is evaluatedin a systematic reviewing task. http://bit.ly/bir2018 Workshop proceedings are available at: http://ceur-ws.org/Vol-2080/ igure 1: A sense of the atmosphere at the BIR workshop.Ollagnier, Fournier, and Bellot highlighted the central references of a paper based on themining of its fulltext, quantifying the occurences of all in-text references [8]. They benchmarkedthis approach compared to a system in production at OpenEdition, and discuss the results interms of enhanced relevance.In their article on query expansion, Rattinger, Le Goff, and Guetl combined word embeddingsand co-authorship relations [9]. The set of documents used for pseudo-relevance feedback was en-riched by similar documents from co-authors, applying a locally trained Word2Vec model. Addingsimilar documents from co-authors significantly improved the baseline.Bertin and Atanassova reported on the construction of the InTeReC dataset [10]. Utilisingdifferent section types from PLOS articles, InTeReC consists of within-text references and theirsurrounding sentences. Additionally, verb phrases were extracted, providing an idea of the natureof the reference.Kacem and Mayr investigated the usage and influence of a specific search stratagem – theJournal Run – in an academic search engine log file [11]. They studied the frequency and stage f use of journal run as well as its impact on sessions. The authors found that the frequency ofusage of the analyzed journals is not related to the impact factor within these sessions and thatthe size of the journal (Bradford Zones) has an insignificant correlation. Cataldi, Di Caron, and Schifanella designed the d -index to evaluate the degree of dependence ofa researcher with respect to his/her co-authors over time. They implemented this indicator anddemonstrate it online with DBLP as a bibliographic datasource [12].The demo paper by Bessagnet presented a framework combining thematic, temporal, andspatial features of Twitter tweets in the field of Human and Social Sciences [13]. The authorpromoted 5 W dimensions (who, when, what, where, why) for the analysis of tweets. While the past workshops laid the foundations for further work and also made the benefit ofbringing information retrieval and bibliometrics together more explicit, there are still many chal-lenges ahead. One of them is to provide infrastructures and testbeds for the evaluation of retrievalapproaches that utilise bibliometrics and scientometrics. To this end, a focus of the proposedworkshop and the discussion was on real experimentations (including demos) and industrial par-ticipation. This line was started in a related workshop at JCDL (BIRNDL 2016) and continued atSIGIR (BIRNDL 2017), but with a focus on digital libraries and computational linguistics. Giventhe complex information needs scholars are usually facing, we emphasized on information retrievaland information seeking and searching aspects.In July 2018 we will run the third iteration of the BIRNDL workshop at the 41st SIGIRconference in Ann Arbor, MI, USA. In conjunction with the BIRNDL workshop, the 4th CL-SciSumm Shared Task in Scientific Document Summarization will be hold. In 2015 we published a first special issue on “Combining Bibliometrics and Information Retrieval”in the
Scientometrics journal [2]. A special issue on “Bibliometrics, Information Retrieval andNatural Language Processing in Digital Libraries” will appear in 2018 in the
International Journalon Digital Libraries [14]. Another special issue on “Bibliometric-enhanced Information Retrievaland Scientometrics” is in preparation for the
Scientometrics journal.Since 2016 we maintain the “Bibliometric-enhanced-IR Bibliography” that collects scientificpapers which appear in collaboration with the BIR/BIRNDL organizers. We invite interestedresearchers to join this project and contribute related publications. http://d-index.di.unito.it http://wing.comp.nus.edu.sg/birndl-sigir2018/ http://wing.comp.nus.edu.sg/cl-scisumm2018/ https://github.com/PhilippMayr/Bibliometric-enhanced-IR_Bibliography/ Acknowledgement
We wish to thank all those who have contributed to the workshop proceedings: all the contributingauthors and the many reviewers who generously offered their time and expertise . References [1] Mayr, P., Scharnhorst, A., Larsen, B., Schaer, P., Mutschke, P.: Bibliometric-Enhanced In-formation Retrieval. In: 36th European Conference on IR Research, ECIR 2014, Amsterdam,The Netherlands (2014) 798–801[2] Mayr, P., Scharnhorst, A.: Scientometrics and Information Retrieval: Weak-links revitalized.Scientometrics (3) (2015) 2193–2199[3] Cabanac, G., Chandrasekaran, M.K., Frommholz, I., Jaidka, K., Kan, M.Y., Mayr, P., Wol-fram, D.: Report on the Joint Workshop on Bibliometric-enhanced Information Retrievaland Natural Language Processing for Digital Libraries (BIRNDL 2016). SIGIR Forum (2)(2016) 36–43[4] Mayr, P., Chandrasekaran, M.K., Jaidka, K.: Report on the 2nd Joint Workshop onBibliometric-enhanced Information Retrieval and Natural Language Processing for DigitalLibraries (BIRNDL 2017). SIGIR Forum (3) (2017) 107–113[5] Labb´e, C.: Trends in gaming indicators: On failed attempts at deception and their comput-erised detection. In: Proc. of the Seventh Workshop on Bibliometric-enhanced InformationRetrieval (BIR), Grenoble, France, CEUR-WS.org (2018) 6–15[6] Schenkel, R.: Integrating and exploiting public metadata sources in a bibliographic infor-mation system. In: Proc. of the Seventh Workshop on Bibliometric-enhanced InformationRetrieval (BIR), Grenoble, France, CEUR-WS.org (2018) 16–21[7] Sarol, M.J., Liu, L., Schneider, J.: Testing a citation and text-based framework for retrievingpublications for literature reviews. In: Proc. of the Seventh Workshop on Bibliometric-enhanced Information Retrieval (BIR), Grenoble, France, CEUR-WS.org (2018) 22–33[8] Ollagnier, A., Fournier, S., Bellot, P.: BIBLME RecSys: Harnessing bibliometric measures fora scholarly paper recommender system. In: Proc. of the Seventh Workshop on Bibliometric-enhanced Information Retrieval (BIR), Grenoble, France, CEUR-WS.org (2018) 34–45[9] Rattinger, A., Goff, J.M.L., Guetl, C.: Local word embeddings for query expansion based onco-authorship and citations. In: Proc. of the Seventh Workshop on Bibliometric-enhancedInformation Retrieval (BIR), Grenoble, France, CEUR-WS.org (2018) 46–53 The list of PC members can be found at