Featured Researches

Digital Libraries

Collecting 16K archived web pages from 17 public web archives

We document the creation of a data set of 16,627 archived web pages, or mementos, of 3,698 unique live web URIs (Uniform Resource Identifiers) from 17 public web archives. We used four different methods to collect the dataset. First, we used the Los Alamos National Laboratory (LANL) Memento Aggregator to collect mementos of an initial set of URIs obtained from four sources: (a) the Moz Top 500, (b) the dataset used in our previous study, (c) the HTTP Archive, and (d) the Web Archives for Historical Research group. Second, we extracted URIs from the HTML of already collected mementos. These URIs were then used to look up mementos in LANL's aggregator. Third, we downloaded web archives' published lists of URIs of both original pages and their associated mementos. Fourth, we collected more mementos from archives that support the Memento protocol by requesting TimeMaps directly from archives, not through the Memento aggregator. Finally, we downsampled the collected mementos to 16,627 due to our constraints of a maximum of 1,600 mementos per archive and being able to download all mementos from each archive in less than 40 hours.

Read more
Digital Libraries

Collective authorship in Ukrainian science: marginal effect or new phenomenon?

One of the features of modern science is the formation of stable large collaborations of researchers working together within the projects that require the concentration of huge financial and human resources. Results of such common work are published in scientific papers by large co-authorship teams that include sometimes thousands of names. The goal of this work is to study the influence of such publications on the values of scientometric indicators calculated for individuals, research groups and science of Ukraine in general. Bibliometric data related to Ukraine, some academic institutions and selected individual researchers were collected from Scopus database and used for our study. It is demonstrated that while the relative share of publications by collective authors is comparatively small, their presence in a general pool can lead to statistically significant effects. The obtained results clearly show that traditional quantitative approaches for research assessment should be changed in order to take into account this phenomenon. Keywords: collective authorship, scientometrics, group science, Ukraine.

Read more
Digital Libraries

Comment on `Open is not forever: a study of vanished open access journals'

This is a comment to an article by Laakso, Matthias and Jahn (arXiv:2008.11933).

Read more
Digital Libraries

Common Library 1.0: A Corpus of Victorian Novels Reflecting the Population in Terms of Publication Year and Author Gender

Research in 19th-century book history, sociology of literature, and quantitative literary history is blocked by the absence of a collection of novels which captures the diversity of literary production. We introduce a corpus of 75 Victorian novels sampled from a 15,322-record bibliography of novels published between 1837 and 1901 in the British Isles. This corpus, the Common Library, is distinctive in the following way: the shares of novels in the corpus associated with sociologically important subgroups match the shares in the broader population. For example, the proportion of novels written by women in 1880s in the corpus is approximately the same as in the population. Although we do not, in this particular paper, claim that the corpus is a representative sample in the familiar sense--a sample is representative if "characteristics of interest in the population can be estimated from the sample with a known degree of accuracy" (Lohr 2010, p. 3)--we are confident that the corpus will be useful to researchers. This is because existing corpora--frequently convenience samples--are conspicuously misaligned with the population of published novels. They tend to over-represent novels published in specific periods and novels by men. The Common Library may be used alongside or in place of these non-representative convenience corpora.

Read more
Digital Libraries

Communities of attention networks: introducing qualitative and conversational perspectives for altmetrics

We propose to analyze the level of recommendation and spreading in the sharing of scientific papers on Twitter to understand the interactions of communities around papers and to develop the "Community of Attention Network" (CAN). In this paper, a pilot case study was conducted for the paper 'Pharmacological Treatment of Obesity' authored by Mancini & Halpern (2002), an extensive review of the criteria for evaluating the efficacy of anti-obesity treatments and derived pharmacological agents. The altmetric data was collected from this http URL and the description information for each tweeter was extracted from their Twitter profiles. The data were analyzed with Microanalysis Of Online Data perspective to investigate the formation of a CAN around this focal paper and the context of its formation. The studied article received 736 tweets from 134 different users with a combined exposure of more than 459,018 followers and a high level of spreading (67.26%) and recommendation (28.53%). The user's bios information analysis of who shares the article indicate individual profiles focused on personal issues and strong civic and political engagement. Personal-professional and institutional tweeters of the national political scene are often mentioned in the tweets. In analyzing the content of the tweets, we note that the altmetric score of the paper is a result of its strategic use as an online activism resource and a digital advocacy tool used to mobilize stakeholders for awareness and support activities. This study and the contextual and network perspective it introduces may help to understand the social impact of publications by using altmetrics.

Read more
Digital Libraries

Community Detection and Growth Potential Prediction Using the Stochastic Block Model and the Long Short-term Memory from Patent Citation Networks

Scoring patent documents is very useful for technology management. However, conventional methods are based on static models and, thus, do not reflect the growth potential of the technology cluster of the patent. Because even if the cluster of a patent has no hope of growing, we recognize the patent is important if PageRank or other ranking score is high. Therefore, there arises a necessity of developing citation network clustering and prediction of future citations. In our research, clustering of patent citation networks by Stochastic Block Model was done with the aim of enabling corporate managers and investors to evaluate the scale and life cycle of technology. As a result, we confirmed nested SBM is appropriate for graph clustering of patent citation networks. Also, a high MAPE value was obtained and the direction accuracy achieved a value greater than 50% when predicting growth potential for each cluster by using LSTM.

Read more
Digital Libraries

Como Mensurar a Importância, Influência e a Relevância de Usuários do Twitter? Uma análise da interação dos candidatos à presidência do Brasil nas eleições de 2018

In the contemporary world, a significant number of people use social networking services for a variety of purposes, including, but not limited to, communicating, exchanging messages and searching for information. A popular social network in the political arena is Twitter, a microblogging service for posting messages of up to 280 characters, called "tweets," where influential politicians from various countries often use this medium to spread ideas and make public statements. In this work, an analysis was made of the connections of candidates for the presidency of the Republic of Brazil in the year 2018. Using the analysis of complex networks to measure influence and relevance, a metric was established able to quantify the importance of users in the network. As part of the analysis, a Memory Algorithm was used to detect communities, groups of strongly connected vertices (tweets) evidencing groupings of users.

Read more
Digital Libraries

Comparing like with like: China ranks first in SCI-indexed research articles since 2018

China's rising in scientific research output is impressive. The academic community is curious about the time when the cross-over in the number of annual scientific publication production between China and the USA can happen. By using Web of Science Core Collection's Science Citation Index Expanded database, this study finds that China still ranks the second in the production of SCI-indexed publications in 2019 but may leapfrog the USA to be the first in 2020 or 2021, if all document types are considered. Comparatively, China has already overtaken the USA and been the largest SCI-indexed original research article producer since 2018. However, China still lags behind the USA regarding the number of review paper production. In general, quantitative advantage does not equal quality or impact advantage. We think that the USA will continue to be the global scientific leader for a long time.

Read more
Digital Libraries

Comparing the impact of subfields in scientific journals

The impact factor has been extensively used in the last years to assess journals visibility and prestige. While the impact factor is useful to compare journals, the specificities of subfields visibility in journals are overlooked whenever visibility is measured only at the journal level. In this paper, we analyze the subfields visibility in a subset of over 450,000 Physics papers. We show that the visibility of subfields is not regular in the considered dataset. In particular years, the variability in subfields impact factor in a journal reached 75% of the average subfields impact factor. We also found that the difference of subfields visibility in the same journal can be even higher than the difference of visibility between different journals. Our results show that subfields impact is an important factor accounting for journals visibility.

Read more
Digital Libraries

Comparison of Citations Trends between the COVID-19 Pandemic and SARS-CoV, MERS-CoV, Ebola, Zika, Avian and Swine Influenza Epidemics

Objective: The novel coronavirus COVID-19 outbreak rapidly evolved into pandemic. Global research efforts focus on this topic and with the collaboration of the scientific journals publication industry produced more than 16,000 related published articles in PubMed within five months from the onset of the outbreak. Herein, a comparison of the COVID-19 citations in PubMed and Web of Science was performed with SARS-CoV, MERS-CoV, Ebola, Zika, avian and swine influenza epidemics. Methods: The citations were searched and collected using the disease terms and the date of publication restriction. The total number of PubMed citations and the HIV associated papers during the same chronological periods were examined in parallel. The journal category and country information of the publications were gathered from Web of Science. The collected data were statistically analyzed and compared. Results: Significant correlations were found between COVID-19 and MERS (CC=0.988; p=0.003; q=0.006), Ebola (CC=0.987; p=0.003; q=0.011), and SARS (CC=0.964; p=0.015; q=0.028) epidemics five-month pick of novel citations in PubMed. However, COVID-19 publications were accumulated earlier and in larger numbers than any other 21st century major communicable disease outbreak. Conclusion: The acceleration and the total number of COVID-19 publications represent an unprecedented landmark event in the medical library history. The immediate adoption of the fast-track peer-reviewing and publishing as well as the open access publication policies by the journal publishers are significant contributors to this bibliographic phenomenon.

Read more

Ready to get started?

Join us today