Featured Researches

Digital Libraries

Citing and referencing habits in Medicine and Social Sciences journals in 2019

This article explores citing and referencing systems in Social Sciences and Medicine articles from different theoretical and practical perspectives, considering bibliographic references as a facet of descriptive representation. The analysis of citing and referencing elements (i.e. bibliographic references, mentions, quotations, and respective in-text reference pointers) identified citing and referencing habits within disciplines under consideration and errors occurring over the long term as stated by previous studies now expanded. Future expected trends of information retrieval from bibliographic metadata was gathered by approaching these referencing elements from the FRBR Entities concepts. Reference styles do not fully accomplish with their role of guiding authors and publishers on providing concise and well-structured bibliographic metadata within bibliographic references. Trends on representative description revision suggest a predicted distancing on the ways information is approached by bibliographic references and bibliographic catalogs adopting FRBR concepts, including the description levels adopted by each of them under the perspective of the FRBR Entities concept. This study was based on a subset of Medicine and Social Sciences articles published in 2019 and, therefore, it may not be taken as a final and broad coverage. Future studies expanding these approaches to other disciplines and chronological periods are encouraged. By approaching citing and referencing issues as descriptive representation's facets, findings on this study may encourage further studies that will support Information Science and Computer Science on providing tools to become bibliographic metadata description simpler, better structured and more efficient facing the revision of descriptive representation actually in progress.

Read more
Digital Libraries

Citing is earlier than Cited?

Generally, it is common that cited papers are earlier than citing papers. But we found three different cases, with more undiscovered. In this letter, we attempted to explain the reasons. However, negative time lag between citing and cited papers may mislead us when we study the characteristics of science.

Read more
Digital Libraries

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

In this paper, we show how selecting and combining encodings of natural and mathematical language affect classification and clustering of documents with mathematical content. We demonstrate this by using sets of documents, sections, and abstracts from the arXiv preprint server that are labeled by their subject class (mathematics, computer science, physics, etc.) to compare different encodings of text and formulae and evaluate the performance and runtimes of selected classification and clustering algorithms. Our encodings achieve classification accuracies up to 82.8% and cluster purities up to 69.4% (number of clusters equals number of classes), and 99.9% (unspecified number of clusters) respectively. We observe a relatively low correlation between text and math similarity, which indicates the independence of text and formulae and motivates treating them as separate features of a document. The classification and clustering can be employed, e.g., for document search and recommendation. Furthermore, we show that the computer outperforms a human expert when classifying documents. Finally, we evaluate and discuss multi-label classification and formula semantification.

Read more
Digital Libraries

Classification of abrupt changes along viewing profiles of scientific articles

With the expansion of electronic publishing, a new dynamics of scientific articles dissemination was initiated. Nowadays, many works are widely disseminated even before publication, in the form of preprints. Another important new element concerns the views of published articles. Thanks to the availability of respective data by some journals, such as PLoS ONE, it became possible to develop investigations on how scientific works are viewed along time, often before the first citations appear. This provides the main theme of the present work. More specifically, our research was motivated by preliminary observations that the view profiles along time tend to present a piecewise linear nature. A methodology was then delineated in order to identify the main segments in the view profiles, which allowed several related measurements to be derived. In particular, we focused on the inclination and length of each subsequent segment. Basic statistics indicated that the inclination can vary substantially along subsequent segments, while the segment lengths resulted more stable. Complementary joint statistics analysis, considering pairwise correlations, provided further information about the properties of the views. In order to better understand the view profiles, we performed respective multivariate statistical analysis, including principal component analysis and hierarchical clustering. The results suggest that a portion of the polygonal views are organized into clusters or groups. These groups were characterized in terms of prototypes indicating the relative increase or decrease along subsequent segments. Four respective distinct models were then developed for representing the observed segments. It was found that models incorporating joint dependencies between the properties of the segments provided the most accurate results among the considered alternatives.

Read more
Digital Libraries

Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets

Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision. The existing solution which is based on information retrieval and string similarity on titles works well only if the titles are cleaned. We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset. The blocking function uses the classic BM25 algorithm to find the matching candidates from the reference data that has been indexed by ElasticSearch. The core components use supervised methods which combine features extracted from all available metadata fields. The system also leverages available citation information to match entities. The combination of metadata and citation achieves high accuracy that significantly outperforms the baseline method on the same test dataset. We apply this system to match the database of CiteSeerX against Web of Science, PubMed, and DBLP. This method will be deployed in the CiteSeerX system to clean metadata and link records to other scholarly big datasets.

Read more
Digital Libraries

Climate Change and Social Sciences: a bibliometric analysis

The complexity of emergent wicked problems, such as climate change, culminates in a reformulation of how we think about society and mobilize scientists from various disciplines to seek solutions and perspectives on the problem. From an epistemological point of view, it is essential to evaluate how such topics can be developed inside the academic arena but, to do that, it is necessary to perform complex analysis of the great number of recent academic publications. In this work, we discuss how climate change has been addressed by social sciences in practice. Can we observe the development of a new epistemology by the emergence of the climate change debate? Are there contributions in academic journals within the field of social sciences addressing climate change? Which journals are these? Who are the authors? To answer these questions, we developed an innovative method combining different tools to search, filter, and analyze the impact of the academic production related to climate change in social sciences in the most relevant journals.

Read more
Digital Libraries

Co-author weighting in bibliometric methodology and subfields of a scientific discipline

Collaborative work and co-authorship are fundamental to the advancement of modern science. However, it is not clear how collaboration should be measured in achievement-based metrics. Co-author weighted credit introduces distortions into the bibliometric description of a discipline. It puts great weight on collaboration - not based on the results of collaboration - but purely because of the existence of collaborations. In terms of publication and citation impact, it artificially favors some subdisciplines. In order to understand how credit is given in a co-author weighted system (like the NRC's method), we introduced credit spaces. We include a study of the discipline of physics to illustrate the method. Indicators are introduced to measure the proportion of a credit space awarded to a subfield or a set of authors.

Read more
Digital Libraries

Co-citations in context: disciplinary heterogeneity is relevant

Citation analysis of the scientific literature has been used to study and define disciplinary boundaries, to trace the dissemination of knowledge, and to estimate impact. Co-citation, the frequency with which pairs of publications are cited, provides insight into how documents relate to each other and across fields. Co-citation analysis has been used to characterize combinations of prior work as conventional or innovative and to derive features of highly cited publications. Given the organization of science into disciplines, a key question is the sensitivity of such analyses to frame of reference. Our study examines this question using semantically-themed citation networks. We observe that trends reported to be true across the scientific literature do not hold for focused citation networks, and we conclude that inferring novelty using co-citation analysis and random graph models benefits from disciplinary context.

Read more
Digital Libraries

Code Replicability in Computer Graphics

Being able to duplicate published research results is an important process of conducting research whether to build upon these findings or to compare with them. This process is called "replicability" when using the original authors' artifacts (e.g., code), or "reproducibility" otherwise (e.g., re-implementing algorithms). Reproducibility and replicability of research results have gained a lot of interest recently with assessment studies being led in various fields, and they are often seen as a trigger for better result diffusion and transparency. In this work, we assess replicability in Computer Graphics, by evaluating whether the code is available and whether it works properly. As a proxy for this field we compiled, ran and analyzed 151 codes out of 374 papers from 2014, 2016 and 2018 SIGGRAPH conferences. This analysis shows a clear increase in the number of papers with available and operational research codes with a dependency on the subfields, and indicates a correlation between code replicability and citation count. We further provide an interactive tool to explore our results and evaluation data.

Read more
Digital Libraries

Coevolution of theoretical and applied research: a case study of graphene research by temporal and geographic analysis

As a part of science of science (SciSci) research, the evolution of scientific disciplines has been attracting a great deal of attention recently. This kind of discipline level analysis not only give insights of one particular field but also shed light on general principles of scientific enterprise. In this paper we focus on graphene research, a fast growing field covers both theoretical and applied study. Using co-clustering method, we split graphene literature into two groups and confirm that one group is about theoretical research (T) and another corresponds to applied research (A). We analyze the proportion of T/A and found applied research becomes more and more popular after 2007. Geographical analysis demonstrated that countries have different preference in terms of T/A and they reacted differently to research trend. The interaction between two groups has been analyzed and shows that T extremely relies on T and A heavily relies on A, however the situation is very stable for T but changed markedly for A. No geographic difference is found for the interaction dynamics. Our results give a comprehensive picture of graphene research evolution and also provide a general framework which is able to analyze other disciplines.

Read more

Ready to get started?

Join us today