Featured Researches

Digital Libraries

Making Recommendations from Web Archives for "Lost" Web Pages

When a user requests a web page from a web archive, the user will typically either get an HTTP 200 if the page is available, or an HTTP 404 if the web page has not been archived. This is because web archives are typically accessed by URI lookup, and the response is binary: the archive either has the page or it does not, and the user will not know of other archived web pages that exist and are potentially similar to the requested web page. In this paper, we propose augmenting these binary responses with a model for selecting and ranking recommended web pages in a Web archive. This is to enhance both HTTP 404 responses and HTTP 200 responses by surfacing web pages in the archive that the user may not know existed. First, we check if the URI is already classified in DMOZ or Wikipedia. If the requested URI is not found, we use ML to classify the URI using DMOZ as our ontology and collect candidate URIs to recommended to the user. Next, we filter the candidates based on if they are present in the archive. Finally, we rank candidates based on several features, such as archival quality, web page popularity, temporal similarity, and URI similarity. We calculated the F1 score for different methods of classifying the requested web page at the first level. We found that using all-grams from the URI after removing numerals and the TLD produced the best result with F1=0.59. For second-level classification, the micro-average F1=0.30. We found that 44.89% of the correctly classified URIs contained at least one word that exists in a dictionary and 50.07% of the correctly classified URIs contained long strings in the domain. In comparison with the URIs from our Wayback access logs, only 5.39% of those URIs contained only words from a dictionary, and 26.74% contained at least one word from a dictionary. These percentages are low and may affect the ability for the requested URI to be correctly classified.

Read more
Digital Libraries

Making sense of global collaboration dynamics: Developing a methodological framework to study (dis)similarities between country disciplinary profiles and choice of collaboration partners

This paper presents a novel methodological framework by which the effects of globalization on international collaboration can be studied and understood. Using the cosine similarity of the disciplinary and partner profiles of countries by collaboration types it is possible to analyse the effects of globalization and the costs and benefits of an increasing global networked research system.

Read more
Digital Libraries

Mapping Researchers with PeopleMap

Discovering research expertise at universities can be a difficult task. Directories routinely become outdated, and few help in visually summarizing researchers' work or supporting the exploration of shared interests among researchers. This results in lost opportunities for both internal and external entities to discover new connections, nurture research collaboration, and explore the diversity of research. To address this problem, at Georgia Tech, we have been developing PeopleMap, an open-source interactive web-based tool that uses natural language processing (NLP) to create visual maps for researchers based on their research interests and publications. Requiring only the researchers' Google Scholar profiles as input, PeopleMap generates and visualizes embeddings for the researchers, significantly reducing the need for manual curation of publication information. To encourage and facilitate easy adoption and extension of PeopleMap, we have open-sourced it under the permissive MIT license at this https URL. PeopleMap has received positive feedback and enthusiasm for expanding its adoption across Georgia Tech.

Read more
Digital Libraries

Mapping Three Decades of Intellectual Change in Academia

Research on the development of science has focused on the creation of multidisciplinary teams. However, while this coming together of people is symmetrical, the ideas, methods, and vocabulary of science have a directional flow. We present a statistical model of the text of dissertation abstracts from 1980 to 2010, revealing for the first time the large-scale flow of language across fields. Results of the analysis include identifying methodological fields that export broadly, emerging topical fields that borrow heavily and expand, and old topical fields that grow insular and retract. Particular findings show a growing split between molecular and ecological forms of biology and a sea change in the humanities and social sciences driven by the rise of gender and ethnic studies.

Read more
Digital Libraries

Mapping of Publications Productivity on Journal of Documentation 1989-2018: A Study Based on Clarivate Analytics-Web of Science Database

The Present Study analyzed research productivity in Journal of Documentation (JDoc) for a period of 30 years between 1989 and 2018. Web of Science database a service from Clarivate Analytics has been used to download citation and source data. Bibexcel and Histcite application software have been used to present the datasets. Analysis part focuses on the parameters like citation impact at local and global level, influential authors and their total output, ranking of contributing institutions and countries. In addition to this scientographical mapping of data is presented through graphs using VOSviewer software mapping technique.

Read more
Digital Libraries

Mapping social media attention in Microbiology: Identifying main topics and actors

This paper aims to map and identify topics of interest within the field of Microbiology and identify the main sources driving such attention. We combine data from Web of Science and this http URL, a platform which retrieves mentions to scientific literature from social media and other non-academic communication outlets. We focus on the dissemination of microbial publications in Twitter, news media and policy briefs. A two-mode network of social accounts shows distinctive areas of activity. We identify a cluster of papers mentioned solely by regional news media. A central area of the network is formed by papers discussed by the three outlets. A large portion of the network is driven by Twitter activity. When analyzing top actors contributing to such network, we observe that more than half of the Twitter accounts are bots, mentioning 32% of the documents in our dataset. Within news media outlets, there is a predominance of popular science outlets. With regard to policy briefs, both international and national bodies are represented. Finally, our topic analysis shows that the thematic focus of papers mentioned varies by outlet. While news media cover the wider range of topics, policy briefs are focused on translational medicine, and bacterial outbreaks.

Read more
Digital Libraries

Mapping the "long tail" of research funding: A topic analysis of NSF grant proposals in the Division of Astronomical Sciences

"Long tail" data are considered to be smaller, heterogeneous, researcher-held data, which present unique data management and scholarly communication challenges. These data are presumably concentrated within relatively lower-funded projects due to insufficient resources for curation. To better understand the nature and distribution of long tail data, we examine National Science Foundation (NSF) funding patterns using Latent Dirichlet Analysis (LDA) and bibliographic data. We also introduce the concept of "Topic Investment" to capture differences in topics across funding levels and to illuminate the distribution of funding across topics. This study uses the discipline of astronomy as a case study, overall exploring possible associations between topic, funding level and research output, with implications for research policy and practice. We find that while different topics demonstrate different funding levels and publication patterns, dynamics predicted by the "long tail" theoretical framework presented here can be observed within NSF-funded topics in astronomy.

Read more
Digital Libraries

Mapping the backbone of the Humanities through the eyes of Wikipedia

The present study aims to establish a valid method by which to apply the theory of co-citations to Wikipedia article references and, subsequently, to map these relationships between scientific papers. This theory, originally applied to scientific literature, will be transferred to the digital environment of collective knowledge generation. To this end, a dataset containing Wikipedia references collected from Altmetric and Scopus' Journal Metrics journals has been used. The articles have been categorized according to the disciplines and specialties established in the All Science Journal Classification (ASJC). They have also been grouped by journal of publication. A set of articles in the Humanities, comprising 25 555 Wikipedia articles with 41 655 references to 32 245 resources, has been selected. Finally, a descriptive statistical study has been conducted and co-citations have been mapped using networks and indicators of degree and betweenness centrality.

Read more
Digital Libraries

Mapping the co-evolution of artificial intelligence, robotics, and the internet of things over 20 years (1998-2017)

Understanding the emergence, co-evolution, and convergence of science and technology (S&T) areas offers competitive intelligence for researchers, managers, policy makers, and others. The resulting data-driven decision support helps set proper research and development (R&D) priorities; develop future S&T investment strategies; monitor key authors, organizations, or countries; perform effective research program assessment; and implement cutting-edge education/training efforts. This paper presents new funding, publication, and scholarly network metrics and visualizations that were validated via expert surveys. The metrics and visualizations exemplify the emergence and convergence of three areas of strategic interest: artificial intelligence (AI), robotics, and internet of things (IoT) over the last 20 years (1998-2017). For 32,716 publications and 4,497 NSF awards, we identify their conceptual space (using the UCSD map of science), geospatial network, and co-evolution landscape. The findings demonstrate how the transition of knowledge (through cross-discipline publications and citations) and the emergence of new concepts (through term bursting) create a tangible potential for interdisciplinary research and new disciplines.

Read more
Digital Libraries

Mathematical Formulae in Wikimedia Projects 2020

This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 to providing modern semantically enriched language-independent MathML formulae in 2020. Additionally, we describe our plans to improve the accessibility and discoverability of mathematical knowledge in Wikimedia projects further.

Read more

Ready to get started?

Join us today