Data and Information Management | 2019
Special Issue on Cyberinfrastructure, Machine Learning, and Digital Library
Abstract
This special issue aims to bridge the gaps between digital libraries and archives and cyberinfrastructure by inviting researchers and practitioners from both fields as well as domain experts to share their ideas, introduce theories and methods, and demonstrate successful use cases. Increasingly, digital libraries and archives need to and are using cyberinfrastructure and machine learning to meet curation, data management, and researchers’ needs. Academic libraries have made a significant progress accommodating data into their services and collections. This has been achieved through data management consulting services and institutional repositories for final and relatively small-sized data publications. However, research data management remains challenging for largescale data generated from complex analysis pipelines conducted in distributed computational resources. More often than not, researches conducted in these ecosystems involve using multiple computational facilities, remote users with access and authorization roadblocks, data that evolve at varying time frames, missing documentation, data transfer problems, storage scalability limitations, and data vulnerability risks. A necessary approach to meeting these challenges is to use cyberinfrastructure, which refers to large shared online research environments, backed up by advanced computing resources hosted in data centers and supported by experts. Coupling cyberinfrastructure and digital libraries and archives can provide the needed technical resources and the expertise required to manage and analyze data at scale, as well as new opportunities to facilitate data preservation, access, and reuse. Facilitating adoption and integration between machine learning, cyberinfrastructure and digital library is an important step to achieve smart data and information management. However, there is lack of mediums and forums that bring together researchers and practitioners to share visions, questions, latest advances in methodology, application experiences, and best practices. Library and archival professionals are often unfamiliar with cyberinfrastructure. In turn, cyberinfrastructure experts lack experience in traditional digital library and archives practices such as metadata, provenance, publishing, information retrieval, and digital preservation. To address this challenge, the workshop on cyberinfrastructure, machine learning, and digital library was held in conjunction with 2018 Joint Conference on Digital Libraries on June 3, 2018, in Dallas. The workshop included 10 project presentations and drew more than 40 participants at crossroad of these three fields. These projects span entire data life cycle and range from infrastructure perspective and service provider perspective to end-user perspective. From the workshop, five projects are invited to submit extended version for this special issue. In the paper “Capsule Computing: Safe Open Science,” Beth et al. presented their recent works and insights on enabling more options for data sharing. The authors argue that more flexible approaches are needed to share protected data and propose a solution named capsule framework. They demonstrated the synergies and tradeoffs of the current implementation through a use case of using their framework with a massive collection of copyrighted texts. Instead of just providing public data content and hiding any protected data for research, the proposed capsule framework enables a novel sociotechnical system involving a controlled interaction between humans, machines, and the environmental aspects of the work system. Through this framework, users can access, but with limited interaction, protected data. The limitations are imposed by the capsule framework to ensure that the data content is safe. HathiTrust data collection is used as an example data collection to demonstrate the usage and features of the proposed framework in the paper. *Corresponding author: Weijia Xu, Texas Advanced Computing Center, University of Texas at Austin, Austin, USA, Email: [email protected] Maria Esteva, Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, USA Jessica Trelogan, University of Texas Library, TX, USA Dan Wu, Center for Studies of Information Resources, Wuhan University, Wuhan, China