Maria Esteva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maria Esteva is active.

Explore More

Publication

Featured researches published by Maria Esteva.

acm/ieee joint conference on digital libraries | 2010

Visualizing personal digital collections

Weijia Xu; Maria Esteva; Suyog Dott Jain

This paper describes the use of relational database management system (RDBMS) and treemap visualization to represent and analyze a group of personal digital collections created in the context of work and with no external metadata. We evaluated the visualization vis a vis the results of previous personal information management (PIM) studies. We suggest that this visualization supports analysis that allow understanding PIM practices overtime.

acm/ieee joint conference on digital libraries | 2009

Inferring intra-organizational collaboration from cosine similarity distributions in text documents

Maria Esteva; Hai Bi

We present a method that uses text mining methods and statistical distributions to infer degrees of collaboration between staff members in an organization, based on the similarity of the documents that that they wrote and exchanged over time.

ASIST '13 Proceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries | 2013

Data mining for big archives analysis: a case study

Maria Esteva; Weijia Xu; Jeffrey Felix Tang; Karthik Anantha Padmanabhan

We present a case of archival analysis using a combination of data mining methods. The team of researchers, composed by archivists and computer scientists, used a collection of declassified Department of State Cables as a case study. The methods implemented included Support Vector Machine (SVM) and Association Rule Mining. Combined in an analysis workflow, the results of the different methods allowed the team to identify the different security classes, understand how they changed over time and generate descriptions for the cables in each class. The interpretation of results also allowed understanding contextual aspects of the collection. Until now, the use of data mining for archival analysis and processing has not been thoroughly explored by the archival community. This study constitutes a seminal roadmap to understand how to apply, interpret and integrate data mining with the archivists experience and judgment in collaboration with computer scientists. It proposes an inductive approach to archives analysis and the possibility of verifying processing decisions.

acm/ieee joint conference on digital libraries | 2016

Data Curation with a Focus on Reuse

Maria Esteva; Sandra Sweat; Robert McLay; Weijia Xu; Sivakumar Kulasekaran

A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation tasks to conduct, which included: adding more information elements to the dataset to expand its content scope; removing personal information; and, packaging the data in a size, a format, and at a frequency of delivery that are convenient for access and analysis purposes. The curation tasks are embedded in the software that produces the data, and are implemented as an automated workflow that spans various HPC resources, in which the dataset is generated, processed and stored and the Texas ScholarWorks institutional repository, through which the data is published. Within this distributed architecture, the integrated data creation and curation workflow complies with long-term preservation requirements, and is the first one implemented as a collaboration between the supercomputing center where the data is created on ongoing basis, and the University Libraries at UT Austin where it is published. The targeted curation strategy included the design of proof of concept data analyses to evaluate if the curated data met the reuse scenarios proposed by users. The results suggest that the dataset is understandable, and that researchers can use it to answer some of the research questions they posed. Results also pointed to specific elements of the curation strategy that had to be improved and disclosed the difficulties involved in breaking data to new users.

international conference on big data | 2013

A case study on entity Resolution for Distant Processing of big Humanities data

Weijia Xu; Maria Esteva; Jessica Trelogan; Todd Swinson

At the forefront of big data in the Humanities, collections management can directly impact collections access and reuse. However, curators using traditional data management methods for tasks such as identifying redundant from relevant and related records, a small increase in data volume can significantly increase their workload. In this paper, we present preliminary work aimed at assisting curators in making important data management decisions for organizing and improving the overall quality of large unstructured Humanities data collections. Using Entity Resolution as a conceptual framework, we created a similarity model that compares directories and files based on their implicit metadata, and clusters pairs of closely related directories. Useful relationships between data are identified and presented through a graphical user interface that allows qualitative evaluation of the clusters and provides a guide to decide on data management actions. To evaluate the models performance, we experimented with a test collection and asked the curator to classify the clusters according to four model cluster configurations that consider the presence of related and duplicate information. Evaluation results suggest that the model is useful for making data management action decisions.

international conference on big data | 2016

Content-based comparison for collections identification

Weijia Xu; Ruizhu Huang; Maria Esteva; Jawon Song; Ramona L. Walls

Assigning global unique persistent identifiers (GUPIs) to datasets has the goal of improving their accessibility and simplifying how they are referenced and reused. However, as repositories receive more and complex data, attesting for the identity of datasets attached to persistent identifiers over time is becoming more challenging. This is due to the nature of scientific research data, which is generated through distributed research practices and evolves across different computational environments. This work presents a robust, automated computational service for data content comparison as a valuable addition to assigning, managing, and tracking persistent identifiers. We operationalized the functions of the service within the archival space by linking data provenance and identity to authenticity. The need for such service is shown through three genomics data use cases in which the results aided curators establishing the identity of datasets and inferring issues of provenance. We describe the systems design, implementation and performance, and report on lessons learned.

acm ieee joint conference on digital libraries | 2018

Cyberinfrastructure for Digital Libraries and Archives: Integrating Data Management, Analysis, and Publication

Weijia Xu; Maria Esteva; Jessica Trelogan

Increasingly, digital libraries and archives need to and are using cyberinfrastructure and machine learning to meet curation, data management, and researchers needs. This workshop focuses on facilitating adoption and integration between these spaces. It brings together researchers and practitioners to share visions, questions, latest advances in methodology, application experiences, and best practices.

acm ieee joint conference on digital libraries | 2017

A portable strategy for preserving web applications functionality

Weijia Xu; Maria Esteva; Deborah Beck; Yi-Hsuan Hsieh

The value of research data not only resides in its content but in how it is made available to users. Research data is often presented interactively through a web application, the design of which is often the result of years of work by researchers. Therefore, preserving the data and the applications functionalities becomes equally important. However, preserving web applications, which are commonly deployed within shared and changing technology infrastructures, presents challenges to the reproducibility and portability of the application across technology platforms over time. We propose a functional preservation strategy to decouple web applications and their corresponding data from their hosting environments. The strategy allows re-launching the applications in more portable, simplified environments without compromising their interactive features, and it allows reusing the data in other technical and functional contexts. The strategy fits well with the evolving nature of digital preservation and with the requirements for data reuse. We demonstrate this approach and its evaluation using the Speech Presentation in Homeric Epic digital humanities project.

Archive | 2016

User Guided Design: Building Confidence in Engineering Data Publication

Sandra Sweat; Aditi Ranganath; Maria Esteva; Maša Prodanović

Advances in imaging technology have generated large volumetric datasets in the field of petroleum engineering. To address the need to share this data, a multidisciplinary team developed the Digital Rocks Portal. This paper describes the protocol for conducting the user experience study, analyzing the results, and the methods to improve the researcher’s experience and enhance the quality of the data publications.

Information Visualization | 2014

Interactive visualization for curatorial analysis of large digital collection

Weijia Xu; Maria Esteva; Suyog Dott Jain; Varun Jain

To make decisions about the long-term preservation of and access to large digital collections, digital curators use information such as the collections’ digital object types, their contents and preservation risks, and how they are organized. To date, the process of analyzing a collection—from data gathering to exploratory analysis and final conclusions—has largely been conducted using linear review and pen and paper methods. To help curators analyze large-scale digital collections, we developed an interactive visual analytics application. We have put methods in place to summarize large and diverse information about the collection and to present it as integrated views. Multiple views can be linked or unlinked on demand to enable curators to identify trends and particularities at different levels of detail and to compare and contrast views. We describe two analysis workflows to illustrate how the application can be used to triage digital collections and facilitate collection management decision making and to provide access. After conducting a focus group study with domain specialists, we introduced features to address their concerns and needs.

Explore More