Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yasmin AlNoamany is active.

Publication


Featured researches published by Yasmin AlNoamany.


acm/ieee joint conference on digital libraries | 2013

Access patterns for robots and humans in web archives

Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson

Although user access patterns on the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Based on samples from the Internet Archives public Wayback Machine, we propose a set of basic usage patterns: Dip (a single access), Slide (the same page at different archive times), Dive (different pages at approximately the same archive time), and Skim (lists of what pages are archived, i.e., TimeMaps). Robots are limited almost exclusively to Dips and Skims, but human accesses are more varied between all four types. Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of megabytes transferred. Robots almost always access TimeMaps (95% of accesses), but humans predominately access the archived web pages themselves (82% of accesses). In terms of unique archived web pages, there is no overall preference for a particular time, but the recent past (within the last year) shows significant repeat accesses.


acm/ieee joint conference on digital libraries | 2012

Visualizing digital collections at archive-it

Kalpesh Padia; Yasmin AlNoamany; Michele C. Weigle

Archive-It, a subscription service from the Internet Archive, allows users to create, maintain and view digital collections of web resources. The current interface of Archive-It is largely text-based, supporting drill-down navigation using lists of URIs. To provide an overview of each collection and highlight the collections underlying characteristics, we present four alternate visualizations (image plot with histogram, wordle, bubble chart and timeline). The sites in an Archive-It collection may be organized by the collection curator into groups for easier navigation. However, many collections do not have such groupings, making them difficult to explore. We introduce a heuristics-based categorization for such collections.


international conference theory and practice digital libraries | 2013

Who and What Links to the Internet Archive

Yasmin AlNoamany; Ahmed AlSum; Michele C. Weigle; Michael L. Nelson

The Internet Archive’s (IA) Wayback Machine is the largest and oldest public web archive and has become a significant repository of our recent history and cultural heritage. Despite its importance, there has been little research about how it is discovered and used. Based on web access logs, we analyze what users are looking for, why they come to IA, where they come from, and how pages link to IA. We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web. About 65% of the requested archived pages no longer exist on the live web. We find that more than 82% of human sessions connect to the Wayback Machine via referrals from other web sites, while only 15% of robots have referrers. Most of the links (86%) from websites are to individual archived pages at specific points in time, and of those 83% no longer exist on the live web.


international conference theory and practice digital libraries | 2015

Detecting Off-Topic Pages in Web Archives

Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson

Web archives have become a significant repository of our recent history and cultural heritage. Archival integrity and accuracy is a precondition for future cultural research. Currently, there are no quantitative or content-based tools that allow archivists to judge the quality of the Web archive captures. In this paper, we address the problems of detecting off-topic pages in Web archive collections. We evaluate six different methods to detect when the page has gone off-topic through subsequent captures. Those predicted off-topic pages will be presented to the collection’s curator for possible elimination from the collection or cessation of crawling. We created a gold standard data set from three Archive-It collections to evaluate the proposed methods at different thresholds. We found that combining cosine similarity at threshold 0.10 and change in size using word count at threshold \(-\)0.85 performs the best with accuracy = 0.987, \(F_{1}\) score = 0.906, and AUC = 0.968. We evaluated the performance of the proposed method on several Archive-It collections. The average precision of detecting the off-topic pages is 0.92.


International Journal on Digital Libraries | 2016

Detecting off-topic pages within TimeMaps in Web archives

Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson

Web archives have become a significant repository of our recent history and cultural heritage. Archival integrity and accuracy is a precondition for future cultural research. Currently, there are no quantitative or content-based tools that allow archivists to judge the quality of the Web archive captures. In this paper, we address the problems of detecting when a particular page in a Web archive collection has gone off-topic relative to its first archived copy. We do not delete off-topic pages (they remain part of the collection), but they are flagged as off-topic so they can be excluded for consideration for downstream services, such as collection summarization and thumbnail generation. We propose different methods (cosine similarity, Jaccard similarity, intersection of the 20 most frequent terms, Web-based kernel function, and the change in size using the number of words and content length) to detect when a page has gone off-topic. Those predicted off-topic pages will be presented to the collection’s curator for possible elimination from the collection or cessation of crawling. We created a gold standard data set from three Archive-It collections to evaluate the proposed methods at different thresholds. We found that combining cosine similarity at threshold 0.10 and change in size using word count at threshold −0.85 performs the best with accuracy = 0.987,


international conference theory and practice digital libraries | 2015

Characteristics of Social Media Stories

Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson


PeerJ | 2018

Towards computational reproducibility: researcher perspectives on the use and sharing of software

Yasmin AlNoamany; John Borghi

F_{1}


Journal of Librarianship and Scholarly Communication | 2018

Software Curation in Research Libraries: Practice and Promise

Alexandra Chassanoff; Yasmin AlNoamany; Katherine Thornton; John Borghi


International Journal on Digital Libraries | 2014

Who and what links to the Internet Archive

Yasmin AlNoamany; Ahmed AlSum; Michele C. Weigle; Michael L. Nelson

F1 score = 0.906, and AUC


web science | 2017

Generating Stories From Archived Collections

Yasmin AlNoamany; Michele C. Weigle; Michael L. Nelson

Collaboration


Dive into the Yasmin AlNoamany's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ahmed AlSum

Old Dominion University

View shared research outputs
Top Co-Authors

Avatar

John Borghi

California Digital Library

View shared research outputs
Top Co-Authors

Avatar

Alexandra Chassanoff

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mat Kelly

Old Dominion University

View shared research outputs
Researchain Logo
Decentralizing Knowledge