2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) | 2019

Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit

 
 
 
 
 

Abstract


We demonstrate the integration of the Archives Unleashed Toolkit, a scalable platform for exploring web archives, with Google s TensorFlow deep learning toolkit to provide scholars with content-based image analysis capabilities. By applying pretrained deep neural networks for object detection, we are able to extract images of common objects from a 4TB web archive of GeoCities, which we then compile into browsable collages. This case study illustrates the types of interesting analyses enabled by combining big data and deep learning capabilities.

Volume None
Pages 436-437
DOI 10.1109/JCDL.2019.00107
Language English
Journal 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

Full Text