Microscopy and Microanalysis | 2021

CEM500K – A large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning.

 
 

Abstract


Recent advances in volume electron microscopy (vEM) have resulted in the production of massive amounts of cellular EM image data, yet only a minuscule fraction of that data is annotated or segmented. Much work is being done to apply deep learning (DL) approaches to segment vEM data, but a key hurdle remains generalization: models trained on a specific type of data (say mitochondria from 3-D reconstructions of mouse hippocampus) perform poorly when confronted with previously unseen contexts (say mitochondria from the same 3-D reconstruction but within the retina instead) [1][2]. In DL research, a successful paradigm to correct this deficiency includes pre-training a neural network on a large dataset followed by transfer learning to a downstream task with a much smaller dataset. This approach yields better performing models that train more quickly and require fewer labeled examples. Unsupervised algorithms are now able to leverage unlabeled image data for pre-training, thereby overcoming the constraint of image annotation, but now the need for an appropriately relevant, large, heterogenous and information-rich dataset emerges.

Volume 27
Pages 3036 - 3037
DOI 10.1017/S1431927621010539
Language English
Journal Microscopy and Microanalysis

Full Text