GIMP-ML: Python Plugins for using Computer Vision Models in GIMP
GGIMP-ML: P
YTHON P LUGINS FOR USING C OMPUTER V ISION M ODELS IN
GIMP
Kritik Soman
Department of Electrical Engineering,Indian Institute of Technology Kanpur,India [email protected], [email protected]
May 12, 2020 A BSTRACT
This paper introduces GIMP-ML, a set of Python plugins for the widely popular GNU ImageManipulation Program (GIMP). It enables the use of recent advances in computer vision to theconventional image editing pipeline. Applications from deep learning such as monocular depthestimation, semantic segmentation, mask generative adversarial networks, image super-resolution, de-noising and coloring have been incorporated with GIMP through Python-based plugins. Additionally,operations on images such as edge detection and color clustering have also been added. GIMP-ML relies on standard Python packages such as numpy, scikit-image, pillow, pytorch,open-cv, scipy.
Apart from these, several image manipulation techniques using these plug-ins have been compiled and demonstrated in the YouTube playlist ( ) with the objective of demonstratingthe use-cases for machine learning based image modification. In addition, GIMP-ML also aims tobring the benefits of using deep learning networks used for computer vision tasks to routine imageprocessing workflows. The code and installation procedure for configuring these plugins is availableat https://github.com/kritiksoman/GIMP-ML . Image editing has conventionally been performed manually by users or graphics designers using various imageprocessing tools or software. A plethora of image editing and transformation functions are provided in such tools,which are available in open-source, commercial or proprietary license-based modes. Image processing workflows havevarying levels of complexity and sometimes even require significant effort from the user even for simple modificationsto images.GNU Image Manipulation Program (GIMP) is a popular free and open source image editing software that has beenwidely used on Linux-based platforms, as well as on other operating systems. It provides several features for imageediting and manipulation and has a simple user interface to work with. It also supports the development of pluginswhich can be developed independently and integrated with the local GIMP installation on a computer. Using plugins,one can realize custom workflows or set of operations that can be applied to an image.Recently, machine learning techniques have completely changed the landscape of image understanding and manyapplications which were previously not possible have now become the new baseline. This has significantly beenfacilitated by recent advances in deep learning and the applications of resultant models to tasks in the computer visiondomain. However, these deep learning models have been made available to users using independent deep learningframeworks such as Keras, TensorFlow, PyTorch, among others. It may also be noted here that since these networkshave a “large” architecture, their training is done on compute-intensive platforms (using GPUs) and the resultantmodels have a high memory footprint. Since the use of these models requires the user to code, graphics designersand users involved in conventional image editing workflows using image processing tools have not often been able a r X i v : . [ c s . C V ] M a y IMP-ML - M AY
12, 2020to directly leverage the benefits from the deep learning models. As such, developing a framework that would enablethe use of deep learning models in image editing tasks through commonly available image processing tools wouldpotentially benefit both the deep learning / computer vision community as well as graphics designers and common usersof such software.The motivation for this paper is to bridge the gap between cutting edge research in deep learning (computer vision)and manual image editing, specifically for the case of GIMP. A pilot implementation of plugins for GIMP, collectivelytermed as “GIMP-ML” (GIMP - Machine Learning), have been presented for various tasks such as background blurring,image coloring, face parsing, generative portrait modification, monocular depth based relighting, motion blurring andgenerating super-resolution images. It is expected that the image editing process would become highly automated in theupcoming future as the semantic understanding of images improves, which would be facilitated by advances in artificialintelligence. Figure 1: GIMP-ML Plugins MenuThe rest of this paper is organized as follows. Section 2 presents they key dependencies for GIMP-ML. This is followedby implementation details in Section 3. Various applications of GIMP-ML have been illustrated in Section 4, which alsoincludes links to demonstration videos on YouTube. Finally, conclusions and future work are presented in Section 5.
The Python package dependencies involved in the development of GIMP-ML are as follows:1. NumPy: The base N-dimensional array package, numpy [1], has been used for converting GIMP layer to atensor for use in Pytorch.2. SciPy: The fundamental library for scientific computing, scipy [2], has been used for performing basiccomputing operations.3. Scikit-image: The scikit-image [3] package has been used for realizing basic image processing operationsfor the plugins.4. OpenCV: The opencv-python [4] package provides OpenCV libraries in Python. It has been used for edgedetection. 2IMP-ML - M AY
12, 20205. Pre-Trained Models: The pretrainedmodels includes a set of pre-trained models for PyTorch [5], of whichthe
InceptionResNetV2 has been used for the applications presented in this paper.6. Torch & Torchvision: The torch [5] and torchvision [7] packages have been used to incorporate the deeplearning framework through Pytorch.
The GIMP-ML plugins have been developed in Python 2.7 which is supported in GIMP 2.10. A virtual environmenthas been separately created and added to the gimp-python path. This contains all the python packages used by theplugins. The plugins use CPU by default and switch to GPU for prediction when available. Currently, for all pluginsassume that the input layer should not have alpha channels. The plugins take advantage of layers in GIMP for variousworkflows. As a consequence, image manipulation in the following applications is also non-destructive in nature.
This section describes applications of GIMP-ML, which include background blurring, image coloring, face parsing,generative portrait modification, monocular depth based relighting, motion blurring and generating super-resolutionimages. Demo videos of all the applications are available in the YouTube playlist: . We used the Pytorch Deeplabv3 [8] model trained on the Pascal VOC dataset [9]. It has 20 classes, namely, person,bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, pottedplant, sofa, and tv/monitor. These objects can be directly segmented in images. The segmentation map can then be usedto selectively perform operations on regions of the image, such as blurring, hue/saturation change etc. A demonstrationvideo for background blurring has been shown in https://youtu.be/U1CieWi--gc
Conversion of grayscale images to RGB [10] using deep learning has also been included in GIMP-ML. The inputimage should be in grayscale model. This can be done from the menu Image->Mode->Grayscale. The demo for imagecoloring has been shown in https://youtu.be/HVwISLRow_0 For segmenting portrait images, we used BiSeNet [11] trained on the CelebAMask-HQ dataset . It can segment 19classes such as such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, andcloth. The segmentation map can then be used to selectively manipulate various facial features. Hair color manipulationhas been demonstrated in the video demo using this network. The demo for hair color manipulation using this plugincan be viewed at https://youtu.be/thS8VqPvuhE With the facegen plugin, facial features in portrait photo can be segmented, modified and then newly generated.Trained on the CelebAMask-HQ dataset [12], this model relies on facial segmentation map generated in the previoussub-section. The mask can be duplicated into another layer and it can be manipulated using Color Picker Tool andPaintbrush Tool. The input image, original mask and modified mask can then be fed into Mask-GAN to generate thedesired image (as shown in Fig.2). A drawback of such a model is that it does not preserve unmodified facial features.This can, however, be taken care of by manually erasing unwanted facial feature changes from the generated layerthereby exposing the original image in the layer underneath. This is a valuable workflow since professional imageeditors spend a large amount of time in making portrait shots perfect and would retain original image facial features.The demo has generative portrait modification has been shown in https://youtu.be/kXYsWvOB4uk https://github.com/richzhang/colorization https://github.com/zllrunning/face-parsing.PyTorch https://github.com/switchablenorms/CelebAMask-HQ AY
12, 2020Figure 2: Menu for Generative Portrait Modification
Disparity maps can be generated from images using deep learning methods and depth from stereo images. Recently,self supervised monocular depth estimation has been proposed in [13]. This has been ported for GIMP-ML using themodel that was trained on the KITTI dataset [14].Using this model, the disparity map of street images can be desaturated, inverted and colorized to created a layerrepresenting light falling from the sky. In the demo video ( https://youtu.be/q9Ny5XqIUKk ), a day time image of astreet has been converted to night time using this approach. GAN based motion deblurring from [15] was also ported . The video demo has been shown in https://youtu.be/adgHtu4chyU . The model in [16] for image super resolution was also implemented. Using this plugin the input image layer can beupscaled to upto 4x its original size. Demo has been shown in https://youtu.be/HeBgWcXFQpI . This paper presented GIMP-ML, a set of Python plugins that enabled the use of deep learning models in GIMP viaPytorch for various applications. It has been shown that several manual and time-consuming image processing tasks canbe simplified by the use of deep learning models, which makes it convenient for the users of image processing softwareto perform such tasks. GIMP 2.10 currently relies on Python 2.7 which been deprecated as on 1 January 2020. The nextversion of GIMP would use Python 3 and GIMP-ML codebase would be updated to support this. Further, deep learningmodels suffer from the data bias problem and only work well when the test image is from the same distribution as thedata on which the model was trained. In future, the framework would be enhanced to handle such scenarios.
References [1] Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. The numpy array: a structure for efficient numericalcomputation.
Computing in Science & Engineering , 13(2):22–30, 2011.[2] Eric Jones, Travis Oliphant, and Pearu Peterson. Scipy: Open source scientific tools for python. 2001.[3] Stefan Van der Walt, Johannes L Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D Warner, NeilYager, Emmanuelle Gouillart, and Tony Yu. scikit-image: image processing in python.
PeerJ , 2:e453, 2014.[4] Alexander Mordvintsev and K Abid. Opencv-python tutorials documentation.
Obtenido de https://media.readthedocs. org/pdf/opencv-python-tutroals/latest/opencv-python-tutroals. pdf , 2014. https://github.com/nianticlabs/monodepth2 https://github.com/TAMU-VITA/DeblurGANv2 https://github.com/twtygqyy/pytorch-SRResNet AY
12, 2020[5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deeplearning library. In
Advances in Neural Information Processing Systems , pages 8024–8035, 2019.[6] John D Hunter. Matplotlib: A 2d graphics environment.
Computing in science & engineering , 9(3):90–95, 2007.[7] Sébastien Marcel and Yann Rodriguez. Torchvision the machine-vision package of torch. In
Proceedings of the18th ACM international conference on Multimedia , pages 1485–1488, 2010.[8] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution forsemantic image segmentation, 2017.[9] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascalvisual object classes (voc) challenge.
International journal of computer vision , 88(2):303–338, 2010.[10] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution,2016.[11] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentationnetwork for real-time semantic segmentation, 2018.[12] Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. Maskgan: towards diverse and interactive facial imagemanipulation, 2019.[13] Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. Digging into self-supervisedmonocular depth estimation, 2019.[14] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti visionbenchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition