Image Segmentation to Distinguish Between Overlapping Human Chromosomes
R. Lily Hu, Jeremy Karnowski, Ross Fadely, Jean-Patrick Pommier
IImage Segmentation to Distinguish BetweenOverlapping Human Chromosomes
R. Lily Hu
UC Berkeley, Salesforce Research [email protected]
Jeremy Karnowski
Insight Data Science [email protected]
Ross Fadely
Insight Data Science [email protected]
Jean-Patrick Pommier https://dip4fish.blogspot.fr/ [email protected]
Abstract
In medicine, visualizing chromosomes is important for medical diagnostics, drugdevelopment, and biomedical research. Unfortunately, chromosomes often overlapand it is necessary to identify and distinguish between the overlapping chromo-somes. A segmentation solution that is fast and automated will enable scaling ofcost effective medicine and biomedical research. We apply neural network-basedimage segmentation to the problem of distinguishing between partially overlappingDNA chromosomes. A convolutional neural network is customized for this prob-lem. The results achieved intersection over union (IOU) scores of 94.7% for theoverlapping region and 88-94% on the non-overlapping chromosome regions.
Neural networks are a powerful approach to segmenting images, including for street scenes andbiomedical images of tissue. In medicine, visualizing chromosomes is important for medical diag-nostics, drug development, and biomedical research. Unfortunately, chromosomes often overlapand it is necessary to identify and distinguish between the overlapping chromosomes. For example,some diseases are associated with particular chromosomes or the existence of more or fewer than theexpected number of chromosomes. Challenges to this problem include that the overlapping objectsmay be nearly identical and that it is arbitrary which object is considered the first object and whichone the second. Furthermore, overlapping chromosomes may look like one larger chromosome, maycriss-cross, or one may be almost entirely on top of the other. A segmentation solution that is fastand automated will enable scaling of cost effective medicine and biomedical research. Traditionalmethods of distinguishing between overlapping chromosomes involved printing and cutting outindividual chromosomes by hand, thresholding on histogram values of pixels, geometric analysis ofchromosome contours, among others, and required human intervention when partial overlaps occur.In this work, we apply neural network-based image segmentation to the problem of distinguishingbetween partially overlapping human chromosomes. A convolutional neural network, based onU-Net, is customized for this problem. The model is designed so that the output segmentation maphas the same dimensions as the input image. To reduce computation time and storage, the model isalso simplified. This is because the dimensions of the input image, the set of potential objects in theimage, and the set of potential chromosome shapes, are all small, which reduces the scope of theproblem, the required capacity of the model, and thus the modeling needs. Various hyperparametersof the model are explored and tested. Code available: https://github.com/LilyHu/image_segmentation_chromosomes a r X i v : . [ c s . C V ] D ec ection 2 outlines the background, Section 3 describes the data and preprocessing, Section 4 elaborateson the model, Section 5 summarizes the results, and Section 6 concludes with future work. Cytogenetics is the study of chromosomes, including their numbers and structures up to the nucleotidsscale [4] [13] . Pionneering works in species from flies to maise [15] enabled the understandingof genes and their inheritance. Human cytogenetics started in 1956 with the discovery of the exactnumber of chromosomes in humans [21], soon followed by the discovery that structural chromosomalor number anomalies can be be associated with cancer or developmental diseases. Human cytogeneticsbecome a diagnostic tool. Cytogenetics is also used as a biological dosimeter in radiobiology, whichis the study of the effect of radiation on living beings [5].
The advent of molecular cytogenetics and fluorescent probes (FISH or Fluorescent in-situ Hybridiza-tion) yields insights otherwise inaccessible by stained-based cytogenetics. Computers and dedicatedsoftware applications started to replace scissor cutouts of black and white pictures of chromosomesfor karyotyping. New algorithms and application were developed to process and interpret fluores-cent images, study genomic hybridization, and measure the telomere length Q-FISH [12] [18] [2].Quantitative methods were developed to become metaphase-free and array-based [4]. Metaphasicchromosomes were used to detect targeted chromosomal anomalies [21] or for QFISH [22].Computer based chromosome segmentation and classification is still an open problem [1], particularlythe resolving of overlapping chromosomes. Up to now, approaches rely on geometric approachs basedon contour analysis [7], finding a skeleton [19] [17] [16]. These methods can be rule-based or involveclassifiers with hand crafted features. Even for a case as simple as a pair of crossing chromosomesforming a cross, there is ambiguity when it comes to reassembling the pieces to reconstitute the twochromosomes [8]. Grisan et al. developed a tree search to address this issue [6].
Chromosomes can be DAPI stained in fluorescence imaging, or stained with giemsa in conventionalcytogenetics. After adaptive thresholding and labeling of connected components of binary particles,images of chromosomes can be isolated. Those images can yield single chromosomes, touchingchromosomes or overlapping chromosomes.In the following emblematic example taken from a metaphase, shown in Figure 1, a polygonalapproximation is computed from the chromosome contour and some remarkable points can beisolated. The four points corresponding to the chromosomal crossing determine a polygon containingthe pixels belonging to the overlapping domain.Figure 1: Isolation of crossing domain from contour analysis of two crossing chromosomes. Remark-able points are found from contour (left), crossing domain can be found from four points then used toisolate the different parts of two crossing chromosomes (right).2ven for a case as emblematic as a pair of crossing chromosomes forming a four-armed cross, thereis ambiguity of a combinatorial nature when it comes to reassembling the pieces to reconstitute thetwo chromosomes [8]. This ambiguity is illustrated in Figure 2.This ambiguity necessites a decision. Grisan et al. developed a tree search from high resolutionQ banded chromosomes to address this issue [6]. Successful results were reported on resolvingchromosomes clusters[17] [16], on limited numbers of chromosome clusters extracted from imagesof metaphases, and in some cases on synthetic images combining chromosomes using Adobe CS[17].Figure 2: Combinatorial issue when reassembling segmented parts of two crossing chromosomes. Inthis case three pairs, mutually exclusive, can be generated.
Convolutional neural networks are popular for image segmentation. These include fully convolutionalnetwork [14], dilated convolutions [23], and encoder-decoder architectures [20] [3]. We propose tosolve the overlapping chromosome problem by replacing geometric algorithms with methods fromdeep learning.
To create a segmentation solution to resolve overlapping chromosomes, we built a dataset for semanticsegmentation using thousands of semi-synthetically generated overlapping chromosomes.Images of single chromosomes were extracted from an image of human metaphase hybridized with aCy3 fluorescent telomeric probe [12]. Blue (DAPI) and orange (Cy3) components of the image of asingle chromosome were combined into a greyscale image as shown in Figure 3. Then the resolutionof the images were decrease by two.Figure 3: Combination of DAPI (Chromosome) and Cy3 (Telomeres) images into a grey scaled imageFrom the set of 46 chromosomes, there are (cid:0) (cid:1) = 1035 possible pairs of chromosomes. 12chromosomes were kept to generate a subset of (cid:0) (cid:1) = 66 pairs of chromosomes to combine different3hromosomal size and morphology. In each pair of chromosomes, each chromosome was rotated andone chromosome was relatively translated horizontally and vertically to the other one. The overlappingchromosomes were generated by meaning the two grey scaled images of the chromosomes. Theso-called ground-truth labels were generated by adding the mask of each single chromosome. Bychoosing the value 1 for the mask of the first chromosome and the value 2 for the mask of the otherchromosome, the label of the overlapping domain has the value 3. Only pairs with ground-truthcontaining overlapping domains were kept. Raw images of metaphasic chromosomes, dataset and ajupyter notebook are available from kaggle or from dip4fish blog [9], [10], [11].
The final data set is comprised of about thirteen thousand grayscale images (94 x 93 pixels). Foreach image, there is a ground truth segmentation map of the same size, as shown in Figure 4. In thesegmentation map, class labels of 0 (shown as black) correspond to the background, class labels of 1(shown as red below) correspond to non-overlapping regions of one chromosome, class labels of 2(show as green) correspond to non-overlapping regions of the second chromosome, and labels of 3(shown as blue) correspond to overlapping regions.Figure 4: Sample of overlapping chromosomes input image and ground-truth label
A few erroneous labels of 4 were corrected to match the label of the surrounding pixels. Mislabelson the non-overlapping regions, which were seen as artifacts in the segmentation map (example inFigure 5), were addressed by assigning them to the background class unless there were at least threeneighboring pixels that were in the chromosome class. The images were cropped to 88 x 88 pixels sothat the dimensions were divisible by 2, which helped processing in the pooling layers of the neuralnetwork.Figure 5: An initial data pre-processing step was performed on segmentation maps that had artifacts
One simple solution is to classify pixels based on their intensity. Unfortunately, when histograms ofthe overlapping region and the single chromosome regions are plotted, as shown in Figure 6, there issignificant overlap between the two histograms. Thus, a simple algorithm based on a threshold pixelintensity value would perform poorly. 4igure 6: Histogram of pixel valesA convolutional neural network was created for this problem, illustrated in Figure 7. The deeplearning solution used for this problem was inspired by U-Net, a convolutional neural network forimage segmentation that was demonstrated on medical images of cells. The model for overlappingchromosomes was designed so that the output segmentation map has the same length and width asthe input image. To reduce computation time and storage, the model was also simplified, with almosta third fewer layers and blocks. This is because the dimensions of the input image are small (anorder of magnitude smaller than the input to U-Net) and thus too many pooling layers is undesirable.Furthermore, the set of potential objects in the chromosome images is small and the set of potentialchromosome shapes is also quite limited, which reduces the scope of the problem and thus themodeling needs. Also, cropping was not done within the network and padding was set to be ‘same’.This was because given the small input image, it was undesirable to remove pixels.Figure 7: Resulting neural network for separating overlapping chromosomesSince the problem was not straightforward, various architectures were investigated and the design ofthe model went through several iterations. These investigations included encoding the class labels asintegers, using one-hot encodings, combining the classes of the non-overlapping regions, treatingeach chromosome separately, using or not using class weights, trying different activation functions,and choosing different loss functions. The model was trained on 64% of the data, validated on 16%of the data, and tested on the last 20% of the data.
Visualizations of the input, ground truth, and model predictions are shown in Figure 8. To quanti-tatively assess the results, the intersection over union (IOU, or Jaccard’s index) is calculated. Themodel is able to achieve an IOU of 94.7% for the overlapping region, and 88.2% and 94.4% on thetwo chromosomes. 5igure 8: Comparison of prediction with ground truth
The deep learning model resulted in IOU scores of up to 94.7% on overlapping chromosomes. Toimprove the prediction results, the data set can be supplemented with images of single chromosomesand more than two overlapping chromosomes. Data augmentation can also include transformationssuch as rotations, reflections, and stretching. Additional hyperparameters can also be explored, suchas sample weights, filter numbers, and layer numbers. Increasing convolution size may improvemisclassification between the red and green chromosomes.To build a production system that can operate on entire microscope images, the model proposed inthis paper can be combined with an object detection algorithm. First, the object detection algorithmcan draw bounding boxes around chromosomes in an image. Then, an image segmentation algorithm,based on the model presented here, can identify and separate chromosomes.
References [1] T. Arora and R. Dhir. A review of metaphase chromosome image selection techniques forautomatic karyotype generation.
Med Biol Eng Comput , 54(8):1147–1157, Aug 2016.[2] G. Aubert, M. Hills, and P. M. Lansdorp. Telomere length measurement-caveats and a criticalassessment of the available technologies and tools.
Mutat. Res. , 730(1-2):59–67, Feb 2012.[3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutionalencoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 , 2015.[4] M. A. Ferguson-Smith. History and evolution of cytogenetics.
Mol Cytogenet , 8:19, 2015.[5] J. M. Garcia-Sagredo. Fifty years of cytogenetics: a parallel view of the evolution of cytogeneticsand genotoxicology.
Biochim. Biophys. Acta , 1779(6-7):363–375, 2008.[6] E. Grisan, E. Poletti, and A. Ruggeri. Automatic segmentation and disentangling of chromo-somes in Q-band prometaphase images.
IEEE Trans Inf Technol Biomed , 13(4):575–581, Jul2009.[7] L. Ji. Fully automatic chromosome segmentation.
Cytometry , 17(3):196–208, Nov 1994.[8] Pommier JP. Resolving overlapping chromosomes: an emblematic case, 2013.[9] Pommier JP. Generating images of overlapping chromosomes. https://dip4fish.blogspot.fr/2016/06/generating-images-of-overlapping.html , 2016. [Online; ac-cessed 19-2016-06-21].[10] Pommier JP. Overlapping chromosomes. , 2016.611] Pommier JP. Overlapping chromosomes. https://github.com/jeanpat/DeepFISH/tree/master/dataset , 2016.[12] P. M. Lansdorp, N. P. Verwoerd, F. M. van de Rijke, V. Dragowska, M. T. Little, R. W. Dirks,A. K. Raap, and H. J. Tanke. Heterogeneity in telomere length of human chromosomes.
Hum.Mol. Genet. , 5(5):685–691, May 1996.[13] T. Liehr. Cytogenetically visible copy number variations (CG-CNVs) in banding and molecularcytogenetics of human; about heteromorphisms and euchromatic variants.
Mol Cytogenet , 9:5,2016.[14] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for seman-tic segmentation. In
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , pages 3431–3440, 2015.[15] B. McClintock. A Cytological and Genetical Study of Triploid Maize.
Genetics , 14(2):180–222,Mar 1929.[16] Shervin Minaee, Mehran Fotouhi, and Babak Hossein Khalaj. A geometric approach for fullyautomatic chromosome segmentation, 2011.[17] M. V. Munot, J. Mukherjee, and M. Joshi. A novel approach for efficient extrication ofoverlapping chromosomes in automated karyotyping.
Med Biol Eng Comput , 51(12):1325–1338, Dec 2013.[18] S. S. Poon, U. M. Martens, R. K. Ward, and P. M. Lansdorp. Telomere length measurementsusing digital fluorescence microscopy.
Cytometry , 36(4):267–278, Aug 1999.[19] M. Popescu, P. Gader, J. Keller, C. Klein, J. Stanley, and C. Caldwell. Automatic karyotypingof metaphase cells with overlapping chromosomes.
Comput. Biol. Med. , 29(1):61–82, Jan 1999.[20] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks forbiomedical image segmentation. In
International Conference on Medical Image Computingand Computer-Assisted Intervention , pages 234–241. Springer, 2015.[21] B. J. Trask. Human cytogenetics: 46 chromosomes, 46 years and counting.
Nat. Rev. Genet. ,3(10):769–778, 10 2002.[22] E. Vera and M. A. Blasco. Beyond average: potential for measurement of short telomeres.
Aging (Albany NY) , 4(6):379–392, Jun 2012.[23] Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXivpreprint arXiv:1511.07122arXivpreprint arXiv:1511.07122