[PDF] Semantic Segmentation of Seismic Images

Abstract

Almost all work to understand Earth's subsurface on a large scale relies on the interpretation of seismic surveys by experts who segment the survey (usually a cube) into layers; a process that is very time demanding. In this paper, we present a new deep neural network architecture specially designed to semantically segment seismic images with a minimal amount of training data. To achieve this, we make use of a transposed residual unit that replaces the traditional dilated convolution for the decode block. Also, instead of using a predefined shape for up-scaling, our network learns all the steps to upscale the features from the encoder. We train our neural network using the Penobscot 3D dataset; a real seismic dataset acquired offshore Nova Scotia, Canada. We compare our approach with two well-known deep neural network topologies: Fully Convolutional Network and U-Net. In our experiments, we show that our approach can achieve more than 99 percent of the mean intersection over union (mIOU) metric, outperforming the existing topologies. Moreover, our qualitative results show that the obtained model can produce masks very close to human interpretation with very little discontinuity.

Full PDF

SSemantic Segmentation of Seismic Images

Daniel Civitarese ∗ , Daniela Szwarcman , , Emilio Vital Brazil and Bianca Zadrozny IBM Research PUC-Rio { sallesd, daniszw, evital, biancaz } @br.ibm.com Abstract

Almost all work to understand Earth’s subsurfaceon a large scale relies on the interpretation ofseismic surveys by experts who segment the sur-vey (usually a cube) into layers; a process that isvery time demanding. In this paper, we presenta new deep neural network architecture speciallydesigned to semantically segment seismic imageswith a minimal amount of training data. To achievethis, we make use of a transposed residual unitthat replaces the traditional dilated convolution forthe decode block. Also, instead of using a prede-ﬁned shape for up-scaling, our network learns allthe steps to upscale the features from the encoder.We train our neural network using the Penobscot3D dataset; a real seismic dataset acquired offshoreNova Scotia, Canada. We compare our approachwith two well-known deep neural network topolo-gies: Fully Convolutional Network and U-Net. Inour experiments, we show that our approach canachieve more than 99% of the mean intersectionover union (mIOU) metric, outperforming the ex-isting topologies. Moreover, our qualitative resultsshow that the obtained model can produce masksvery close to human interpretation with very littlediscontinuity.

Seismic surveys are the most used data to understand the sub-surface nowadays, which is an indirect measure that has along and complex chain of computational processing.In one of the crucial steps in seismic interpretation, the ex-pert must ﬁnd the separation between two bodies of rock hav-ing different physical properties; this separation (the contactbetween two strata) can be called horizons. To be able tomark the horizons the geoscientist analyzes slices (images)of the seismic cube (survey) looking for visual patterns thatcould indicate the strata [Randen et al. , 2000]. This task istime-consuming and it overloads human interpreters as theamount of geophysical information is continually increasing.Typically the expert must analyze more than 20% of all lines ∗ Contact Author of the seismic cube, which can mean thousands of images insome recent surveys.Both industry and academia have been advancing the per-formance of automated and semi-automated algorithms to ex-tract features from seismic images [Randen et al. , 2000]. Inthe past, most of the previous attempts to automate parts ofthe seismic facies analysis process relied mainly on tradi-tional computer vision (CV) techniques. For instance, [Gao,2011] studied the application of gray level co-occurrence ma-trix (GLCM) methods; some more recent work introducedother imaging processing techniques like phase congruency[Shaﬁq et al. , 2017], the gradient of texture (GoT) [Wang etal. , 2016], local binary patterns (LBP) [Mattos et al. , 2017] oreven combining more than one method [Ferreira et al. , 2016].In the last two years, the interest in applying machine learn-ing techniques to process seismic data has increased, and lastyear there was even a workshop at NeurIPS called

MachineLearning for Geophysical & Geochemical Signals .[Chevitarese et al. , 2018d] present a study about seis-mic facies classiﬁcation using convolutional neural networks.They test the models on two real-world seismic datasets [Ba-roni et al. , 2018a; Baroni et al. , 2018b] with promising re-sults. In [Chevitarese et al. , 2018b], the authors show howtransfer learning may be used to accelerate and improve train-ing on seismic images, and in [Chevitarese et al. , 2018a] theyapplied the transfer technique to semantically segment rocklayers.However, all these articles lack a detailed discussion aboutthe effect of different network topologies, training proce-dures, and parameter choices, being focused on the domainapplication rather than on the machine learning aspects. Toaddress this lack of detail, [Chevitarese et al. , 2018c] presentdeep learning models speciﬁcally for the task of classiﬁcationof seismic facies along with a detailed discussion on how dif-ferent parameters may affect the model’s performance. In thiswork, they present four different topologies: Danet1, Danet2,Danet3, and Danet4, that are able to reach 90% of classiﬁca-tion accuracy with less than 1% of the available data.

Seismic facies segmentation consists of generating densepixel-wise predictions that delineates the horizons in seismicimages. To the best of our knowledge, [Chevitarese et al. ,2018a] presented the ﬁrst study on applying deep learning a r X i v : . [ ee ss . I V ] M a y odels to seismic facies segmentation. The authors in [Alau-dah et al. , 2019] presented a similar work using the samedataset based on deconvolutional neural networks, but show-ing not so good results in the ﬁnal segmentation.Additionally to the stratigraphic interpretation of an entireseismic cube, other works have demonstrated how CNN cansuccessfully segment fragments of seismic data. In [Walde-land et al. , 2018; Shi et al. , 2018; Zeng et al. , 2018], a CNNis used to identify an entire salt structure where the outputimage is very close to the ground truth. The authors in [Zeng et al. , 2018; Karchevskiy et al. , 2018] uses a segmentationprocess to delineate salt domes in a given seismic.[Chevitarese et al. , 2018a] introduces a deep neural net-work topology for segmentation showing good results onseismic data based on existing work on fully convolutionalnetworks [Long et al. , 2015]. However, they do not discussthe topology nor the training parameters. Furthermore, thereis no performance comparison with network topologies in theliterature used for semantic segmentation.Here, we develop deep neural models speciﬁcally for thesemantic segmentation task using state-of-the-art conceptsand tools to train and predict seismic facies efﬁciently. Ourgoal is to reduce the amount of labeled data the geoscien-tist has to produce to have all the slices in one cube correctlysegmented. In other words, with a minimum number of anno-tated images, we want to produce efﬁcient and good qualitysegmentation in the rest of the cube.We explore the network topology to reduce the number ofparameters and operations but yet improve the performanceof test data. Additionally, we analyze the performance im-pact of different input sizes and number of training examples.Finally, we compare our results to those obtained by usingwell-known topologies applied to the seismic facies segmen-tation task in the literature.We divide this paper into ﬁve sections. In the followingsection, we present the dataset and the pre-processing proce-dure. Next, in the Network Architectures section, we presentthe techniques used to design our models and discuss the net-work topologies. In the Experiments section, we describetraining parameters and the experimental results. Finally, wepresent the conclusions and future work in the last section. In this section, we describe the dataset used in our experi-ments and the data preparation process.

The generation of seismic images is a sophisticated processwhich comprises many steps. The ﬁrst one is data acqui-sition, which requires an intense sound source, such as anair gun, to direct sound waves into the ground. These wavespass through different layers of rock (strata) and are reﬂected,returning to the surface, where geophones or hydrophonesrecord them. This signal is then processed in an iterative pro-cedure to generate the seismic images. Finally, interpretersanalyze the resulting images and discriminate the differentcategories – or facies [Mattos et al. , 2017]. These categoriesrepresent the overall seismic characteristics of a rock unit that reﬂect its origin, differentiating this unit from the other onesaround it [Mattos et al. , 2017].Penobscot 3D Survey [Baroni et al. , 2018b] is the publiclyavailable dataset used in this work. It consists of a horizon-tal stack of 2D seismic images ( slices ), creating a 3D vol-ume as depicted in Figure 1. The vertical axis of this vol-ume represents depth. The remaining axes deﬁne the inline and crossline directions – red arrows in Figure 1. The inlineslices are the images in the cube perpendicular to the inlinedirection, indicated by the blue blocks in Figure 1. The sameidea applies to the crossline slices, which are images along thedepth axis and perpendicular to the crossline axis. The Penob-scot dataset contains 481 crossline and 601 inline slices, withdimensions × and × pixels, respectively.It is important to mention that we used only inline slices inthis work and we removed corrupted or poor-quality images,as indicated by an expert. After removing these images, weended up with 459 inline slices. depthinline crosslinetrain validation Figure 1: 3D seismic volume with the division highlighted in blocksof inline slices. The red arrows indicate inline, crossline and depthdirections. In each block, the ﬁrst slices go to the training set andthe rest to the validation set.

Geoscientists interpreted and annotated the slices generat-ing a label mask for each one. The experts interpreted sevenhorizons: H1, H2, H3, H4, H5, H6, and H7, numbered fromthe lowest depth to the highest. They divide the seismic cubeinto eight intervals with different pattern conﬁgurations. Fig-ure 2 b shows the 7 interpreted horizons, creating 8 classes.The geoscientists based their interpretation on conﬁgurationpatterns that indicate geological factors like lithology, strati-ﬁcation, depositional systems, etc. [Brown Jr, 1980]. In thefollowing list we explain brieﬂy the seismic facies of each ofthe horizon intervals based on the amplitude and continuityof reﬂectors.0. ocean.1. the facies unit is composed of high-amplitude reﬂectors.Although most of the reﬂectors are continuous, somehave diving angles and others are truncated, evidencinga high energy environment.2. the package consists mostly of parallel, high-amplitudereﬂectors.3. reﬂectors are predominantly subparallel and presentvarying amplitude.. reﬂectors between these horizons are continuous buthave low amplitude, which makes it difﬁcult to identifythem.5. facies unit containing parallel to subparallel reﬂectors,like the previous interval, but less continuous.6. the facies unit is characterized by parallel to subparallel,continuous, high-amplitude reﬂectors.7. the facies unit below the 7 th horizon is characterized pri-marily by parallel, concordant, high-amplitude reﬂec-tors. tilelabeltile label (80 x 120)(128 x 128) (a) (b) (c) Figure 2: (a) Example of an inline slice (we cropped the bottomof the image). (b) The corresponding mask for the input image,showing the eight categories. (c) Tile examples with their respectivelabel masks. For better visualization, the tiles are not on the samescale as (a) and (b).

As discussed in [Mattos et al. , 2017; Chopra and Alexeev,2006], here we assume that one may identify different cate-gories by their textural features. Therefore, a model that candistinguish textures in an image can be used to separate seis-mic facies.In our segmentation task, we chose to break the seismic im-ages into smaller tiles (Figure 2 (c)), as the original ones arelarge and would require a more complex model to deal withsuch a big input size. Also, the original images have a ﬂoatrange around [-30.000, 33.000] and we decided to rescalethem to the grayscale range of [0, 255].Our dataset preparation comprises the following steps:1. remove corrupted images (indicated by a domain ex-pert);2. split image ﬁles into training, validation and test sets;3. process images removing outliers, and rescale values be-tween 0-255;4. break processed images into tiles using a sliding windowmechanism that allows overlap;As we want to simplify the interpreter job, we must createtraining, validation and test sets that correspond to the nature of this task, i.e., we should use only data from a single seis-mic cube. We acknowledge that adjacent slices can be similarsince they represent seismic characteristics of neighboring re-gions. For the same reason, we also expect slices located farfrom each other in the volume to be less correlated. However,different cubes represent different regions that usually do nothave interchangeable classes or characteristics. Then, for ourapplication, we could not use data from other cubes to testoverﬁtting.Keeping that in mind, we used the following scheme tosplit the cube into training, validation and test sets (see Fig-ure 1):1. divide the cube into n blocks;2. in each block, select the ﬁrst 70% slices for the trainingset and the remaining 30% for the validation set;3. (optionally) limit the total number of training slices tobe x and randomly select x / n slices in each block so wecan simulate limited data scenarios;Following the work in [Baroni et al. , 2018b] we mergedclasses 2 and 3, as class 3 represents a very thin layer (about20 pixels of depth). Therefore, the ﬁnal number of classes isseven. In this paper, we present two new topologies built explicitlyfor the semantic segmentation of seismic data, Danet-FCN2and Danet-FCN3. As mentioned before, this kind of imagecan be very different from common texture images, such asthe images in the DTD dataset mentioned in the introduction,where tiles representing textures are larger and have colorsthat make classes more separable.The proposed architectures use residual blocks [He et al. ,2016] to extract features (encoder), and a novel transposedunit of the same structure to reconstruct the original imagewith its respective labels (decoder). Contrarily to the previ-ously developed Danet-FCN [Chevitarese et al. , 2018a], thesenewer models do not have additional fused connections be-tween the encoder and the decoder. In [Long et al. , 2015], theauthors incorporated skip connections in the topology to com-bine coarse, high layer information with detailed, low layerinformation. On the other hand, He and his colleagues showthat shortcut connections and identity after-addition activa-tion make information propagation smooth [He et al. , 2016].In this sense, we merge FCN and ResNet ideas by remov-ing the shortcuts as in FCN, and U-Net, and by replacingthe contracting path with residual units. For the expansivepart, however, we had to create the inverse operation that wecalled transposed residual since it was inexistent. In additionto a gain in performance, our new architecture, composed ofresidual blocks, already implements shortcut paths along withthe skip connection, what makes fuse (FCN and U-Net) oper-ations unnecessary.In Figure 3 we compare all three Danet-FCN topologies,where c stands for a convolution layer, s is the stride, and ru stands for a residual unit. So, an entry c s stands fora convolution with ﬁlters of size × and stride . Theresidual conﬁguration we use in this work is the same one anet FCN2 rest 128rest 64c5 s2 32ct5 s2 32scoreres 64res 128 Danet FCN3 rest 512c5 s2 64rest 256rest 128ct5 s2 64scoreres 128res 256res 512

Danet FCN c5 s2 64convt 256convt 128fuse 3fuse 2ct5 s2 64scoreconv 128conv 256 rut3 s1

Figure 3: These are the topologies of the three models. On the right,we present the description of each block, where presented in [Chevitarese et al. , 2018c], described by Equa-tion 1: x ← h ( x ) + F ( x , θ ) y ← f ( x ) (1)where h ( x ) is an identity mapping, and f ( x ) is a ReLU[Krizhevsky et al. , 2012] function. [Chevitarese et al. , 2018a] introduced this model that is basedon the VGG-FCN topology presented by [Long et al. , 2015].In [Long et al. , 2015], the authors convert the fully connectedlayers at the end of the network structure into × convolu-tions, and append a transposed convolution to up-sample itsinput features to the input image size. We decided to revisitthis network not only to compare it with the ones proposedhere but also to extend the discussion about it. The main ideabehind Danet-FCN is to have a simpler network – instead ofusing VGG-size structures – that leads to faster training times,and less memory usage. In contrast to its relative, VGG-FCN, Danet-FCN only performs convolutions, and strides inits operations replace pooling to reduce data dimensionality.We believe that this is one of the reasons that made Danet-FCN perform remarkably well on the datasets in which it wastested. In this work, we propose the use of transposed convolutionsto create transposed residual units producing a “mirror” struc-ture similar to the one in U-Net [Ronneberger et al. , 2015].Figure 4 Although this operation can be more expensive com-paring to dilated convolutions, we noticed that it can producesmoother results. The fact that the upscale is learned insteadof being predeﬁned by the user may be one explanation forthat. Another nice feature we obtained with this new residualstructure is the creation of links between the encode layersand decode ones, similar to the fuse we have in FCN-16 and FCN-8. Those links improve the segmentation results sub-stantially, as discussed in [Long et al. , 2015]. conv + BNconv + BN + F(x) convt + BNconvt + BN + F(x)ReLU

Figure 4: On the left we have the regular residual structure. On theright, we have the proposed transposed residual unit.

Figure 3 shows the entire structure of Danet-FCN2. It hasa very compact topology allowing it to learn from a minimalnumber of examples – see Table 2. Also, we noticed that thismodel is the fastest to train in most of the conﬁgurations too.

Danet-FCN3

Our biggest topology is similar to Danet-FCN2, but compris-ing more residual units with more ﬁlters. Comparing to U-Net it has more parameters on one hand, but less operationsto compute on the other hand. It is expected a deterioration ofthe results for smaller datasets since Danet-FCN3 has muchmore parameters compared to its relatives Danet-FCN andDanet-FCN2.

Following the idea presented in [Chevitarese et al. , 2018c],we want to investigate whether the proposed topologies areuseful in a limited data context, which is similar to analyzethe inﬂuence of the number of slices available for training inthe models’ performance. Also, there are only a few smalldatasets publicly available for training in the Oil and Gas in-dustry. It is important to highlight that the process of annotat-ing seismic images is expensive and time-consuming. Hence,ﬁnding the minimum number of labeled images required totrain a model successfully, is valuable for the O&G industry.In this section, we describe the experiments we conducted toaccomplish our goals.We generated different tile datasets from the seismic im-ages by varying the number of slices for the training set: 100,13, 9 and 5 slices. This allows us to simulate the limiteddataset scenarios and analyze models’ performance with adecreasing number of examples for training. Based on theresults presented in [Chevitarese et al. , 2018a], we selected atile size of × pixels. In addition, we generated tiles of × pixels so we could compare our networks with rel-evant models in the literature of semantic segmentation, suchas U-Net [Ronneberger et al. , 2015], which expects larger in-put images. For both tile sizes, we let 50% of overlap in eachdirection. This means that the sliding window skips, for ex-ample, pixels horizontally and pixels vertically in the × tile case. Table 1 summarizes all the generated tiledatasets and shows the number of training examples in eachone.e compare our models with slightly modiﬁed networksfrom the literature: FCN16, FCN8 [Long et al. , 2015] and U-Net [Ronneberger et al. , 2015]. In their original applications,the models were designed to handle input images larger thanour tiles, but considerably smaller than an entire seismic im-age. Therefore, to fairly compare our models with FCN andU-Net, we decided to make minor changes in their architec-ture to accommodate the smaller or non-square tiles. For theFCN networks, we added a crop mechanism to correct shapesof feature maps in the upsampling part. For the selected tilesizes, we cropped at most 1 pixel from the feature maps,which we can consider that will not affect the results. Forthe U-Net, we changed the unpadded convolutions to paddedconvolutions, keeping intact the rest of the structure.It is important to mention that, from the selected modelsin the literature, we only trained FCN-8 using both tile sizes.The other ones would require further changes, or cropping asigniﬁcant number of pixels in order to comply with the inputsize of × . In all experiments, we ﬁxed the mini-batch as , the weightdecay coefﬁcient as × − and the maximum number ofepochs as . We also used the RMSProp optimization al-gorithm [Tieleman and Hinton, 2012] with ν = 0 . , ϕ = 0 . and ε = 1 . in all training sessions. The Xavier method [Glo-rot and Bengio, 2010] of initialization was used in all convo-lutional kernels while the biases were initialized with zeros.The Danet models have batch normalization layers and wechose their parameters (cid:15) = 1 × − and µ = 0 . . We alsoused a learning rate (lr) schedule for the Danet models, to takeadvantage of the batch normalization beneﬁts. Starting with arelatively high lr of . , we performed a stepwise decrease: • epoch = , lr = . • epoch = , lr = × − • epoch = , lr = × − For the FCN and U-Net models, we used a constant learn-ing rate of × − , and dropout with ρ = 0 . , as indicatedin the original papers. Also, for the FCN models, we usedthe same scheme of loading the weights of a VGG16 networkpreviously trained with the ImageNet dataset [Russakovskyand others, 2015].We trained the models using Tensorﬂow 1.9, 4 or 2 GPUsK80 (depending on the number of examples) in a Power8node. Also, we adopted the mean intersection over union(mIOU) in the validation dataset as the metric to select thebest model during training. One should notice that we calcu-lated the mIOU by averaging the IOU over all classes, withthe same weights, even though they are not balanced. To compare the trained models, we used the 40 images se-lected for the test set, as mentioned before. Our test procedurefor all trained models consists of the following steps:1. For each image, we break it into non-overlapping tilesof the same size the model was trained with;

Table 1: Datasets generated for the experiments × Images 100 13 9 5Tiles 15392 2000 1376 768 × Images 100 13 9 5Tiles 7792 1008 688 3842. we predict the tile masks with the trained model;3. the entire mask for that image is assembled by unitingthe predicted mask tiles;4. once the entire predicted mask is formed, we calculatethe IOU for each class in that image and average theresults to get the mIOU for the image;5. ﬁnally, when we have the mIOU for all 40 images, weaverage the results to get the overall response on the testset, which we named here mmIOU.The test procedure enables us to compare the models bothquantitatively (mmIOU) and qualitatively (predicted masks).Table 2 summarizes the mmIOU for all the experiments.

Table 2: Experiments results.

Image size Model Number of slices (mmIOU)5 9 13 100 × DanetFCN

FCN8 0.5819 0.7196 0.7389 0.9364 × DanetFCN

FCN8 0.4497 0.5790 0.6605 0.9057FCN16 0.1524 0.4997 0.5950 0.9369U-Net 0.1397 0.1431 0.1371 0.1880

For the tile size of × pixels, all Danet models pre-sented a mmIOU higher than 80%, even in the very limiteddataset of 5 slices. One can notice that the results deterioratewith the decrease in the number of examples, but this effect isstronger in the FCN8 and FCN16. In the non-limited context(100 slices in the training set), both FCN8 and FCN16 havea performance similar to the Danet models, but Danet-FCN3beats them for both tile sizes. On the other hand, we were notable to effectively train the U-Net model with our datasets, asshown in the last line of Table 2. The training loss did notconverge and the results were poor. Further investigation isrequired to understand the reason for that. A ﬁrst assumptionwould be that the input size is still too small for this network.The qualitative results for Danet-FCN3 can be seen in Fig-ure 5 and Figure 5. In the 100 slices case, the predicted maskis almost identical to the original one and, in the limited datacontext, the result is still very close to the mask generatedby a human expert. Table 3 complements Figures 5 and 5 asit shows the IOU for each class separately. From these val-ues, we can see that the classes most affected by the reduced a) (b)predicted mask - best predicted mask - worstoriginal mask original maskmIOU = 0.8962 mIOU = 0.7921 Figure 5: Masks generated by Danet-FCN3 on the test set, for tiles of size 80 x 120. (a) Best results for the 5 slices case. (b) Worst results forthe 5 slices case. amount of examples are classes 1, 3 and 5, but the other onesstill maintain an IOU higher than 80%. The predicted masksalong with the high values of mIOU are a strong indicationthat our topologies are well suited for the semantic segmenta-tion of seismic images even with a restricted number of train-ing examples available.

Table 3: IOU per class Danet-FCN3 and tiles × . IOU per classslices case 0 1 2 3 4 5 6 mIOU5 best 0.888 0.531 0.983 0.887 0.997 0.989 0.998 0.896worst 0.889 0.350 0.897 0.596 0.968 0.855 0.989 0.792100 best 0.991 0.991 0.999 0.990 1.000 0.998 1.000 0.995worst 0.970 0.983 0.994 0.981 0.999 0.995 0.995 0.988

Table 4: Computational resources.

Model name DanetFCN DanetFCN2 DanetFCN3 FCN8 FCN16 UNETMillions ofparameters 4.46 6.66 39.2 134.31 134.34 31.03Millions ofoperations 3213 1880 10572 9032 - -tile = 80 x 120Millions ofoperations 5477 3140 17675 14094 14093 24133tile = 128 x 128

Table 4 compares the models by the number of parametersand operations. Danet-FCN has the lowest number of param-eters and, at the same time, is the best performing model forthe 5 slices training case for both tile sizes. Danet-FCN3 andU-Net have parameters in the same order of magnitude, butU-net has signiﬁcantly more operations. In this case, we wereable to take advantage of the deep residual network and get excellent results while keeping the number of operations sim-ilar to what we ﬁnd in FCN and U-Net.Comparing Table 4 with Table 2 we can conclude thatDanet-FCN2 can be seen as the model with the best balancebetween performance and efﬁciency. It has the lowest num-ber of operations and almost 5 times less parameters than U-net while presenting mmIOU above 89% for all cases with × tiles. In this work, we designed and compared deep neural modelsspeciﬁcally for the task of semantic segmentation of seismicimages, which have different characteristics than standard im-ages. Our proposed networks combine different features fromexisting architectures and a novel feature which is the use ofa transposed convolution to create transposed residual unitsproducing a ”mirror structure”. To the best of our knowledgethis feature is novel and could be tested as part of neural mod-els for other types of images and even for other tasks.Our experiments show that our proposed models lead tosigniﬁcantly better results than the semantic segmentationneural models designed for standard images. Furthermore,we evaluated these models under the realistic condition thatthere would be very few examples for training (as few as 5slices) and showed that we can still achieve high values in theintersection over union metric and, through visual inspection,see a good reproduction of the original masks.For future work we want to test the proposed topologieson new seismic datasets. Also, it is important to experimentwith more seismic layers being considered by the model, andthen increasing the number of facies (classes) recognized bythe network. Finally, we want to include to the current data(texture only) the geophysical properties (e.g., velocity). eferences [Alaudah et al. , 2019] Y. Alaudah, P. Michalowicz, M. Al-farraj, and G. AlRegib. A machine learning benchmarkfor facies classiﬁcation. arXiv:1901.07659 , 2019.[Baroni et al. , 2018a] L. Baroni, R. M. Silva, R. S. Fer-reira, D. Chevitarese, D. Szwarcman, and E. Vital Brazil.Netherlands f3 interpretation dataset, September 2018.[Baroni et al. , 2018b] L. Baroni, R. M. Silva, R. S. Ferreira,D. Chevitarese, D. Szwarcman, and E. Vital Brazil. Penob-scot interpretation dataset, July 2018.[Brown Jr, 1980] L. F. Brown Jr. Seismic stratigraphic inter-pretation and petroleum exploration.

Course Notes AAPG ,1(16):181p, 1980.[Chevitarese et al. , 2018a] D. Chevitarese, D. Szwarcman,R. M. D Silva, and E. Vital Brazil. Seismic facies seg-mentation using deep learning. In

American Associationof Petroleum Geologists (AAPG) , 2018.[Chevitarese et al. , 2018b] D. Chevitarese, D. Szwarcman,R. M. D Silva, and E. Vital Brazil. Transfer learning ap-plied to seismic images classiﬁcation. In

AAPG , 2018.[Chevitarese et al. , 2018c] D. S. Chevitarese, D. Szwarc-man, E. V. Brazil, and B. Zadrozny. Efﬁcient Classiﬁcationof Seismic Textures. In

IJCNN , jul 2018.[Chevitarese et al. , 2018d] D. S. Chevitarese, D. Szwarc-man, R. M. Gama e Silva, and E. Vital Brazil. Deep learn-ing applied to seismic facies classiﬁcation: a methodologyfor training. In

European Association of Geoscientists andEngineers (EAGE) , 2018.[Chopra and Alexeev, 2006] S. Chopra and V. Alexeev. Ap-plications of texture attribute analysis to 3d seismic data.

The Leading Edge , 25(8):934–940, 2006.[Ferreira et al. , 2016] R. D. S. Ferreira, A. B. Mattos, E. Vi-tal Brazil, R. Cerqueira, M. Ferraz, and S. Cersosimo.Multi-scale evaluation of texture features for salt dome de-tection. In

IEEE ISM , pages 632–635, 2016.[Gao, 2011] Dengliang Gao. Latest developments in seismictexture analysis for subsurface structure, facies, and reser-voir characterization: A review.

Geophysics , 2011.[Glorot and Bengio, 2010] X. Glorot and Y. Bengio. Under-standing the difﬁculty of training deep feedforward neuralnetworks. In , volume 9 of

PMLR , pages 249–256, May 2010.[He et al. , 2016] K. He, X. Zhang, S. Ren, and J. Sun. Deepresidual learning for image recognition. In

CVPR , pages770–778, 2016.[Karchevskiy et al. , 2018] M. Karchevskiy, I. Ashrapov, andL. Kozinkin. Automatic salt deposits segmentation: Adeep learning approach. arXiv:1812.01429 , 2018.[Krizhevsky et al. , 2012] A. Krizhevsky, I. Sutskever, andG. E. Hinton. Imagenet classiﬁcation with deep convolu-tional neural networks. In

Advances in Neural InformationProcessing Systems 25 , pages 1097–1105. Curran Asso-ciates, Inc., 2012. [Long et al. , 2015] J. Long, E. Shelhamer, and T. Darrell.Fully convolutional networks for semantic segmentation.In

CVPR , pages 3431–3440, 2015.[Mattos et al. , 2017] A. B. Mattos, R. S. Ferreira, R. M.Silva, M. Riva, and E. Vital Brazil. Assessing texture de-scriptors for seismic image retrieval. In ,pages 292–299, Oct 2017.[Randen et al. , 2000] T. Randen, E. Monsen, C. Signer,A. Abrahamsen, J. O. Hansen, T. Sæter, and J. Schlaf.Three-dimensional texture attributes for seismic data anal-ysis. In

SEG Technical Program Expanded Abstracts2000 , pages 668–671. 2000.[Ronneberger et al. , 2015] O. Ronneberger, P. Fischer, andT. Brox. U-net: Convolutional networks for biomedicalimage segmentation. In

International Conference on Med-ical image computing and computer-assisted intervention ,pages 234–241. Springer, 2015.[Russakovsky and others, 2015] Olga Russakovsky et al.ImageNet Large Scale Visual Recognition Challenge.

IJCV , 115(3):211–252, 2015.[Shaﬁq et al. , 2017] M. A. Shaﬁq, Y. Alaudah, G. AlRegib,and M. Deriche. Phase congruency for image understand-ing with applications in computational seismic interpreta-tion. In

IEEE ICASSP , pages 1587–1591, 2017.[Shi et al. , 2018] Y. Shi, X. Wu, and S. Fomel. Automaticsalt-body classiﬁcation using a deep convolutional neuralnetwork. In

SEG Technical Program Expanded Abstracts2018 , pages 1971–1975. 2018.[Tieleman and Hinton, 2012] T. Tieleman and G. Hinton.Lecture 6.5—RmsProp. COURSERA: Neural Networksfor Machine Learning, 2012.[Waldeland et al. , 2018] A. U Waldeland, A. C. Jensen,L. Gelius, and A. H. S. Solberg. Convolutional neural net-works for automated seismic interpretation.

The LeadingEdge , 37(7):529–537, 2018.[Wang et al. , 2016] Z. Wang, Z. Long, and G. AlRegib.Tensor-based subspace learning for tracking salt-domeboundaries constrained by seismic attributes. In

IEEEICASSP , pages 1125–1129, 2016.[Zeng et al. , 2018] Y. Zeng, K. Jiang, and J. Chen. Auto-matic seismic salt interpretation with deep convolutionalneural networks. arXiv:1812.01101arXiv:1812.01101